Use Text and Image Together in Cohere Embed 4

OCI Generative AI now supports new features for Cohere Embed 4, including configurable embedding dimensions and mixed text-and-image input through the EmbedText API.

With this update, you can generate embeddings from text only, one image only, or text combined with one image in the same API payload. This helps represent related text and visual context together, such as a product image with a description, a chart with explanatory text, or an image from a document with surrounding content.

To include text and one image in the same request you can use the new embedContents array in the EmbedTextDetails request body. The embedContents array is supported only for Embed 4 models. Each item in the array is an EmbedContent object that can contain either text or image content.

Key Features

  • Matryoshka embeddings: Configure the output embedding dimension as 256, 512, 1,024, or 1,536
  • Expanded multimodal input: Generate embeddings from:
    • Text only
    • One image only
    • Text and one image in the same API payload
  • API-based image input: Image input is supported through the API by using a base64-encoded image
  • One image per payload: You can include multiple text inputs with one image, up to the maximum input size

For migration instructions from Embed 3 to Embed 4, see the Cohere Embed 4 model page.

For supported regions, see Generative AI Models by Region. For information about the service, see the Generative AI documentation.