Cohere Embed 4

Cohere Embed 4 (cohere.embed-v4.0) is a multimodal embedding model that generates embeddings from text, one image, or text and one image in the same API payload. Image input is available through the API only.

Regions for this Model

Important

For supported regions, endpoint types (on-demand or dedicated AI clusters), and hosting (OCI Generative AI or external calls) for this model, see the Models by Region page. For details about the regions, see the Generative AI Regions page.

Key Features

  • Matryoshka embeddings: Supports output dimensions of 256, 512, 1,024, and 1,536. This feature isn't supported in Embed 3 models.
  • Input limits:
    • Console: Up to 96 text inputs per run, with each text input under 512 tokens. This limit applies to on-demand mode.
    • SDK and API: Up to 128,000 total input tokens per run.
  • Output dimensions:
    • Console:1,536
    • API: 1,536 by default; supports 256, 512, 1,024, and 1,536
  • Input mode:
    • API: Supports text only, one image only, or several text inputs with one image in the same payload.
    • Only one image is allowed per payload.
    • Image input is available through the API only.
  • Image input:
    • Requires a base64-encoded image.
    • A 512 x 512 image is about 1,610 tokens.
  • Language support:
    • Text: English and multilingual
    • Image: English only

Use Text and Image in the EmbedText API

To include an image with text, use the embedContents attribute in the EmbedTextDetails request body for the EmbedText API.

The embedContents attribute is an array and is supported only for Embed 4 models. Each item in the array is an EmbedContent object. An EmbedContent object can contain either text content or image content.

Use embedContents when you want to send text and image content in the same EmbedText request. You can include several text entries and one image, up to the maximum input size.

The other parameters for the EmbedText API remain the same.

Important

The embedContents attribute is supported only by Embed 4 models. Don't use embedContents with Embed 3 models.

On-Demand Mode

On-demand mode is pay-as-you-go and is useful for experimentation, proof-of-concept work, and model evaluation. On the pricing page, this model is listed as:

Model Name OCI Model Name Pricing Page Product Name
Cohere Cohere Embed 4 cohere.embed-v4.0 Embed Cohere
Important

Dynamic Throttling Limit Change for On-Demand Mode

OCI Generative AI dynamically adjusts the request throttling limit for each active tenancy based on model demand and system capacity to optimize resource allocation and ensure fair access. Because of dynamic throttling, rate limits are undocumented and can change to meet system-wide demand.

Tip

Because rate limits can change, we recommend implementing a back-off strategy, which involves delaying requests after a rejection. Without one, repeated rapid requests can lead to further rejections over time, increased latency, and potential temporary blocking of client by the Generative AI service. By using a back-off strategy, such as an exponential back-off strategy, you can distribute requests more evenly, reduce load, and improve retry success, following industry best practices and enhancing the overall stability and performance of the integration to the service.

Dedicated AI Cluster for the Model

To use this model with a dedicated AI cluster, create an endpoint for the model in a supported region.

Base Model Fine-Tuning Cluster Hosting Cluster Pricing Page Information Request Cluster Limit Increase
  • Model Name: Cohere Embed 4
  • OCI Model Name: cohere.embed-v4.0
Not available for fine-tuning
  • Unit Size: Embed Cohere
  • Required Units: 1
  • Pricing Page Product Name: Embed Cohere - Dedicated
  • For Hosting, Multiply the Unit Price: x1
  • Limit Name: dedicated-unit-embed-cohere-count
  • For Hosting, Request Limit Increase by: 1
Tip

If you don't have enough hosting capacity, request an increase for the dedicated-unit-embed-cohere-count limit.

Endpoint Rules for Clusters

  • A dedicated AI cluster can hold up to 50 endpoints.
  • Use these endpoints to create aliases that all point either to the same base model or to the same version of a custom model, but not both types.
  • Several endpoints for the same model make it easy to assign them to different users or purposes.
Hosting Cluster Unit Size Endpoint Rules
Embed Cohere
  • Base model: To run the cohere.embed-v4.0 model on several endpoints, create as many endpoints as you need on a Embed Cohere cluster (unit‑size).
  • Custom model: You can't fine‑tune cohere.embed-v4.0, so you can't create and host custom models built from that base.
Tip

Cluster Performance Benchmarks

Review the Cohere Embed 4 cluster performance benchmarks for different use cases.

Input Data for Text Embeddings

For text embeddings, you can add sentences, phrases, or paragraphs. In the Console, you can enter text directly or upload a .txt file.

If you use an input file, separate each input sentence, phrase, or paragraph with a newline character.

Console limits:

  • Maximum 96 text inputs per run
  • Each text input must be under 512 tokens

SDK and API limits:

  • Up to 128,000 total input tokens per run
  • Text and image inputs together count toward the total input token limit
  • Only one image is allowed per payload
  • Image input must be base64 encoded

If an input is too long, use the truncate parameter to truncate the start or end of the input. If the input exceeds the token limit and truncate is set to None, the request returns an error.

Embedding Model Parameters

You can change the following parameters when using embedding models.

Truncate (truncate)

Truncates tokens at the start or end when input exceeds the maximum token limit.

Embedding Types (embeddingTypes)

Supported values:

  • float (Default)
  • int8
  • uint8
  • binary
  • ubinary
  • base64
Output Dimensions (outputDimensions)

Supported values:

  • 256
  • 512
  • 1024
  • 1536 (Default)

Migrating from Embed 3 to Embed 4

When migrating from Embed 3 to Embed 4, we recommend changing the vector size from 1,024 to 1,536 dimensions and using a new index to help avoid downtime.

  1. Create a new vector index

    Create a new index or collection in your vector database configured for 1,536 dimensions.

  2. Re-embed the data

    Reprocess the source documents with cohere.embed-v4.0 and set outputDimensions=1536. Store the new embeddings in the new index.

  3. Update query logic

    Update the application to use Embed 4 for incoming search queries. Use:
    • input_type="search_query" for queries
    • input_type="search_document" for stored documents
  4. Cut over

    After the new index is fully populated and tested, update the application to use the new 1,536-dimension index.