Cohere Embed 4
Cohere Embed 4 (cohere.embed-v4.0) is a multimodal embedding model that generates embeddings from text, one image, or text and one image in the same API payload. Image input is available through the API only.
Regions for this Model
For supported regions, endpoint types (on-demand or dedicated AI clusters), and hosting (OCI Generative AI or external calls) for this model, see the Models by Region page. For details about the regions, see the Generative AI Regions page.
Access this Model
The API inks list the endpoints for all supported commercial, sovereign, and government regions.
Key Features
- Matryoshka embeddings: Supports output dimensions of 256, 512, 1,024, and 1,536. This feature isn't supported in Embed 3 models.
- Input limits:
- Console: Up to 96 text inputs per run, with each text input under 512 tokens. This limit applies to on-demand mode.
- SDK and API: Up to 128,000 total input tokens per run.
- Output dimensions:
- Console:1,536
- API: 1,536 by default; supports 256, 512, 1,024, and 1,536
- Input mode:
- API: Supports text only, one image only, or several text inputs with one image in the same payload.
- Only one image is allowed per payload.
- Image input is available through the API only.
- Image input:
- Requires a base64-encoded image.
- A 512 x 512 image is about 1,610 tokens.
- Language support:
- Text: English and multilingual
- Image: English only
Use Text and Image in the EmbedText API
To include an image with text, use the embedContents attribute in the EmbedTextDetails request body for the EmbedText API.
The embedContents attribute is an array and is supported only for Embed 4 models. Each item in the array is an EmbedContent object. An EmbedContent object can contain either text content or image content.
Use embedContents when you want to send text and image content in the same EmbedText request. You can include several text entries and one image, up to the maximum input size.
The other parameters for the EmbedText API remain the same.
The
embedContents attribute is supported only by Embed 4 models. Don't use embedContents with Embed 3 models.On-Demand Mode
On-demand mode is pay-as-you-go and is useful for experimentation, proof-of-concept work, and model evaluation. On the pricing page, this model is listed as:
| Model Name | OCI Model Name | Pricing Page Product Name |
|---|---|---|
| Cohere Cohere Embed 4 | cohere.embed-v4.0 |
Embed Cohere |
Dynamic Throttling Limit Change for On-Demand Mode
OCI Generative AI dynamically adjusts the request throttling limit for each active tenancy based on model demand and system capacity to optimize resource allocation and ensure fair access. Because of dynamic throttling, rate limits are undocumented and can change to meet system-wide demand.
Because rate limits can change, we recommend implementing a back-off strategy, which involves delaying requests after a rejection. Without one, repeated rapid requests can lead to further rejections over time, increased latency, and potential temporary blocking of client by the Generative AI service. By using a back-off strategy, such as an exponential back-off strategy, you can distribute requests more evenly, reduce load, and improve retry success, following industry best practices and enhancing the overall stability and performance of the integration to the service.
Dedicated AI Cluster for the Model
To use this model with a dedicated AI cluster, create an endpoint for the model in a supported region.
| Base Model | Fine-Tuning Cluster | Hosting Cluster | Pricing Page Information | Request Cluster Limit Increase |
|---|---|---|---|---|
|
Not available for fine-tuning |
|
|
|
If you don't have enough hosting capacity, request an increase for the dedicated-unit-embed-cohere-count limit.
Endpoint Rules for Clusters
- A dedicated AI cluster can hold up to 50 endpoints.
- Use these endpoints to create aliases that all point either to the same base model or to the same version of a custom model, but not both types.
- Several endpoints for the same model make it easy to assign them to different users or purposes.
| Hosting Cluster Unit Size | Endpoint Rules |
|---|---|
| Embed Cohere |
|
-
To increase the call volume supported by a hosting cluster, increase its instance count by editing the dedicated AI cluster. See Updating a Dedicated AI Cluster.
-
For more than 50 endpoints per cluster, request an increase for the limit,
endpoint-per-dedicated-unit-count. See Creating a Limit Increase Request and Service Limits for Generative AI.
Cluster Performance Benchmarks
Review the Cohere Embed 4 cluster performance benchmarks for different use cases.
OCI Release and Retirement Dates
For release and retirement dates and replacement model options, see the following pages based on the mode (on-demand or dedicated):
Input Data for Text Embeddings
For text embeddings, you can add sentences, phrases, or paragraphs. In the Console, you can enter text directly or upload a .txt file.
If you use an input file, separate each input sentence, phrase, or paragraph with a newline character.
Console limits:
- Maximum 96 text inputs per run
- Each text input must be under 512 tokens
SDK and API limits:
- Up to 128,000 total input tokens per run
- Text and image inputs together count toward the total input token limit
- Only one image is allowed per payload
- Image input must be base64 encoded
If an input is too long, use the truncate parameter to truncate the start or end of the input. If the input exceeds the token limit and truncate is set to None, the request returns an error.
Embedding Model Parameters
You can change the following parameters when using embedding models.
- Truncate (
truncate) -
Truncates tokens at the start or end when input exceeds the maximum token limit.
- Embedding Types (
embeddingTypes) -
Supported values:
float(Default)int8uint8binaryubinarybase64
- Output Dimensions (
outputDimensions) -
Supported values:
25651210241536(Default)
Migrating from Embed 3 to Embed 4
When migrating from Embed 3 to Embed 4, we recommend changing the vector size from 1,024 to 1,536 dimensions and using a new index to help avoid downtime.
-
Create a new vector index
Create a new index or collection in your vector database configured for 1,536 dimensions.
-
Re-embed the data
Reprocess the source documents with
cohere.embed-v4.0and setoutputDimensions=1536. Store the new embeddings in the new index. -
Update query logic
Update the application to use Embed 4 for incoming search queries. Use:input_type="search_query"for queriesinput_type="search_document"for stored documents
-
Cut over
After the new index is fully populated and tested, update the application to use the new 1,536-dimension index.