Cohere Rerank 4.0
Cohere Rerank 4.0 is a rerank model available in two variants, Pro and Fast.
How Reranking Works
Reranking improves search relevance by reordering an initial set of retrieved results. After a first retrieval step returns candidate documents, the reranking model compares the query with each candidate and ranks the results from most relevant to least relevant. This helps surface results that better match the user’s intent.
Cohere Rerank 4.0 models support multilingual reranking and can also rerank semi-structured JSON content.
What’s New in Rerank 4.0
Compared with Cohere Rerank v3.5, Rerank 4.0 introduces a larger context window, improved reranking quality, support for model adaptation, and two variants optimized for different workload requirements.
Increased context window
Rerank 4.0 supports a 32,000-token context window, compared with the 4,000-token limit in Rerank v3.5. This larger context window improves handling for long documents and larger candidate inputs, which is useful for dense enterprise content such as financial reports, legal agreements, and technical documentation.
Improved reranking quality
Rerank 4.0 improves result ordering for enterprise retrieval workloads. Compared with Rerank v3.5, it provides stronger relevance ranking for business, finance, and technical content, which can improve the quality of downstream retrieval-augmented generation workflows by surfacing more relevant context.
Self-learning support
Rerank 4.0 introduces self-learning support, which lets you adapt reranking behavior to the data, terminology, and relevance preferences without requiring annotated training data. This can improve retrieval quality for specialized enterprise domains.
Pro and Fast variants
Rerank 4.0 is available in two variants:
- Pro is optimized for higher-precision reranking and more complex retrieval tasks.
- Fast is optimized for lower-latency, higher-throughput workloads.
Multilingual and semi-structured data support
Rerank 4.0 supports reranking for multilingual text and improves handling for semi-structured content, including JSON, tables, and code-like content. This makes it better suited for enterprise datasets that combine natural language with structured or partially structured fields.
Regions for this Model
For supported regions, endpoint types (on-demand or dedicated AI clusters), and hosting (OCI Generative AI or external calls) for this model, see the Models by Region page. For details about the regions, see the Generative AI Regions page.
Access this Model
The API inks list the endpoints for all supported commercial, sovereign, and government regions.
Model Variants
Cohere Rerank 4 includes the following model variants:
| Model | OCI Model Name | Description |
|---|---|---|
| Cohere Rerank 4 Pro | cohere.rerank-v4.0-pro |
Multilingual reranking model for English and non-English text and semi-structured JSON data. Best suited for quality-focused and complex reranking workloads. |
| Cohere Rerank 4 Fast | cohere.rerank-v4.0-fast |
Lightweight multilingual reranking model for English and non-English text and semi-structured JSON data. Best suited for lower-latency and higher-throughput workloads. |
Dedicated AI Cluster for the Model
- Model available only through the dedicated mode. (Not available on-demand.)
- For dedicated mode, create an endpoint on a hosting dedicated AI cluster, host the model on the cluster, and then run the RerankText API or its relevant SDK.
For the cluster unit size that matches each model, see the following table.
| Base Model | Fine-Tuning Cluster | Hosting Cluster | Pricing Page Information | Request Cluster Limit Increase |
|---|---|---|---|---|
|
Not available for fine-tuning |
|
|
|
|
Not available for fine-tuning |
|
|
|
If you don't have enough cluster limits in your tenancy for hosting the Cohere Rerank 4.0 model on a dedicated AI cluster, request the dedicated-unit-rerank-cohere-count limit to increase by 1.
Endpoint Rules for Clusters
- A dedicated AI cluster can hold up to 50 endpoints.
- Use these endpoints to create aliases that all point either to the same base model or to the same version of a custom model, but not both types.
- Several endpoints for the same model make it easy to assign them to different users or purposes.
| Hosting Cluster Unit Size | Endpoint Rules |
|---|---|
| RERANK_COHERE |
|
-
To increase the call volume supported by a hosting cluster, increase its instance count by editing the dedicated AI cluster. See Updating a Dedicated AI Cluster.
-
For more than 50 endpoints per cluster, request an increase for the limit,
endpoint-per-dedicated-unit-count. See Creating a Limit Increase Request and Service Limits for Generative AI.
Release and Retirement Dates
For release and retirement dates and replacement model options, see the following page:
Rerank Model Parameter
For the Rerank model parameters, see the RerankText API documentation.