Cohere Rerank 4.0

Cohere Rerank 4.0 is a rerank model available in two variants, Pro and Fast.

How Reranking Works

Reranking improves search relevance by reordering an initial set of retrieved results. After a first retrieval step returns candidate documents, the reranking model compares the query with each candidate and ranks the results from most relevant to least relevant. This helps surface results that better match the user’s intent.

Cohere Rerank 4.0 models support multilingual reranking and can also rerank semi-structured JSON content.

What’s New in Rerank 4.0

Compared with Cohere Rerank v3.5, Rerank 4.0 introduces a larger context window, improved reranking quality, support for model adaptation, and two variants optimized for different workload requirements.

Increased context window

Rerank 4.0 supports a 32,000-token context window, compared with the 4,000-token limit in Rerank v3.5. This larger context window improves handling for long documents and larger candidate inputs, which is useful for dense enterprise content such as financial reports, legal agreements, and technical documentation.

Improved reranking quality

Rerank 4.0 improves result ordering for enterprise retrieval workloads. Compared with Rerank v3.5, it provides stronger relevance ranking for business, finance, and technical content, which can improve the quality of downstream retrieval-augmented generation workflows by surfacing more relevant context.

Self-learning support

Rerank 4.0 introduces self-learning support, which lets you adapt reranking behavior to the data, terminology, and relevance preferences without requiring annotated training data. This can improve retrieval quality for specialized enterprise domains.

Pro and Fast variants

Rerank 4.0 is available in two variants:

  • Pro is optimized for higher-precision reranking and more complex retrieval tasks.
  • Fast is optimized for lower-latency, higher-throughput workloads.

Multilingual and semi-structured data support

Rerank 4.0 supports reranking for multilingual text and improves handling for semi-structured content, including JSON, tables, and code-like content. This makes it better suited for enterprise datasets that combine natural language with structured or partially structured fields.

Regions for this Model

Important

For supported regions, endpoint types (on-demand or dedicated AI clusters), and hosting (OCI Generative AI or external calls) for this model, see the Models by Region page. For details about the regions, see the Generative AI Regions page.

Model Variants

Cohere Rerank 4 includes the following model variants:

Model OCI Model Name Description
Cohere Rerank 4 Pro cohere.rerank-v4.0-pro Multilingual reranking model for English and non-English text and semi-structured JSON data. Best suited for quality-focused and complex reranking workloads.
Cohere Rerank 4 Fast cohere.rerank-v4.0-fast Lightweight multilingual reranking model for English and non-English text and semi-structured JSON data. Best suited for lower-latency and higher-throughput workloads.

Dedicated AI Cluster for the Model

  • Model available only through the dedicated mode. (Not available on-demand.)
  • For dedicated mode, create an endpoint on a hosting dedicated AI cluster, host the model on the cluster, and then run the RerankText API or its relevant SDK.

For the cluster unit size that matches each model, see the following table.

Base Model Fine-Tuning Cluster Hosting Cluster Pricing Page Information Request Cluster Limit Increase
  • Model Name: Cohere Rerank 4.0 Pro
  • OCI Model Name: cohere.rerank-v4.0-pro
Not available for fine-tuning
  • Unit Size: RERANK_COHERE
  • Required Units: 1
  • Pricing Page Product Name: Cohere Rerank - Dedicated
  • Limit Name: dedicated-unit-rerank-cohere-count
  • For Hosting, Request Limit Increase by: 1
  • Model Name: Cohere Rerank 4.0 Fast
  • OCI Model Name: cohere.rerank-v4.0-fast
Not available for fine-tuning
  • Unit Size: RERANK_COHERE
  • Required Units: 1
  • Pricing Page Product Name: Cohere Rerank - Dedicated
  • Limit Name: dedicated-unit-rerank-cohere-count
  • For Hosting, Request Limit Increase by: 1
Tip

If you don't have enough cluster limits in your tenancy for hosting the Cohere Rerank 4.0 model on a dedicated AI cluster, request the dedicated-unit-rerank-cohere-count limit to increase by 1.

Endpoint Rules for Clusters

  • A dedicated AI cluster can hold up to 50 endpoints.
  • Use these endpoints to create aliases that all point either to the same base model or to the same version of a custom model, but not both types.
  • Several endpoints for the same model make it easy to assign them to different users or purposes.
Hosting Cluster Unit Size Endpoint Rules
RERANK_COHERE
  • Base model: To run the cohere.rerank-v4.0-pro or cohere.rerank-v4.0-fast  model on several endpoints, create as many endpoints as you need on a RERANK_COHERE cluster (unit‑size).
  • Custom model: You can't fine‑tune cohere.rerank-v4.0-pro or cohere.rerank-v4.0-fast, so you can't create and host custom models built from that base.
Tip

Rerank Model Parameter

For the Rerank model parameters, see the RerankText API documentation.