Google Gemini 2.5 Pro

The Google Gemini 2.5 Pro (google.gemini-2.5-pro) is a reasoning, multimodal model that excels at solving complex problems and is the most advanced reasoning Gemini model to date. This model is the next iteration and preforms better than the Gemini 2.0 series. The Google Gemini 2.5 Pro is great at understanding large datasets and complex problems from different types of input, such as text, images, and code.

Regions for this Model

Important

For supported regions, endpoint types (on-demand or dedicated AI clusters), and hosting (OCI Generative AI or external calls) for this model, see the Models by Region page. For details about the regions, see the Generative AI Regions page.

Key Features

Model Name in OCI Generative AI: google.gemini-2.5-pro
Available On-Demand: Access this model on-demand, through the Console playground or the API.
Multimodal Support: Input text, code, and images and get a text output. Documents, audio, and video file inputs are supported through API only. See Document Understanding, Image Understanding, Audio Understanding and Video Understanding.
Knowledge: Has a deep domain knowledge in science, mathematics, and code.
Context Length: One million tokens
Maximum Input Tokens: 1,048,576 (Console and API)
Maximum Output Tokens: 65,536 (default) (Console and API)
Excels at These Use Cases: Applications that require powerful in-depth thinking, enhanced reasoning, detailed explanations and deep understanding, such as advanced coding, scientific analysis, and complex content extraction.
Has Reasoning: Yes. Also strong at visual reasoning and image understanding. For reasoning problems increase the maximum output tokens. See Model Parameters.
Knowledge Cutoff: January 2025

See the following table for the features supported in the Google Vertex AI Platform for OCI Generative, with links to each feature.

Supported Gemini 2.5 Pro Features
Feature	Supported?
Code execution	Yes
Tuning	No
System instructions	Yes
Structured output	Yes
Batch prediction	No
Function calling	Yes
Count Tokens	No
Thinking	Yes, but turning off the thinking process isn't supported.
Context caching	Yes, the model can cache the input tokens, but this feature isn't controlled through the API.
Vertex AI RAG Engine	No
Chat completions	Yes

For key feature details, see the Google Gemini 2.5 Pro documentation and the Google Gemini 2.5 Pro model card.

Document Understanding

Supported Content Type

Console: not available
API: Supported files are text/plain for text files and application/pdf for PDF files when using inline data.

Supported Document Inputs for the API

URL: Convert a supported document format to a base64 encoded version of the document.
URI: Submit the document in a Uniform Resource Identifier (URI) format so without uploading the file, the model can access the file.

For the format, see DocumentContent Reference.

Technical Details

See Document Understanding in Gemini API documentation.

Image Understanding

Image Size

Console: Maximum image size: 5 MB
API: Maximum images per prompt: 3,000 and maximum image size before encoding: 7 MB

Supported Image Inputs

Console: png and jpeg formats
API: In the Chat operation submit a base64 encoded version of an image. For example, a 512 x 512 image typically converts to around 1,610 tokens. Supported MIME types are: image/png, image/jpeg, image/webp, image/heic, and image/heif. For the format, see ImageContent Reference.

Technical Details

Supports object detection and segmentation. See Image Understanding in the Gemini API documentation.

Audio Understanding

Supported Audio Formats

Console: not available
API: Supported media files are audio/wav, audio/mp3, audio/aiff, audio/aac, audio/ogg, and audio/flac.

Supported Audio Inputs for the API

URL: Convert a supported audio format to a base64 encoded version of the audio file.
URI: Submit the audio in a Uniform Resource Identifier (URI) format so without uploading the file, the model can access the audio.

For the format, see AudioContent Reference.

Technical Details

Token Conversion Each second of audio represents 32 tokens, so one minute of audio corresponds to 1,920 tokens.
Non‑speech Detection: The model can recognize non‑speech components such as bird songs and sirens.
Maximum Length: The maximum supported audio length in a single prompt is 9.5 hours. You can submit several files as long as their combined duration stays under 9.5 hours.
Downsampling: The model downsamples audio files to a 16 kbps resolution.
Channel Merging: If an audio source has several channels, the model merges them into a single channel.

See Audio Understanding in the Gemini API documentation.

Video Understanding

Supported Audio Formats

Console: not available
API: Supported media files are video/mp4, video/mpeg, video/mov, video/avi, video/x-flv, video/mpg, video/webm, video/wmv, and video/3gpp.

Supported Video Inputs for the API

Base64-encoded upload (URL): Convert a supported video format to a base64. The maximum payload is 50 MB (encoded). The original file size is smaller. For example, a 37.5 MB file becomes ~50 MB when encoded.
URI: Submit a Uniform Resource Identifier (URI) to access the video without uploading. The maximum payload size is 100 MB.

For the format, see VideoContent Reference.

Technical Details

See Video Understanding in Gemini API documentation.

Limits

Tokens per minute (TPM): For the TPM limit increase, use the following limit name, gemini-2-5-pro-chat-tokens-per-minute-count (for 100,000 tokens). See Creating a Limit Increase Request.

On-Demand Mode

Note

The Gemini models are available only in the on-demand mode.


Model Name	OCI Model Name	Pricing Page Product Name
Gemini 2.5 Pro	`google.gemini-2.5-pro`	Google - Gemini 2.5 Pro

You can reach the pretrained foundational models in Generative AI through two modes: on-demand and dedicated. Here are key features for the on-demand mode:

You pay as you go for each inference call when you use the models in the playground or when you call the models through the API.
Low barrier to start using Generative AI.
Great for experimentation, proof of concept, and model evaluation.
Available for the pretrained models in regions not listed as (dedicated AI cluster only).

Tip

We recommend implementing a back-off strategy, which involves delaying requests after a rejection. Without one, repeated rapid requests can lead to further rejections over time, increased latency, and potential temporary blocking of client by the Generative AI service. By using a back-off strategy, such as an exponential back-off strategy, you can distribute requests more evenly, reduce load, and improve retry success, following industry best practices and enhancing the overall stability and performance of the integration to the service.

OCI Release and Retirement Dates

For release and retirement dates and replacement model options, see the Model Retirement Dates (On-Demand Mode).

Model Parameters

To change the model responses, you can change the values of the following parameters in the playground or the API.

Maximum output tokens

The maximum number of tokens that you want the model to generate for each response. Estimate four characters per token. Because you're prompting a chat model, the response depends on the prompt and each response doesn't necessarily use up the maximum allocated tokens. The maximum prompt + output length is 128,000 tokens for each run.

Tip

For large inputs with difficult problems, set a high value for the maximum output tokens parameter.

Temperature

The level of randomness used to generate the output text. Min: 0, Max: 2, Default: 1

Tip

Start with the temperature set to 0 or less than one, and increase the temperature as you regenerate the prompts for a more creative output. High temperatures can introduce hallucinations and factually incorrect information.

Top p

A sampling method that controls the cumulative probability of the top tokens to consider for the next token. Assign p a decimal number between 0 and 1 for the probability. For example, enter 0.75 for the top 75 percent to be considered. Set p to 1 to consider all tokens.

Top k

A sampling method in which the model chooses the next token randomly from the top k most likely tokens. In the Gemini 2.5 models, the top k has a fixed value of 64, which means that the model considers only the 64 most likely tokens (words or word parts) for each step of generation. The final token is then chosen from this list.

Number of Generations (API only)

The numGenerations parameter in the API controls how many different response options the model generates for each prompt.

When you send a prompt, the Gemini model generates a set of possible answers. By default, it returns only the response with the highest probability (numGenerations = 1).
If you increase the numGenerations parameter to a number between or equal to 2 and 8 you can have the model generate 2 to 8 distinct responses.

Oracle Cloud Infrastructure Documentation

Google Gemini 2.5 Pro

Regions for this Model

Key Features

Document Understanding

Image Understanding

Audio Understanding

Video Understanding

Limits

On-Demand Mode

OCI Release and Retirement Dates

Model Parameters