Batch prediction for Cloud Storage

This guide shows you how to get batch predictions using Cloud Storage and covers the following topics:

  • Prepare your inputs: Learn how to format your batch request data in a JSON Lines file and store it in Cloud Storage.
  • Submit a batch job: Create a batch prediction job using the Google Cloud console, the REST API, or the Google Gen AI SDK.
  • Monitor the job status: Track the status of your running batch job.
  • Retrieve batch output: Access and interpret the prediction results from your Cloud Storage bucket.

The following diagram summarizes the overall workflow:

Prepare your inputs

Batch prediction jobs for Gemini models accept a single JSON Lines (JSONL) file stored in Cloud Storage as input. Each line in the file is a separate request to the model and follows the same format as the Gemini API.

For example:

{"request":{"contents": [{"role": "user", "parts": [{"text": "What is the relation between the following video and image samples?"}, {"fileData": {"fileUri": "gs://cloud-samples-data/generative-ai/video/animals.mp4", "mimeType": "video/mp4"}}, {"fileData": {"fileUri": "gs://cloud-samples-data/generative-ai/image/cricket.jpeg", "mimeType": "image/jpeg"}}]}], "generationConfig": {"temperature": 0.9, "topP": 1, "maxOutputTokens": 256}}}

You can download a sample batch request file to see the expected format.

After you prepare your input data and upload it to a Cloud Storage bucket, grant the AI Platform Service Agent permission to access the input file.

Submit a batch job

You can create a batch job using the Google Cloud console, the Google Gen AI SDK, or the REST API. The following table compares these options.

Option Description Pros Use Case
Google Cloud console A web-based graphical user interface for creating and managing batch jobs. Easy to use, no coding required, provides visual feedback. Ideal for one-off tasks, exploring features, or for users who prefer a GUI.
Google Gen AI SDK A Python library that provides high-level methods to interact with the Vertex AI API. Simplifies development and integrates well with Python workflows and data science tools like notebooks. Recommended for most programmatic use cases, especially for developers working in Python.
REST API A standard HTTP interface for interacting with Vertex AI services directly. Language-agnostic, offers maximum control and flexibility. Best for integrating with custom applications, services written in languages other than Python, or when direct HTTP control is needed.

Console

  1. In the Vertex AI section of the Google Cloud console, go to the Batch Inference page.

    Go to Batch Inference

  2. Click Create.

REST

To create a batch prediction job, use the projects.locations.batchPredictionJobs.create method.

Before using any of the request data, make the following replacements:

  • LOCATION: A region that supports Gemini models.
  • PROJECT_ID: Your project ID.
  • MODEL_PATH: the publisher model name, for example, publishers/google/models/gemini-2.5-flash; or the tuned endpoint name, for example, projects/PROJECT_ID/locations/LOCATION/models/MODEL_ID, where MODEL_ID is the model ID of the tuned model.
  • INPUT_URI: The Cloud Storage location of your JSONL batch prediction input such as gs://bucketname/path/to/file.jsonl.
  • OUTPUT_FORMAT: To output to a Cloud Storage bucket, specify jsonl.
  • DESTINATION: For BigQuery, specify bigqueryDestination. For Cloud Storage, specify gcsDestination.
  • OUTPUT_URI_FIELD_NAME: For BigQuery, specify outputUri. For Cloud Storage, specify outputUriPrefix.
  • OUTPUT_URI: For BigQuery, specify the table location such as bq://myproject.mydataset.output_result. The region of the output BigQuery dataset must be the same as the Vertex AI batch prediction job. For Cloud Storage, specify the bucket and directory location such as gs://mybucket/path/to/output.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs

Request JSON body:

{
  "displayName": "my-cloud-storage-batch-prediction-job",
  "model": "MODEL_PATH",
  "inputConfig": {
    "instancesFormat": "jsonl",
    "gcsSource": {
      "uris" : "INPUT_URI"
    }
  },
  "outputConfig": {
    "predictionsFormat": "OUTPUT_FORMAT",
    "DESTINATION": {
      "OUTPUT_URI_FIELD_NAME": "OUTPUT_URI"
    }
  }
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs" | Select-Object -Expand Content

You should receive a JSON response similar to the following.

The response includes a unique identifier for the batch job. You can poll for the status of the batch job using the BATCH_JOB_ID. For more information, see Monitor the job status. Note: Custom Service account, live progress, CMEK, and VPCSC reports are not supported.

Python

Install

pip install --upgrade google-genai

To learn more, see the SDK reference documentation.

Set environment variables to use the Gen AI SDK with Vertex AI:

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=global
export GOOGLE_GENAI_USE_VERTEXAI=True

import time

from google import genai
from google.genai.types import CreateBatchJobConfig, JobState, HttpOptions

client = genai.Client(http_options=HttpOptions(api_version="v1"))
# TODO(developer): Update and un-comment below line
# output_uri = "gs://your-bucket/your-prefix"

# See the documentation: https://googleapis.github.io/python-genai/genai.html#genai.batches.Batches.create
job = client.batches.create(
    # To use a tuned model, set the model param to your tuned model using the following format:
    # model="projects/{PROJECT_ID}/locations/{LOCATION}/models/{MODEL_ID}
    model="gemini-2.5-flash",
    # Source link: https://storage.cloud.google.com/cloud-samples-data/batch/prompt_for_batch_gemini_predict.jsonl
    src="gs://cloud-samples-data/batch/prompt_for_batch_gemini_predict.jsonl",
    config=CreateBatchJobConfig(dest=output_uri),
)
print(f"Job name: {job.name}")
print(f"Job state: {job.state}")
# Example response:
# Job name: projects/%PROJECT_ID%/locations/us-central1/batchPredictionJobs/9876453210000000000
# Job state: JOB_STATE_PENDING

# See the documentation: https://googleapis.github.io/python-genai/genai.html#genai.types.BatchJob
completed_states = {
    JobState.JOB_STATE_SUCCEEDED,
    JobState.JOB_STATE_FAILED,
    JobState.JOB_STATE_CANCELLED,
    JobState.JOB_STATE_PAUSED,
}

while job.state not in completed_states:
    time.sleep(30)
    job = client.batches.get(name=job.name)
    print(f"Job state: {job.state}")
# Example response:
# Job state: JOB_STATE_PENDING
# Job state: JOB_STATE_RUNNING
# Job state: JOB_STATE_RUNNING
# ...
# Job state: JOB_STATE_SUCCEEDED

Monitor the job status

After you submit a batch job, you can monitor its status using the Google Cloud console, the REST API, or the Google Gen AI SDK.

Console

  1. Go to the Batch Inference page.

    Go to Batch Inference

  2. Select your batch job to monitor its progress.

REST

To monitor a batch prediction job, use the projects.locations.batchPredictionJobs.get method and view the CompletionStats field in the response.

Before using any of the request data, make the following replacements:

  • LOCATION: A region that supports Gemini models.
  • PROJECT_ID: .
  • BATCH_JOB_ID: Your batch job ID.

HTTP method and URL:

GET https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/BATCH_JOB_ID

To send your request, choose one of these options:

curl

Execute the following command:

curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/BATCH_JOB_ID"

PowerShell

Execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/BATCH_JOB_ID" | Select-Object -Expand Content

You should receive a JSON response similar to the following.

Python

Install

pip install --upgrade google-genai

To learn more, see the SDK reference documentation.

Set environment variables to use the Gen AI SDK with Vertex AI:

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=global
export GOOGLE_GENAI_USE_VERTEXAI=True

from google import genai
from google.genai.types import HttpOptions

client = genai.Client(http_options=HttpOptions(api_version="v1"))

# Get the batch job
# Eg. batch_job_name = "projects/123456789012/locations/us-central1/batchPredictionJobs/1234567890123456789"
batch_job = client.batches.get(name=batch_job_name)

print(f"Job state: {batch_job.state}")
# Example response:
# Job state: JOB_STATE_PENDING
# Job state: JOB_STATE_RUNNING
# Job state: JOB_STATE_SUCCEEDED

A batch job can have one of the following statuses:

  • JOB_STATE_PENDING: The job is queued and waiting for resources. A job can remain in this state for up to 72 hours before it starts running.
  • JOB_STATE_RUNNING: The input file was successfully validated, and the job is processing.
  • JOB_STATE_SUCCEEDED: The job completed successfully, and the results are available.
  • JOB_STATE_FAILED: The job failed. This can occur if the input file fails validation or if the job doesn't complete within 24 hours of starting.
  • JOB_STATE_CANCELLING: The job is in the process of being canceled.
  • JOB_STATE_CANCELLED: The job was successfully canceled.

Retrieve batch output

When a batch prediction job completes, the output is stored in the Cloud Storage bucket that you specified when you created the job. For successful rows, model responses are stored in the response field. For rows that failed, error details are stored in the status field.

During long-running jobs, completed predictions are continuously exported to the specified output destination. If a job is terminated, all completed rows are exported, and you are only charged for the completed predictions.

Output examples

Successful example

{
  "status": "",
  "processed_time": "2024-11-01T18:13:16.826+00:00",
  "request": {
    "contents": [
      {
        "parts": [
          {
            "fileData": null,
            "text": "What is the relation between the following video and image samples?"
          },
          {
            "fileData": {
              "fileUri": "gs://cloud-samples-data/generative-ai/video/animals.mp4",
              "mimeType": "video/mp4"
            },
            "text": null
          },
          {
            "fileData": {
              "fileUri": "gs://cloud-samples-data/generative-ai/image/cricket.jpeg",
              "mimeType": "image/jpeg"
            },
            "text": null
          }
        ],
        "role": "user"
      }
    ]
  },
  "response": {
    "candidates": [
      {
        "avgLogprobs": -0.5782725546095107,
        "content": {
          "parts": [
            {
              "text": "This video shows a Google Photos marketing campaign where animals at the Los Angeles Zoo take self-portraits using a modified Google phone housed in a protective case. The image is unrelated."
            }
          ],
          "role": "model"
        },
        "finishReason": "STOP"
      }
    ],
    "modelVersion": "gemini-2.0-flash-001@default",
    "usageMetadata": {
      "candidatesTokenCount": 36,
      "promptTokenCount": 29180,
      "totalTokenCount": 29216
    }
  }
}

Failed example

{
  "status": "Bad Request: {\"error\": {\"code\": 400, \"message\": \"Please use a valid role: user, model.\", \"status\": \"INVALID_ARGUMENT\"}}",
  "processed_time": "2025-07-09T19:57:43.558+00:00",
  "request": {
    "contents": [
      {
        "parts": [
          {
            "text": "Explain how AI works in a few words"
          }
        ],
        "role": "tester"
      }
    ]
  },
  "response": {}
}