Starting April 29, 2025, Gemini 1.5 Pro and Gemini 1.5 Flash models are not available in projects that have no prior usage of these models, including new projects. For details, see Model versions and lifecycle.

Use the LLM parser

This guide shows you how to use the Vertex AI RAG Engine LLM parser to process documents. This guide covers the following topics:

Introduction to the LLM parser: Learn about the capabilities and benefits of using LLMs for document parsing.
Supported models and file types: Find out which models and file formats are compatible with the LLM parser.
Pricing and quotas: Understand the cost structure and usage limits associated with the LLM parser.
Import files with the LLM parser: Follow steps to import and parse your documents using the API.
Prompting guidance: Discover how to create custom prompts to tailor the parsing process for your specific needs.
Parsing quality analysis: Review examples of how the LLM parser improves parsing quality in various scenarios.

Introduction

The Vertex AI RAG Engine LLM parser uses large language models (LLMs) to improve document processing. With the LLM parser, you can do the following:

Improve semantic understanding: Interpret content across various formats, identify relevant sections, and accurately summarize complex documents.
Enhance information extraction: Retrieve relevant document chunks and extract meaningful information.
Process visuals: Understand and interact with visuals like charts and diagrams, describe images, and understand the relationships between charts and text.

These capabilities help improve the quality of generated responses.

Supported models and file types

Models

The LLM parser supports only Gemini models. If you have enabled the RAG API, you have access to the supported models. For a list of supported generation models, see Generative models.

File types

The LLM parser supports the following file types:

application/pdf
image/png
image/jpeg
image/webp
image/heic
image/heif

Pricing and quotas

For pricing details, see Vertex AI pricing.

For applicable quotas, see Rate quotas.

When you use the LLM parser, it calls Gemini models to parse your documents. This process incurs costs that are charged to your project. You can estimate the cost by using the following formula:

cost = number_of_document_files * average_pages_per_document *
(average_input_tokens * input_token_pricing_of_selected_model +
average_output_tokens * output_token_pricing_of_selected_model)

For example, you have 1,000 PDF files, and each PDF file has 50 pages. The average PDF page has 500 tokens, and an additional 100 tokens are needed for prompting. The average output is 100 tokens.

Gemini 2.0 Flash-Lite is used in your configuration for parsing, and it costs $0.075 for 1M input tokens and $0.3 for output text tokens.

cost = 1,000 * 50 * (600 * 0.075 / 1M + 100 * 0.3 / 1M) = 3.75

The estimated cost is $3.75.

Import files with `LlmParser` enabled

Before you use the code samples, replace the following variables with your values:

PROJECT_ID: The ID for your Google Cloud project.
LOCATION: The region where your request is processed.
RAG_CORPUS_RESOURCE: The ID of your corpus.
GCS_URI: The Cloud Storage URI of the files you want to import.
GOOGLE_DRIVE_URI: The Google Drive URI of the files you want to import.
MODEL_NAME: The resource name of the model used for parsing.
Format: projects/{project_id}/locations/{location}/publishers/google/models/{model_id}
CUSTOM_PARSING_PROMPT: (Optional) A custom prompt for the LLM parser to use for parsing documents.
MAX_PARSING_REQUESTS_PER_MIN: (Optional) The maximum number of requests the job can make to the Vertex AI model per minute. For more information, see Generative AI on Vertex AI rate limits and the Quotas & System Limits page for your project to set an appropriate value.

REST

curl -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora/RAG_CORPUS_RESOURCE/ragFiles:import" -d '{
  "import_rag_files_config": {
    "gcs_source": {
      "uris":  ["GCS_URI", "GOOGLE_DRIVE_URI"]
    },
    "rag_file_chunking_config": {
      "chunk_size": 512,
      "chunk_overlap": 102
    },
    "rag_file_parsing_config": {
      "llm_parser": {
        "model_name": "MODEL_NAME",
        "custom_parsing_prompt": "CUSTOM_PARSING_PROMPT"
        "max_parsing_requests_per_min": "MAX_PARSING_REQUESTS_PER_MIN"
      }
    }
  }
}'

Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.

  from vertexai import rag
  import vertexai

  PROJECT_ID = "PROJECT_ID"
  CORPUS_NAME = "RAG_CORPUS_RESOURCE"
  LOCATION = "LOCATION"
  MODEL_ID = "MODEL_ID"
  MODEL_NAME = "projects/{PROJECT_ID}/locations/{LOCATION}/publishers/google/models/{MODEL_ID}"
  MAX_PARSING_REQUESTS_PER_MIN = MAX_PARSING_REQUESTS_PER_MIN # Optional
  CUSTOM_PARSING_PROMPT = "Your custom prompt" # Optional

  PATHS = ["https://drive.google.com/file/123", "gs://my_bucket/my_files_dir"]

  # Initialize Vertex AI API once per session
  vertexai.init(project={PROJECT_ID}, location={LOCATION})

  transformation_config = rag.TransformationConfig(
      chunking_config=rag.ChunkingConfig(
          chunk_size=1024, # Optional
          chunk_overlap=200, # Optional
      ),
  )

  llm_parser_config = rag.LlmParserConfig(
      model_name = MODEL_NAME,
      max_parsing_requests_per_min=MAX_PARSING_REQUESTS_PER_MIN, # Optional
      custom_parsing_prompt=CUSTOM_PARSING_PROMPT, # Optional
  )

  rag.import_files(
      CORPUS_NAME,
      PATHS,
      llm_parser=llm_parser_config,
      transformation_config=transformation_config,
  )

Prompting

By default, the Vertex AI RAG Engine LLM parser uses a predefined prompt for parsing documents. For specialized documents that might not be suitable for a general prompt, you can specify a custom parsing prompt through the API. When you provide a custom prompt, Vertex AI RAG Engine appends it to the default system prompt before sending the request to Gemini.

Prompt template table

The following table provides a template to help you create a custom prompt for parsing your documents:

Instruction	Template statement	Example
Specify role.	You are a/an [Specify the role, such as a factual data extractor or an information retriever].	You are an information retriever.
Specify task.	Extract [Specify the type of information, such as factual statements, key data, or specific details] from the [Specify the document source, such as a document, text, article, image, table].	Extract key data from the sample.txt file.
Explain how you want the LLM to generate the output according to your documents.	Present each fact in a [Specify the output format, such as a structured list or text format], and link to its [Specify the source location, such as a page, paragraph, table, or row].	Present each fact in a structured list, and link to its sample page.
Highlight what should be the focus of the LLM.	Extract [Specify the key data types, such as the names, dates, numbers, attributes, or relationships] exactly as stated.	Extract names and dates.
Highlight what you want the LLM to avoid.	[List the actions to avoid, such as analysis, interpretation, summarizing, inferring, or giving opinions]. Extract only what the document explicitly says.	No giving opinions. Extract only what the document explicitly says.

General guidance

When you write a custom prompt, follow these guidelines:

Be specific: Clearly define the task and the type of information to extract.
Be detailed: Provide instructions on the output format, source attribution, and how to handle different data structures.
Be constraining: State what the model should not do, such as perform analysis or interpretation.
Be clear: Use direct and clear language.
Be structured: To improve readability, organize instructions with numbered or bulleted lists.

Parsing quality analysis

The following table shows how the LLM parser improves parsing quality in several common scenarios.

Scenario	Result
Parsing information across slides and linking sections	The LLM parser successfully linked section titles on one slide to the detailed information presented on subsequent slides.
Understanding and extracting information from tables	The LLM parser correctly related columns and headers within a large table to answer specific questions.
Interpreting flowcharts	The LLM parser was able to follow the logic of a flowchart and extract the correct sequence of actions and corresponding information.
Extracting data from graphs	The LLM parser could interpret different types of graphs, such as line graphs, and extract specific data points based on the query.
Capturing relationships between headings and text	The LLM parser, guided by the prompt, paid attention to heading structures and could retrieve all relevant information associated with a particular topic or section.
Potential to overcome embedding limitations with prompt engineering	While initially hampered by embedding model limitations in some use cases, additional experiments demonstrated that a well-crafted LLM parser prompt could potentially mitigate these issues and retrieve the correct information even when semantic understanding is challenging for the embedding model alone.

The LLM parser helps the LLM better understand and reason about the context within a document, which can lead to more accurate and comprehensive responses.

Retrieval query

After you enter a prompt that's sent to a generative AI model, the retrieval component in RAG searches through its knowledge base to find information that's relevant to the query. For an example of retrieving RAG files from a corpus based on a query text, see Retrieval query.

What's next

To learn more about Vertex AI RAG Engine, see Vertex AI RAG Engine overview.
To learn more about the Vertex AI RAG Engine, see Vertex AI RAG Engine API.