This guide shows you how to use the Vertex AI RAG Engine LLM parser to process documents. This guide covers the following topics:
- Introduction to the LLM parser: Learn about the capabilities and benefits of using LLMs for document parsing.
- Supported models and file types: Find out which models and file formats are compatible with the LLM parser.
- Pricing and quotas: Understand the cost structure and usage limits associated with the LLM parser.
- Import files with the LLM parser: Follow steps to import and parse your documents using the API.
- Prompting guidance: Discover how to create custom prompts to tailor the parsing process for your specific needs.
- Parsing quality analysis: Review examples of how the LLM parser improves parsing quality in various scenarios.
Introduction
The Vertex AI RAG Engine LLM parser uses large language models (LLMs) to improve document processing. With the LLM parser, you can do the following:
- Improve semantic understanding: Interpret content across various formats, identify relevant sections, and accurately summarize complex documents.
- Enhance information extraction: Retrieve relevant document chunks and extract meaningful information.
- Process visuals: Understand and interact with visuals like charts and diagrams, describe images, and understand the relationships between charts and text.
These capabilities help improve the quality of generated responses.
Supported models and file types
Models
The LLM parser supports only Gemini models. If you have enabled the RAG API, you have access to the supported models. For a list of supported generation models, see Generative models.
File types
The LLM parser supports the following file types:
application/pdf
image/png
image/jpeg
image/webp
image/heic
image/heif
Pricing and quotas
For pricing details, see Vertex AI pricing.
For applicable quotas, see Rate quotas.
When you use the LLM parser, it calls Gemini models to parse your documents. This process incurs costs that are charged to your project. You can estimate the cost by using the following formula:
cost = number_of_document_files * average_pages_per_document * (average_input_tokens * input_token_pricing_of_selected_model + average_output_tokens * output_token_pricing_of_selected_model)
For example, you have 1,000 PDF files, and each PDF file has 50 pages. The average PDF page has 500 tokens, and an additional 100 tokens are needed for prompting. The average output is 100 tokens.
Gemini 2.0 Flash-Lite is used in your configuration for parsing, and it costs $0.075 for 1M input tokens and $0.3 for output text tokens.
cost = 1,000 * 50 * (600 * 0.075 / 1M + 100 * 0.3 / 1M) = 3.75
The estimated cost is $3.75.
Import files with LlmParser
enabled
Before you use the code samples, replace the following variables with your values:
- PROJECT_ID: The ID for your Google Cloud project.
- LOCATION: The region where your request is processed.
- RAG_CORPUS_RESOURCE: The ID of your corpus.
- GCS_URI: The Cloud Storage URI of the files you want to import.
- GOOGLE_DRIVE_URI: The Google Drive URI of the files you want to import.
- MODEL_NAME: The resource name of the model used for parsing.
Format:
projects/{project_id}/locations/{location}/publishers/google/models/{model_id}
- CUSTOM_PARSING_PROMPT: (Optional) A custom prompt for the LLM parser to use for parsing documents.
MAX_PARSING_REQUESTS_PER_MIN: (Optional) The maximum number of requests the job can make to the Vertex AI model per minute. For more information, see Generative AI on Vertex AI rate limits and the Quotas & System Limits page for your project to set an appropriate value.
REST
curl -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora/RAG_CORPUS_RESOURCE/ragFiles:import" -d '{
"import_rag_files_config": {
"gcs_source": {
"uris": ["GCS_URI", "GOOGLE_DRIVE_URI"]
},
"rag_file_chunking_config": {
"chunk_size": 512,
"chunk_overlap": 102
},
"rag_file_parsing_config": {
"llm_parser": {
"model_name": "MODEL_NAME",
"custom_parsing_prompt": "CUSTOM_PARSING_PROMPT"
"max_parsing_requests_per_min": "MAX_PARSING_REQUESTS_PER_MIN"
}
}
}
}'
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
from vertexai import rag
import vertexai
PROJECT_ID = "PROJECT_ID"
CORPUS_NAME = "RAG_CORPUS_RESOURCE"
LOCATION = "LOCATION"
MODEL_ID = "MODEL_ID"
MODEL_NAME = "projects/{PROJECT_ID}/locations/{LOCATION}/publishers/google/models/{MODEL_ID}"
MAX_PARSING_REQUESTS_PER_MIN = MAX_PARSING_REQUESTS_PER_MIN # Optional
CUSTOM_PARSING_PROMPT = "Your custom prompt" # Optional
PATHS = ["https://drive.google.com/file/123", "gs://my_bucket/my_files_dir"]
# Initialize Vertex AI API once per session
vertexai.init(project={PROJECT_ID}, location={LOCATION})
transformation_config = rag.TransformationConfig(
chunking_config=rag.ChunkingConfig(
chunk_size=1024, # Optional
chunk_overlap=200, # Optional
),
)
llm_parser_config = rag.LlmParserConfig(
model_name = MODEL_NAME,
max_parsing_requests_per_min=MAX_PARSING_REQUESTS_PER_MIN, # Optional
custom_parsing_prompt=CUSTOM_PARSING_PROMPT, # Optional
)
rag.import_files(
CORPUS_NAME,
PATHS,
llm_parser=llm_parser_config,
transformation_config=transformation_config,
)
Prompting
By default, the Vertex AI RAG Engine LLM parser uses a predefined prompt for parsing documents. For specialized documents that might not be suitable for a general prompt, you can specify a custom parsing prompt through the API. When you provide a custom prompt, Vertex AI RAG Engine appends it to the default system prompt before sending the request to Gemini.
Prompt template table
The following table provides a template to help you create a custom prompt for parsing your documents:
Instruction | Template statement | Example |
---|---|---|
Specify role. | You are a/an [Specify the role, such as a factual data extractor or an information retriever]. | You are an information retriever. |
Specify task. | Extract [Specify the type of information, such as factual statements, key data, or specific details] from the [Specify the document source, such as a document, text, article, image, table]. | Extract key data from the sample.txt file. |
Explain how you want the LLM to generate the output according to your documents. | Present each fact in a [Specify the output format, such as a structured list or text format], and link to its [Specify the source location, such as a page, paragraph, table, or row]. | Present each fact in a structured list, and link to its sample page. |
Highlight what should be the focus of the LLM. | Extract [Specify the key data types, such as the names, dates, numbers, attributes, or relationships] exactly as stated. | Extract names and dates. |
Highlight what you want the LLM to avoid. | [List the actions to avoid, such as analysis, interpretation, summarizing, inferring, or giving opinions]. Extract only what the document explicitly says. | No giving opinions. Extract only what the document explicitly says. |
General guidance
When you write a custom prompt, follow these guidelines:
- Be specific: Clearly define the task and the type of information to extract.
- Be detailed: Provide instructions on the output format, source attribution, and how to handle different data structures.
- Be constraining: State what the model should not do, such as perform analysis or interpretation.
- Be clear: Use direct and clear language.
- Be structured: To improve readability, organize instructions with numbered or bulleted lists.
Parsing quality analysis
The following table shows how the LLM parser improves parsing quality in several common scenarios.
Scenario | Result |
---|---|
Parsing information across slides and linking sections | The LLM parser successfully linked section titles on one slide to the detailed information presented on subsequent slides. |
Understanding and extracting information from tables | The LLM parser correctly related columns and headers within a large table to answer specific questions. |
Interpreting flowcharts | The LLM parser was able to follow the logic of a flowchart and extract the correct sequence of actions and corresponding information. |
Extracting data from graphs | The LLM parser could interpret different types of graphs, such as line graphs, and extract specific data points based on the query. |
Capturing relationships between headings and text | The LLM parser, guided by the prompt, paid attention to heading structures and could retrieve all relevant information associated with a particular topic or section. |
Potential to overcome embedding limitations with prompt engineering | While initially hampered by embedding model limitations in some use cases, additional experiments demonstrated that a well-crafted LLM parser prompt could potentially mitigate these issues and retrieve the correct information even when semantic understanding is challenging for the embedding model alone. |
The LLM parser helps the LLM better understand and reason about the context within a document, which can lead to more accurate and comprehensive responses.
Retrieval query
After you enter a prompt that's sent to a generative AI model, the retrieval component in RAG searches through its knowledge base to find information that's relevant to the query. For an example of retrieving RAG files from a corpus based on a query text, see Retrieval query.
What's next
- To learn more about Vertex AI RAG Engine, see Vertex AI RAG Engine overview.
- To learn more about the Vertex AI RAG Engine, see
Vertex AI RAG Engine API.