This document describes the supported document types and file size limits for Vertex AI RAG Engine.
Vertex AI RAG Engine offers native support for common document types and extended support through the Llm Parser for more specialized needs.
Option | Description | Use Case |
---|---|---|
Natively supported types | File types that are directly processed by Vertex AI RAG Engine without extra configuration. | Best for common document formats like PDF, DOCX, and HTML that require straightforward processing. |
Llm Parser | A separate tool that can parse and extract text from a wider variety of document formats, often with more customizable options. | Ideal for unsupported file types, complex layouts, or when you need custom parsing logic. |
Supported file types and size limits
The following table shows the natively supported file types and their size limits:
File type | File size limit |
---|---|
Google documents | 10 MB when exported from Google Workspace |
Google drawings | 10 MB when exported from Google Workspace |
Google slides | 10 MB when exported from Google Workspace |
HTML file | 10 MB |
JSON file | 10 MB |
JSONL or NDJSON file | 10 MB |
Markdown file | 10 MB |
Microsoft PowerPoint slides (PPTX file) | 10 MB |
Microsoft Word documents (DOCX file) | 50 MB |
PDF file | 50 MB |
Text file | 10 MB |
Support for additional file types
To use file types that aren't natively supported, preprocess them with the Llm Parser. The Llm Parser extracts text from a wider variety of document formats, which you can then provide to Vertex AI RAG Engine.
If you use unsupported file types directly without preprocessing, you might encounter processing errors or receive lower-quality responses.