Document types for Vertex AI RAG Engine

This document describes the supported document types and file size limits for Vertex AI RAG Engine.

Vertex AI RAG Engine offers native support for common document types and extended support through the Llm Parser for more specialized needs.

Option Description Use Case
Natively supported types File types that are directly processed by Vertex AI RAG Engine without extra configuration. Best for common document formats like PDF, DOCX, and HTML that require straightforward processing.
Llm Parser A separate tool that can parse and extract text from a wider variety of document formats, often with more customizable options. Ideal for unsupported file types, complex layouts, or when you need custom parsing logic.

Supported file types and size limits

The following table shows the natively supported file types and their size limits:

File type File size limit
Google documents 10 MB when exported from Google Workspace
Google drawings 10 MB when exported from Google Workspace
Google slides 10 MB when exported from Google Workspace
HTML file 10 MB
JSON file 10 MB
JSONL or NDJSON file 10 MB
Markdown file 10 MB
Microsoft PowerPoint slides (PPTX file) 10 MB
Microsoft Word documents (DOCX file) 50 MB
PDF file 50 MB
Text file 10 MB

Support for additional file types

To use file types that aren't natively supported, preprocess them with the Llm Parser. The Llm Parser extracts text from a wider variety of document formats, which you can then provide to Vertex AI RAG Engine.

If you use unsupported file types directly without preprocessing, you might encounter processing errors or receive lower-quality responses.

What's next