Starting April 29, 2025, Gemini 1.5 Pro and Gemini 1.5 Flash models are not available in projects that have no prior usage of these models, including new projects. For details, see Model versions and lifecycle.

Use embedding models with Vertex AI RAG Engine

This guide describes the available embedding models for Vertex AI RAG Engine and shows you how to create a RAG corpus with each type. This document covers the following topics:

Introduction to embeddings: Explains what embeddings are and their role in semantic retrieval.
Embedding model choices: Compares the different types of embedding models you can use.
Use Vertex AI text embedding models: Shows how to create a RAG corpus with a publisher Gecko model.
Use fine-tuned Vertex AI text embedding models: Shows how to create a RAG corpus with a fine-tuned model.
Use OSS embedding models: Shows how to create a RAG corpus with an open-source model.

The following diagram summarizes the overall workflow for choosing an embedding model and creating a RAG corpus:

Introduction to embeddings

Embeddings are numerical representations of inputs. You can use embeddings in your applications to recognize complex meanings and semantic relationships and to process and produce language.

Embeddings work by converting text, images, and video into arrays of floating-point numbers called vectors. The closer two vectors are in their embedding space, the greater the similarity of their inputs. For example, the vector for "dog" is closer to the vector for "puppy" than to the vector for "car," reflecting their semantic similarity.

Embedding models are an important component of semantic retrieval systems. The performance of a retrieval system depends on how well the embedding model maps relationships in your data.

Embedding model choices

Vertex AI RAG Engine supports the following types of embedding models for your RAG corpus:

Embedding model type	Description	Pros	Cons	Use Case
Vertex AI text embedding models	General-purpose models trained and managed by Google.	Easy to use; no deployment required. Strong baseline performance for many tasks.	Might be discontinued by the publisher. Not specialized for niche domains.	Recommended for general tasks, quick prototypes, and when you don't need highly specialized knowledge.
Fine-tuned Vertex AI text embedding models	Vertex AI text embedding models that you fine-tune with your own data.	Highly tailored performance for specific domains. You own the resulting model, so it's not affected by publisher model deprecation.	Requires a tuning dataset and process. Requires deployment, which adds complexity.	Ideal for applications requiring specialized knowledge or improved performance on domain-specific data.
OSS embedding models	Third-party open-source embedding models available in Model Garden.	Access to a wide variety of models. More control over the model and deployment.	Requires deployment and management. Performance and support can vary.	Suitable when you need to use a specific open-source model for research, benchmarking, or custom requirements.

Use Vertex AI text embedding models

The Vertex AI text embedding API uses the publisher Gecko embedding models. These models produce a dense embedding vector with 768 dimensions. Dense embeddings store the meaning of text, unlike sparse vectors, which tend to map words to numbers. This lets you search for passages that align to the meaning of a query, even if the passages don't use the same language.

Gecko models are available in English-only and multilingual versions. Since publisher Gecko models don't require deployment, they are a convenient way to get started with Vertex AI RAG Engine.

Supported Gecko models

The following table lists the Gecko models that are recommended for use with a RAG corpus.

Model version	Description
`text-embedding-005`	Default embedding model. Recommended for use with a RAG corpus.
`text-embedding-004`
`text-multilingual-embedding-002`	Recommended for use with a RAG corpus.

When Gecko models are discontinued

If a publisher Gecko model is discontinued, you can no longer use it with Vertex AI RAG Engine, even for a RAG corpus created before the discontinuation. When this happens, you must migrate the RAG corpus by creating a new one and re-importing the data.

Alternatively, you can use a fine-tuned Gecko model or a self-deployed OSS embedding model, which are not affected by publisher model discontinuations.

Create a RAG corpus with a publisher Gecko model

The following code samples show you how to create a RAG corpus with a publisher Gecko model.

curl

ENDPOINT=us-central1-aiplatform.googleapis.com
PROJECT_ID=YOUR_PROJECT_ID

// Set this to your choice of publisher Gecko model. Note that the full resource name of the publisher model is required.
// Example: projects/${PROJECT_ID}/locations/us-central1/publishers/google/models/text-embedding-004
ENDPOINT_NAME=YOUR_ENDPOINT_NAME

// Set a display name for your corpus.
// For example, "my test corpus"
CORPUS_DISPLAY_NAME=YOUR_CORPUS_DISPLAY_NAME

// CreateRagCorpus
// Input: ENDPOINT, PROJECT_ID, ENDPOINT_NAME, CORPUS_DISPLAY_NAME
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${ENDPOINT}/v1beta1/projects/${PROJECT_ID}/locations/us-central1/ragCorpora \
-d '{
      "display_name" : '\""${CORPUS_DISPLAY_NAME}"\"',
      "rag_embedding_model_config" : {
            "vertex_prediction_endpoint": {
                  "endpoint": '\""${ENDPOINT_NAME}"\"'
            }
      }
}'

Vertex AI SDK for Python

  import vertexai
  from vertexai import rag

  # Set Project
  PROJECT_ID = "YOUR_PROJECT_ID"
  vertexai.init(project=${PROJECT_ID}, location="us-central1")

  # Configure a Google first-party embedding model
  embedding_model_config = rag.RagEmbeddingModelConfig(
        publisher_model="publishers/google/models/text-embedding-004"
  )

  # Name your corpus
  DISPLAY_NAME = "YOUR_CORPUS_DISPLAY_NAME"

  rag_corpus = rag.create_corpus(
        display_name=DISPLAY_NAME, rag_embedding_model_config=embedding_model_config
  )

Use fine-tuned Vertex AI text embedding models

Foundation publisher models are trained on a large dataset and provide a strong baseline for many tasks. However, you might require a model with specialized knowledge or highly-tailored performance. In these cases, you can use model tuning to adjust the model's representations with your own data. An additional benefit is that you own the fine-tuned model, so it's unaffected by Gecko model deprecation. All fine-tuned Gecko embedding models produce embeddings with 768-dimensional vectors. To learn more about these models, see Get text embeddings.

For more information about tuning embedding models, see Tune text embeddings.

The following code samples show you how to create a RAG corpus with your deployed, fine-tuned Gecko model.

curl

  ENDPOINT=us-central1-aiplatform.googleapis.com
  PROJECT_ID=YOUR_PROJECT_ID

  // Your Vertex AI endpoint resource with the deployed fine-tuned Gecko model
  // Example: projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/${ENDPOINT_ID}
  ENDPOINT_NAME=YOUR_ENDPOINT_NAME

  // Set a display name for your corpus.
  // For example, "my test corpus"
  CORPUS_DISPLAY_NAME=YOUR_CORPUS_DISPLAY_NAME

  // CreateRagCorpus
  // Input: ENDPOINT, PROJECT_ID, ENDPOINT_NAME, CORPUS_DISPLAY_NAME
  curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${ENDPOINT}/v1beta1/projects/${PROJECT_ID}/locations/us-central1/ragCorpora \
  -d '{
        "display_name" : '\""${CORPUS_DISPLAY_NAME}"\"',
        "rag_embedding_model_config" : {
                "vertex_prediction_endpoint": {
                      "endpoint": '\""${ENDPOINT_NAME}"\"'
                }
        }
    }'

Vertex AI SDK for Python

  import vertexai
  from vertexai import rag

  # Set Project
  PROJECT_ID = "YOUR_PROJECT_ID"
  vertexai.init(project=${PROJECT_ID}, location="us-central1")

  # Your Vertex Endpoint resource with the deployed fine-tuned Gecko model
  ENDPOINT_ID = "YOUR_MODEL_ENDPOINT_ID"
  MODEL_ENDPOINT = "projects/${PROJECT_ID}/locations/us-central1/endpoints/${ENDPOINT_ID}"

  embedding_model_config = rag.RagEmbeddingModelConfig(
      endpoint=${MODEL_ENDPOINT},
  )

  # Name your corpus
  DISPLAY_NAME = "YOUR_CORPUS_DISPLAY_NAME"

  rag_corpus = rag.create_corpus(
      display_name=${DISPLAY_NAME}, rag_embedding_model_config=embedding_model_config
  )

Use OSS embedding models

Vertex AI RAG Engine supports third-party open-source embedding models in English-only and multilingual variants. This table lists the supported E5 models.

Model version	Base model	Parameters	embedding dimension	English only
`e5-base-v2`	`MiniLM`	109M	768	✔
`e5-large-v2`	`MiniLM`	335M	1,024	✔
`e5-small-v2`	`MiniLM`	33M	384	✔
`multilingual-e5-large`	`xlm-roberta-large`	560M	1,024	✗
`multilingual-e5-small`	`microsoft/Multilingual-MiniLM-L12-H384`	118M	384	✗

In order to use E5 models with Vertex AI RAG Engine, the E5 model must be deployed from Model Garden. To deploy your E5 model, see E5 Text Embedding in the Google Cloud console.

These code samples demonstrate how to create RAG corpus with your deployed E5 model.

curl

  ENDPOINT=us-central1-aiplatform.googleapis.com
  PROJECT_ID=YOUR_PROJECT_ID

  // Your Vertex Endpoint resource with the deployed E5 model
  // Example: projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/${ENDPOINT_ID}
  ENDPOINT_NAME=YOUR_ENDPOINT_NAME

  // Set a display name for your corpus.
  // For example, "my test corpus"
  CORPUS_DISPLAY_NAME=YOUR_CORPUS_DISPLAY_NAME

  // CreateRagCorpus
  // Input: ENDPOINT, PROJECT_ID, ENDPOINT_NAME, CORPUS_DISPLAY_NAME
  curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${ENDPOINT}/v1beta1/projects/${PROJECT_ID}/locations/us-central1/ragCorpora \
  -d '{
        "display_name" : '\""${CORPUS_DISPLAY_NAME</var>}"\"',
        "rag_embedding_model_config" : {
                "vertex_prediction_endpoint": {
                      "endpoint": '\""${ENDPOINT_NAME}"\"'
                }
        }
    }'

Vertex AI SDK for Python

  import vertexai
  from vertexai import rag

  # Set Project
  PROJECT_ID = "YOUR_PROJECT_ID"
  vertexai.init(project=PROJECT_ID, location="us-central1")

  # Your Vertex Endpoint resource with the deployed E5 model
  ENDPOINT_ID = "YOUR_MODEL_ENDPOINT_ID"
  MODEL_ENDPOINT = "projects/{PROJECT_ID}/locations/us-central1/endpoints/{ENDPOINT_ID}"

  embedding_model_config = rag.RagEmbeddingModelConfig(
      endpoint=MODEL_ENDPOINT,
  )

  # Name your corpus
  DISPLAY_NAME = "YOUR_CORPUS_DISPLAY_NAME"

  rag_corpus = rag.create_corpus(
      display_name=DISPLAY_NAME, rag_embedding_model_config=embedding_model_config
  )

What's next

Document types for Vertex AI RAG Engine

Use embedding models with Vertex AI RAG Engine Stay organized with collections Save and categorize content based on your preferences.

Introduction to embeddings

Embedding model choices

Use Vertex AI text embedding models

When Gecko models are discontinued

Create a RAG corpus with a publisher Gecko model

curl

Vertex AI SDK for Python

Use fine-tuned Vertex AI text embedding models

curl

Vertex AI SDK for Python

Use OSS embedding models

curl

Vertex AI SDK for Python

What's next

Use embedding models with Vertex AI RAG Engine