Stay organized with collections
Save and categorize content based on your preferences.
The ML.DISTANCE function
This document describes the ML.DISTANCE scalar function, which lets you
compute the distance between two vectors.
Syntax
ML.DISTANCE(vector1, vector2 [, type])
Arguments
ML.DISTANCE has the following arguments:
vector1: an ARRAY value that represents the first vector, in one of the
following forms:
ARRAY<Numerical type>
ARRAY<STRUCT<STRING, Numerical type>>
ARRAY<STRUCT<INT64, Numerical type>>
where Numerical type is BIGNUMERIC, FLOAT64, INT64 or NUMERIC.
For example ARRAY<STRUCT<INT64, BIGNUMERIC>>.
When a vector is expressed as ARRAY<Numerical type>, each element
of the array denotes one dimension of the vector. An example of a
four-dimensional vector is [0.0, 1.0, 1.0, 0.0].
When a vector is expressed as ARRAY<STRUCT<STRING, Numerical type>> or
ARRAY<STRUCT<INT64, Numerical type>>, each STRUCT array item
denotes one dimension of the vector. An example of a three-dimensional
vector is [("a", 0.0), ("b", 1.0), ("c", 1.0)].
The initial INT64 or STRING value in the STRUCT is used as an
identifier to match the STRUCT values in vector2. The ordering of data
in the array doesn't matter; the values are matched by the identifier rather
than by their position in the array. If either vector has any STRUCT
values with duplicate identifiers, running this function returns an error.
vector2: an ARRAY value that represents the second vector.
vector2 must have the same type as vector1.
For example, if vector1
is an ARRAY<STRUCT<STRING, FLOAT64>> column with three elements, like
[("a", 0.0), ("b", 1.0), ("c", 1.0)], then vector2 must also be an
ARRAY<STRUCT<STRING, FLOAT64>> column.
When vector1 and vector2 are ARRAY<Numerical type> columns,
they must have the same array length.
type: a STRING value that specifies the type of distance to calculate.
Valid values are
EUCLIDEAN,
MANHATTAN, and
COSINE.
If this argument isn't specified, the default value is EUCLIDEAN.
Output
ML.DISTANCE returns a FLOAT64 value that represents the distance between
the vectors. Returns NULL if either vector1 or vector2 is NULL.
Example
Get the Euclidean distance for two tensors of ARRAY<FLOAT64> values:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-25 UTC."],[[["\u003cp\u003eThe \u003ccode\u003eML.DISTANCE\u003c/code\u003e function calculates the distance between two vectors, returning a \u003ccode\u003eFLOAT64\u003c/code\u003e value representing that distance.\u003c/p\u003e\n"],["\u003cp\u003eIt accepts two \u003ccode\u003eARRAY\u003c/code\u003e values (\u003ccode\u003evector1\u003c/code\u003e and \u003ccode\u003evector2\u003c/code\u003e) representing the vectors, which can be of numerical types or structured with identifiers, and the vectors must have the same type.\u003c/p\u003e\n"],["\u003cp\u003eThe function supports three distance types: \u003ccode\u003eEUCLIDEAN\u003c/code\u003e (default), \u003ccode\u003eMANHATTAN\u003c/code\u003e, and \u003ccode\u003eCOSINE\u003c/code\u003e, specified via the optional \u003ccode\u003etype\u003c/code\u003e argument.\u003c/p\u003e\n"],["\u003cp\u003eIf either input vector (\u003ccode\u003evector1\u003c/code\u003e or \u003ccode\u003evector2\u003c/code\u003e) is \u003ccode\u003eNULL\u003c/code\u003e, \u003ccode\u003eML.DISTANCE\u003c/code\u003e returns \u003ccode\u003eNULL\u003c/code\u003e.\u003c/p\u003e\n"],["\u003cp\u003eIf any vector contains duplicate identifiers, then an error will occur.\u003c/p\u003e\n"]]],[],null,["# The ML.DISTANCE function\n========================\n\nThis document describes the `ML.DISTANCE` scalar function, which lets you\ncompute the distance between two vectors.\n| **Note:** The [`VECTOR_SEARCH` function](/bigquery/docs/reference/standard-sql/search_functions#vector_search) is another vector function that calculates the distance between vectors. You should use the `VECTOR_SEARCH` function if you need to search a dataset for vectors similar to an input vector. You should use the `ML.DISTANCE` function if you need to compare two specific vectors to determine the distance between them.\n\nSyntax\n------\n\n```sql\nML.DISTANCE(vector1, vector2 [, type])\n```\n\n### Arguments\n\n`ML.DISTANCE` has the following arguments:\n\n- `vector1`: an `ARRAY` value that represents the first vector, in one of the\n following forms:\n\n - `ARRAY\u003cNumerical type\u003e`\n - `ARRAY\u003cSTRUCT\u003cSTRING, Numerical type\u003e\u003e`\n - `ARRAY\u003cSTRUCT\u003cINT64, Numerical type\u003e\u003e`\n\n where `Numerical type` is `BIGNUMERIC`, `FLOAT64`, `INT64` or `NUMERIC`.\n For example `ARRAY\u003cSTRUCT\u003cINT64, BIGNUMERIC\u003e\u003e`.\n\n When a vector is expressed as `ARRAY\u003cNumerical type\u003e`, each element\n of the array denotes one dimension of the vector. An example of a\n four-dimensional vector is `[0.0, 1.0, 1.0, 0.0]`.\n\n When a vector is expressed as `ARRAY\u003cSTRUCT\u003cSTRING, Numerical type\u003e\u003e` or\n `ARRAY\u003cSTRUCT\u003cINT64, Numerical type\u003e\u003e`, each `STRUCT` array item\n denotes one dimension of the vector. An example of a three-dimensional\n vector is `[(\"a\", 0.0), (\"b\", 1.0), (\"c\", 1.0)]`.\n\n The initial `INT64` or `STRING` value in the `STRUCT` is used as an\n identifier to match the `STRUCT` values in `vector2`. The ordering of data\n in the array doesn't matter; the values are matched by the identifier rather\n than by their position in the array. If either vector has any `STRUCT`\n values with duplicate identifiers, running this function returns an error.\n- `vector2`: an `ARRAY` value that represents the second vector.\n\n `vector2` must have the same type as `vector1`.\n\n For example, if `vector1`\n is an `ARRAY\u003cSTRUCT\u003cSTRING, FLOAT64\u003e\u003e` column with three elements, like\n `[(\"a\", 0.0), (\"b\", 1.0), (\"c\", 1.0)]`, then `vector2` must also be an\n `ARRAY\u003cSTRUCT\u003cSTRING, FLOAT64\u003e\u003e` column.\n\n When `vector1` and `vector2` are `ARRAY\u003cNumerical type\u003e` columns,\n they must have the same array length.\n- `type`: a `STRING` value that specifies the type of distance to calculate.\n Valid values are\n [`EUCLIDEAN`](https://xlinux.nist.gov/dads/HTML/euclidndstnc.html),\n [`MANHATTAN`](https://xlinux.nist.gov/dads/HTML/manhattanDistance.html), and\n [`COSINE`](https://en.wikipedia.org/wiki/Cosine_similarity#Cosine_Distance).\n If this argument isn't specified, the default value is `EUCLIDEAN`.\n\nOutput\n------\n\n`ML.DISTANCE` returns a `FLOAT64` value that represents the distance between\nthe vectors. Returns `NULL` if either `vector1` or `vector2` is `NULL`.\n\nExample\n-------\n\nGet the Euclidean distance for two tensors of `ARRAY\u003cFLOAT64\u003e` values:\n\n1. Create the table `t1`:\n\n ```sql\n CREATE TABLE mydataset.t1\n (\n v1 ARRAY\u003cFLOAT64\u003e,\n v2 ARRAY\u003cFLOAT64\u003e\n )\n ```\n2. Populate `t1`:\n\n ```sql\n INSERT mydataset.t1 (v1,v2)\n VALUES ([4.1,0.5,1.0], [3.0,0.0,2.5])\n ```\n3. Calculate the Euclidean norm for `v1` and `v2`:\n\n ```sql\n SELECT v1, v2, ML.DISTANCE(v1, v2, 'EUCLIDEAN') AS output FROM mydataset.t1\n ```\n\n This query produces the following output: \n\n +---------------+---------------+-------------------+\n | v1 | v2 | output |\n +---------------+---------------+-------------------|\n | [4.1,0.5,1.0] | [3.0,0.0,2.5] | 1.926136028425822 |\n +------------+------------------+-------------------+\n\nWhat's next\n-----------\n\n- For information about the supported SQL statements and functions for each model type, see [End-to-end user journey for each model](/bigquery/docs/e2e-journey)."]]