arrow_back

Explore Multimodal Vector Search in BigQuery

Sign in Join
Get access to 700+ labs and courses

Explore Multimodal Vector Search in BigQuery

Lab 30 minutes universal_currency_alt 1 Credit show_chart Introductory
info This lab may incorporate AI tools to support your learning.
Get access to 700+ labs and courses

GSP1334

Overview

Imagine you want to search a catalog of product images stored in Cloud Storage to determine whether a similar product has been sold in the past. You can do this by leveraging vector search in BigQuery!

BigQuery is a fully managed, AI-ready data platform that supports seamless integration of external data sources (such as metadata from images in a Cloud Storage bucket) and connections to remote models for vector search to enhance your data analysis workflows with generative AI.

Vector search is a methodology that can be used to quickly find similar items based on their semantic meaning (rather than exact keyword matching) and can be applied to many types of data including images, videos, and text. Vector search uses embeddings which are high-dimensional numerical vectors that represent an entity, such as text or an image. These embeddings encode semantics about these entities to make it easier to compare them. This has many applications such as enhanced ability to search historical sales of similar items in an inventory dataset or product image catalog.

Given a product name or description, you can use vector search on a BigQuery object table containing metadata for the product images and identify the most closely related products to your search terms. This workflow integrates the power of Vertex AI into BigQuery for vector search, which enables you to analyze multimodal data (including images and text) in BigQuery to extract actionable insights, thereby enhancing decision-making capabilities.

In this lab, you get a quick introduction to how multimodal vector search works in BigQuery. You start by granting some permissions and creating a connection to an embeddings model in BigQuery. Then, you call the model to generate an embeddings table to store vector representations of product images in a Cloud Storage bucket. Last, you run a vector search on product images stored in Cloud Storage to identify the most similar products to your search terms based on image features and metadata.

What you'll do

  • Grant Identity and Access Management (IAM) roles to the BigQuery service account.
  • Connect to a remote embeddings model in BigQuery.
  • Generate embeddings and save them to a table in BigQuery.
  • Run a vector search on the product images in Cloud Storage.

Setup and requirements

Before you click the Start Lab button

Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources are made available to you.

This hands-on lab lets you do the lab activities in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials you use to sign in and access Google Cloud for the duration of the lab.

To complete this lab, you need:

  • Access to a standard internet browser (Chrome browser recommended).
Note: Use an Incognito (recommended) or private browser window to run this lab. This prevents conflicts between your personal account and the student account, which may cause extra charges incurred to your personal account.
  • Time to complete the lab—remember, once you start, you cannot pause a lab.
Note: Use only the student account for this lab. If you use a different Google Cloud account, you may incur charges to that account.

How to start your lab and sign in to the Google Cloud console

  1. Click the Start Lab button. If you need to pay for the lab, a dialog opens for you to select your payment method. On the left is the Lab Details pane with the following:

    • The Open Google Cloud console button
    • Time remaining
    • The temporary credentials that you must use for this lab
    • Other information, if needed, to step through this lab
  2. Click Open Google Cloud console (or right-click and select Open Link in Incognito Window if you are running the Chrome browser).

    The lab spins up resources, and then opens another tab that shows the Sign in page.

    Tip: Arrange the tabs in separate windows, side-by-side.

    Note: If you see the Choose an account dialog, click Use Another Account.
  3. If necessary, copy the Username below and paste it into the Sign in dialog.

    {{{user_0.username | "Username"}}}

    You can also find the Username in the Lab Details pane.

  4. Click Next.

  5. Copy the Password below and paste it into the Welcome dialog.

    {{{user_0.password | "Password"}}}

    You can also find the Password in the Lab Details pane.

  6. Click Next.

    Important: You must use the credentials the lab provides you. Do not use your Google Cloud account credentials. Note: Using your own Google Cloud account for this lab may incur extra charges.
  7. Click through the subsequent pages:

    • Accept the terms and conditions.
    • Do not add recovery options or two-factor authentication (because this is a temporary account).
    • Do not sign up for free trials.

After a few moments, the Google Cloud console opens in this tab.

Note: To access Google Cloud products and services, click the Navigation menu or type the service or product name in the Search field.

Task 1. Grant IAM roles to the BigQuery service account

In this lab environment, several resources have been pre-created including a BigQuery external connection named remote-models-connection and a BigQuery object table named image_object_table. These resources allow you to access and query unstructured data in Cloud Storage such as images. When the external connection was created, a BigQuery service account was also auto-generated, which can be used to control access to resources.

In this task, you grant specific IAM roles to the auto-generated BigQuery service account to access the image files in Cloud Storage and the Vertex AI resources needed for vector search.

  1. In the Google Cloud console, on the Navigation menu (), select BigQuery > Studio.

  2. Expand the arrow next to the project id (), and then expand the arrow next to External connections.

  3. Click on the connection named .remote-models-connection.

  4. On the Connection info page, copy the Service account ID for use in the next step.

For example: bqcx-1054723899402-whbp@gcp-sa-bigquery-condel.iam.gserviceaccount.com

  1. In the Google Cloud console, on the Navigation menu (), select IAM & Admin > IAM.

  2. Click Grant access.

  3. For New principals, enter the service account ID you copied in the previous section (such as bqcx-1054723899402-whbp@gcp-sa-bigquery-condel.iam.gserviceaccount.com).

  4. For Select a role, filter for Vertex AI User, and select it from the results.

  5. Click Add another role.

  6. For Select a role, filter for Storage Object Viewer, and select it from the results.

  7. Click Save.

Click Check my progress to verify the objective. Grant IAM roles to the BigQuery service account

Task 2. Create a BigQuery remote model

Now that you have granted the necessary roles to the BigQuery service account, you can create a remote model in BigQuery to access Vertex AI functionality such as generative AI features.

In this task, you use the BigQuery external connection named remote-models-connection to create a remote model named embeddings_model that connects to an embeddings model available in Vertex AI. Because your image data is stored in a Cloud Storage bucket in an unstructured format, you want to designate a multimodal model as the endpoint to handle this type of data effectively.

  1. In the Google Cloud console, on the Navigation menu (), select BigQuery > Studio.

  2. In the SQL query window, paste the following SQL query to create the BigQuery remote model in the existing BigQuery dataset named products, and click Run:

CREATE OR REPLACE MODEL `{{{project_0.project_id | filled in at lab start}}}.products.embeddings_model` REMOTE WITH CONNECTION `{{{project_0.project_id | filled in at lab start}}}.{{{project_0.startup_script.bq_gcp_region | "filled in at lab start"}}}.remote-models-connection` OPTIONS( ENDPOINT = 'multimodalembedding@001' ); Note: If you receive an error related to the service account permissions (which you assigned in the previous task), wait a few minutes, and then run the query again.

Click Check my progress to verify the objective. Create a BigQuery remote model

Task 3. Create an embeddings table for images listed in an object table

After creating a remote model in BigQuery, the next step is to call the model to generate the embeddings based on the image files stored in Cloud Storage. As mentioned in Task 1, an object table named image_object_table has been pre-created for this lab environment and can be used to query the image files.

In this task, you call the embeddings model to run on product_image_object_table and store the results in a new table named product_embeddings. When executing the query, BigQuery submits the data in the object table to the multimodal embeddings model, which returns the embeddings that can be used for vector searches to find semantically similar entities.

  1. In the SQL query window, run the following SQL query to create the embeddings table:
CREATE OR REPLACE TABLE `{{{project_0.project_id | filled in at lab start}}}.products.product_embeddings` AS SELECT *, REGEXP_EXTRACT(uri, r'[^/]+$') as product_name FROM ML.GENERATE_EMBEDDING ( MODEL `{{{project_0.project_id | filled in at lab start}}}.products.embeddings_model`, TABLE `{{{project_0.project_id | filled in at lab start}}}.products.image_object_table` );
  1. In the SQL query window, run the following query to review the data in the new embeddings table:
SELECT * FROM `{{{project_0.project_id | filled in at lab start}}}.products.product_embeddings`

Click Check my progress to verify the objective. Create an embeddings table for images listed in an object table

Task 4. Run a vector search on product images in Cloud Storage

Recall from the overview of this lab that vector search in BigQuery allows you to use calculated embeddings (which are numerical vectors) to find semantically similar objects. Specifically, vector search uses distance metrics that represent the distance between vectors in an embeddings space to compare and rank them by similarity. In BigQuery, the vector search function supports three distance types including Euclidean (which is the default), cosine, and dot product.

In this task, you use cosine similarity (which is the cosine of the angle between two vectors) to identify the top 3 most similar product images in Google Cloud Storage to a specific search phrase: colorful sweaters for women.

  • In the SQL query window, run the following query that uses the VECTOR_SEARCH function to search the product embeddings table and find the most similar images to the provided search term:
SELECT base.uri, base.product_name, base.content_type, distance FROM VECTOR_SEARCH(TABLE products.product_embeddings,'ml_generate_embedding_result', ( SELECT ml_generate_embedding_result AS embedding_col FROM ML.GENERATE_EMBEDDING ( MODEL `products.embeddings_model`, (SELECT 'colorful sweaters for women' AS content), STRUCT(TRUE AS flatten_json_output) ) ), TOP_K => 3, DISTANCE_TYPE => 'COSINE' );

Review the highest matched images in Cloud Storage

[Optional] To see the specific images that have been returned as the highest matches, you can go to the Cloud Storage bucket, and open each image.

  1. In the Google Cloud console, on the Navigation menu (), select Cloud Storage > Buckets.

  2. Click on the bucket named , and click on an image name (such as Onesie.jpg).

  3. Click on the Authenticated URL to see the image (such as https://storage.cloud.google.com/qwiklabs-gcp-04-dc2812426125/Onesie.jpg).

Click Check my progress to verify the objective. Run a vector search on product images in Cloud Storage

Congratulations!

In this lab, you received a quick introduction to vector search in BigQuery and used it to find images in Cloud Storage that are most related to a desired search term.

Next steps / Learn more

Google Cloud training and certification

...helps you make the most of Google Cloud technologies. Our classes include technical skills and best practices to help you get up to speed quickly and continue your learning journey. We offer fundamental to advanced level training, with on-demand, live, and virtual options to suit your busy schedule. Certifications help you validate and prove your skill and expertise in Google Cloud technologies.

Manual Last Updated July 7, 2025

Lab Last Tested July 7, 2025

Copyright 2025 Google LLC. All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.

Before you begin

  1. Labs create a Google Cloud project and resources for a fixed time
  2. Labs have a time limit and no pause feature. If you end the lab, you'll have to restart from the beginning.
  3. On the top left of your screen, click Start lab to begin

This content is not currently available

We will notify you via email when it becomes available

Great!

We will contact you via email if it becomes available

One lab at a time

Confirm to end all existing labs and start this one

Use private browsing to run the lab

Use an Incognito or private browser window to run this lab. This prevents any conflicts between your personal account and the Student account, which may cause extra charges incurred to your personal account.