
Before you begin
- Labs create a Google Cloud project and resources for a fixed time
- Labs have a time limit and no pause feature. If you end the lab, you'll have to restart from the beginning.
- On the top left of your screen, click Start lab to begin
Set up the notebook
/ 10
Interact with the model
/ 15
Configure and control the model
/ 15
Manage the model interaction
/ 20
Function calling
/ 10
Use context caching
/ 10
Retrieve batch prediction results
/ 10
Get text embeddings
/ 10
The Google Gen AI SDK provides a unified interface to Google's generative AI API services. This SDK simplifies the process of integrating generative AI capabilities into applications and services, enabling developers to leverage Google's advanced AI models for various tasks. In this lab, you explore the Google Gen AI SDK, learning to connect to AI services, send diverse prompts, and fine-tune responses from Gemini. You also get hands-on experience with more advanced techniques to prepare you to leverage the power of generative AI for your own projects.
In this lab, you learn how to use the Google Gen AI SDK for Python to interact with Google's generative AI services and models, including Gemini. You cover the following:
Before starting this lab, you should be familiar with:
Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources are made available to you.
This hands-on lab lets you do the lab activities in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials you use to sign in and access Google Cloud for the duration of the lab.
To complete this lab, you need:
Click the Start Lab button. If you need to pay for the lab, a dialog opens for you to select your payment method. On the left is the Lab Details pane with the following:
Click Open Google Cloud console (or right-click and select Open Link in Incognito Window if you are running the Chrome browser).
The lab spins up resources, and then opens another tab that shows the Sign in page.
Tip: Arrange the tabs in separate windows, side-by-side.
If necessary, copy the Username below and paste it into the Sign in dialog.
You can also find the Username in the Lab Details pane.
Click Next.
Copy the Password below and paste it into the Welcome dialog.
You can also find the Password in the Lab Details pane.
Click Next.
Click through the subsequent pages:
After a few moments, the Google Cloud console opens in this tab.
In the Google Cloud console, on the Navigation menu (), click Vertex AI > Workbench.
Find the
The JupyterLab interface for your Workbench instance opens in a new browser tab.
Open your notebook file, import your libraries, and choose your model.
Open the
In the Select Kernel dialog, choose Python 3 from the list of available kernels.
Run through these sections of the notebook:
For Project ID, use
Click Check my progress to verify the objective.
For more information about all AI models and APIs on Vertex AI, refer to Google Models and Model Garden.
Use the generate_content
method to generate responses to your prompts. You can pass text to generate_content
, and use the .text
property to get the text content of the response.
You can include text, PDF documents, images, audio and video in your prompt requests and get text or code responses.
You can also pass the file URL in Part.from_uri
in the request to the model directly.
The system instructions allow you to control model behavior. Set the system instruction to give the model additional context to understand the task, provide more customized responses, and adhere to guidelines over the user interaction.
Click Check my progress to verify the objective.
You can include parameter values in each call that you send to a model to control how the model generates a response. To learn more, refer to experimenting with parameter values.
The Gemini API provides safety filters that you can adjust across multiple filter categories to restrict or allow certain types of content. You can use these filters to adjust what's appropriate for your use case. Refer to the Configure safety filters page for details.
When you make a request to the model, the content is analyzed and assigned a safety rating. You can inspect the safety ratings of the generated content by printing out the model responses, as in this example:
The Gemini API enables you to have freeform conversations across multiple turns.
The controlled generation capability in Gemini API allows you to constrain the model output to a structured format. You can provide the schemas as Pydantic Models or a JSON string.
You also can define a response schema in a Python dictionary. You can use only the fields below. All other fields are ignored.
enum
items
maxItems
nullable
properties
required
In this example, you instruct the model to analyze product review data, extract key entities, perform sentiment classification (multiple choices), provide additional explanation, and output the results in JSON format.
Click Check my progress to verify the objective.
By default, the model returns a response after completing the entire generation process. You can also use the generate_content_stream
method to stream the response as it is being generated. The model returns chunks of the response as they are generated.
You can send asynchronous requests using the client.aio
module. This module exposes all the analogous async methods that are available on client
.
For example, client.aio.models.generate_content
is the async version of client.models.generate_content
.
You can use the count_tokens
method to calculate the number of input tokens before sending a request to the Gemini API. Refer to the List and count tokens page for details.
Click Check my progress to verify the objective.
Function calling lets you provide a set of tools that it can use to respond to the user's prompt. You create a description of a function in your code, then pass that description to a language model in a request. The response from the model includes the name of a function that matches the description and the arguments to call it with.
For more examples of Function Calling, refer to this notebook..
Click Check my progress to verify the objective.
Context caching lets you store frequently used input tokens in a dedicated cache and reference them for subsequent requests. This eliminates the need to repeatedly pass the same set of tokens to a model.
Note: Context caching is only available for stable models with fixed versions (for example, gemini-2.0-flash-001
). You must include the version postfix (for example, the -001
).
Click Check my progress to verify the objective.
Different from getting online (synchronous) responses, where you are limited to one input request at a time, batch predictions for the Gemini API in Vertex AI allow you to send a large number of requests to Gemini in a single batch request. Then, the model responses asynchronously populate to your storage output location in Cloud Storage or BigQuery.
Batch predictions are generally more efficient and cost-effective than online predictions when processing a large number of inputs that are not latency sensitive.
The input for batch requests specifies the items to send to your model for prediction.
Batch requests for Gemini accept BigQuery storage sources and Cloud Storage sources. You can learn more about the batch input formats in the Batch text generation page.
This lab uses Cloud Storage as an example. The requirements for Cloud Storage input are:
us-central1
Each request that you send to a model can include parameters that control how the model generates a response. Learn more about Gemini parameters in the Experiment with parameter values page.
This is one of the example requests in the input JSONL file batch_requests_for_multimodal_input_2.jsonl
:
When a batch prediction task completes, the output is stored in the location specified in your request.
The location is in the form of a Cloud Storage or BigQuery URI prefix, for example:
gs://path/to/output/data
or bq://projectId.bqDatasetId
.
If not specified, gs://STAGING_BUCKET/gen-ai-batch-prediction
is used for Cloud Storage source and bq://PROJECT_ID.gen_ai_batch_prediction.predictions_TIMESTAMP
is used for BigQuery source.
This lab uses a Cloud Storage bucket as an example for the output location.
You can specify the URI of your Cloud Storage bucket in BUCKET_URI
, or, if it is not specified, a new Cloud Storage bucket in the form of gs://PROJECT_ID-TIMESTAMP
is be created for you.
To make a batch prediction request, you specify a source model ID, an input source and an output location where Vertex AI stores the batch prediction results.
For more, see the Batch prediction API page. You can also check the status in the console at https://console.cloud.google.com/vertex-ai/batch-predictions
When a batch prediction task is complete, the output of the prediction is stored in the location specified in your request. It is also available in batch_job.dest.bigquery_uri
or batch_job.dest.gcs_uri
.
Example output:
Click Check my progress to verify the objective.
You can get text embeddings for a snippet of text by using embed_content
method. All models produce an output with 768 dimensions by default. However, some models give users the option to choose an output dimensionality between 1
and 768
. See Vertex AI text embeddings API for details.
Click Check my progress to verify the objective.
Congratulations! You've successfully explored the Google Gen AI SDK, learning to connect to AI services, send diverse prompts, and fine-tune responses from the Gemini model. You've also got hands-on experience with more advanced techniques like managing interactions, using context caching, and even working with embeddings! Now you're well-equipped to leverage the power of generative AI for your own projects.
Check out the following resources to learn more about Gemini:
...helps you make the most of Google Cloud technologies. Our classes include technical skills and best practices to help you get up to speed quickly and continue your learning journey. We offer fundamental to advanced level training, with on-demand, live, and virtual options to suit your busy schedule. Certifications help you validate and prove your skill and expertise in Google Cloud technologies.
Manual Last Updated May 05, 2025
Lab Last Tested May 05, 2025
Copyright 2025 Google LLC. All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.
This content is not currently available
We will notify you via email when it becomes available
Great!
We will contact you via email if it becomes available
One lab at a time
Confirm to end all existing labs and start this one