arrow_back

Supervised Fine-tuning with Gemini for Image Captioning

Sign in Join
Get access to 700+ labs and courses

Supervised Fine-tuning with Gemini for Image Captioning

Lab 1 hour 30 minutes universal_currency_alt 1 Credit show_chart Introductory
info This lab may incorporate AI tools to support your learning.
Get access to 700+ labs and courses

GSP1296

Overview

Gemini is a family of generative AI models developed by Google DeepMind that is designed for multimodal use cases. The Gemini API gives you access to the various Gemini models, such as Gemini Pro, Flash and Flash Lite.

With this lab you learn how to fine-tune the Gemini Flash generative model using the Vertex AI Supervised Fine-Tuning feature. Supervised Fine-Tuning allows you to use your own training data to further refine the base model's capabilities towards your specific tasks.

Supervised Fine-Tuning uses labeled examples (in this case images) to tune the base Gemini model. Each example demonstrates the output you want from your text model during inference.

Before you begin tuning, ensure your training data is of high quality, well-labeled, and directly relevant to the target task. This is crucial as low-quality data can adversely affect the performance and introduce bias in the fine-tuned model.

When you start the lab, the environment will contain the resources that are shown in the following diagram.

By the end of the lab, you will have used the architecture to perform several tasks.

The following table provides a detailed explanation of each task in relation to the lab architecture.

Numbered Task Detail
1. Open the notebook in Vertex AI Workbench and choose kernel.
2. Install packages, and configure the notebook for your project:
You will use the Google Gen AI SDK to work with Gemini through the Developer API and Vertex AI. This will require you to install python libraries and reference them throughout the lab. You also need to configure the notebook to have access to resources in your project, like the Cloud Storage bucket provided to you at lab launch.
3. Prepare the dataset:
In this task you download the dataset files including the images, and prepare them for use with your notebook.
4. Compare baseline model performance to ground truth:
You will compare the baseline Gemini Flash model against the ground truth of a single image in the dataset.
5. Evaluate the baseline model with the validation dataset:
You will use val.jsonl with your prompt and the baseline Gemini Flash model to evaluate its results. Note: you will use various evaluation metrics like rougeL_precision, rougeL_recall, and rougeL_fmeasure to evaluate the tuned model's performance.
6. Create the fine-tuned model:
You use your training data train.jsonl along with the baseline Gemini Flash model to create a fine-tuned model for your specific image captioning use case.
7. Evaluate the fine-tuned model:
You use evaluation metrics to evaluate the fine-tuned model against the validation dataset val.jsonl and the performance of the baseline Gemini Flash model. Evaluation will occur with evaluation metrics like rougeL.

Prerequisites

Before starting this lab, you should be familiar with:

  • Basic Python programming.
  • General API concepts.
  • Running Python code in a Jupyter notebook on Vertex AI Workbench.

Objectives

In this lab, you will:

  • Describe the process to utilize supervised fine-tuning.
  • Fine-tune the baseline Gemini Flash model with your training dataset.
  • Evaluate the results of image captioning with fine-tuned model.
  • Compare the results of the fine-tuned model with the baseline model and ground truth.

Setup and requirements

Before you click the Start Lab button

Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources are made available to you.

This hands-on lab lets you do the lab activities in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials you use to sign in and access Google Cloud for the duration of the lab.

To complete this lab, you need:

  • Access to a standard internet browser (Chrome browser recommended).
Note: Use an Incognito (recommended) or private browser window to run this lab. This prevents conflicts between your personal account and the student account, which may cause extra charges incurred to your personal account.
  • Time to complete the lab—remember, once you start, you cannot pause a lab.
Note: Use only the student account for this lab. If you use a different Google Cloud account, you may incur charges to that account.

How to start your lab and sign in to the Google Cloud console

  1. Click the Start Lab button. If you need to pay for the lab, a dialog opens for you to select your payment method. On the left is the Lab Details pane with the following:

    • The Open Google Cloud console button
    • Time remaining
    • The temporary credentials that you must use for this lab
    • Other information, if needed, to step through this lab
  2. Click Open Google Cloud console (or right-click and select Open Link in Incognito Window if you are running the Chrome browser).

    The lab spins up resources, and then opens another tab that shows the Sign in page.

    Tip: Arrange the tabs in separate windows, side-by-side.

    Note: If you see the Choose an account dialog, click Use Another Account.
  3. If necessary, copy the Username below and paste it into the Sign in dialog.

    {{{user_0.username | "Username"}}}

    You can also find the Username in the Lab Details pane.

  4. Click Next.

  5. Copy the Password below and paste it into the Welcome dialog.

    {{{user_0.password | "Password"}}}

    You can also find the Password in the Lab Details pane.

  6. Click Next.

    Important: You must use the credentials the lab provides you. Do not use your Google Cloud account credentials. Note: Using your own Google Cloud account for this lab may incur extra charges.
  7. Click through the subsequent pages:

    • Accept the terms and conditions.
    • Do not add recovery options or two-factor authentication (because this is a temporary account).
    • Do not sign up for free trials.

After a few moments, the Google Cloud console opens in this tab.

Note: To access Google Cloud products and services, click the Navigation menu or type the service or product name in the Search field.

Task 1. Open the notebook in Vertex AI Workbench and choose kernel

Task 1a. Open the notebook

  1. In the Google Cloud console, on the Navigation menu (), click Vertex AI > Workbench.

  2. Find the instance and click on the Open JupyterLab button.

The JupyterLab interface for your Workbench instance opens in a new browser tab.

Note: If you do not see notebooks in JupyterLab, please follow these additional steps to reset the instance:

1. Close the browser tab for JupyterLab, and return to the Workbench home page.

2. Select the checkbox next to the instance name, and click Reset.

3. After the Open JupyterLab button is enabled again, wait one minute, and then click Open JupyterLab.

Task 1b. Choose the kernel

  1. Open the file.

  2. In the Select Kernel dialog, choose Python 3 from the list of available kernels.

Task 2. Install packages, and configure the notebook.

In this task you install the required Python packages, restart the kernel runtime, configure your notebook to use your project and region, and import libraries.

  1. Complete the steps in Task 2, Install packages, and configure the notebook section of the notebook.

    For Project ID, use , and for Location, use , if these aren't already populated for you.

Click Check my progress to verify the objective. Install packages and configure the notebook.

Task 3. Prepare the dataset

In this task, you download the dataset files including the images, and prepare them for use with your notebook.

  1. Complete the Prepare the dataset section of the notebook.

Click Check my progress to verify the objective. Prepare the dataset.

Task 4. Compare baseline model performance to ground truth

In this task, you will compare the baseline Gemini Flash model against the ground truth of a single image in the dataset.

  1. Run through the Compare baseline model performance to ground truth section of the notebook.

Click Check my progress to verify the objective. Compare baseline model performance to ground truth.

Task 5. Evaluate the baseline model with the validation dataset

In this task, you will use val.jsonl with your prompt and the baseline Gemini Flash model to evaluate its results.

  1. Run through the Evaluate the baseline model with the validation dataset section of the notebook.

Click Check my progress to verify the objective. Evaluate the baseline model with the validation dataset.

Task 6. Create the fine-tuned model

In this task, you use your training data train.jsonl along with the baseline Gemini Flash model to create a fine-tuned model for your specific image captioning use case.

  1. Run through the Create the fine-tuned model section of the notebook.
Note: The tuning job will take several minutes to complete. Wait for the tuning job to complete before proceeding to subsequent cells of the notebook. Important: You will need to access the tuning job details, by clicking on the VIEW TUNING JOB button and access the Experiment details by clicking on the VIEW EXPERIMENT and wait until the state of the job is 'Succeeded' in order to get credit for check progress on this task.

Click Check my progress to verify the objective. Create the fine-tuned model.

Task 7. Evaluate the fine-tuned model

In this task, you use evaluation metrics to evaluate the fine-tuned model against the validation dataset val.jsonl and the performance of the baseline Gemini Flash model. Evaluation will occur with evaluation metrics like rougeL.

  1. Run through the Evaluate the fine-tuned model section of the notebook.
Important: As noted in the notebook, if you see an error when running the first cell in this task, "Endpoint projects/... misconfigured, traffic split not set. Please wait. It takes about 10 minutes once the previous cell has completed running for Vertex AI to update the traffic split setting for the deployed endpoint, and you will not be able to run any online predictions using this endpoint until that value is 100 percent. While you are waiting you can review the Google Documentation on Define your evaluation metrics. This link will help you to understand what rougeL is and why it is applicable to this use case.

Click Check my progress to verify the objective. Evaluate the fine-tuned model.

Congratulations!

In this lab, you learned how to use the supervised fine-tuning capability of Vertex AI to fine-tune Gemini using custom data to enhance its image captioning ability.

Next steps / learn more

Check out the following resources to learn more about Gemini:

Google Cloud training and certification

...helps you make the most of Google Cloud technologies. Our classes include technical skills and best practices to help you get up to speed quickly and continue your learning journey. We offer fundamental to advanced level training, with on-demand, live, and virtual options to suit your busy schedule. Certifications help you validate and prove your skill and expertise in Google Cloud technologies.

Manual Last Updated July 12, 2025

Lab Last Tested July 12, 2025

Copyright 2025 Google LLC. All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.

Before you begin

  1. Labs create a Google Cloud project and resources for a fixed time
  2. Labs have a time limit and no pause feature. If you end the lab, you'll have to restart from the beginning.
  3. On the top left of your screen, click Start lab to begin

This content is not currently available

We will notify you via email when it becomes available

Great!

We will contact you via email if it becomes available

One lab at a time

Confirm to end all existing labs and start this one

Use private browsing to run the lab

Use an Incognito or private browser window to run this lab. This prevents any conflicts between your personal account and the Student account, which may cause extra charges incurred to your personal account.