arrow_back

MLOps with Vertex AI

Join Sign in

MLOps with Vertex AI

1 hour 30 minutes 9 Credits

GSP274

Google Cloud selp-paced labs logo

Overview

Taking a TensorFlow model that you trained in your Jupyter notebook and deploying the SavedModel to Vertex AI doesn’t scale to hundreds of models and large teams. Retraining will be difficult because the ops team has to set up all of the ops and monitoring and scheduling on top of something that is really clunky and totally non-minimal.

For an ML model to be placed into production, it must meet the following requirements:

  • The model should be under version control. Source code control systems such as GitHub work better with text files (such as .py files) than with mixtures of text and binaries (which is what .ipynb files are).

  • The entire process — from dataset creation to training to deployment – must be driven by code. This enables the automatic retrigger of a training run using GitHub Actions or GitLab Continuous Integration whenever new changed code is checked in.

  • The entire process should be invokable from a single entry point, so that the retraining can be triggered by non-code changes such as the arrival of new data in a Cloud Storage bucket.

  • It should be easy to monitor the performance of models and endpoints and take measures to fix some subset of issues that arise without having to modify the model code.

Together, these criteria go by the name MLOps. Google Cloud, in general, and Vertex AI, in particular, provide a number of MLOps capabilities. However, to take advantage of these in-built capabilities, you separate out the model code from the ops code, and express everything in Python rather than in notebooks.

In this lab, you automate the entire Machine Learning (ML) process by creating a Vertex training pipeline. You create a single entry point for the end-to-end training run, which is triggered whenever new code is checked in, when new data is received, or when changes in feature distribution or model evaluation are detected.

This lab uses a set of code samples and scripts developed for Data Science on the Google Cloud Platform, 2nd Edition from O'Reilly Media, Inc.

Objectives

  • Develop and Deploy model using Python

  • Make predictions from the deployed model

Setup and requirements

Qwiklabs setup

Before you click the Start Lab button

Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources will be made available to you.

This hands-on lab lets you do the lab activities yourself in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials that you use to sign in and access Google Cloud for the duration of the lab.

To complete this lab, you need:

  • Access to a standard internet browser (Chrome browser recommended).
Note: Use an Incognito or private browser window to run this lab. This prevents any conflicts between your personal account and the Student account, which may cause extra charges incurred to your personal account.
  • Time to complete the lab---remember, once you start, you cannot pause a lab.
Note: If you already have your own personal Google Cloud account or project, do not use it for this lab to avoid extra charges to your account.

How to start your lab and sign in to the Google Cloud Console

  1. Click the Start Lab button. If you need to pay for the lab, a pop-up opens for you to select your payment method. On the left is the Lab Details panel with the following:

    • The Open Google Console button
    • Time remaining
    • The temporary credentials that you must use for this lab
    • Other information, if needed, to step through this lab
  2. Click Open Google Console. The lab spins up resources, and then opens another tab that shows the Sign in page.

    Tip: Arrange the tabs in separate windows, side-by-side.

    Note: If you see the Choose an account dialog, click Use Another Account.
  3. If necessary, copy the Username from the Lab Details panel and paste it into the Sign in dialog. Click Next.

  4. Copy the Password from the Lab Details panel and paste it into the Welcome dialog. Click Next.

    Important: You must use the credentials from the left panel. Do not use your Google Cloud Skills Boost credentials. Note: Using your own Google Cloud account for this lab may incur extra charges.
  5. Click through the subsequent pages:

    • Accept the terms and conditions.
    • Do not add recovery options or two-factor authentication (because this is a temporary account).
    • Do not sign up for free trials.

After a few moments, the Cloud Console opens in this tab.

Note: You can view the menu with a list of Google Cloud Products and Services by clicking the Navigation menu at the top-left. Navigation menu icon

Task 1. Prepare your environment

  1. In the Cloud Console, on the Navigation menu (Navigation menu), click Compute Engine > VM instances.

  2. Click the SSH button inline with startup-vm VM to launch a terminal and connect.

  3. In the terminal, enter the following command to clone the repository:

git clone https://github.com/GoogleCloudPlatform/data-science-on-gcp
  1. Navigate to the repository directory:
cd ~/data-science-on-gcp/10_mlops
  1. Install the required Python packages:
pip3 install google-cloud-aiplatform cloudml-hypertune kfp numpy tensorflow

Task 2. Run a standalone model

Jupyter notebooks are great for development, but putting those notebooks directly into production is not ideal, even though Vertex AI allows you to do this. It's good practice that you convert your initial prototyping model code into a Python file and then continue all development in it.

  1. In the startup-vm terminal, use cat to examine the file model.py:
cat model.py

The file model.py is created by extracting all the previously developed Keras model code from the jupyter notebook. See chapter 9 of Data Science on the Google Cloud Platform, 2nd Edition for more information about the Keras model code.

  1. Run the model.py in development mode to make sure it works.
export PROJECT_ID=$(gcloud info --format='value(config.project)') export BUCKET_NAME=$PROJECT_ID-dsongcp python3 model.py --bucket $BUCKET_NAME --develop

To could run it on the full dataset by dropping the develop flag, but it takes more time to complete.

Task 3. Develop and deploy model using Vertex AI

To develop and deploy a pipeline on Vertex AI, a training pipeline must do five things in code:

  • Load up a managed dataset in Vertex AI
  • Set up training infrastructure to run model.py
  • Train the model by invoking functions in model.py on the managed dataset
  • Find the endpoint to which to deploy the model
  • Deploy the model to the endpoint
  1. In the startup-vm terminal, examine the train_on_vertexai.py script used to develop and deploy pipeline on Vertex AI:
cat train_on_vertexai.py

That’s it! Now, you have a Python program that you can run anytime to retrain and/or deploy the trained model. Typically, the MLOps person does not replace the entire model, but sends only a small fraction of the traffic to the model.

  1. Run the training pipeline using the following command, which passes in the PROJECT_ID and storage bucket name BUCKET_NAME in develop mode.
python3 train_on_vertexai.py --project $PROJECT_ID --bucket $BUCKET_NAME --develop --cpuonly --tfversion 2.6 Note: If you encounter a model building error, rerun the command. An example of a model building error is if the service account doesn't have sufficient permissions for writing objects to the cloud storage bucket.
  1. In the Cloud Console, on the Navigation menu (Navigation menu), click Vertex AI > Training to monitor the training pipeline. When the status is Finished, click the training pipeline name to track the model deployment status.
Note: It takes approximately 30 minutes to complete the job.

Click Check my progress to verify the objective. Develop and deploy the model using Vertex AI

Task 4. Make predictions from the deployed model

Sending the normal prediction request to the model endpoint will return a response that contains feature attributions.

Here, you have two different ways to call the model, using bash and using Python. You use either one to call the deployed model.

Call the model using bash

You can send prediction requests using bash script call_predict.sh by entering the following commands.

cd ../09_vertexai bash ./call_predict.sh cd ../10_mlops

Output -

Using endpoint [https://us-central1-aiplatform.googleapis.com/] 4600611737312428032 Using endpoint [https://us-central1-prediction-aiplatform.googleapis.com/] [[0.60023278], [0.747012496]]

Call the model using Python

In a Pythonic way, you can send prediction requests using call_predict.py by entering the following command.

python3 call_predict.py

Output -

Prediction(predictions=[[0.60023278], [0.747012496]], deployed_model_id='8618297593950109696', explanations=None)

Optional exercises

This is an optional exercise for this lab, you can continue with any remaining time or end this lab without completing the exercise below. It won't affect your passing percentage.

Run a Vertex AI pipeline on full dataset

You can also run a Vertex AI pipeline on the full dataset by dropping the develop flag, using the command below. However, it takes much more time to complete.

python3 train_on_vertexai.py --project $PROJECT_ID --bucket $BUCKET_NAME Note: You can use --cpuonly argument to run Vertex AI pipeline without using GPU accelerators.

You can also add the --tfversion argument to the command to specify the TensorFlow version:

python3 train_on_vertexai.py --project $PROJECT_ID --bucket $BUCKET_NAME --tfversion 2.6

Train an AutoML model using Vertex AI

AutoML lets you create and train a model with minimal technical effort. Even if you want the flexibility of a custom training application, you can use AutoML to quickly prototype models and explore new datasets before investing in development. AutoML on Vertex AI enables you to build a code-free model based on the training tabular dataset.

Run the training pipeline using the following command by passing PROJECT_ID and storage bucket name BUCKET_NAME in automl mode.

python3 train_on_vertexai.py --project $PROJECT_ID --bucket $BUCKET_NAME --automl

Hyperparameter tuning

Our custom model is better than AutoML, but could it be even better? There are a number of hyperparameters — learning rate, batch size, number of layers/nodes in the neural network, number of buckets, number of embedding dimensions, etc.

To add this capability to the Python code, this is what you have to do:

  • Parameterize the model in model.py
  • Implement a shorter training run
  • Write out metrics during training
  • Implement a hyperparameter tuning pipeline
  • Run the best trial to completion

Recall that the model training file exists as model.py in the code repository of this book and the pipeline orchestrator is train_on_vertexai.py.

Run the training pipeline using the following command by passing PROJECT_ID, storage bucket name BUCKET_NAME, and num_hparam_trials 10.

python3 train_on_vertexai.py --project $PROJECT_ID --bucket $BUCKET_NAME --num_hparam_trials 10

Congratulations!

In this lab, you learned how to automate the entire process by creating a Vertex training pipeline. you created a single entry point for the end-to-end training run. The entry point is triggered whenever new code is checked in, when new data is received, or when changes in feature distribution or model evaluation are detected.

completion_badge_Data_Science_on_Google_Cloud_Machine_Learning-135.png

Finish your Quest

This self-paced lab is part of the Data Science on Google Cloud: Machine Learning Quest. A Quest is a series of related labs that form a learning path. Completing this Quest earns you the badge above, to recognize your achievement. You can make your badges public and link to them in your online resume or social media account. Enroll in this Quest and get immediate completion credit if you've taken this lab. See other available Quests.

Take your next lab

Continue your Quest with Real Time Machine Learning with Cloud Dataflow and Vertex AI, or check out these suggestions:

Next steps / learn more

The source of this lab:

Google Cloud Training & Certification

...helps you make the most of Google Cloud technologies. Our classes include technical skills and best practices to help you get up to speed quickly and continue your learning journey. We offer fundamental to advanced level training, with on-demand, live, and virtual options to suit your busy schedule. Certifications help you validate and prove your skill and expertise in Google Cloud technologies.

Manual Last Updated June 13, 2022
Lab Last Tested June 13, 2022

Copyright 2022 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.