チェックポイント
Develop and Deploy model using Python
/ 100
MLOps with Vertex AI
GSP274
Overview
Taking a TensorFlow model that you trained in your Jupyter notebook and deploying the SavedModel to Vertex AI doesn’t scale to hundreds of models and large teams. Retraining will be difficult because the ops team has to set up all of the ops and monitoring and scheduling on top of something that is really clunky and totally non-minimal.
For an ML model to be placed into production, it must meet the following requirements:
-
The model should be under version control. Source code control systems such as GitHub work better with text files (such as .py files) than with mixtures of text and binaries (which is what .ipynb files are).
-
The entire process — from dataset creation to training to deployment – must be driven by code. This enables the automatic retrigger of a training run using GitHub Actions or GitLab Continuous Integration whenever new changed code is checked in.
-
The entire process should be invokable from a single entry point, so that the retraining can be triggered by non-code changes such as the arrival of new data in a Cloud Storage bucket.
-
It should be easy to monitor the performance of models and endpoints and take measures to fix some subset of issues that arise without having to modify the model code.
Together, these criteria go by the name MLOps. Google Cloud, in general, and Vertex AI, in particular, provide a number of MLOps capabilities. However, to take advantage of these in-built capabilities, you separate out the model code from the ops code, and express everything in Python rather than in notebooks.
In this lab, you automate the entire Machine Learning (ML) process by creating a Vertex training pipeline. You create a single entry point for the end-to-end training run, which is triggered whenever new code is checked in, when new data is received, or when changes in feature distribution or model evaluation are detected.
This lab uses a set of code samples and scripts developed for Data Science on the Google Cloud Platform, 2nd Edition from O'Reilly Media, Inc.
What you'll learn
-
Develop and Deploy model using Python
-
Make predictions from the deployed model
Setup and requirements
Before you click the Start Lab button
Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources will be made available to you.
This hands-on lab lets you do the lab activities yourself in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials that you use to sign in and access Google Cloud for the duration of the lab.
To complete this lab, you need:
- Access to a standard internet browser (Chrome browser recommended).
- Time to complete the lab---remember, once you start, you cannot pause a lab.
How to start your lab and sign in to the Google Cloud Console
-
Click the Start Lab button. If you need to pay for the lab, a pop-up opens for you to select your payment method. On the left is the Lab Details panel with the following:
- The Open Google Console button
- Time remaining
- The temporary credentials that you must use for this lab
- Other information, if needed, to step through this lab
-
Click Open Google Console. The lab spins up resources, and then opens another tab that shows the Sign in page.
Tip: Arrange the tabs in separate windows, side-by-side.
Note: If you see the Choose an account dialog, click Use Another Account. -
If necessary, copy the Username from the Lab Details panel and paste it into the Sign in dialog. Click Next.
-
Copy the Password from the Lab Details panel and paste it into the Welcome dialog. Click Next.
Important: You must use the credentials from the left panel. Do not use your Google Cloud Skills Boost credentials. Note: Using your own Google Cloud account for this lab may incur extra charges. -
Click through the subsequent pages:
- Accept the terms and conditions.
- Do not add recovery options or two-factor authentication (because this is a temporary account).
- Do not sign up for free trials.
After a few moments, the Cloud Console opens in this tab.
Task 1. Prepare your environment
-
In the Cloud Console, on the Navigation menu (
), click Compute Engine > VM instances.
-
Click the SSH button inline with
startup-vm
VM to launch a terminal and connect. -
In the terminal, enter the following command to clone the repository:
- Navigate to the repository directory:
- Install the required Python packages:
Task 2. Run a standalone model
Jupyter notebooks are great for development, but putting those notebooks directly into production is not ideal, even though Vertex AI allows you to do this. It's good practice that you convert your initial prototyping model code into a Python file and then continue all development in it.
- In the
startup-vm
terminal, use cat to examine the filemodel.py
:
The file model.py
is created by extracting all the previously developed Keras model code from the jupyter notebook. See chapter 9 of Data Science on the Google Cloud Platform, 2nd Edition for more information about the Keras model code.
- Run the
model.py
in development mode to make sure it works.
To could run it on the full dataset by dropping the develop flag, but it takes more time to complete.
Task 3. Develop and deploy model using Vertex AI
To develop and deploy a pipeline on Vertex AI, a training pipeline must do five things in code:
- Load up a managed dataset in Vertex AI
- Set up training infrastructure to run model.py
- Train the model by invoking functions in model.py on the managed dataset
- Find the endpoint to which to deploy the model
- Deploy the model to the endpoint
- In the
startup-vm
terminal, examine thetrain_on_vertexai.py
script used to develop and deploy pipeline on Vertex AI:
That’s it! Now, you have a Python program that you can run anytime to retrain and/or deploy the trained model. Typically, the MLOps person does not replace the entire model, but sends only a small fraction of the traffic to the model.
- Run the training pipeline using the following command, which passes in the
PROJECT_ID
and storage bucket nameBUCKET_NAME
in develop mode.
- In the Cloud Console, on the Navigation menu (
), click Vertex AI > Training to monitor the training pipeline. When the status is Finished, click the training pipeline name to track the model deployment status.
Click Check my progress to verify the objective.
Task 4. Make predictions from the deployed model
Sending the normal prediction request to the model endpoint will return a response that contains feature attributions.
Here, you have two different ways to call the model, using bash and using Python. You use either one to call the deployed model.
Call the model using bash
- You can send prediction requests using bash script
call_predict.sh
by entering the following commands:
Output:
Call the model using Python
- In a Pythonic way, you can send prediction requests using
call_predict.py
by entering the following command:
Output:
Optional exercises
This is an optional exercise for this lab, you can continue with any remaining time or end this lab without completing the exercise below. It won't affect your passing percentage.
Run a Vertex AI pipeline on full dataset
- You can also run a Vertex AI pipeline on the full dataset by dropping the develop flag, using the command below. However, it takes much more time to complete:
--cpuonly
argument to run Vertex AI pipeline without using GPU accelerators. - You can also add the
--tfversion
argument to the command to specify the TensorFlow version:
Train an AutoML model using Vertex AI
AutoML lets you create and train a model with minimal technical effort. Even if you want the flexibility of a custom training application, you can use AutoML to quickly prototype models and explore new datasets before investing in development. AutoML on Vertex AI enables you to build a code-free model based on the training tabular dataset.
- Run the training pipeline using the following command by passing
PROJECT_ID
and storage bucket nameBUCKET_NAME
inautoml
mode:
Hyperparameter tuning
Our custom model is better than AutoML, but could it be even better? There are a number of hyperparameters — learning rate, batch size, number of layers/nodes in the neural network, number of buckets, number of embedding dimensions, etc.
To add this capability to the Python code, this is what you have to do:
- Parameterize the model in
model.py
. - Implement a shorter training run.
- Write out metrics during training.
- Implement a hyperparameter tuning pipeline.
- Run the best trial to completion.
Recall that the model training file exists as model.py
in the code repository of this book and the pipeline orchestrator is train_on_vertexai.py
.
- Run the training pipeline using the following command by passing
PROJECT_ID
, storage bucket nameBUCKET_NAME
, andnum_hparam_trials
10:
Congratulations!
In this lab, you learned how to automate the entire process by creating a Vertex training pipeline. you created a single entry point for the end-to-end training run. The entry point is triggered whenever new code is checked in, when new data is received, or when changes in feature distribution or model evaluation are detected.
Finish your quest
This self-paced lab is part of the Data Science on Google Cloud: Machine Learning quest. A quest is a series of related labs that form a learning path. Completing this quest earns you a badge to recognize your achievement. You can make your badge or badges public and link to them in your online resume or social media account. Enroll in this quest and get immediate completion credit. See the Google Cloud Skills Boost catalog to see all available quests.
Take your next lab
Continue your Quest with Real Time Machine Learning with Cloud Dataflow and Vertex AI, or check out these Google Cloud Skills Boost labs:
Next steps / Learn more
The source of this lab:
Google Cloud training and certification
...helps you make the most of Google Cloud technologies. Our classes include technical skills and best practices to help you get up to speed quickly and continue your learning journey. We offer fundamental to advanced level training, with on-demand, live, and virtual options to suit your busy schedule. Certifications help you validate and prove your skill and expertise in Google Cloud technologies.
Manual Last Updated Aug 8, 2022
Lab Last Tested Aug 8, 2022
Copyright 2023 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.