
Before you begin
- Labs create a Google Cloud project and resources for a fixed time
- Labs have a time limit and no pause feature. If you end the lab, you'll have to restart from the beginning.
- On the top left of your screen, click Start lab to begin
In this lab, you build and train a forecasting model with Vertex AI AutoML.
This Qwiklabs hands-on lab lets you do the lab activities yourself in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials that you use to sign in and access Google Cloud for the duration of the lab.
To complete this lab, you need:
Click the Start Lab button. If you need to pay for the lab, a pop-up opens for you to select your payment method. On the left is a panel populated with the temporary credentials that you must use for this lab.
Copy the username, and then click Open Google Console. The lab spins up resources, and then opens another tab that shows the Choose an account page.
On the Choose an account page, click Use Another Account. The Sign in page opens.
Paste the username that you copied from the Connection Details panel. Then copy and paste the password.
After a few moments, the Cloud console opens in this tab.
Google Cloud Shell is a virtual machine that is loaded with development tools. It offers a persistent 5GB home directory and runs on the Google Cloud.
Google Cloud Shell provides command-line access to your Google Cloud resources.
In Cloud console, on the top right toolbar, click the Open Cloud Shell button.
Click Continue.
It takes a few moments to provision and connect to the environment. When you are connected, you are already authenticated, and the project is set to your PROJECT_ID. For example:
gcloud is the command-line tool for Google Cloud. It comes pre-installed on Cloud Shell and supports tab-completion.
Output:
Example output:
Output:
Example output:
This lab uses data from the Iowa Liquor Sales dataset from BigQuery Public Datasets. This dataset consists of the wholesale liquor purchases in the US state of Iowa since 2012.
Once you are in the public dataset, you can browse the original raw data by clicking View Dataset. To access the table, navigate in the left navigation bar to expand the bigquery-public-datasets project, expand the iowa_liquor_sales dataset (you may have to click More Results at the bottom of the list first), then click the sales table. You can select the Preview tab to see a selection of rows from the dataset.
For the purposes of this lab, some basic data pre-processing has already been done to group the purchases by day. You use a CSV file that is extracted from the BigQuery table.
The columns in the CSV file are:
In this case, you forecast the overall purchases in one time-series, so the id column is set to 0 for each row.
In this task, you import the data from the Cloud Storage bucket configured for your project into a dataset.
In the Vertex AI menu, navigate to Datasets.
Click Create Dataset.
On the Create dataset page, set the Dataset name to iowa_daily
.
For the data type and objective, click Tabular, and then select Forecasting.
Click Create.
The next step is to import data into the dataset. It is best practice to import data from a BigQuery table. The data currently resides in a Cloud Storage bucket supplied to you. It must first be imported into a BigQuery table.
You may need to click Authorize for Cloud Shell to continue.
In this task, you import training data into BigQuery.
In the Google Cloud Console, on the Navigation menu, right-click on BigQuery and select Open link in new tab.
In the BigQuery tab in your browser, in the Explorer pane, click the vertical ellipsis () next to your project and select Create dataset.
In the Create dataset pop-up page, set the Dataset ID to iowa_daily
.
Set the Location type to Region.
Select the Region to us-central1 (Iowa).
Leave the remaining settings on their default values and click Create Dataset.
Expand the node for your project in the Explorer pane, click the vertical ellipsis () next to the iowa_daily dataset and select Create table.
In the Create table page, under Source, in the Create table from field, select Google Cloud Storage.
Paste the value of DATA_INPUT_FILE
displayed in Cloud Shell earlier into the Select file from Cloud Storage bucket field, excluding the "gs://" prefix.
Select CSV in the File format field.
In the Destination section, enter forecasting
into the Table field.
In the Schema section, check the Auto detect box.
Leave the remaining settings on their default values and click Create Table.
In this task, you configure Vertex AI to import training data from BigQuery.
Return to the Vertex AI browser tab and on the Source tab, select the Select a table or view from BigQuery.
Select a path in the BigQuery path field by clicking Browse.
In the Select path page, enter forecasting
in the search box and click Search.
Check the forecasting table associated with your Project ID and click Select.
Click Continue.
In this task, you create a model and configure it to use the imported data.
After a few minutes, AutoML notifies you that the import has been completed. At that point, you can configure the model features.
Select the Series identifier column to be id. You only have one time-series in our dataset, so this is a formality.
Select the Timestamp column to be ds.
Click Generate Statistics. When this process completes you will see a summary of the data to be imported. This process will take a few minutes to complete. You may continue to the next step.
Click Train New Model to begin the training process.
Select the AutoML radio button.
Click Continue.
Select the Target column to be y. This is the value that you are predicting.
If not already set earlier, set the Series identifier column to id and the Timestamp column to ds.
Set Data granularity to Daily and Forecast horizon to 7. This field specifies the number of periods that the model can predict into the future.
Set the Context window to 7. The model will use data from the previous 30 days to make a prediction. There are trade-offs between shorter and longer windows and generally, selecting a value between 1-10x for the forecasting horizon is recommended.
Check the box to Export test dataset to BigQuery.
In the BigQuery path field, enter your Project ID, iowa_daily
and test_data
, separated by periods as the format advice guides you.
The Data split column enables you to select specific rows to be used for training, validation, and testing. When you create your training data, you add a column that can contain one of the following (case sensitive) values:
The values in this column must be one of the two following combinations:
Every row must have a value for this column; it cannot be the empty string.
For example, with all sets specified:
With only the test set specified:
The Data split column can have any valid column name; its transformation type can be Categorical, Text, or Auto.
If the value of the Data split column is UNASSIGNED
, Vertex AI automatically assigns that row to the training or validation set.
When you train your model, you should select a Manual data split and specify this column as the data split column.
In this step, you will specify more details about how you'd like to train the model.
Set the holiday column Transformation to Automatic.
Set the holiday column Feature type to Covariate.
Set the Available at forecast column to Available, because you know whether a given date is a holiday in advance.
Do not edit the Weight column. This will default to weighting evenly.
Expand Advanced Options and set the Optimization objective to RMSE.
Click Continue.
In this task, you initiate the model training.
Set a budget of your choice. For the purposes of this lab, 1 node hour is sufficient to train the model, but you should be aware that this is only reasonable for small 'toy' datasets. Typically, in production, this value would be something in the range of 6-12 node hours.
Click Start Training to begin the training process.
You can click Vertex AI > Training in the Google Cloud Console menu to see the progress of the model training.
You complete this lab till this point without waiting for the training result. Depending on the resources, the training takes 1-2 hours to complete. In production, you'd get an email to inform you of the completion. When the model is ready, you can then evaluate the performance and make predictions based on the trained model. You will continue with the rest of the steps of model evaluation and model predictions in the next two labs of this course.
In this lab, you practiced with data preparation, model building, and model training with Vertex AI AutoML.
You're ready to build your own forecasting model!
When you have completed your lab, click End Lab. Google Cloud Skills Boost removes the resources you’ve used and cleans the account for you.
You will be given an opportunity to rate the lab experience. Select the applicable number of stars, type a comment, and then click Submit.
The number of stars indicates the following:
You can close the dialog box if you don't want to provide feedback.
For feedback, suggestions, or corrections, please use the Support tab.
Copyright 2022 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.
This content is not currently available
We will notify you via email when it becomes available
Great!
We will contact you via email if it becomes available
One lab at a time
Confirm to end all existing labs and start this one