
Before you begin
- Labs create a Google Cloud project and resources for a fixed time
- Labs have a time limit and no pause feature. If you end the lab, you'll have to restart from the beginning.
- On the top left of your screen, click Start lab to begin
Get MovieLens Data
/ 5
Explore the Data
/ 5
Making Recommendations
/ 5
Customer Targeting
/ 5
BigQuery is Google's fully managed, NoOps, low cost analytics database. With BigQuery you can query terabytes and terabytes of data without having any infrastructure to manage or needing a database administrator. BigQuery uses SQL and can take advantage of the pay-as-you-go model. BigQuery allows you to focus on analyzing data to find meaningful insights.
BigQuery Machine Learning (BigQuery ML) is a feature in BigQuery where data analysts can create, train, evaluate, and predict with machine learning models with minimal coding.
Collaborative filtering provides a way to generate product recommendations for users, or user targeting for products. The starting point is a table with three columns: a user id, an item id, and the rating that the user gave the product. This table can be sparse -- users don’t have to rate all products. Then, based on just the ratings, the technique finds similar users and similar products and determines the rating that a user would give an unseen product. Then, you can recommend the products with the highest predicted ratings to users, or target products at users with the highest predicted ratings.
To illustrate recommender systems in action, you will use the MovieLens dataset. This is a dataset of movie reviews released by GroupLens, a research lab in the Department of Computer Science and Engineering at the University of Minnesota, through funding by the US National Science Foundation.
In this lab, you learn to perform the following tasks:
For each lab, you get a new Google Cloud project and set of resources for a fixed time at no cost.
Sign in to Qwiklabs using an incognito window.
Note the lab's access time (for example, 1:15:00
), and make sure you can finish within that time.
There is no pause feature. You can restart if needed, but you have to start at the beginning.
When ready, click Start lab.
Note your lab credentials (Username and Password). You will use them to sign in to the Google Cloud Console.
Click Open Google Console.
Click Use another account and copy/paste credentials for this lab into the prompts.
If you use other credentials, you'll receive errors or incur charges.
Accept the terms and skip the recovery resource page.
The Welcome to BigQuery in the Cloud Console message box opens. This message box provides a link to the quickstart guide and lists UI updates.
In this task you will use the command line to create a BigQuery dataset to store the MovieLens data. The MovieLens data will then be loaded from a Cloud Storage bucket into the dataset.
To create a BigQuery dataset and load the MovieLens data the Cloud Shell is used.
movies
:Click Check my progress to verify the objective.
In this task you will explore and verify the MovieLens dataset using Query editor.
You should confirm that the dataset consists of over 138 thousand users, nearly 27 thousand movies, and a little more than 20 million ratings.
movielens_movies
:Feel free to perform additional queries until you are comfortable with the dataset.
Click Check my progress to verify the objective.
In this task you will view the metrics for a trained model which was generated using matrix factorization.
Matrix factorization is a collaborative filtering technique that relies on two vectors called the user factors and the item factors. The user factors is a low-dimensional representation of a user_id
and the item factors similarly represents an item_id
.
To perform a matrix factorization of our data, you use the typical BigQuery ML syntax except that the model_type
is matrix_factorization
and you have to identify which columns play what roles in the collaborative filtering setup.
In order to apply matrix factorization to the movie ratings data, the BigQuery ML query needs to be executed to create the model. However, creation of this model type can take up to 40 minutes and requires a Google Cloud project with reservation-oriented resources -- which is unlike those offered by the Qwiklabs environment.
A model has been created in the Cloud Training project's cloud-training-demos BigQuery dataset for use in the rest of the lab.
num_factors
and l2_reg
options have been selected after much experimentation to speed up training of the model.In this task you will use the trained model to provide recommendations.
Let’s find the best comedy movies to recommend to the user whose userId
is 903.
This result includes movies the user has already seen and rated in the past.
Click Check my progress to verify the objective.
In this task you will look at how to identify the top-rated movies for a specific user. Sometimes, you have a product and have to find the customers who are likely to appreciate it.
You wish to get more reviews for movieId=96481
which has only one rating and you wish to send coupons to the 100 users who are likely to rate it the highest.
The result gives us 100 users to target.
Click Check my progress to verify the objective.
In this task you will perform a query to obtain batch predictions for users and movies.
What if you wish to carry out predictions for every user and movie combination? Instead of having to pull distinct users and movies as in the previous query, a convenience function is provided to carry out batch predictions for all movieId
and userId
encountered during training.
Without the LIMIT command the results would be too large to return given the default settings. But the output provides you a sense of the type of predictions that can be made with this model.
As seen in a section above, it is possible to filter out movies the user has already seen and rated in the past. The reason already seen movies aren’t filtered out by default is that there are situations (think of restaurant recommendations, for example) where it is perfectly expected that you would need to recommend restaurants the user has liked in the past.
When you have completed your lab, click End Lab. Google Cloud Skills Boost removes the resources you’ve used and cleans the account for you.
You will be given an opportunity to rate the lab experience. Select the applicable number of stars, type a comment, and then click Submit.
The number of stars indicates the following:
You can close the dialog box if you don't want to provide feedback.
For feedback, suggestions, or corrections, please use the Support tab.
Copyright 2022 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.
This content is not currently available
We will notify you via email when it becomes available
Great!
We will contact you via email if it becomes available
One lab at a time
Confirm to end all existing labs and start this one