
准备工作
- 实验会创建一个 Google Cloud 项目和一些资源,供您使用限定的一段时间
- 实验有时间限制,并且没有暂停功能。如果您中途结束实验,则必须重新开始。
- 在屏幕左上角,点击开始实验即可开始
Setup the data environment
/ 40
Aggregating Site Traffic by User with SQL
/ 60
In this lab, you:
For each lab, you get a new Google Cloud project and set of resources for a fixed time at no cost.
Sign in to Qwiklabs using an incognito window.
Note the lab's access time (for example, 1:15:00
), and make sure you can finish within that time.
There is no pause feature. You can restart if needed, but you have to start at the beginning.
When ready, click Start lab.
Note your lab credentials (Username and Password). You will use them to sign in to the Google Cloud Console.
Click Open Google Console.
Click Use another account and copy/paste credentials for this lab into the prompts.
If you use other credentials, you'll receive errors or incur charges.
Accept the terms and skip the recovery resource page.
Before you begin your work on Google Cloud, you need to ensure that your project has the correct permissions within Identity and Access Management (IAM).
In the Google Cloud console, on the Navigation menu (), select IAM & Admin > IAM.
Confirm that the default compute Service Account {project-number}-compute@developer.gserviceaccount.com
is present and has the editor
role assigned. The account prefix is the project number, which you can find on Navigation menu > Cloud Overview > Dashboard.
editor
role, follow the steps below to assign the required role.729328892908
).{project-number}
with your project number.For the purposes of this lab, you will mainly be using a Theia Web IDE hosted on Google Compute Engine. It has the lab repo pre-cloned. There is java langauge server support, as well as a terminal for programmatic access to Google Cloud APIs via the gcloud
command line tool, similar to Cloud Shell.
The lab repo has been cloned to your environment. Each lab is divided into a labs
folder with code to be completed by you, and a solution
folder with a fully workable example to reference if you get stuck.
File Explorer
button to look:You can also create multiple terminals in this environment, just as you would with cloud shell:
You can see with by running gcloud auth list
on the terminal that you're logged in as a provided service account, which has the exact same permissions are your lab user account:
If at any point your environment stops working, you can try resetting the VM hosting your IDE from the GCE console like this:
In this lab, you rewrite your previous BatchUserTraffic pipeline so that it performs the following:
CommonLog
object.As in the prior labs, the first step is to generate data for the pipeline to process. You will open the lab environment and generate the data as before:
Click Check my progress to verify the objective.
BatchUserTrafficSQLPipeline.java
in your IDE, which can be found in 4_SQL_Batch_Analytics/labs/src/main/java/com/mypackage/pipeline
.This pipeline already contains the necessary code to accept command-line options for the input path and one output table name, as well as code to read in events from Google Cloud Storage, parse those events, and write results to BigQuery. However, some important parts are missing.
As in the previous lab, the next step in the pipeline is to aggregate the events by each unique user_id
and count pageviews for each. This time, however, you will perform the aggregation using SQL using SqlTransform
instead of Java-based transforms.
Before you implement this, you will need to add a SQL dependency to your pipeline.
pom.xml
file for this pipeline located at 4_SQL_Batch_Analytics/labs/
and add the following dependency:BatchUserTrafficSQLPipeline.java
, ensure that the following imports are at the top of the file:Beam SQL can be implemented in the Apache Calcite dialect.
The SqlTransform.query(queryString)
method is the only API to create a PTransform
from a string representation of the SQL query. You can apply this PTransform either to a single PCollection or a PCollectionTuple that holds multiple PCollections (more on PColllectionTuples later).
When it is being applied to a single PCollection, it can be referenced via the table name PCOLLECTION in the query:
The resulting output is a Row
object with associated schema that can be further mutated in SQL Transforms or other Java PTransform
s, or stored in a sink.
SQLTransform
to your pipeline, supplying a SQL string to aggregate the number of hits by user_id and naming this 'pageviews'.You can also perform additional aggregations if you wish. For reference, this was the PTransform
previously used:
Some helpful SQL aggregate functions:
If you get stuck, reference the solution for hints.
You want to store all the raw results in BigQuery for further SQL aggregations later in the UI.
CommonLog
objects directly to BigQuery, to a table name referenced by a command line option rawTableName
.You can do this as before by terminating the first branch of the pipeline with a semicolon and starting each branch with logs.apply();
. Don't forget to add this new command line to the Pipeline options along with inputPath
and aggregateTableName
. Remember also to change the type hint on BigQueryIO.<Object>write()
.
Navigate to Navigation Menu > Dataflow to see the status of your pipeline.
Once your pipeline has finished, go to the BigQuery UI to query the two resulting tables. Make sure that logs.raw
exists and has data populated, as you will need that later in the lab.
Click Check my progress to verify the objective.
When you have completed your lab, click End Lab. Google Cloud Skills Boost removes the resources you’ve used and cleans the account for you.
You will be given an opportunity to rate the lab experience. Select the applicable number of stars, type a comment, and then click Submit.
The number of stars indicates the following:
You can close the dialog box if you don't want to provide feedback.
For feedback, suggestions, or corrections, please use the Support tab.
Copyright 2022 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.
此内容目前不可用
一旦可用,我们会通过电子邮件告知您
太好了!
一旦可用,我们会通过电子邮件告知您
一次一个实验
确认结束所有现有实验并开始此实验