arrow_back

Visualizing Data with Google Data Studio

Join Sign in

Visualizing Data with Google Data Studio

1 hour 30 minutes 7 Credits

GSP197

Google Cloud Self-Paced Labs

Overview

This lab demonstrates how to use Google Data Studio to visualize data stored in Google Cloud SQL for MySQL.

The data set used in this lab comes from the US Bureau of Transport Statistics and contains historic information about internal flights in the United States. This data set can be used to demonstrate a wide range of data science concepts and techniques and is used in all of the labs in the Data Science on Google Cloud Platform Quest.

Objectives

  • Create Cloud SQL database views

  • Create a Cloud SQL Datasource in Google Data Studio

  • Create a Data Studio report with a date range control

  • Create multiple charts using Cloud SQL database views

Setup and Requirements

This lab will have assets provisioned for it. You can log in to the lab with the provided credentials, but only being the lab exercises after the green progress bar has disappeared, and you see the Lab Running green light on the Qwiklabs page.

Before you click the Start Lab button

Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources will be made available to you.

This hands-on lab lets you do the lab activities yourself in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials that you use to sign in and access Google Cloud for the duration of the lab.

What you need

To complete this lab, you need:

  • Access to a standard internet browser (Chrome browser recommended).
  • Time to complete the lab.

Note: If you already have your own personal Google Cloud account or project, do not use it for this lab.

Note: If you are using a Chrome OS device, open an Incognito window to run this lab.

How to start your lab and sign in to the Google Cloud Console

  1. Click the Start Lab button. If you need to pay for the lab, a pop-up opens for you to select your payment method. On the left is a panel populated with the temporary credentials that you must use for this lab.

    Open Google Console

  2. Copy the username, and then click Open Google Console. The lab spins up resources, and then opens another tab that shows the Sign in page.

    Sign in

    Tip: Open the tabs in separate windows, side-by-side.

  3. In the Sign in page, paste the username that you copied from the left panel. Then copy and paste the password.

    Important: You must use the credentials from the left panel. Do not use your Google Cloud Training credentials. If you have your own Google Cloud account, do not use it for this lab (avoids incurring charges).

  4. Click through the subsequent pages:

    • Accept the terms and conditions.
    • Do not add recovery options or two-factor authentication (because this is a temporary account).
    • Do not sign up for free trials.

After a few moments, the Cloud Console opens in this tab.

Activate Cloud Shell

Cloud Shell is a virtual machine that is loaded with development tools. It offers a persistent 5GB home directory and runs on the Google Cloud. Cloud Shell provides command-line access to your Google Cloud resources.

In the Cloud Console, in the top right toolbar, click the Activate Cloud Shell button.

Cloud Shell icon

Click Continue.

cloudshell_continue.png

It takes a few moments to provision and connect to the environment. When you are connected, you are already authenticated, and the project is set to your PROJECT_ID. For example:

Cloud Shell Terminal

gcloud is the command-line tool for Google Cloud. It comes pre-installed on Cloud Shell and supports tab-completion.

You can list the active account name with this command:

gcloud auth list

(Output)

Credentialed accounts: - <myaccount>@<mydomain>.com (active)

(Example output)

Credentialed accounts: - google1623327_student@qwiklabs.net

You can list the project ID with this command:

gcloud config list project

(Output)

[core] project = <project_ID>

(Example output)

[core] project = qwiklabs-gcp-44776a13dea667a6

Preparing your Environment

This lab uses a data set, code samples, and scripts developed for the Data Science on Google Cloud Platform book from O'Reilly Media, Inc. and specifically covers the data visualization tasks covered in Chapter 3, "Creating Compelling Dashboards".

A workspace virtual machine has been configured for you to allow you to manage your Cloud SQL instance.

Connect to your workspace virtual machine instance

On the Google Cloud console click Compute Engine > VM Instances. There will only be a single instance running, the pre-configured workspace you will use in this lab.

Click the SSH button to open an SSH console to the workspace virtual machine instance.

Connect to the Lab Cloud SQL instance

This lab uses a pre-configured Cloud SQL Database that has been pre-loaded with two months of sample flight data for January and February 2015, obtained from the US Bureau of Transport Statistics. The flight data is in a table called flights in the bts database.

Get the IP address of your Cloud SQL instance by running the following in the SSH shell:

MYSQLIP=$(gcloud sql instances describe \ flights --format="value(ipAddresses.ipAddress)")

Connect to the mysql command line interface:

mysql --host=$MYSQLIP --user=root --password

When prompted for a password enter Passw0rd.

You will now be in the mysql command line mysql>.

Check the import by entering the following commands:

use bts; describe flights;

You should see a table with 27 rows identifying the field names in the flights table.

Count the number of records by running:

select count(*) from flights;

This will report that there are 899159 records in the flights table.

Now create some table views that will let you easily look at the sets of flights that are delayed by 10, 15 and 20 minutes respectively. You'll use them later in the lab.

Run these commands:

CREATE VIEW delayed_10 AS SELECT * FROM flights WHERE dep_delay > 10; CREATE VIEW delayed_15 AS SELECT * FROM flights WHERE dep_delay > 15; CREATE VIEW delayed_20 AS SELECT * FROM flights WHERE dep_delay > 20;

Exit the mysql interactive console:

exit

Connect Data Studio to visually analyze the data set

Go back to the Console click the Navigation menu to display the left menu. In the Databases section, click on SQL.

Click on the Instance ID to see the Details page. Copy the Instance Connection Name of your flights Cloud SQL instance to a text file. You will need to refer to it later.

Open a new tab in the same browser as your Google Cloud console.

Open Google Data Studio by navigating to http://datastudio.google.com/ .

Click Data sources in the center menu, and then click the + Create button on the top left to create a new data source.

Click Data source.

datastudionew.png

Check the Terms of service check box and click Continue.

Select No on each of the questions about email alerts and market surveys then click Continue.

Find the Cloud SQL for MySQL tile in Google Connectors and click then select it.

Click AUTHORIZE to enable access from Data Studio to your Cloud sources.

Your lab account should be selected, click ALLOW. If you are using a private browser instance there will only be one account listed so you won't have to make the selection.

Under Database Authentication, add the following:

Instance Connection Name

The name you saved earlier

Database

bts

Username

root

Password

Passw0rd

Click AUTHENTICATE.

Click the flights table to select it and then click the blue CONNECT button on the upper right of the screen.

flightconnect.png

You can see the 28 fields.

Scroll down to FL_DATE field, click the Text data type field, and then click Date & Time > Date to change the data type to the date format you will need later.

Click the Cloud SQL for MySQL - bts name in the upper left corner and change it to Cloud SQL for MySQL - Flights.

Create a Scatter chart using Data Studio

Click CREATE REPORT at the top right of the page.

Click ADD TO REPORT to confirm that you want to add the Cloud SQL for MySQL - Flights database as a data source.

Click Add a chart then select scatter chart icon from the drop-down, then draw a rectangle on the report canvas to hold the chart.

addchart.png

In the DATA properties pane on the right, click on the field for each setting and change to the following:

Metric X

DEP_DELAY

Metric Y

ARR_DELAY

Dimension

UNIQUE CARRIER

Hover your mouse over the data type icon (SUM) of the Metric X property.

metricx.png

Click the pencil icon that it changes to to edit the aggregation type for this metric. Change the aggregation type to Average.

e70b47921beed75a.png

Click outside the aggregation type box to return to the property pane.

You'll do the same with the data type icon (SUM) of the Metric Y property. Change the aggregation type to Average.

Click outside the aggregation type box to return to the property pane.

Click the STYLE tab.

In the Style menu click the Trendline drop-down and select Linear.

In the ribbon above the graph, click Add a control dropdown and select Date range control.

DateControl.png

Draw a rectangle below the graph, and a Date range control is added.

Click the VIEW button on the upper right to change to the interactive report view to test this control.

View.png

You will only see data if the range includes dates between Jan 1st 2015 and Feb 28 2015 because the dataset is limited to those dates in this lab.

Add additional chart types to your report

Click the Edit button on the upper right to add more chart items.

Click Add a chart then select pie chart icon from the drop-down, then draw a rectangle on the report canvas to hold the pie chart.

addchart.png

With the pie chart selected, click + ADD A FIELD on the bottom right of the screen. You may need to make sure you have selected the DATA property tab on the right hand side of the screen to see this.

Click the <- ALL FIELDS to go to the field property summary.

Click the context menu icon to the right of the ARR_DELAY field (three dots) and select Duplicate.

84dff62bd30c18eb.png

Click + ADD A FIELD on the right of the screen.

Enter ISLATE in the Field Name text box.

In the Formula text box enter the following formula:

CASE WHEN ( Copy of ARR_DELAY <15) THEN "ON TIME" ELSE "LATE" END

The field name must register correctly. If you do not see the syntax highlighting as shown below then double check the formula or use the Available Fields selector on the right to select the Copy of ARR_DELAY field.

20b5d685b137de67.png

Click SAVE and then click DONE.

In the DATA properties pane on the right, change the Dimension for the Pie chart to the new ISLATE calculated field.

The pie chart now displays the percentage of ontime and late flights.

Add a bar graph

Click Add a chart then select bar chart icon from the drop-down, and draw a rectangle on the report canvas to hold the bar chart.

In the DATA properties pane on the right change to the following settings:

Metric 1 (Default)

DEP_DELAY

Metric 2 ( Click + Add metric)

ARR_DELAY

Dimension

UNIQUE CARRIER

Sort

UNIQUE CARRIER

Sort Order

Ascending

Click the STYLE tab.

In the Style menu under Bar chart select Vertical.

Create additional dashboard items for different departure delay thresholds

At the beginning of the lab you created 3 database table views. You're now going to create charts that will display the delay thresholds for these tables.

Add an additional Data Source for the Delayed_10 database table view

Copy the pie chart and the bar chart so that you now have two sets. The report canvas should now look similar to this:

copyofpiebar.png

Select the second pie chart and click Cloud SQL for MySQL- Flights in the Data Source in the property list.

Click + ADD DATA at the bottom of the menu.

Click Cloud SQL for MySQL in the Google Connectors section of the selection pane.

Under Database Authentication, add the following:

Instance Connection Name

The name you saved earlier

Database

bts

Username

root

Password

Passw0rd

Click AUTHENTICATE.

Click the delayed_10 table to select it and then click ADD button on the bottom right of the screen.

Click ADD TO REPORT.

Click Cloud SQL for MySQL- Flights in the Data Source in the property list select Cloud SQL for MySQL - bts.

Click Edit data source.

editdatasource.png

Click the Cloud SQL for MySQL -bts name in the upper left corner and change it to Delayed_10.

Click DONE to update the data source name.

Recreate the copy of the Arr_Delay field and the ISLATE calculated field.

Click + ADD A FIELD on the bottom right of the screen. You may need to make sure you have selected the DATA property tab on the right hand side of the screen to see this.

If you cannot see the full list of fields with their data type and Aggregation type displayed then click the <- ALL FIELDS to go to the field property summary.

Click the context menu icon to the right of the ARR_DELAY field and select Duplicate.

84dff62bd30c18eb.png

Click + ADD A FIELD on the right side of the screen.

Enter ISLATE in the Field Name text box.

Enter the following formula in the Formula text box:

CASE WHEN ( Copy of ARR_DELAY <15) THEN "ON TIME" ELSE "LATE" END

The field name must register correctly. If you do not see the syntax highlighting as shown below then double check the formula or use the Available Fields selector on the right to select the Copy of ARR_DELAY field.

20b5d685b137de67.png

Click SAVE and then click DONE.

Now change the Dimension for the Delayed_10 pie chart to the new ISLATE calculated field.

The second pie chart now displays the percentage of ontime and late flights for the Delayed_10 view.

delay10.png

Create the remaining dashboard views

If you have time you can repeat the last two sections, where you added an additional database view for the Dealyed_15 and Delayed_20 views.

Congratulations!

Now you know how to use Google Data Studio to visualize data stored in Google Cloud SQL for MySQL database tables and views.

2ea99a2e13bf2db4.png

Finish Your Quest

This self-paced lab is part of the Qwiklabs Quest, Data Science on Google Cloud. A Quest is a series of related labs that form a learning path. Completing this Quest earns you the badge above, to recognize your achievement. You can make your badge (or badges) public and link to them in your online resume or social media account. Enroll in this Quest and get immediate completion credit if you've taken this lab. See other available Qwiklabs Quests.

Take Your Next Lab

Continue your Quest with Processing Data with Google Cloud Dataflow, or check out these suggestions:

Next Steps / Learn More

Here are some follow-up steps:

Google Cloud Training & Certification

...helps you make the most of Google Cloud technologies. Our classes include technical skills and best practices to help you get up to speed quickly and continue your learning journey. We offer fundamental to advanced level training, with on-demand, live, and virtual options to suit your busy schedule. Certifications help you validate and prove your skill and expertise in Google Cloud technologies.

Manual Last Updated July 19, 2021
Lab Last Tested July 19, 2021

Copyright 2022 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.