Overview
In this lab you analyze 2 different public datasets, run queries on them, separately and then combined, to derive interesting insights.
What you'll learn
In this lab, you will:
- Carry out interactive queries on the BigQuery console.
- Combine and run analytics on multiple datasets.
Prerequisites
This is a fundamental level lab and assumes some experience with BigQuery and SQL.
Introduction
This lab uses two public datasets in BigQuery: weather data from the US National Oceanic and Atmospheric Administration (NOAA), and bicycle rental data from New York City.
You will encounter, for the first time, several aspects of Google Cloud Platform that are of great benefit to scientists:
-
Serverless. No need to download data to your machine in order to work with it - the dataset will remain on the cloud.
-
Ease of use. Run ad-hoc SQL queries on your dataset without having to prepare the data, like indexes, beforehand. This is invaluable for data exploration.
-
Scale. Carry out data exploration on extremely large datasets interactively. You don't need to sample the data in order to work with it in a timely manner.
-
Shareability. You will be able to run queries on data from different datasets without any issues. BigQuery is a convenient way to share datasets. Of course, you can also keep your data private, or share them only with specific persons -- not all data need to be public.
The end-result is that you will find if there are lesser bike rentals on rainy days.
Setup and requirements
Lab setup
For each lab, you get a new Google Cloud project and set of resources for a fixed time at no cost.
-
Click the Start Lab button. If you need to pay for the lab, a pop-up opens for you to select your payment method.
On the left is the Lab Details panel with the following:
- The Open Google Cloud console button
- Time remaining
- The temporary credentials that you must use for this lab
- Other information, if needed, to step through this lab
-
Click Open Google Cloud console (or right-click and select Open Link in Incognito Window if you are running the Chrome browser).
The lab spins up resources, and then opens another tab that shows the Sign in page.
Tip: Arrange the tabs in separate windows, side-by-side.
Note: If you see the Choose an account dialog, click Use Another Account.
-
If necessary, copy the Username below and paste it into the Sign in dialog.
{{{user_0.username | "Username"}}}
You can also find the Username in the Lab Details panel.
-
Click Next.
-
Copy the Password below and paste it into the Welcome dialog.
{{{user_0.password | "Password"}}}
You can also find the Password in the Lab Details panel.
-
Click Next.
Important: You must use the credentials the lab provides you. Do not use your Google Cloud account credentials.
Note: Using your own Google Cloud account for this lab may incur extra charges.
-
Click through the subsequent pages:
- Accept the terms and conditions.
- Do not add recovery options or two-factor authentication (because this is a temporary account).
- Do not sign up for free trials.
After a few moments, the Google Cloud console opens in this tab.
Note: To view a menu with a list of Google Cloud products and services, click the Navigation menu at the top-left, or type the service or product name in the Search field.
Task 1. Explore bicycle rental data
In this task, you explore the public bigquery-public-data
dataset, specifically the new_york_citibike
dataset, by examining the schema of the citibike_trips
table and running queries to analyze trip duration and calculate the total distance traveled by each bicycle.
- In the Google Cloud console, in the Navigation menu (
) click BigQuery.
The Welcome to BigQuery in the Cloud Console message box opens. This message box provides a link to the quickstart guide and lists UI updates.
-
Click Done.
-
In the left pane, click + Add data , then click Star a project by name, next in the pop-up window type bigquery-public-data, finally click Star.
-
In the BigQuery console, you see two projects in the left pane, one named your Qwiklabs project ID, and one named bigquery-public-data.
-
In the left pane of the BigQuery console, select bigquery-public-data > new_york_citibike > citibike_trips table.
-
In the Table (citibike_trips) window, click the Schema tab.
-
Examine the column names and the datatypes.
-
Click the + SQL query to compose a new query.

Enter the following query:
SELECT
MIN(start_station_name) AS start_station_name,
MIN(end_station_name) AS end_station_name,
APPROX_QUANTILES(tripduration, 10)[OFFSET (5)] AS typical_duration,
COUNT(tripduration) AS num_trips
FROM
`bigquery-public-data.new_york_citibike.citibike_trips`
WHERE
start_station_id != end_station_id
GROUP BY
start_station_id,
end_station_id
ORDER BY
num_trips DESC
LIMIT
10
-
Click Run. Look at the result and try to determine what this query does ?
Hint: Typical duration for the 10 most common one-way rentals
-
Next, run the query below to find another interesting fact: total distance traveled by each bicycle in the dataset. Note that the query limits the results to only top 5.
WITH
trip_distance AS (
SELECT
bikeid,
ST_Distance(ST_GeogPoint(s.longitude,
s.latitude),
ST_GeogPoint(e.longitude,
e.latitude)) AS distance
FROM
`bigquery-public-data.new_york_citibike.citibike_trips`,
`bigquery-public-data.new_york_citibike.citibike_stations` as s,
`bigquery-public-data.new_york_citibike.citibike_stations` as e
WHERE
start_station_name = s.name
AND end_station_name = e.name)
SELECT
bikeid,
SUM(distance)/1000 AS total_distance
FROM
trip_distance
GROUP BY
bikeid
ORDER BY
total_distance DESC
LIMIT
5
Note: For this query, we also used the other table in the dataset called citibike_stations to get bicycle station information.
Task 2. Explore the weather dataset
In this task, you explore the ghcn_d
dataset, by previewing the ghcnd_2015
table and running a query to retrieve rainfall data from a New York weather station.
-
In the left pane of the BigQuery Console, select the newly added bigquery-public-data
project and select ghcn_d > ghcnd_2015.
-
Then click on the Preview tab. Your console should resemble the following:

Examine the columns and some of the data values.
- Click the + SQL query to compose a new query and enter the following:
SELECT
wx.date,
wx.value/10.0 AS prcp
FROM
`bigquery-public-data.ghcn_d.ghcnd_2015` AS wx
WHERE
id = 'USW00094728'
AND qflag IS NULL
AND element = 'PRCP'
ORDER BY
wx.date
- Click Run.
This query will return rainfall (in mm) for all days in 2015 from a weather station in New York whose id is provided in the query (the station corresponds to NEW YORK CNTRL PK TWR ).
Task 3. Find correlation between rain and bicycle rentals
In this task, you analyze the correlation between rain and bicycle rentals by joining the new_york_citibike.citibike_trips
dataset with the ghcn_d.ghcnd_2015
weather dataset, determining the average number of bicycle trips on rainy versus non-rainy days.
How about joining the bicycle rentals data against weather data to learn whether there are fewer bicycle rentals on rainy days?
- Click the + SQL query to compose a new query and enter the following:
WITH bicycle_rentals AS (
SELECT
COUNT(starttime) as num_trips,
EXTRACT(DATE from starttime) as trip_date
FROM `bigquery-public-data.new_york_citibike.citibike_trips`
GROUP BY trip_date
),
rainy_days AS
(
SELECT
date,
(MAX(prcp) > 5) AS rainy
FROM (
SELECT
wx.date AS date,
IF (wx.element = 'PRCP', wx.value/10, NULL) AS prcp
FROM
`bigquery-public-data.ghcn_d.ghcnd_2015` AS wx
WHERE
wx.id = 'USW00094728'
)
GROUP BY
date
)
SELECT
ROUND(AVG(bk.num_trips)) AS num_trips,
wx.rainy
FROM bicycle_rentals AS bk
JOIN rainy_days AS wx
ON wx.date = bk.trip_date
GROUP BY wx.rainy
- Click Run.
Now you can see the results of joining the bicycle rental dataset with a weather dataset that comes from a completely different source:

Running the query yields that, yes, New Yorkers ride the bicycle 47% fewer times when it rains.
Summary
In this lab you did ad-hoc queries on two datasets. You were able to query the data without setting up any clusters, creating any indexes, etc. You were also able to mash up the two datasets and get some interesting insights. All without ever leaving your browser!
Congratulations!
You learned how to run some very interesting queries on BigQuery!
End your lab
When you have completed your lab, click End Lab. Google Cloud Skills Boost removes the resources you’ve used and cleans the account for you.
You will be given an opportunity to rate the lab experience. Select the applicable number of stars, type a comment, and then click Submit.
The number of stars indicates the following:
- 1 star = Very dissatisfied
- 2 stars = Dissatisfied
- 3 stars = Neutral
- 4 stars = Satisfied
- 5 stars = Very satisfied
You can close the dialog box if you don't want to provide feedback.
For feedback, suggestions, or corrections, please use the Support tab.
Copyright 2022 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.