arrow_back

Manage a partitioned table in BigQuery

Sign in Join
Test and share your knowledge with our community!
done
Get access to over 700 hands-on labs, skill badges, and courses

Manage a partitioned table in BigQuery

Lab 1 hour 30 minutes universal_currency_alt 2 Credits show_chart Introductory
Test and share your knowledge with our community!
done
Get access to over 700 hands-on labs, skill badges, and courses
important icon IMPORTANT:

desktop/labtop icon Make sure to complete this hands-on lab on a desktop/laptop only.

check icon There are only 5 attempts permitted per lab.

quiz target icon As a reminder – it is common to not get every question correct on your first try, and even to need to redo a task; this is part of the learning process.

timer icon Once a lab is started, the timer cannot be paused. After 1 hour and 30 minutes, the lab will end and you’ll need to start again.

tip icon For more information review the Lab technical tips reading.

Activity overview

Partitioned tables are a powerful tool that cloud data analysts can use to improve query performance and reduce costs.

A partitioned table is divided into segments, called partitions, that make it easier to manage and query your data. By dividing a large table into smaller partitions, you can improve query performance and control costs by reducing the number of bytes read by a query.

In this lab, you’ll partition tables by specifying a partition column which is used to segment the table, manage tables using default and customized settings, add and export data, and query tables.

Scenario

Meredith, the head merchandiser of The Look eCommerce, wants to better understand the behavior of customers at the company's brick-and-mortar stores. Meredith is responsible for overseeing the company's product selection. But, they are concerned about not having a complete understanding of how customers are using the stores and the products they want to buy.

So Meredith asks Huan, a data scientist who works with the merchandising team, for help. Huan's role is to study the number of people who are dropped off near brick-and-mortar stores after taking a taxi ride. Huan believes that this data can provide valuable insights into customer shopping patterns. This information can help the merchandising team make more informed decisions about each store's stock.

To visualize the data for the merchandising team, Huan built a dashboard that shows various stores and how many people are dropped off nearby. But, it looks like the dashboard is not working out as Huan hoped! When they open up their email, they find multiple complaints from users that the dashboard is too slow. Huan also notices that the costs of running the dashboard are too high.

As a cloud data analyst for TheLook eCommerce, you have been asked to help to make the dashboard more efficient and cost-effective so that the merchandising team can have the data they need.

You ask Artem, the data architect for help. They tell you the problem may be related to how the data is being partitioned. The more data that must be scanned for each query, the more resources the query will use. They suggest that you test if partitions can be used to scan less data for queries. This will reduce both time and costs.

You suspect that the problem may be related to how the data is being partitioned. So you decide to test if partitions can be used to scan less data for queries.

Here’s how you'll do this task: First, you’ll conduct an analysis of partitioned tables. Next, you’ll check the amount of data analyzed. Then, you’ll run several queries, clearing the cache after each query. Finally, you’ll use partitions as filters.

Setup

Before you click Start Lab

Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources will be made available to you.

This practical lab lets you do the activities yourself in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials that you use to sign in and access Google Cloud for the duration of the lab.

To complete this lab, you need:

  • Access to a standard internet browser (Chrome browser recommended).
Note: Use an Incognito or private browser window to run this lab. This prevents any conflicts between your personal account and the Student account, which may cause extra charges incurred to your personal account.
  • Time to complete the lab---remember, once you start, you cannot pause a lab.
Note: If you already have your own personal Google Cloud account or project, do not use it for this lab to avoid extra charges to your account.

How to start your lab and sign in to the Google Cloud console

  1. Click the Start Lab button. On the left is the Lab Details panel with the following:

    • Time remaining
    • The Open Google Cloud console button
    • The temporary credentials that you must use for this lab
    • Other information, if needed, to step through this lab
    Note: If you need to pay for the lab, a pop-up opens for you to select your payment method.
  2. Click Open Google Cloud console (or right-click and select Open Link in Incognito Window) if you are running the Chrome browser. The Sign in page opens in a new browser tab.

    Tip: You can arrange the tabs in separate, side-by-side windows to easily switch between them.

    Note: If the Choose an account dialog displays, click Use Another Account.
  3. If necessary, copy the Google Cloud username below and paste it into the Sign in dialog. Click Next.

{{{user_0.username | "Google Cloud username"}}}

You can also find the Google Cloud username in the Lab Details panel.

  1. Copy the Google Cloud password below and paste it into the Welcome dialog. Click Next.
{{{user_0.password | "Google Cloud password"}}}

You can also find the Google Cloud password in the Lab Details panel.

Important: You must use the credentials the lab provides you. Do not use your Google Cloud account credentials. Note: Using your own Google Cloud account for this lab may incur extra charges.
  1. Click through the subsequent pages:
    • Accept the terms and conditions
    • Do not add recovery options or two-factor authentication (because this is a temporary account)
    • Do not sign up for free trials

After a few moments, the Console opens in this tab.

Note: You can view the menu with a list of Google Cloud Products and Services by clicking the Navigation menu at the top-left. Google Cloud console menu with the Navigation menu icon highlighted

Task 1. Analyze a partitioned table

In this task, you’ll analyze a partitioned table to identify the performance and cost issues that may be causing Huan's dashboard to be slow and expensive.

  1. In the Cloud console, from the Navigation menu (Navigation Menu), select BigQuery > BigQuery Studio.
Note: The Welcome to BigQuery in the Cloud Console message box may appear, providing links to the quickstart guide and the release notes for UI updates. Click Done to proceed.
  1. Expand the list of datasets by clicking the drop-down arrow next to the project ID.
Note: You may need to select a project first. To do so, click Select a project in the Google Cloud console title bar, then select the project link from the Select a project dialog.
  1. Expand the thelook_gcda dataset.
  2. Select the taxi_trips table.

Notice the icon indicating that the taxi_trips table is partitioned.

The partitioned table icon

Hint: You'll find the partition information by clicking the Details tab.

Task 2. Use partitions to reduce the amount of data examined

In this task, you'll discover how partitioning limits the amount of data that needs to be examined each time a query is executed. Partitions increase query performance and aid cost reduction.

  1. In BigQuery Studio, click + Compose new query. A new Untitled tab opens.
  2. Copy and paste the following query into the Untitled tab:
SELECT * FROM `thelook_gcda.taxi_trips`LIMIT 10;
  1. Click Run.

  2. In the SQL Query editor action bar, click More > Query settings. The Query settings dialog opens.

Note: If your screen view is too small, you may need to first click on the overflow menu icon, which is displayed on the action bar of the Untitled query tab. This will allow you to access the More menu. Query dialog box, highlighting the overflow menu icon.

Now, disable the query cache preference. In the Session management section, ensure that the Use session mode checkbox is cleared.

The Query settings page, which includes the unchecked Use session mode box.

  1. Click Save.

  2. Replace the previous query in the Untitled tab with the following SQL query:

SELECT * FROM `thelook_gcda.taxi_trips` order by pickup_datetime DESC LIMIT 100;

Notice the number of MB this query will process when run. For small browser windows, hover over the green check to view the number of MB this query will process when run.

  1. Click Run.
Tip: The amount of bytes processed displays on the JOB INFORMATION tab in the Query results section. Note: The most recent date is in the pickup_datetime column. The earliest data is from December 2022.
  1. Replace the previous query in the Untitled tab with the following SQL query:
SELECT * FROM `thelook_gcda.taxi_trips` WHERE pickup_datetime > '2022-11-01' order by pickup_datetime ASC;

Notice the number of MB this query will process when run. For small browser windows, hover over the green check to view the number of MB this query will process when run.

  1. Click Run.

Note: Notice that the amount of data processed is approximately 50 times less than the original query. If you run the same query multiple times and observe "0" in Bytes processed, it indicates that the results are being retrieved from the cache. Note: If you decide to open a new Untitled tab, you'll need to configure the previous cache steps to ensure that the Use cached results checkbox is cleared.
  1. Replace the previous query in the Untitled tab with the following SQL query:
SELECT * FROM `thelook_gcda.taxi_trips` WHERE pickup_datetime > '1900-01-01' order by pickup_datetime ASC LIMIT 100;
  1. Click Run.
Note: You should now notice that 489 MB were scanned. This is because the filter includes all partitions. The date in the WHERE statement goes way back to the 1900s, and all rows in the table will satisfy that criterion since the oldest pickup_datetime in this table is December 2008 as displayed in column pickup_datetime.

Click Check my progress to verify that you have completed this task correctly.

Use partitions to reduce the amount of data examined

Conclusion

Great work! You have successfully conducted an analysis of partitioned tables in BigQuery and helped Huan make their dashboard more efficient and cost-effective.

By examining the amount of data analyzed and running queries, you have explored ways to address the reported slow performance of the dashboard.

Additionally, by utilizing partitions as filters, you have tested the effectiveness of reducing the data scanned for queries, potentially providing insights for optimizing query efficiency and reducing costs for the dashboard.

You’re well on your way to understanding how to utilize data sources in BigQuery.

End your lab

Before you end the lab, make sure you’re satisfied that you’ve completed all the tasks. When you're ready, click End Lab and then click Submit.

Ending the lab will remove your access to the lab environment, and you won’t be able to access the work you've completed in it again.

Copyright 2024 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.