
Before you begin
- Labs create a Google Cloud project and resources for a fixed time
- Labs have a time limit and no pause feature. If you restart it, you'll have to start from the beginning.
- On the top left of your screen, click Start lab to begin
Troubleshoot a query
/ 50
Work with nested data types
/ 50
Understanding data types and structures is essential for working effectively with data in BigQuery. Flat data types store a single value, such as a number or text string, while nested data types can store multiple values or complex structures with various data types. Knowing the difference between flat and nested data types helps you organize and query data in BigQuery efficiently.
In this lab, you'll explore flat and nested data types in BigQuery and learn how to use these data types effectively in queries.
You work as a cloud data analyst for TheLook eCommerce, a global clothing company that specializes in innovative fashion and ethical and sustainable sourcing. The head merchandiser, Meredith, has asked for your help with two data-related problems. First, Meredith is having trouble running a query to generate a summary of sales by country and year. Second, they are concerned that ineffective marketing has caused some products to underperform in specific countries, but they’re unsure how to use existing data to gain the insights she needs. To help Meredith, you'll need to apply your knowledge of flat and nested data types.
Here’s how you’ll do this task: First, you’ll explore data types in BigQuery. Next, you’ll troubleshoot a query to learn more about how data types work. Then, you’ll explore nested data types and how they are different from flat data types. Finally, you’ll work with nested data types to track down why one product is not selling in Singapore and deliver valuable insight that can help Meredith get the answers they need.
Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources will be made available to you.
This practical lab lets you do the activities yourself in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials that you use to sign in and access Google Cloud for the duration of the lab.
To complete this lab, you need:
Click the Start Lab button. On the left is the Lab Details panel with the following:
Click Open Google Cloud console (or right-click and select Open Link in Incognito Window) if you are running the Chrome browser. The Sign in page opens in a new browser tab.
Tip: You can arrange the tabs in separate, side-by-side windows to easily switch between them.
If necessary, copy the Google Cloud username below and paste it into the Sign in dialog. Click Next.
You can also find the Google Cloud username in the Lab Details panel.
You can also find the Google Cloud password in the Lab Details panel.
After a few moments, the Console opens in this tab.
Artem, the data architect, recommends exploring the table Meredith wants to access before troubleshooting their query. This will help you better understand the data, including the columns and data types.
In this task, you’ll explore a table’s schema to identify the data types and relevant columns and learn more about flat data types.
A STRING is a sequence of characters that can include letters, numbers, and punctuation. When a STRING contains numbers, it is still read as text.
An INTEGER is a positive or negative whole number. Integers do not contain decimal points.
Both STRINGs and INTEGERs are flat data types because they contain a single value.
In BigQuery, columns can be NULLABLE, REQUIRED, and REPEATED.
NULLABLE, REQUIRED, and REPEATED can be used with flat data types and nested RECORDs.
Now that you’ve explored the data, you’re ready to help Meredith with their query. Meredith is trying to generate a summary of sales by country and by year. Currently, they are using the orders_by_state table, but they’re getting an error and can’t find the issue.
In this task, you’ll use your understanding of data types to troubleshoot and fix the query so that Meredith can create a summary of sales.
This is the query that Meredith tried to run to get the sum for the total_sales
column.
Did you spot an exclamation mark in line one when you entered the query? This is a warning from BigQuery indicating that the query may contain errors. BigQuery scans all queries for potential errors before they are run, and it will alert you to any errors it finds.
To clarify, the signature of a function describes its parameters and data types. Currently, the SUM() function is taking in a STRING, a flat data type whose value is read as text. However, this is not a supported data type.
To fix the error, you'll need to change the query to convert the data type from a STRING to a data type that includes a number. Since you are working with numbers that may contain a decimal point, the NUMERIC data type is a good choice.
CAST is a valuable tool to use in SQL because it converts data from one type to another. The query now works because it changes the data type of total_sales
from a STRING to NUMERIC, a supported data type that is read as a number and can be used for mathematical calculations. This will resolve the error that was shown.
Now that the issue with total_sales is solved, you can help Meredith get the information they need.
This query returns the orders by country and by year. The query will also sort the results by country name in ascending order, and by year in descending order, ensuring that the latest year appears first.
Click Check my progress to verify that you have completed this task correctly.
Meredith has asked you to create a report on the effectiveness of marketing for certain products in 2021. They suspect that a lack of customer awareness might have led to fewer sales in some markets, and a report can help determine if unsuccessful marketing was the cause. You’re unsure what data is needed, so you reach out to Artem, the data architect, for help. Artem explains that Google Analytics captures website views in the ga4_events table, and you can search for an event named view_item in the event_name column to find the data you need.
BigQuery can be tricky, as it contains two main types: arrays and STRUCTs. REPEATED columns are arrays, which means they can store multiple values of the same data type. RECORD columns are structs, which means they can store multiple values of different data types. You can even have an entire array that is made up of STRUCTS called a REPEATED RECORD column.
You thank Artem and get down to work.
In this task, you’ll explore nested data types in the ga4_events table and learn more about nested data types.
A STRING, INTEGER, and FLOAT are flat data types, while a RECORD is a nested data type that functions as a special kind of STRUCT. STRUCTs enable you to treat various pieces of data as one unit, even if they are different data types. For instance, the geo RECORD comprises several columns, such as country, region, and city, which together represent a geographical reference for the event, or the location of the customer who viewed the product at the time of the event.
The Repeated mode is exclusive to nested data types in BigQuery. When a column is marked as Repeated, it indicates that there can be multiple items for every row in the table. Additionally, repeated columns can hold multiple values of varying data types.
After exploring nested data types, you're almost ready to assist Meredith in understanding how marketing in 2021 affected product sales.
Before that, you reach out to Artem for more advice. Artem tells you that when working with flat data types like STRING, you can reference them directly. But when working with nested data types like RECORD, you'll need to reference them using dot notation.
You thank Artem and start thinking about how to use the data you've explored to help Meredith. Since Meredith noticed that the sales of Google Dino Game Tees in Singapore were much lower than expected, you decide to track down how many people viewed the Dino Tee on the website in that market during 2021. This insight could help Meredith better understand if ineffective marketing may have cost the product sales.
In this task, you’ll generate a report of the number of views for the product Google Dino Game Tee in 2021 for Singapore, a market where very few Dino tees were sold.
This query retrieves the event_date, event_name, and geo from the ga4_events table and inspect the results.
That is a lot of results. To view only the relevant information, you'll need to filter the events by type.
This query returns only the rows that match the view_item event type.
When creating a report, a good strategy is to start with the total number of rows in the dataset. Then, you can add filters to narrow down the results. It’s important to keep track of how many rows the filtered dataset has, so that you can understand how your filters are affecting the results.
To get the results you need for Meredith, you’ll need to filter the results by country. Country is a column nested within the geo-nested data type. To reference a column in a nested data type, you use dot notation.
This query filters events by country, specifically Singapore. Note how country is shown to be nested inside of geo with geo.country.
Finally, you need to filter on the product name. The following query adds the items column, which has the data type of RECORD and is marked as REPEATED. This means that it is a collection of RECORDs. In other words, the items column is an array of STRUCTs!
Click Run.
Scroll down to view the results.
Each row numbered under the Row column in the results represents an event, an item view. Notice that some of the event rows have multiple items listed within them, as is the case in row 19. This is how BigQuery shows columns that repeat within the same row.
The number of rows in the results remains the same at 65 since the query is still finding a total of 65 events. However, row 19 in the results set has 9 nested items in the items column. This means that although row 19 corresponds to 1 event, it actually contains 9 items.
Now, let’s try to list the item name in the query since Meredith is interested in the item with the name Google Dino Game Tee.
In this instance, none of the rows are returned. Instead, the query returns an error. BigQuery cannot determine which of the nested items is being referenced since items is a collection (in other words, an array that contains multiple values of the STRUCT data type).
One way to work with nested columns in a query is to simply un-nest those columns so they can be easily referenced. The operation is known as “flattening”, and uses the unnest function that can be referenced like a table.
The query that follows uses unnest on the column items and defines the item as an alias, and cross-joins with each individual event row. This will select all events where a user viewed an item in Singapore.
In the previous step, you had 65 rows that matched the view_item event name and the country Singapore. However, after the UNNEST operation, you now have 571 rows! This is because the UNNEST operation flattens the items array, which means that each item in the array is now a separate row.
Using the flattened dataset, add a filter so only products named Google Dino Game Tee are displayed in the result set.
How many rows are returned now? Did the marketing campaign generate a lot of views on the website?
It seems only three events match the criteria. This means that for the entire country of Singapore and the entire year of 2021, only three people looked at the Google Dino Game Tee product on the website. That's not a lot of views for a country with millions of people. It seems like you have valuable insight, so share it with Meredith to help them better understand if marketing impacted the product sales in 2021. Good job!
Click Check my progress to verify that you have completed this task correctly.
If you’d like to review the concepts covered in this lab, refer to the lesson on data types.
Great work!
As a cloud data analyst at TheLook eCommerce, you used your knowledge of flat and nested data types to help Meredith fix a query and gain insights into a lack of product sales in a specific market.
First, you explored flat data types and applied your knowledge to fix Meredith’s query so that they could run the sales report they needed. Then, you explored nested data types to help Meredith uncover why the Google Game Dino Tees were not selling in Singapore.
You’re well on your way to understanding how to use flat and nested data types to query data with BigQuery.
Before you end the lab, make sure you’re satisfied that you’ve completed all the tasks. When you're ready, click End Lab and then click Submit.
Ending the lab will remove your access to the lab environment, and you won’t be able to access the work you've completed in it again.
Copyright 2024 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.
This content is not currently available
We will notify you via email when it becomes available
Great!
We will contact you via email if it becomes available
One lab at a time
Confirm to end all existing labs and start this one