arrow_back

Building an IoT Analytics Pipeline on Google Cloud

Join Sign in

Building an IoT Analytics Pipeline on Google Cloud

1 hour 10 minutes 5 Credits

GSP088

Google Cloud selp-paced labs logo

Overview

The term Internet of Things (IoT) refers to the interconnection of physical devices with the global Internet. These devices are equipped with sensors and networking hardware, and each is globally identifiable. Taken together, these capabilities afford rich data about items in the physical world.

Cloud IoT Core is a fully managed service that allows you to easily and securely connect, manage, and ingest data from millions of globally dispersed devices. The service connects IoT devices that use the standard Message Queue Telemetry Transport (MQTT) protocol to other Google Cloud data services.

Cloud IoT Core has two main components:

  • A device manager for registering devices with the service, so you can then monitor and configure them.
  • A protocol bridge that supports MQTT, which devices can use to connect to Google Cloud.

What You'll Learn

In this lab, you will dive into Google Cloud's IoT services and complete the following tasks:

  • Connect and manage MQTT-based devices using Cloud IoT Core (using simulated devices)

  • Ingest a stream of information from Cloud IoT Core using Cloud Pub/Sub.

  • Process the IoT data using Cloud Dataflow.

  • Analyze the IoT data using BigQuery.

Prerequisites

This is an advanced level lab and having a baseline level of knowledge on the following services will help you better understand the steps and commands that you'll be writing:

  • BigQuery
  • Cloud Pub/Sub
  • Dataflow
  • IoT

If you want to get up to speed in any of the above services, check out the following Qwiklabs:

Setup and Requirements

Before you click the Start Lab button

Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources will be made available to you.

This hands-on lab lets you do the lab activities yourself in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials that you use to sign in and access Google Cloud for the duration of the lab.

To complete this lab, you need:

  • Access to a standard internet browser (Chrome browser recommended).
Note: Use an Incognito or private browser window to run this lab. This prevents any conflicts between your personal account and the Student account, which may cause extra charges incurred to your personal account.
  • Time to complete the lab---remember, once you start, you cannot pause a lab.
Note: If you already have your own personal Google Cloud account or project, do not use it for this lab to avoid extra charges to your account.

How to start your lab and sign in to the Google Cloud Console

  1. Click the Start Lab button. If you need to pay for the lab, a pop-up opens for you to select your payment method. On the left is the Lab Details panel with the following:

    • The Open Google Console button
    • Time remaining
    • The temporary credentials that you must use for this lab
    • Other information, if needed, to step through this lab
  2. Click Open Google Console. The lab spins up resources, and then opens another tab that shows the Sign in page.

    Tip: Arrange the tabs in separate windows, side-by-side.

    Note: If you see the Choose an account dialog, click Use Another Account.
  3. If necessary, copy the Username from the Lab Details panel and paste it into the Sign in dialog. Click Next.

  4. Copy the Password from the Lab Details panel and paste it into the Welcome dialog. Click Next.

    Important: You must use the credentials from the left panel. Do not use your Google Cloud Skills Boost credentials. Note: Using your own Google Cloud account for this lab may incur extra charges.
  5. Click through the subsequent pages:

    • Accept the terms and conditions.
    • Do not add recovery options or two-factor authentication (because this is a temporary account).
    • Do not sign up for free trials.

After a few moments, the Cloud Console opens in this tab.

Note: You can view the menu with a list of Google Cloud Products and Services by clicking the Navigation menu at the top-left. Navigation menu icon

Enable APIs

Throughout this lab you'll be using your Project ID. You may want to copy and save it now for use later.

  1. In the Cloud Console, click Navigation menu > APIs & Services.

Console_APIs.png

  1. Scroll down in the list of enabled APIs, and confirm that these APIs are enabled:

    • Google Cloud IoT API
    • Cloud Pub/Sub API
    • Dataflow API
  2. If one or more API is not enabled, click the ENABLE APIS AND SERVICES button at the top. Search for the APIs by name and enable each API for your current project.

Ensure that the Dataflow API is successfully enabled

To ensure access to the necessary API, restart the connection to the Dataflow API.

  1. In the Cloud Console, enter Dataflow API in the top search bar. Click on the result for Dataflow API.

  2. Click Manage.

  3. Click Disable API.

If asked to confirm, click Disable.

  1. Click Enable.

When the API has been enabled again, the page will show the option to disable.

dataflow_api.png

Create a Cloud Pub/Sub topic

Cloud Pub/Sub is an asynchronous global messaging service. By decoupling senders and receivers, it allows for secure and highly available communication between independently written applications. Cloud Pub/Sub delivers low-latency, durable messaging.

In Cloud Pub/Sub, publisher applications and subscriber applications connect with one another through the use of a shared string called a topic. A publisher application creates and sends messages to a topic. Subscriber applications create a subscription to a topic to receive messages from it.

In an IoT solution built with Cloud IoT Core, device telemetry data is forwarded to a Cloud Pub/Sub topic.

To define a new Cloud Pub/Sub topic:

  1. In the Cloud Console, in the top search bar enter Pub/Sub then you should be on the Topics page.
  2. Click + CREATE TOPIC. The Create a topic dialog shows you a partial URL path.
Note: If you see qwiklabs-resources as your project name, cancel the dialog and return to the Cloud Console. Use the menu to the right of the Google Cloud logo to select the correct project. Then return to this step.
  1. Add this string as your Topic ID:

iotlab
  1. Click CREATE TOPIC.

Test Completed Task

Click Check my progress to verify your performed task. If you have successfully created a Cloud Pub/Sub topic, you will see an assessment score.

Create a Cloud Pub/Sub Topic
  1. Click on the Topics page to see the new topic whose partial URL ends in iotlab. Click the three-dot icon at the right edge of its row to open the context menu. Choose View permissions.

    view-permission.png

  2. In the Permissions dialogue, click ADD PRINCIPAL and copy the below principal as New principals:

cloud-iot@system.gserviceaccount.com
  1. From the Select a role menu, give the new member the Pub/Sub > Pub/Sub Publisher role.

  2. Click Save.

Test Completed Task

Click Check my progress to verify your performed task. If you have successfully added IAM binding policy to Pub/Sub topic, you will see an assessment score.

Add IAM binding policy to Pub/Sub topic

Create a BigQuery dataset

BigQuery is a serverless data warehouse. Tables in BigQuery are organized into datasets. In this lab, messages published into Pub/Sub will be aggregated and stored in BigQuery.

To create a new BigQuery dataset:

  1. In the Cloud Console, go to Navigation menu > BigQuery.

  2. Click Done.

  3. To create a dataset, click on the View actions icon next to your Project ID and then select Create dataset.

create_dataset.png

  1. Name the dataset iotlabdataset, leave all the other fields the way they are, and click Create dataset.

  2. Click on your project name to see the newly created dataset under your project:

    bigquery_dataset.png

Test Completed Task

Click Check my progress to verify your performed task. If you have successfully created a BigQuery dataset, you will see an assessment score.

Create a BigQuery dataset
  1. To create a table, click on the View actions icon next to the iotlabdataset dataset and select Create table.

    Table.png

  2. Ensure that the source field is set to Empty table.

  3. In the Destination section's Table name field, enter sensordata.

  4. In the Schema section, click the + Add field button and add the following fields:

    • timestamp, set the field's Type to TIMESTAMP.
    • device, set the field's Type to STRING.
    • temperature, set the field's Type to FLOAT.
  5. Leave the other defaults unmodified. Click Create Table.

bq-table_2.png

Test Completed Task

Click Check my progress to verify your performed task. If you have successfully created an empty table in BigQuery Dataset, you will see an assessment score.

Create an empty table in BigQuery Dataset

Create a cloud storage bucket

Cloud Storage allows world-wide storage and retrieval of any amount of data at any time. You can use Cloud Storage for a range of scenarios including serving website content, storing data for archival and disaster recovery, or distributing large data objects to users via direct download.

For this lab Cloud Storage will provide working space for your Cloud Dataflow pipeline.

  1. In the Cloud Console, go to Navigation menu > Cloud Storage.

  2. Click +CREATE BUCKET.

  3. For Name, use your Project ID then add -bucket, then click Continue.

  4. For Location type, click Multi-region if it is not already selected.

  5. For Location, choose the selection closest to you.

  6. Click CREATE.

Test Completed Task

Click Check my progress to verify your performed task. If you have successfully created a Cloud Storage bucket, you will see an assessment score.

Create a Cloud Storage Bucket

Set up a Cloud Dataflow Pipeline

Cloud Dataflow is a serverless way to carry out data analysis. In this lab, you will set up a streaming data pipeline to read sensor data from Pub/Sub, compute the maximum temperature within a time window, and write this out to BigQuery.

  1. In the Cloud Console, in the top search bar enter "Dataflow", then click on Dataflow.

  2. In the top menu bar, click + CREATE JOB FROM TEMPLATE.

  3. For Job name, enter iotlabflow.

  4. For Regional Endpoint, choose the region as us-central1.

  5. For Dataflow template, choose Pub/Sub Topic to BigQuery. When you choose this template, the form updates to review new fields below.

  6. For Input Pub/Sub topic, choose from the dropdown menu. The resulting string will look like this: projects/qwiklabs-gcp-d2e509fed105b3ed/topics/iotlab

  7. The BigQuery output table takes the form of Project ID:dataset.table (:iotlabdataset.sensordata). The resulting string will look like this: qwiklabs-gcp-d2e509fed105b3ed:iotlabdataset.sensordata

  8. For Temporary location, enter your Cloud Storage bucket name then /tmp/. The resulting string will look like this: gs://qwiklabs-gcp-d2e509fed105b3ed-bucket/tmp/

  9. Click SHOW OPTIONAL PARAMETERS.

  10. For Max workers, enter 2.

  11. Click RUN JOB.

A new streaming job is started. You can now see a visual representation of the data pipeline.

Test Completed Task

Click Check my progress to verify your performed task. If you have successfully set up a Cloud Dataflow Pipeline, you will see an assessment score.

Set up a Cloud Dataflow Pipeline (region: us-central1)

Prepare your compute engine VM

In your project, a pre-provisioned VM instance named iot-device-simulator will let you run instances of a Python script that emulate an MQTT-connected IoT device. Before you emulate the devices, you will also use this VM instance to populate your Cloud IoT Core device registry.

To connect to the iot-device-simulator VM instance:

  1. In the Cloud Console, go to Navigation menu > Compute Engine > VM Instances. You'll see your VM instance listed as iot-device-simulator.

  2. Click the SSH drop-down arrow and select Open in browser window.

  3. In your SSH session, enter following commands to create a virtual environment.

sudo pip3 install virtualenv virtualenv -p python3 venv source venv/bin/activate
  1. Initialize the gcloud SDK.

gcloud auth login --no-launch-browser

If you get the error message "Command not found," you might have forgotten to exit your previous SSH session and start a new one.

When you are asked whether to authenticate with an @developer.gserviceaccount.com account or to log in with a new account, choose log in with a new account.

When you are asked "Are you sure you want to authenticate with your personal account? Do you want to continue (Y/n)?" enter Y.

  1. Click on the URL shown to open a new browser window that displays a verification code.

  2. Copy the verification code and paste it in response to the "Enter verification code:" prompt, then press Enter.

  3. Enter this command to update the system's information about Debian Linux package repositories:

sudo apt-get update
  1. Enter this command to make sure that various required software packages are installed:

sudo apt-get install python-pip openssl git -y
  1. Use pip to add needed Python components:

pip install pyjwt paho-mqtt cryptography
  1. Enter this command to add data to analyze during this lab:

git clone http://github.com/GoogleCloudPlatform/training-data-analyst

Create a registry for IoT devices

To register devices, you must create a registry for the devices. The registry is a point of control for devices.

To create the registry:

  1. In your SSH session on the iot-device-simulator VM instance, run the following, adding your Project ID as the value for PROJECT_ID:

export PROJECT_ID=

Your completed command will look like this: export PROJECT_ID=qwiklabs-gcp-d2e509fed105b3ed

  1. You must choose a region for your IoT registry. At this time, these regions are supported:

  • us-central1
  • europe-west1
  • asia-east1

Choose the region that is closest to you. To set an environment variable containing your preferred region, enter this command followed by the region name:

export MY_REGION=

Your completed command will look like this: export MY_REGION=us-central1.

  1. Enter this command to create the device registry:

gcloud iot registries create iotlab-registry \ --project=$PROJECT_ID \ --region=$MY_REGION \ --event-notification-config=topic=projects/$PROJECT_ID/topics/iotlab

Test Completed Task

Click Check my progress to verify your performed task. If you have successfully created a Registry for IoT Devices, you will see an assessment score.

Create a Registry for IoT Devices

Create a Cryptographic Keypair

To allow IoT devices to connect securely to Cloud IoT Core, you must create a cryptographic keypair.

In your SSH session on the iot-device-simulator VM instance, enter these commands to create the keypair in the appropriate directory:

cd $HOME/training-data-analyst/quests/iotlab/ openssl req -x509 -newkey rsa:2048 -keyout rsa_private.pem \ -nodes -out rsa_cert.pem -subj "/CN=unused"

This openssl command creates an RSA cryptographic keypair and writes it to a file called rsa_private.pem.

Add simulated devices to the registry

For a device to be able to connect to Cloud IoT Core, it must first be added to the registry.

  1. In your SSH session on the iot-device-simulator VM instance, enter this command to create a device called temp-sensor-buenos-aires:

gcloud iot devices create temp-sensor-buenos-aires \ --project=$PROJECT_ID \ --region=$MY_REGION \ --registry=iotlab-registry \ --public-key path=rsa_cert.pem,type=rs256
  1. Enter this command to create a device called temp-sensor-istanbul:

gcloud iot devices create temp-sensor-istanbul \ --project=$PROJECT_ID \ --region=$MY_REGION \ --registry=iotlab-registry \ --public-key path=rsa_cert.pem,type=rs256

Test Completed Task

Click Check my progress to verify your performed task. If you have successfully added Simulated Devices to the Registry, you will see an assessment score.

Add Simulated Devices to the Registry

Run simulated devices

  1. In your SSH session on the iot-device-simulator VM instance, enter these commands to download the CA root certificates from pki.google.com to the appropriate directory:

cd $HOME/training-data-analyst/quests/iotlab/ curl -o roots.pem -s -m 10 --retry 0 "https://pki.goog/roots.pem"
  1. Enter this command to run the first simulated device:

python cloudiot_mqtt_example_json.py \ --project_id=$PROJECT_ID \ --cloud_region=$MY_REGION \ --registry_id=iotlab-registry \ --device_id=temp-sensor-buenos-aires \ --private_key_file=rsa_private.pem \ --message_type=event \ --algorithm=RS256 > buenos-aires-log.txt 2>&1 &

It will continue to run in the background.

  1. Enter this command to run the second simulated device:

python cloudiot_mqtt_example_json.py \ --project_id=$PROJECT_ID \ --cloud_region=$MY_REGION \ --registry_id=iotlab-registry \ --device_id=temp-sensor-istanbul \ --private_key_file=rsa_private.pem \ --message_type=event \ --algorithm=RS256

Telemetry data will flow from the simulated devices through Cloud IoT Core to your Cloud Pub/Sub topic. In turn, your Dataflow job will read messages from your Pub/Sub topic and write their contents to your BigQuery table.

Analyze the Sensor Data Using BigQuery

To analyze the data as it is streaming:

  1. In the Cloud Console, open the Navigation menu and select BigQuery.

  2. Enter the following query in the Query editor and click RUN:

SELECT timestamp, device, temperature from iotlabdataset.sensordata ORDER BY timestamp DESC LIMIT 100
  1. You should receive a similar output:

bq-results.png

  1. Browse the Results. What is the temperature trend at each of the locations?

Test your Understanding

Below are multiple-choice questions to reinforce your understanding of this lab's concepts. Answer them to the best of your abilities.

Congratulations!

In this lab, you used Cloud IoT Core and other services on Google Cloud to collect, process, and analyze IoT data.

IOT-Badge-125.png Data_Engineering_badge_125.png

Finish your Quest

This self-paced lab is part of the IoT in the Google Cloud and Data Engineering quests. A Quest is a series of related labs that form a learning path. Completing a quest earns you a badge to recognize your achievement. You can make your badges public and link to them in your online resume or social media account. Enroll in a quest and get immediate completion credit if you've taken this lab. See other available quests.

Take Your Next Lab

Continue your quest with ETL Processing on Google Cloud Using Dataflow and BigQuery, or check out these suggestions:

Next steps

Google Cloud Training & Certification

...helps you make the most of Google Cloud technologies. Our classes include technical skills and best practices to help you get up to speed quickly and continue your learning journey. We offer fundamental to advanced level training, with on-demand, live, and virtual options to suit your busy schedule. Certifications help you validate and prove your skill and expertise in Google Cloud technologies.

Manual Last Updated June 28, 2022
Lab Last Tested June 28, 2022

Copyright 2022 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.