arrow_back

Using OpenTSDB to Monitor Time-Series Data on Cloud Platform

Join Sign in

Using OpenTSDB to Monitor Time-Series Data on Cloud Platform

1 hour 15 minutes 5 Credits

GSP142

Google Cloud self-paced labs logo

Overview

In this lab you will learn how to collect, record, and monitor time-series data on Google Cloud using OpenTSDB running on Google Kubernetes Engine and Cloud Bigtable.

Time-series data is a highly valuable asset that you can use for several applications, including trending, monitoring, and machine learning. You can generate time-series data from server infrastructure, application code, and other sources. OpenTSDB can collect and retain large amounts of time-series data with a high degree of granularity.

In this hands-on lab you will create a scalable data collection layer using Kubernetes Engine and work with the collected data using Bigtable. The following diagram illustrates the high-level architecture of the solution: dd9bdac000ea968f.png

Objectives

  • Create a new Bigtable instance.

  • Create a new Kubernetes Engine cluster.

  • Deploy OpenTSDB to your Kubernetes Engine cluster.

  • Send time-series metrics to OpenTSDB.

  • Visualize metrics using OpenTSDB and Grafana.

Setup and Requirements

Before you click the Start Lab button

Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources will be made available to you.

This hands-on lab lets you do the lab activities yourself in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials that you use to sign in and access Google Cloud for the duration of the lab.

To complete this lab, you need:

  • Access to a standard internet browser (Chrome browser recommended).
Note: Use an Incognito or private browser window to run this lab. This prevents any conflicts between your personal account and the Student account, which may cause extra charges incurred to your personal account.
  • Time to complete the lab---remember, once you start, you cannot pause a lab.
Note: If you already have your own personal Google Cloud account or project, do not use it for this lab to avoid extra charges to your account.

How to start your lab and sign in to the Google Cloud Console

  1. Click the Start Lab button. If you need to pay for the lab, a pop-up opens for you to select your payment method. On the left is the Lab Details panel with the following:

    • The Open Google Console button
    • Time remaining
    • The temporary credentials that you must use for this lab
    • Other information, if needed, to step through this lab
  2. Click Open Google Console. The lab spins up resources, and then opens another tab that shows the Sign in page.

    Tip: Arrange the tabs in separate windows, side-by-side.

    Note: If you see the Choose an account dialog, click Use Another Account.
  3. If necessary, copy the Username from the Lab Details panel and paste it into the Sign in dialog. Click Next.

  4. Copy the Password from the Lab Details panel and paste it into the Welcome dialog. Click Next.

    Important: You must use the credentials from the left panel. Do not use your Google Cloud Skills Boost credentials. Note: Using your own Google Cloud account for this lab may incur extra charges.
  5. Click through the subsequent pages:

    • Accept the terms and conditions.
    • Do not add recovery options or two-factor authentication (because this is a temporary account).
    • Do not sign up for free trials.

After a few moments, the Cloud Console opens in this tab.

Note: You can view the menu with a list of Google Cloud Products and Services by clicking the Navigation menu at the top-left. Navigation menu icon

Activate Cloud Shell

Cloud Shell is a virtual machine that is loaded with development tools. It offers a persistent 5GB home directory and runs on the Google Cloud. Cloud Shell provides command-line access to your Google Cloud resources.

  1. Click Activate Cloud Shell Activate Cloud Shell icon at the top of the Google Cloud console.

  2. Click Continue.

It takes a few moments to provision and connect to the environment. When you are connected, you are already authenticated, and the project is set to your PROJECT_ID. The output contains a line that declares the PROJECT_ID for this session:

Your Cloud Platform project in this session is set to YOUR_PROJECT_ID

gcloud is the command-line tool for Google Cloud. It comes pre-installed on Cloud Shell and supports tab-completion.

  1. (Optional) You can list the active account name with this command:

gcloud auth list

Output:

ACTIVE: * ACCOUNT: student-01-xxxxxxxxxxxx@qwiklabs.net To set the active account, run: $ gcloud config set account `ACCOUNT`
  1. (Optional) You can list the project ID with this command:

gcloud config list project

Output:

[core] project = <project_ID>

Example output:

[core] project = qwiklabs-gcp-44776a13dea667a6 Note: For full documentation of gcloud, in Google Cloud, refer to the gcloud CLI overview guide.

Preparing your environment

Enter the following commands in Cloud Shell to prepare your environment.

Set the default Compute Engine zone to us-central1-f to create your Bigtable cluster:

gcloud config set compute/zone us-central1-f

Clone the git repository containing the sample code:

git clone https://github.com/GoogleCloudPlatform/opentsdb-bigtable.git

Change to the sample code directory:

cd opentsdb-bigtable

Creating a Bigtable instance

You will be using Cloud Bigtable to store the time-series data that you collect. You must create a Bigtable instance to do that work.

Bigtable is a key/wide-column store that works especially well for time-series data, explained in Bigtable Schema Design for Time Series Data. Bigtable supports the HBase API, which makes it easy for you to use software designed to work with Apache HBase, such as OpenTSDB. You can learn about the HBase schema used by OpenTSDB in the OpenTSDB documentation.

A key component of OpenTSDB is the AsyncHBase client, which enables it to bulk-write to HBase in a fully asynchronous, non-blocking, thread-safe manner. When you use OpenTSDB with Bigtable, AsyncHBase is implemented as the AsyncBigtable client.

The ability to easily scale to meet your needs is a key feature of Bigtable. This lab uses a single-node development cluster because it is sufficient for the task and is economical. You should start your projects in a development cluster, moving to a larger production cluster when you are ready to work with production data. The Bigtable documentation includes detailed discussion about performance and scaling to help you pick a cluster size for your own work.

Now you will create your Bigtable instance.

  1. In Cloud Shell, set the environment variables for your Google Cloud zone where you will create your Bigtable cluster and GKE cluster and the instance identifier for your Bigtable cluster:

export BIGTABLE_INSTANCE_ID=bt-opentsdb export ZONE=us-central1-f
  1. Create the Bigtable instance:

gcloud bigtable instances create ${BIGTABLE_INSTANCE_ID} \ --cluster-config=id=${BIGTABLE_INSTANCE_ID}-${ZONE},zone=${ZONE},nodes=1 \ --display-name=OpenTSDB

Click Check my progress to verify the objective. Create Bigtable instance

Creating a Kubernetes Engine cluster

Kubernetes Engine provides a managed Kubernetes environment. After you create a Kubernetes Engine cluster, you can deploy Kubernetes pods to it. This Qwiklab uses Kubernetes Engine and Kubernetes pods to run OpenTSDB.

OpenTSDB separates its storage from its application layer, which enables it to be deployed across multiple instances simultaneously. By running in parallel, it can handle a large amount of time-series data. Packaging OpenTSDB into a Docker container enables easy deployment at scale using Kubernetes Engine.

In Cloud Shell create a Kubernetes cluster by running the following command:

gcloud container clusters create opentsdb-cluster \ --cluster-version=1.20 \ --zone=us-central1-f \ --machine-type n1-standard-4 \ --scopes "https://www.googleapis.com/auth/cloud-platform"

Adding the two extra scopes to your Kubernetes cluster allows your OpenTSDB container to interact with Bigtable. You can pull images from Google Container Registry without adding a scope for Cloud Storage, because the cluster can read from Cloud Storage by default. You might need additional scopes in other deployments.

Click Check my progress to verify the objective. Create Kubernetes Engine cluster

Create the images used to deploy and test OpenTSDB

To deploy and demonstrate OpenTSDB with a Bigtable storage backend, this guide uses a series of Docker container images that are deployed to GKE. You build several of these images using code from an accompanying GitHub repository with Cloud Build. When deploying infrastructure to GKE, a container repository is used. In this guide, you use Artifact Registry to manage these container images.

  1. In Cloud Shell, set the environment variables for your Google Cloud zone where you will create your Artifact Registry repository:

export PROJECT_ID=$(gcloud config get project) export REGION=us-central1 export AR_REPO=opentsdb-bt-repo export BIGTABLE_INSTANCE_ID=bt-opentsdb export ZONE=us-central1-f
  1. Create an Artifact Registry repository:

gcloud artifacts repositories create ${AR_REPO} \ --repository-format=docker \ --location=${REGION} \ --description="OpenTSDB on bigtable container images"

Create and manage the images used to deploy and demonstrate OpenTSDB

Two Docker container images are used in this lab. The first image is used for two purposes: to perform the one-time Bigtable database setup for OpenTSDB, and to deploy the read and write service containers for the OpenTSDB deployment. The second image is used to generate sample metric data to demonstrate your OpenTSDB deployment.

When you submit the container image build job to Cloud Build, you tag the images so that they are stored in the Artifact Registry after they are built.

  1. Set the environment variables for the OpenTSDB server image that uses Bigtable as the storage backend:

export SERVER_IMAGE_NAME=opentsdb-server-bigtable export SERVER_IMAGE_TAG=2.4.1
  1. Build the image using Cloud Build:

gcloud builds submit \ --tag ${REGION}-docker.pkg.dev/${PROJECT_ID}/${AR_REPO}/${SERVER_IMAGE_NAME}:${SERVER_IMAGE_TAG} \ build

Because you tagged the image appropriately, when the build is complete, the image will be managed by your Artifact Registry repository.

  1. Set the environment variables for the demonstration time series data generation image:

export GEN_IMAGE_NAME=opentsdb-timeseries-generate export GEN_IMAGE_TAG=0.1
  1. Build the image using Cloud Build:

cd generate-ts ./build-cloud.sh cd ..

Create a ConfigMap with configuration details

Kubernetes uses the ConfigMap to decouple configuration details from the container image in order to make applications more portable. The configuration for OpenTSDB is specified in the opentsdb.conf file. A ConfigMap containing the opentsdb.conf file is included with the sample code.

In this and following steps, you use the GNU envsubst utility to replace environment variable placeholders in the YAML template files will the respective values for your deployment.

  • Create a ConfigMap from the updated opentsdb-config.yaml file:

envsubst < configmaps/opentsdb-config.yaml.tpl | kubectl create -f -

Click Check my progress to verify the objective. Create ConfigMap

Create OpenTSDB tables in Bigtable

Before you can read or write data using OpenTSDB, you need to create the necessary tables in Bigtable to store that data. Follow these steps to create a Kubernetes job that creates the tables.

  1. In Cloud Shell, launch the job:

envsubst < jobs/opentsdb-init.yaml.tpl | kubectl create -f -

The job can take up to a minute or more to complete. Verify that the job has completed successfully:

kubectl describe jobs

The output should indicate 1 SUCCEEDED under the heading, Pods Statuses. Do not proceed until you see this status.

Click Check my progress to verify the objective. Create OpenTSDB tables in Bigtable

  1. Examine the table creation job logs:

OPENTSDB_INIT_POD=$(kubectl get pods --selector=job-name=opentsdb-init \ --output=jsonpath={.items..metadata.name}) kubectl logs $OPENTSDB_INIT_POD

The output is similar to the following:

create 'tsdb-uid', {NAME => 'id', COMPRESSION => 'NONE', BLOOMFILTER => 'ROW', DATA_BLOCK_ENCODING => 'DIFF'}, {NAME => 'name', COMPRESSION => 'NONE', BLOOMFILTER => 'ROW', DATA_BLOCK_ENCODING => 'DIFF'} 0 row(s) in 3.2730 seconds create 'tsdb', {NAME => 't', VERSIONS => 1, COMPRESSION => 'NONE', BLOOMFILTER => 'ROW', DATA_BLOCK_ENCODING => 'DIFF'} 0 row(s) in 1.8440 seconds create 'tsdb-tree', {NAME => 't', VERSIONS => 1, COMPRESSION => 'NONE', BLOOMFILTER => 'ROW', DATA_BLOCK_ENCODING => 'DIFF'} 0 row(s) in 1.5420 seconds create 'tsdb-meta', {NAME => 'name', COMPRESSION => 'NONE', BLOOMFILTER => 'ROW', DATA_BLOCK_ENCODING => 'DIFF'} 0 row(s) in 1.9910 seconds

The output lists each table that was created. This job runs several table creation commands, each using the format of create TABLE_NAME. The tables are successfully created when you have output in the form of 0 row(s) in TIME seconds.

  • TABLE_NAME: the name of the table that the job creates
  • TIME: the amount of time it took to create the table

Data Model

The tables you just created will store data points from OpenTSDB. In a later step, you will configure a test service to write time-series data into these tables. Time-series data points are organized and stored as follows:

Field

Required

Description

Example

metric

Required

Item that is being measured - the default key

sys.cpu.user

timestamp

Required

Epoch time of the measurement

1497561091

value

Required

Measurement value

89.3

tags

At least one tag is required

Qualifies the measurement for querying purposes

hostname=wwwcpu=0env=prod

The metric, timestamp, and tags (tag key and tag value) form the row key. The timestamp is normalized to one hour, to ensure that a row does not contain too many data points. For more information, see HBase Schema.

Deploying OpenTSDB

The rest of this Qwiklab provides instructions for making the sample scenario work. The following diagram shows the architecture you will use:

dde43b8104990596.svg

This Qwiklab uses two OpenTSDB Kubernetes deployments: one deployment sends metrics to Bigtable and the other deployment reads from it. Using two deployments prevents long-running reads and writes from blocking each other. The Pods in each deployment use the same container image. OpenTSDB provides a daemon called tsd that runs in each container.

A single tsd process can handle a high throughput of events per second. To distribute load, each deployment in this guide creates three replicas of the read and write Pods.

  1. In Cloud Shell, create a deployment for writing metrics:

envsubst < deployments/opentsdb-write.yaml.tpl | kubectl create -f -

The configuration information for the write deployment is in the opentsdb-write.yaml.tpl file in the deployments folder of the guide repository.

  1. Create a deployment for reading metrics:

envsubst < deployments/opentsdb-read.yaml.tpl | kubectl create -f -

The configuration information for the reader deployment is in the opentsdb-read.yaml.tpl file in the deployments folder of the guide repository.

  1. Check that the deployment for reading and writing metrics is running:

kubectl get pods

Repeat the last command until you see that the opentsdb-read and opentsdb-write pods all have a status of Running.

NAME READY STATUS RESTARTS opentsdb-read-6c464c8f99-rjg24 1/1 Running 0 opentsdb-read-6c464c8f99-s7hfq 1/1 Running 0 opentsdb-read-6c464c8f99-tslgh 1/1 Running 0 opentsdb-write-7b488bc569-bpx4d 1/1 Running 0 opentsdb-write-7b488bc569-ffln2 1/1 Running 0 opentsdb-write-7b488bc569-qhrls 1/1 Running 0

In a production deployment, you can increase the number of tsd Pods that are running, either manually or by using autoscaling in Kubernetes. Similarly, you can increase the number of instances in your GKE cluster manually or by using cluster autoscaler.

Click Check my progress to verify the objective. Deploy OpenTSDB

Create the OpenTSDB services

In order to provide consistent network connectivity to the deployments, you will create two Kubernetes services. One service writes metrics into OpenTSDB and the other reads.

  1. In Cloud Shell, create the service for writing metrics:

kubectl create -f services/opentsdb-write.yaml

The configuration information for the metrics reading service is contained in opentsdb-write.yaml in the services folder of the example repository. This service is created inside your Kubernetes cluster and is reachable by other services running in your cluster.

  1. Create the service for reading metrics:

kubectl create -f services/opentsdb-read.yaml

This service is created inside your Kubernetes cluster and is accessible to other services running in your cluster. In the next section of this lab you write metrics to this service.

  1. Check that the opentsdb-write and opentsdb-read services are running:

kubectl get services

You should see the opentsdb-write and opentsdb-read services listed.

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.3.240.1 443/TCP 33m opentsdb-read ClusterIP 10.3.254.251 4242/TCP 8s opentsdb-write ClusterIP 10.3.240.32 4242/TCP 39s

Click Check my progress to verify the objective. Create OpenTSDB services

Writing time-series data to OpenTSDB

There are several mechanisms to write data into OpenTSDB. After you define service endpoints, you can direct processes to begin writing data to them. This guide deploys a Python service that emits demonstrative time-series data for two metrics: Cluster Memory Utilization (memory_usage_gauge) and Cluster CPU Utilization (cpu_node_utilization_gauge).

  • In Cloud Shell, deploy the time series metric generator to your cluster:

envsubst < deployments/generate.yaml.tpl | kubectl create -f -

Examine the example time-series data with OpenTSDB

You can query time-series metrics by using the opentsdb-read service endpoint that you deployed earlier. You can use the data in a variety of ways. One common option is to visualize it. OpenTSDB includes a basic interface to visualize metrics that it collects. This lab uses Grafana, a popular alternative for visualizing metrics that provides additional functionality.

Running Grafana in your cluster requires a similar process that you used to set up OpenTSDB. In addition to creating a ConfigMap and a deployment, you need to configure port forwarding so that you can access Grafana while it is running in your Kubernetes cluster.

  1. In Cloud Shell, create the Grafana ConfigMap using the configuration information in the grafana.yaml file in the configmaps folder of the guide repository:

kubectl create -f configmaps/grafana.yaml
  1. Check that the Grafana configmap has been created:

kubectl get configmaps

You should now see grafana-config in the list of configmaps:

NAME DATA AGE grafana-config 3 15s opentsdb-config 1 18m
  1. Create the Grafana deployment using the configuration information in grafana.yaml in the deployments folder of the example repository:

kubectl create -f deployments/grafana.yaml
  1. Check that the Grafana deployment is available:

kubectl get deployments

Repeat the last command until you see the AVAILABLE value for the grafana deployment report as 1.

NAME READY UP-TO-DATE AVAILABLE AGE grafana 1/1 1 1 12s heapster-opentsdb 1/1 1 1 114s opentsdb-read 3/3 3 3 2m45s opentsdb-write 3/3 3 3 12m

Click Check my progress to verify the objective. Examining time-series data with OpenTSDB

  1. Get the name of the Grafana pod in the cluster and use it to set up port forwarding:

GRAFANA_PODS=$(kubectl get pods --selector=app=grafana \ --output=jsonpath={.items..metadata.name}) kubectl port-forward $GRAFANA_PODS 8080:3000

Verify that forwarding was successful. The output is similar to the following:

Forwarding from 127.0.0.1:8080 -> 3000
  1. To connect to the Grafana web interface, in Cloud Shell, click Web Preview and then select Preview on port 8080.

bde9fe42e27656fb.png

A new browser tab opens and connects to the Grafana web interface. After a few moments, the browser displays graphs like this:

a3414e6b1cd05089.png

This deployment of Grafana has been customized for this lab. The files configmaps/grafana.yaml and deployments/grafana.yaml configure Grafana to:

  • connect to the opentsdb-read service
  • allow anonymous authentication
  • display some basic cluster metrics

A deployment of Grafana in a production environment would implement the proper authentication mechanisms and use richer time-series graphs.

Congratulations!

You have now successfully completed the Using OpenTSDB to Monitor Time-Series Data on Cloud Platform.

Finish Your Quest

719c8c6d1e702eb3.png

This self-paced lab is part of the Qwiklabs Google Cloud Solutions II: Data and Machine Learning. A Quest is a series of related labs that form a learning path. Completing this Quest earns you the badge above, to recognize your achievement. You can make your badge (or badges) public and link to them in your online resume or social media account. Enroll in this Quest and get immediate completion credit if you've taken this lab. See other available Qwiklabs Quests.

Take Your Next Lab

Continue your Quest, check out these suggestions:

Next Steps / Learn More

  • To learn how to improve the performance of your uses of OpenTSDB, consult Bigtable Schema Design for Time Series Data.
  • The video Bigtable in Action, in Google Cloud Next 17, describes field promotion and other performance considerations.
  • The documentation on cluster scopes for Kubernetes Engine Clusters describes default scopes, such as Cloud Storage, and scopes you can add for other Google services.

Google Cloud training and certification

...helps you make the most of Google Cloud technologies. Our classes include technical skills and best practices to help you get up to speed quickly and continue your learning journey. We offer fundamental to advanced level training, with on-demand, live, and virtual options to suit your busy schedule. Certifications help you validate and prove your skill and expertise in Google Cloud technologies.

Manual Last Updated April 22, 2022
Lab Last Tested April 22, 2022

Copyright 2022 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.