Creating Custom Interactive Dashboards with Bokeh and BigQuery

Join Sign in

Creating Custom Interactive Dashboards with Bokeh and BigQuery

1 hour 10 minutes 9 Credits


Google Cloud Self-Paced Labs

In this lab you will learn how to build a custom interactive dashboard application on Google Cloud by using the Bokeh library to visualize data from publicly available BigQuery datasets. You will also learn how to deploy this application with both security and scalability in mind.

Powerful software-as-a-service (SaaS) solutions like Data Studio are extremely useful when building interactive data visualization dashboards. However, these types of solutions might not provide sufficient customization options. For those scenarios, you can use open source libraries like D3.js, Chart.js, or Bokeh to create custom dashboards. While these libraries offer a lot of flexibility for building dashboards with tailored features and visualizations, a major challenge remains: deploying these dashboards to the internet to enable easy stakeholder access.



Before you click the Start Lab button

Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources will be made available to you.

This hands-on lab lets you do the lab activities yourself in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials that you use to sign in and access Google Cloud for the duration of the lab.

What you need

To complete this lab, you need:

  • Access to a standard internet browser (Chrome browser recommended).
  • Time to complete the lab.

Note: If you already have your own personal Google Cloud account or project, do not use it for this lab.

Note: If you are using a Chrome OS device, open an Incognito window to run this lab.

How to start your lab and sign in to the Google Cloud Console

  1. Click the Start Lab button. If you need to pay for the lab, a pop-up opens for you to select your payment method. On the left is a panel populated with the temporary credentials that you must use for this lab.

    Open Google Console

  2. Copy the username, and then click Open Google Console. The lab spins up resources, and then opens another tab that shows the Sign in page.

    Sign in

    Tip: Open the tabs in separate windows, side-by-side.

  3. In the Sign in page, paste the username that you copied from the left panel. Then copy and paste the password.

    Important: You must use the credentials from the left panel. Do not use your Google Cloud Training credentials. If you have your own Google Cloud account, do not use it for this lab (avoids incurring charges).

  4. Click through the subsequent pages:

    • Accept the terms and conditions.
    • Do not add recovery options or two-factor authentication (because this is a temporary account).
    • Do not sign up for free trials.

After a few moments, the Cloud Console opens in this tab.

Activate Cloud Shell

Cloud Shell is a virtual machine that is loaded with development tools. It offers a persistent 5GB home directory and runs on the Google Cloud. Cloud Shell provides command-line access to your Google Cloud resources.

In the Cloud Console, in the top right toolbar, click the Activate Cloud Shell button.

Cloud Shell icon

Click Continue.


It takes a few moments to provision and connect to the environment. When you are connected, you are already authenticated, and the project is set to your PROJECT_ID. For example:

Cloud Shell Terminal

gcloud is the command-line tool for Google Cloud. It comes pre-installed on Cloud Shell and supports tab-completion.

You can list the active account name with this command:

gcloud auth list


Credentialed accounts: - <myaccount>@<mydomain>.com (active)

(Example output)

Credentialed accounts: -

You can list the project ID with this command:

gcloud config list project


[core] project = <project_ID>

(Example output)

[core] project = qwiklabs-gcp-44776a13dea667a6

Understanding the demo application

The demo application displays a dashboard that processes and visualizes publicly available data for all fifty U.S. states. The dashboard provides a simple level of interaction: the user can select any of the fifty states from a dropdown widget to drill down and display information about that state. The dashboard contains four different modules, and each module displays different types of information:

A screenshot of the running demo follows:


Overall architecture

At a high-level, the lab's architecture consists of six main elements. The following diagram shows how these components interact:


These elements include:

  • Client web browsers on the desktop machines and mobile devices of the dashboard users.
  • An authentication proxy managed by Cloud Identity-Aware Proxy, responsible for controlling access to the application.
  • An HTTPS Load Balancer responsible both for effectively distributing incoming web traffic to the appropriate backends and for handling SSL decryption/encryption of data coming in and out of the system.
  • The application backend responsible for serving the dashboard's dynamic content (HTML and plot data).
  • The static assets backend serving the static JavaScript and CSS files needed for client web browser rendering.

Application backend

The application backend receives incoming requests from the load balancer and then fetches and transforms the appropriate data from BigQuery. The backend returns the requested dynamic content, in the form of HTML over HTTP, and returns plotting data through WebSockets.

The following diagram shows the architecture: arch.png

The application backend contains two main subcomponents:

  • The Bokeh service. The application's centerpiece, this service contains pods running the Bokeh servers that query data from BigQuery and generate the dynamic content consumed by the client web browsers.
  • The Memcached service. Contains pods running Memcached servers that are responsible for caching frequently requested data.

These subcomponents exist in Docker containers and are deployed as Kubernetes pods to Kubernetes Engine to enable horizontal scalability.

Deploying the demo dashboard application

Initializing the environment

Download the source code by running the following command in Cloud Shell:

gsutil cp gs://spls/gsp139/ . unzip

Set your default Compute Engine zone.

export ZONE=us-east1-c gcloud config set compute/zone ${ZONE}

To allow the demo application to access BigQuery, create a service account key by running the following commands:

export PROJECT_ID=$(gcloud info --format='value(config.project)') export SERVICE_ACCOUNT_NAME="dashboard-demo" cd bigquery-bokeh-dashboard gcloud iam service-accounts create $SERVICE_ACCOUNT_NAME \ --display-name="Bokeh/BigQuery Dashboard Demo" gcloud projects add-iam-policy-binding $PROJECT_ID \ --member="serviceAccount:${SERVICE_ACCOUNT_NAME}@${PROJECT_ID}" \ --role roles/bigquery.user gcloud iam service-accounts keys create --iam-account \ "${SERVICE_ACCOUNT_NAME}@${PROJECT_ID}" \ service-account-key.json

Click Check my progress to verify the objective. Create a service account, key and granting role to service account

The Bokeh service

For this lab, the Bokeh service contains two Bokeh pods. Each pod runs a separate Bokeh server instance, which is a wrapper around Tornado's non-blocking web server. This server can serve information synchronously over HTTP for HTML content and asynchronously through Websockets for the dashboard's plotting data.

Each Bokeh server has been configured to fork into four different work subprocesses, reflected by the --num-procs parameter in the container Dockerfile's CMD statement:

CMD bokeh serve --disable-index-redirect --num-procs=4 --port=5006 --address= --allow-websocket-origin=$DASHBOARD_DEMO_DOMAIN

View the content of Dockerfile with:

cat dashboard/Dockerfile

The Bokeh server automatically load balances incoming traffic between the subprocesses. This approach increases the performance and resilience of each pod. The number of subprocesses used in this tutorial is arbitrary. In a real-world scenario, you would adjust this number based on actual production traffic and the memory and CPU resources available in the cluster.

BigQuery is accessed directly from the Bokeh application by using the read_gbq method. For details, see the relevant documentation, available in the Pandas library's gbq extension. This method has a very simple interface. It accepts a query as a parameter and submits it over the network to BigQuery. Other required parameters include the project ID and the service account key, both of which can be passed through environment variables loaded from Kubernetes secrets. The read_gbq method then transparently handles the pagination of results returned by BigQuery, and consolidates them into a single Pandas DataFrame object, which Bokeh can directly process.

As an illustration, here is an example of a query submitted with read_gbq to collect the top 100 most populated ZIP Codes in California (state code 'CA'):

query = """ SELECT A.zipcode, Population, City, State_code FROM `bigquery-public-data.census_bureau_usa.population_by_zip_2010` AS A JOIN `bigquery-public-data.utility_us.zipcode_area` AS B ON A.zipcode = B.zipcode WHERE gender = '' AND state_code = '%(state)s' ORDER BY population DESC LIMIT 100 """ import os import pandas dataframe = query % 'CA', project_id=os.environ['GOOGLE_PROJECT_ID'], private_key=os.environ['GOOGLE_APPLICATION_CREDENTIALS'], dialect='standard' )

Note that, because the resulting data needs to travel over the network, offloading as much pre-processing to BigQuery as possible is highly recommended. For example, do any necessary downsampling at the query level (by using the LIMIT clause, targeting specific fields in the SELECT clause, pre-segmenting tables, or pre-grouping results using the GROUP BY clause) in order to reduce the overall network traffic and speed up response times.

Recall that the demo dashboard displays four different plot modules. When the user selects a U.S. state, the application sends four separate queries (one for each module) to BigQuery, processing a combined total of 7 gigabytes of data. Each of these BigQuery queries takes an average of 1 to 3 seconds to run. During this processing lapse, the application remains idle, waiting for the results to return from BigQuery. To optimize the dashboard's overall response time, these four queries run in parallel on separate threads, as shown in the following code:

def fetch_data(state): """ Fetch data from BigQuery for the given US state by running the queries for all dashboard modules in parallel. """ t0 = time.time() # Collect fetch methods for all dashboard modules fetch_methods = { getattr(module, 'fetch_data') for module in modules} # Create a thread pool: one separate thread for each dashboard module with concurrent.futures.ThreadPoolExecutor(max_workers=len(fetch_methods)) as executor: # Prepare the thread tasks tasks = {} for key, fetch_method in fetch_methods.items(): task = executor.submit(fetch_method, state) tasks[task] = key # Run the tasks and collect results as they arrive results = {} for task in concurrent.futures.as_completed(tasks): key = tasks[task] results[key] = task.result() # Return results once all tasks have been completed t1 = time.time() timer.text = '(Execution time: %s seconds)' % round(t1 - t0, 4) return results

When all the results have been received from BigQuery, Bokeh's graph library plots the data. For example, the following code plots the evolution of air pollutant levels over time:

def make_plot(self, dataframe): self.source = ColumnDataSource(data=dataframe) palette = all_palettes['Set2'][6] hover_tool = HoverTool(tooltips=[ ("Value", "$y"), ("Year", "@year"), ]) self.plot = figure( plot_width=600, plot_height=300, tools=[hover_tool], toolbar_location=None) columns = { 'pm10': 'PM10 Mass (µg/m³)', 'pm25_frm': 'PM2.5 FRM (µg/m³)', 'pm25_nonfrm': 'PM2.5 non FRM (µg/m³)', 'lead': 'Lead (¹/₁₀₀ µg/m³)', } for i, (code, label) in enumerate(columns.items()): self.plot.line( x='year', y=code, source=self.source, line_width=3, line_alpha=0.6, line_color=palette[i], legend=label) self.title = Paragraph(text=TITLE) return column(self.title, self.plot)

Deploying the Bokeh service

To facilitate horizontal scalability for the application, all components of the application backend are deployed with Docker containers using Kubernetes. The steps for deploying these components to Kubernetes Engine follow.

First, create a Kubernetes Engine cluster, with an arbitrary number of 2 nodes, by running the following command in the Cloud Shell:

gcloud container clusters create dashboard-demo-cluster \ --tags dashboard-demo-node \ --num-nodes 2 \ --zone ${ZONE}

This will take a few minutes.

Click Check my progress to verify the objective. Deploying the Bokeh service

Now create Kubernetes secrets to securely store the project ID and the service account key that you created previously.

export PROJECT_ID=$(gcloud info --format='value(config.project)') kubectl create secret generic project-id \ --from-literal project-id=$PROJECT_ID kubectl create secret generic service-account-key \ --from-file service-account-key=service-account-key.json

Create a static IP address:

gcloud compute addresses create dashboard-demo-static-ip --global

Define a domain using, a free DNS service that automatically resolves domains containing an IP address back to that same IP address:

export STATIC_IP=$(gcloud compute addresses describe \ dashboard-demo-static-ip --global --format="value(address)") export DOMAIN="${STATIC_IP}" echo "My domain: ${DOMAIN}"

Create a Kubernetes secret with the domain:

kubectl create secret generic domain --from-literal domain=$DOMAIN

The Bokeh service is now ready to be deployed with this command:

kubectl create -f kubernetes/bokeh.yaml

The last step deploys an arbitrary number of pods, in this case two, each one running a separate instance of the demo application. It also exposes as environment variables the three Kubernetes secrets created earlier through the env.valueFrom.secretKeyRef entry, so that this information can be retrieved by the application to access BigQuery.

For details, see bokeh.yaml.

Check that the two Bokeh pods have started and are ready:

kubectl get pods

You should see two pods listed similar to the following:

NAME READY STATUS RESTARTS AGE bokeh-3952007524-9gc0l 1/1 Running 0 1m bokeh-3952007524-vzn8q 1/1 Running 0 1m

Repeat the last command until the pods show 1/1 READY.

Click Check my progress to verify the objective. Create static IP address and Bokeh service

The Memcached Service

When you don't expect the data displayed on a dashboard to change for a given period of time, it's important to consider performance and cost optimizations. For this purpose, and by default, BigQuery internally caches the results from duplicated queries for a period of twenty-four hours. This greatly speeds up response times and avoids unnecessary billing charges. You can find more information about this caching mechanism in the documentation.

This caching mechanism can be further improved by caching frequently used data locally inside the dashboard application's memory. This approach takes BigQuery out of the loop and avoids unnecessary network traffic to collect the cached data. This tutorial uses Memcached to handle this type of application-level caching. Memcached is extremely fast and reduces response times from a few seconds to a few milliseconds in the case of this tutorial.

One peculiarity of Memcached is that it does not have a built-in load-balancing mechanism. Instead, load-balancing between multiple Memcached pods must be done by the Memcached client, which is included in each Bokeh pod, as illustrated in the following diagram:


For the Memcached pods to be discoverable by the Memcached client, the Memcached service must be declared as headless, which is achieved by using the clusterIP: None entry in the Kubernetes configuration file (see memcached.yaml). The Memcached client can then retrieve IP addresses of the individual pods by querying kube-dns for the memcached.default.svc.cluster.local domain name.

Deploying the Memcached Service

Similar to the Bokeh service, the Memcached service is deployed by using a Docker container to Kubernetes Engine using Kubernetes.

Deploy the Memcached pods and headless service by running the following command:

kubectl create -f kubernetes/memcached.yaml

Check that the two Memcached pods have started and are ready:

kubectl get pods

You should see two more pods listed with the overall list now looking similar to the following:

NAME READY STATUS RESTARTS AGE bokeh-3952007524-9gc0l 1/1 Running 0 4m bokeh-3952007524-vzn8q 1/1 Running 0 4m memcached-3872116640-3bwcj 1/1 Running 0 1m memcached-3872116640-t25zb 1/1 Running 0 1m

Click Check my progress to verify the objective. Deploy the Memcached pods

Adding the load balancer

At this point, the dashboard application is running inside a private service. You need to introduce an HTTPS Load Balancer to expose the application to the Internet so users can access the dashboard. You will do this in three steps:

  1. Create an SSL certificate.

  2. Create a load balancer.

  3. Extend the connection timeout.

Creating an SSL Certificate

In real-world scenarios, data displayed on dashboards can be sensitive or private. You should use SSL to encrypt this data before it gets sent to the users over the Internet. The first step is to create an SSL certificate.

In a production setting, you would have to issue a certificate through an official Certificate Authority. For this lab, you can issue a self-signed certificate by running these commands:

export STATIC_IP=$(gcloud compute addresses describe \ dashboard-demo-static-ip --global --format="value(address)") export DOMAIN="${STATIC_IP}" mkdir /tmp/dashboard-demo-ssl cd /tmp/dashboard-demo-ssl openssl genrsa -out ssl.key 2048 openssl req -new -key ssl.key -out ssl.csr -subj "/CN=${DOMAIN}" openssl x509 -req -days 365 -in ssl.csr -signkey ssl.key -out ssl.crt cd -

Creating the HTTPS Load Balancer

The HTTPS load balancer comprises multiple resources:

  • A global forwarding rule.
  • A health check.
  • A backend service.
  • A URL map.
  • A target proxy.

To create all of those resources simultaneously included in the GitHub source repository:


Verify that the dashboard is live by visiting the URL returned by this command:

echo https://${STATIC_IP}

If your browser returns a "This site can't be reached" message, the HTTPS load balancer is still being deployed. Traffic coming in and out of the system uses HTTPS and Secure WebSockets, while traffic inside the system remains unencrypted using regular HTTP and WebSockets. This level of encryption is sufficient in most scenarios. In a production environment, if you require extra security, consider enforcing SSL encryption in the Google Cloud project's Virtual Private Cloud (VPC) network. Setting up SSL encryption in the VPC network is out of scope for this tutorial.

Extending the connection timeout

By default, the HTTPS load balancer closes all connections that are open for longer than 30 seconds (for details, refer to the documentation).

To allow the dashboard's WebSocket connections to stay open for a longer period of time, you need to increase the connection timeout. The following gcloud command sets the timeout to one day (86,400 seconds):

gcloud compute backend-services update dashboard-demo-service \ --global --timeout=86400

Click Check my progress to verify the objective. Adding the load balancer

Redirecting static content delivery

The Bokeh pods can directly serve all the static JavaScript and CSS files required to render the dashboard. However, letting those pods handle this task has a number of drawbacks:

  • It places an unnecessary load on the Bokeh pods, potentially reducing their capacity for serving other types of resources, such as the dynamic plotting data.
  • It forces clients to download assets from the same origin server, regardless of where the clients are geographically located.

To alleviate those drawbacks, create a separate dedicated backend to efficiently serve static assets.

This static assets backend uses two main components:

  • A bucket hosted in Cloud Storage, responsible for holding the source JavaScript and CSS assets.
  • Cloud CDN, responsible for globally distributing these assets closer to potential dashboard users, resulting in lower latency and faster page loads.

The following diagram shows the architecture:


Creating the static assets backend

Extract the static assets from the Bokeh container and store them locally in a temporary directory by running the following commands:

gcloud auth configure-docker --quiet export DOCKER_IMAGE="" export STATIC_PATH=$(docker run $DOCKER_IMAGE bokeh info --static) export TMP_PATH="/tmp/dashboard-demo-static" mkdir $TMP_PATH docker run $DOCKER_IMAGE tar Ccf $(dirname $STATIC_PATH) - \ static | tar Cxf $TMP_PATH -

Now create a bucket and upload all the assets to that bucket using the gsutil command included in the Cloud SDK:

export BUCKET="${PROJECT_ID}-dashboard-demo" # Create the bucket gsutil mb gs://$BUCKET

Upload the assets to the bucket

gsutil -m cp -z js,css -a public-read \ -r $TMP_PATH/static gs://$BUCKET

The -z option applies gzip encoding to files with the js and css filename extensions, which saves network bandwidth and space in Cloud Storage, reducing storage costs and speeding up page loads. The -a option sets the permissions to public-read so that the uploaded assets can be publicly accessible by all clients.

Create the backend bucket, including Cloud CDN support using the --enable-cdn option:

gcloud compute backend-buckets create dashboard-demo-backend-bucket \ --gcs-bucket-name=$BUCKET --enable-cdn

Click Check my progress to verify the objective. Creating the static assets backend

Configure the load balancer to redirect static content

Connect the backend bucket to the HTTPS load balancer for all traffic requesting assets inside the /static/* directory:

gcloud compute url-maps add-path-matcher dashboard-demo-urlmap \ --default-service dashboard-demo-service \ --path-matcher-name bucket-matcher \ --backend-bucket-path-rules "/static/*=dashboard-demo-backend-bucket"

Verify that the CDN is working by checking that the response headers for static assets contain x-goog-* parameters:

curl -k -sD - -o /dev/null "https://${DOMAIN}/static/js/bokeh.min.js" | grep x-goog

If you get no response wait a few minutes and run the command again. The configuration change can take up to 10 minutes to propagate.

Once the CDN is working you will see a result similar to the following:

x-goog-generation: 1517995207579164 x-goog-metageneration: 1 x-goog-stored-content-encoding: gzip x-goog-stored-content-length: 149673 x-goog-hash: crc32c=oIwVmQ== x-goog-hash: md5=LDj7Jj8zw8kWdnQsOShEFg== x-goog-storage-class: STANDARD

Click Check my progress to verify the objective. Configure the load balancer to redirect static content

Controlling access with user authentication

While the SSL encryption previously enabled at the load balancer level protects the transport, it is a recommended practice to add an authentication layer in order to control access to the application and to prevent potential abuse. One option is to configure OAuth authentication and enable Cloud Identity-Aware Proxy.

Configure the OAuth consent screen for your application

The first step is to set up an OAuth consent screen.

Open this link in a separate window to begin configuring your OAuth consent screen.

For User type, select External, and then click Create.

Enter Dashboard demo for Application name.

Under Support email, select your Qwiklabs student email address.

Click Save.

The rest of the information can remain as its default value.

Under Credentials, click Create credentials > OAuth client ID.

Under Application type, select Web application, and then enter Dashboard demo as the Name value.

Then click Create.

Save client ID and client secret shown in the OAuth client window. You will need them in the next steps:


Click OK to close the OAuth client window.

Click on your "Dashboard demo" credential.

For Authorized redirect URIs, add an entry with the following format. (Replace [CLIENT_ID] with the client IP value that you noted in the earlier step.)[CLIENT_ID]:handleRedirect

Ensure the redirect URI has been added, and then press Save.

Enable the Cloud Identity-Aware Proxy

Enable Cloud Identity-Aware Proxy by running the following commands in the Cloud Shell. Replace the [CLIENT_ID] and [CLIENT_SECRET] values with the actual values noted in the previous step:

gcloud beta compute backend-services update dashboard-demo-service --global \ --iap enabled,oauth2-client-id=[CLIENT_ID],oauth2-client-secret=[CLIENT_SECRET]

Finally, add a member to allow access to the application. Replace the [USER_ACCOUNT] value with the Qwiklabs username you used when starting the lab. The Username is in the Connection Details section of the lab.

gcloud projects add-iam-policy-binding $PROJECT_ID \ --role roles/iap.httpsResourceAccessor \ --member user:[USER_EMAIL]


Bindings: - members: - role: roles/appengine.appAdmin - members: - role: roles/appengineflex.serviceAgent - members: - role: roles/bigquery.admin - members: - role: roles/bigquery.user - members: - role: roles/bigquerydatatransfer.serviceAgent ...

This configuration change can take 5 to 10 minutes to propagate. When done, the application is protected by Cloud Identity-Aware Proxy, and only the user with the email address provided above can access it.

Click Check my progress to verify the objective. Enable the Cloud Identity-Aware Proxy

Refresh the dashboard in your browser (https://[STATIC_IP] and you will be redirected to a Google user authentication page to select an existing authenticated user session or log in as a new user. You can add more users by using the same process detailed in the last step above. Once signed in, you will be able to view your dashboard:


If you get an error that resembles the following:


NOTE: If you get an error that resembles same as above then in Chrome, simple type thisisunsafe anywhere in the browser (Click in the browser and type) and it'll bypass it.

Return to the Dashboard demo client ID. Your Authorized redirect URI may have not gone through:


If not, run through the steps above again to add the appropriate redirect URI. Then open a new tab and visit https://[STATIC_IP]—you will be prompted with the following Sign in with Google page:



You have now successfully completed the Creating Custom Interactive Dashboards with Bokeh and BigQuery Qwiklab.

Take your Next Lab

Continue learning:

Next Steps / Learn More

If you want to process and upload your own data to BigQuery, check out:

Google Cloud Training & Certification

...helps you make the most of Google Cloud technologies. Our classes include technical skills and best practices to help you get up to speed quickly and continue your learning journey. We offer fundamental to advanced level training, with on-demand, live, and virtual options to suit your busy schedule. Certifications help you validate and prove your skill and expertise in Google Cloud technologies.

Manual Last Updated December 08, 2021
Lab Last Tested December 08, 2021

Copyright 2022 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.