08 Cloud Operations and Service Mesh with Anthos

Course · 56 hours

< 1%

complete

Get access to 700+ labs and courses

AHYBRID041 Observing Anthos Services

Lab 1 hour 30 minutes universal_currency_alt 5 Credits show_chart Intermediate

info This lab may incorporate AI tools to support your learning.

Overview
Setup and requirements
Task 1. Install Cloud Service Mesh with tracing enabled
Task 2. Install the microservices-demo application on the cluster
Task 3. Review Google Cloud's operations suite functionality
Task 4. Deploy a canary release that has high latency
Task 5. Define your service level objective
Task 6. Diagnose the problem
Task 7. Roll back the release and verify an improvement
Task 8. Visualize your mesh with the Cloud Service Mesh dashboard
Review
End your lab

Get access to 700+ labs and courses

Overview

In this lab, you learn to install Cloud Service Mesh (CSM) on Google Kubernetes Engine. Cloud Service Mesh is a managed service based on Istio, the leading open source service mesh.

A service mesh gives you a framework for connecting, securing, and managing microservices. It provides a networking layer on top of Kubernetes with features such as advanced load balancing capabilities, service-to-service authentication, and monitoring without requiring any changes in service code.

Cloud Service Mesh has a suite of additional features and tools that help you observe and manage secure, reliable services in a unified way. In this lab you also learn how to use some of these features:

Service metrics and logs for HTTP(S) traffic within your mesh's GKE cluster are automatically ingested to Google Cloud.
Preconfigured service dashboards give you the information you need to understand your services.
In-depth telemetry lets you dig deep into your metrics and logs, filtering and slicing your data on a wide variety of attributes.
Service-to-service relationships at a glance help you understand who connects to which service and the services that each service depends on.
Service-level objectives (SLOs) provide insights into the health of your services. You can easily define an SLO and alert on your own standards of service health.

Cloud Service Mesh is the easiest and richest way to implement an Istio-based service mesh on your GKE Enterprise clusters.

Objectives

In this lab, you learn how to perform the following tasks:

Install Cloud Service Mesh, with tracing enabled and configured to use Cloud Trace as the backend.
Deploy Bookinfo, an Istio-enabled multi-service application.
Enable external access using an Istio Ingress Gateway.
Use the Bookinfo application.
Evaluate service performance using Cloud Trace features within Google Cloud.
Create and monitor service-level objectives (SLOs).
Leverage the Cloud Service Mesh Dashboard to understand service performance.

Setup and requirements

For each lab, you get a new Google Cloud project and set of resources for a fixed time at no cost.

Sign in to Qwiklabs using an incognito window.
Note the lab's access time (for example, 1:15:00), and make sure you can finish within that time.
There is no pause feature. You can restart if needed, but you have to start at the beginning.
When ready, click Start lab.
Note your lab credentials (Username and Password). You will use them to sign in to the Google Cloud Console.
Click Open Google Console.
Click Use another account and copy/paste credentials for this lab into the prompts.
If you use other credentials, you'll receive errors or incur charges.
Accept the terms and skip the recovery resource page.

After you complete the initial sign-in steps, the project dashboard appears.

Click Select a project, highlight your GCP Project ID, and click Open to select your project.

Activate Google Cloud Shell

Google Cloud Shell is a virtual machine that is loaded with development tools. It offers a persistent 5GB home directory and runs on the Google Cloud.

Google Cloud Shell provides command-line access to your Google Cloud resources.

In Cloud console, on the top right toolbar, click the Open Cloud Shell button.
Click Continue.

It takes a few moments to provision and connect to the environment. When you are connected, you are already authenticated, and the project is set to your PROJECT_ID. For example:

gcloud is the command-line tool for Google Cloud. It comes pre-installed on Cloud Shell and supports tab-completion.

You can list the active account name with this command:

gcloud auth list

Output:

Credentialed accounts: - @.com (active)

Example output:

Credentialed accounts: - google1623327_student@qwiklabs.net

You can list the project ID with this command:

gcloud config list project

Output:

[core] project =

Example output:

[core] project = qwiklabs-gcp-44776a13dea667a6

Note: Full documentation of gcloud is available in the gcloud CLI overview guide .

Note: The lab environment has already been partially configured: a GKE cluster named gke has been created and registered.

Task 1. Install Cloud Service Mesh with tracing enabled

A Google Kubernetes Engine (GKE) cluster named gke has already been created and registered. You will install Cloud Service Mesh onto this cluster and override the standard configuration to enable the optional tracing components.

Configure cluster access for kubectl and verify the cluster

To set environment variables for use in scripts, in Cloud Shell, run the following commands:

Set the Name environment variable.
CLUSTER_NAME=gke
Set the Zone and Region environment variables.
CLUSTER_ZONE={{{ project_0.default_zone| "Zone added at lab start" }}} CLUSTER_REGION={{{ project_0.default_region| "Region added at lab start" }}}
Set the Project ID environment variable.
PROJECT_ID={{{ project_0.project_id | "PROJECT ID added at lab start" }}}
Set the Project Number environment variable.
PROJECT_NUMBER=$(gcloud projects describe ${PROJECT_ID} \ --format="value(projectNumber)")
Set the Fleet Project ID environment variable.
FLEET_PROJECT_ID="${PROJECT_ID}"
Set the IDNS environment variable.
IDNS="${PROJECT_ID}.svc.id.goog"
Set the output directory.
DIR_PATH=.
Verify the environment variables are set correctly.
printf '\nCLUSTER_NAME:'$CLUSTER_NAME'\nCLUSTER_ZONE:'$CLUSTER_ZONE'\nPROJECT_ID:'$PROJECT_ID'\nPROJECT_NUMBER:'$PROJECT_NUMBER'\nFLEET PROJECT_ID:'$FLEET_PROJECT_ID'\nIDNS:'$IDNS'\nDIR_PATH:'$DIR_PATH'\n'
Output:
CLUSTER_NAME:gke CLUSTER_ZONE:{{{ project_0.default_zone| "Zone" }}} PROJECT_ID:{{{ project_0.project_id | "PROJECT ID" }}} PROJECT_NUMBER:946429310725 FLEET PROJECT_ID:{{{ project_0.project_id | "PROJECT ID" }}} IDNS:{{{ project_0.project_id | "PROJECT ID" }}}.svc.id.goog DIR_PATH:.
Configure kubectl to manage your GKE cluster:
gcloud container clusters get-credentials $CLUSTER_NAME \ --zone $CLUSTER_ZONE --project $PROJECT_ID
Review your kubectl configuration:
kubectl config view

Output:

apiVersion: v1 clusters: - cluster: certificate-authority-data: DATA+OMITTED server: https://34.67.123.27 name: gke_qwiklabs-gcp-04-6163e6198bad_{{{ project_0.default_zone }}}_gke contexts: - context: cluster: gke_qwiklabs-gcp-04-6163e6198bad_{{{ project_0.default_zone }}}_gke user: gke_qwiklabs-gcp-04-6163e6198bad_{{{ project_0.default_zone }}}_gke name: gke current-context: gke kind: Config preferences: {} users: - name: gke_qwiklabs-gcp-04-6163e6198bad_{{{ project_0.default_zone }}}_gke user: auth-provider: config: cmd-args: config config-helper --format=json cmd-path: /usr/lib/google-cloud-sdk/bin/gcloud expiry-key: '{.credential.token_expiry}' token-key: '{.credential.access_token}' name: gcp

Check that your cluster is running:

gcloud container clusters list

Output:

NAME: gke LOCATION: {{{ project_0.default_zone }}} MASTER_VERSION: 1.24.8-gke.2000 MASTER_IP: 35.192.65.244 MACHINE_TYPE: e2-standard-2 NODE_VERSION: 1.24.8-gke.2000 NUM_NODES: 3 STATUS: RUNNING

Note:

The master version in your installation might be different because you are using GKE's regular release channel to install the cluster.

Enable GKE Enterprise

Enable GKE Enterprise from the command line. This also automatically enables the Fleet (GKE Hub) API.

gcloud services enable --project="${PROJECT_ID}" \ anthos.googleapis.com

gcloud container clusters update gke --enable-fleet --region "${CLUSTER_ZONE}"

Verify that the cluster got registered in your fleet successfully.

gcloud container fleet memberships list --project "${PROJECT_ID}"

Install Cloud Service Mesh

Enable Cloud Service Mesh on the fleet project.

gcloud container fleet mesh enable --project "${PROJECT_ID}"

Enable automatic management of the Cloud Service Mesh control plane.

gcloud container fleet mesh update \ --management automatic \ --memberships gke \ --project "${PROJECT_ID}" \ --location "$CLUSTER_REGION"

Verify that the control plane is being managed.

gcloud container fleet mesh describe --project "${PROJECT_ID}"

Wait until the controlPlaneManagement appears changes the state from PROVISIONING to REVISION_READY. This might take several minutes.

Output:

createTime: '2024-10-09T08:36:54.101719145Z' membershipSpecs: projects/251431549018/locations/us-east1/memberships/gke: mesh: management: MANAGEMENT_AUTOMATIC membershipStates: projects/251431549018/locations/us-east1/memberships/gke: servicemesh: conditions: - code: VPCSC_GA_SUPPORTED details: This control plane supports VPC-SC GA. documentationLink: http://cloud.google.com/service-mesh/docs/managed/vpc-sc severity: INFO controlPlaneManagement: details: - code: REVISION_READY details: 'Ready: asm-managed' implementation: TRAFFIC_DIRECTOR state: ACTIVE dataPlaneManagement: details: - code: MANAGED_CONTROL_PLANE_REQUIRED details: Requires active managed control plane. state: FAILED_PRECONDITION state: code: OK description: 'Revision ready for use: asm-managed.' updateTime: '2024-10-09T08:46:33.932321311Z' name: projects/qwiklabs-gcp-04-e66a83de81ad/locations/global/features/servicemesh resourceState: state: ACTIVE spec: {} updateTime: '2024-10-09T08:38:23.722727135Z'

Enable Cloud Service Mesh to send telemetry to Cloud Trace.

cat <<EOF | kubectl apply -n istio-system -f - apiVersion: telemetry.istio.io/v1alpha1 kind: Telemetry metadata: name: enable-cloud-trace namespace: istio-system spec: tracing: - providers: - name: stackdriver EOF

Verify the config map has been enabled: kubectl get configmap

Output:

NAME DATA AGE kube-root-ca.crt 1 48m

Congratulations!

You now have a GKE cluster with Cloud Service Mesh installed. Kubernetes Metrics are being recorded to Cloud Monitoring, logs are being recorded to Cloud Logging, and distributed trace information is being sent to Cloud Trace.

Task 2. Install the microservices-demo application on the cluster

Online Boutique is a cloud-native microservices demo application. Online Boutique consists of a 10-tier microservices application. The application is a web-based ecommerce app where users can browse items, add them to the cart, and purchase them.

Google uses this application to demonstrate use of technologies like Kubernetes/GKE, Istio/ASM, Google Operations Suite, gRPC and OpenCensus. This application works on any Kubernetes cluster (such as a local one) and on Google Kubernetes Engine. It’s easy to deploy with little to no configuration.

For more information about the application, refer to the github repo.

Configure the mesh data plane

Enable Istio sidecar injection:
kubectl label namespace default istio.io/rev- istio-injection=enabled --overwrite
Output:
namespace/default labeled

To enable Google to manage your data plane so that the sidecar proxies will be automatically updated for you, annotate the namespace:
kubectl annotate --overwrite namespace default \ mesh.cloud.google.com/proxy='{"managed":"true"}'
Output:
namespace/default annotated

Install the Online Boutique application on the GKE cluster

Deploy the application:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/microservices-demo/master/release/kubernetes-manifests.yaml kubectl patch deployments/productcatalogservice -p '{"spec":{"template":{"metadata":{"labels":{"version":"v1"}}}}}'
To be able to access the application from outside the cluster, install the ingress Gateway:
git clone https://github.com/GoogleCloudPlatform/anthos-service-mesh-packages kubectl apply -f anthos-service-mesh-packages/samples/gateways/istio-ingressgateway
Install the required custom resource definitions
kubectl apply -k "github.com/kubernetes-sigs/gateway-api/config/crd/experimental?ref=v0.6.0" kubectl kustomize "https://github.com/GoogleCloudPlatform/gke-networking-recipes.git/gateway-api/config/mesh/crd" | kubectl apply -f -
Configure the Gateway:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/microservices-demo/master/release/istio-manifests.yaml
Return to Kubernetes click on Workloads and then under Networking click Gateways,Services & Ingress pages, and verify that the new deployments and services have been created on the gke cluster.
Note: You can filter these pages by cluster, object type, and namespace to make it easier to parse the information presented.
Take a couple of minutes to investigate the demo application using the console and UI.
Note: When the workloads show an OK status, use the IP address associated with the frontend-external service (for either cluster) to do one of the following:
Explore the cluster using Cloud Shell.
View the deployment using Cloud Shell
kubectl get deployments
Output:
NAME READY UP-TO-DATE AVAILABLE AGE adservice 1/1 1 1 2m39s cartservice 1/1 1 1 2m41s checkoutservice 1/1 1 1 2m44s currencyservice 1/1 1 1 2m40s emailservice 1/1 1 1 2m45s frontend 1/1 1 1 2m43s istio-ingressgateway 3/3 3 3 2m24s loadgenerator 1/1 1 1 2m41s paymentservice 1/1 1 1 2m42s productcatalogservice 1/1 1 1 2m42s recommendationservice 1/1 1 1 2m43s redis-cart 1/1 1 1 2m39s shippingservice 1/1 1 1 2m40s
View the services using Cloud Shell
kubectl get services
Output:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE adservice ClusterIP 10.28.7.144 9555/TCP 7m37s cartservice ClusterIP 10.28.12.9 7070/TCP 7m40s checkoutservice ClusterIP 10.28.15.239 5050/TCP 7m42s currencyservice ClusterIP 10.28.14.145 7000/TCP 7m39s emailservice ClusterIP 10.28.11.181 5000/TCP 7m43s frontend ClusterIP 10.28.1.40 80/TCP 7m41s frontend-external LoadBalancer 10.28.15.84 34.66.85.60 80:31445/TCP 7m41s istio-ingressgateway LoadBalancer 10.28.0.95 34.68.255.184 15021:30153/TCP,80:31266/TCP,443:30286/TCP 7m22s kubernetes ClusterIP 10.28.0.1 443/TCP 62m paymentservice ClusterIP 10.28.1.138 50051/TCP 7m41s productcatalogservice ClusterIP 10.28.8.22 3550/TCP 7m40s recommendationservice ClusterIP 10.28.0.104 8080/TCP 7m42s redis-cart ClusterIP 10.28.5.71 6379/TCP 7m38s shippingservice ClusterIP 10.28.5.69 50051/TCP 7m39s
Open a new tab and enter the IP address for the frontend-external service.
Click through various pages to get a sense of the application.

Task 3. Review Google Cloud's operations suite functionality

When you install a GKE or an Anthos cluster, you can enable cluster logs and metrics to be collected and forwarded to Cloud Logging and Cloud Monitoring. That gives you visibility about the cluster, the nodes, the pods and even the containers in that cluster. However, GKE and Anthos don't monitor the communication between microservices.

With Cloud Service Mesh, because every request goes through an Envoy proxy, microservice telemetry information can be collected and inspected. Envoy proxy extensions then send that telemetry to Google Cloud, where you can inspect it. Use Cloud Trace dashboards to investigate requests and their latencies and obtain a breakdown from all services involved in a request.

In the Google Cloud Console, on the Navigation menu, click Trace.

A trace graph displays service requests made within the demo application.
Click on a dot that displays higher up in the graph (representing a higher overall) request time.

How long did the request take?
When did the request occur?
What service was being called?
What other services were called during execution of this request?
Where was most of the time spent in processing this request?

Review the Cloud Trace documentation for more details on understanding trace information.

Task 4. Deploy a canary release that has high latency

In this task, you deploy a new version of a service that has an issue which causes high latency. In subsequent tasks, you use the observability tools to diagnose and resolve.

In Cloud Shell, clone the repository that has the configuration files you need for this part of the lab:
git clone https://github.com/GoogleCloudPlatform/istio-samples.git \ ~/istio-samples
Create the new destination rule on the gke cluster:
kubectl apply -f ~/istio-samples/istio-canary-gke/canary/destinationrule.yaml
Create the new product catalog on the gke cluster:
kubectl apply -f ~/istio-samples/istio-canary-gke/canary/productcatalog-v2.yaml
Create a traffic split on the gke cluster:
kubectl apply -f ~/istio-samples/istio-canary-gke/canary/vs-split-traffic.yaml
Note: You are creating:
- A DestinationRule to set routing of requests between the service versions
- A new deployment of the product catalog service that has high latency
- A VirtualService to split product catalog traffic 75% to v1 and 25% to v2
You can open each of the configuration files in the Cloud Shell editor to better understand the definition of each new resource.

Task 5. Define your service level objective

When you are not using Cloud Service Mesh, you can define SLOs with the Service Monitoring API. When you are using Cloud Service Mesh, as with the gke cluster, you can define and monitor SLOs via the Cloud Service Mesh dashboard.

In the Google Cloud Console, on the Navigation menu, click Kubernetes Engine to open the GKE Dashboard.

Notice that there is one cluster registered in the fleet.
On the side pannel, under Features click Service Mesh to go to the Cloud Service Mesh dashboard.

A summary of service performance, including SLO information, is displayed. You will define a new SLO for the product catalog service.
In the Services list, click productcatalogservice.
In the menu pane, click Health.
Click +CreateSLO.
In the Set your SLI slideout, for metric, select Latency.
Select Request-based as the method of evaluation.
Click Continue.
Set Latency Threshold to 1000, and click Continue.
Set Period type to Calendar.
Set Period length to Calendar day.
Note: Ninety-nine percent availability over a single day is different from 99% availability over a month. The first SLO would not permit more than 14 minutes of consecutive downtime (24 hrs * 1%), but the second SLO would allow consecutive downtime up to ~7 hours (30 days * 1%).
Set Performance goal to 99.5%.

The Preview graph shows how your goal is reflected against real historical data.
Click Continue.
Review Display name: 99.5% - Latency - Calendar day.

You can adjust this as needed.

The autogenerated JSON document is also displayed. You could use the APIs instead to automate the creation of SLOs in the future.
To create the SLO, click +CreateSLO.

Task 6. Diagnose the problem

Use service metrics to see where the problem is

Click on your SLO entry in the SLO list.

This displays an expanded view. Your SLO will probably show that you are already out of error budget. If not, wait 3-5 minutes and refresh the page. Eventually, you will exhaust your error budget, because too many of the requests to this service will hit the new backend, which has high latency.
In the menu pane, click Metrics.

Scroll down to the Latency section of the Metrics view and note that the service latency increased a few minutes earlier, around the time you deployed the canary version of the service.
From the Breakdown By dropdown, select Source service.

Which pods are showing high latency and causing the overall failure to hit your SLO?
To return to the Service Mesh page, in the menu pane, click Service Mesh.

One SLO is flagged as out of error budget, and a warning indicator is displayed next to the problem service in the Services listing.
Note: You have only defined a single SLO for a single service. In a real production environment, you would probably have multiple SLOs for each service.
Also, you have not defined any alerting policy for your SLO. You would probably have Cloud Monitoring execute an alert if you are exhausting your error budget faster than expected.

Use Cloud Trace to better understand where the delay is

In the Google Cloud Console, on the Navigation menu, click Trace > Trace explorer.
Click on a dot that charts at around 3000ms; it should represent one of the requests to the product catalog service.

Note that all the time seems to be spent within the catalog service itself. Although calls are made to other services, they all appear to return very quickly, and something within the product catalog service is taking a long time.
Note: Cloud Service Mesh is automatically collecting information about network calls within the mesh and providing trace data that documents time spent on these calls. This is useful and required no extra developer effort.
However, how time is spent within the workload, in this case the product catalog service pod, isn't instrumented directly by Istio. If needed, to get this level of detail, the developer would add instrumentation logic within the service itself.

Task 7. Roll back the release and verify an improvement

In Cloud Shell, back out the destination rule canary release:
kubectl delete -f ~/istio-samples/istio-canary-gke/canary/destinationrule.yaml
In Cloud Shell, back out the product catalog canary release:
kubectl delete -f ~/istio-samples/istio-canary-gke/canary/productcatalog-v2.yaml
In Cloud Shell, back out the traffic split canary release:
kubectl delete -f ~/istio-samples/istio-canary-gke/canary/vs-split-traffic.yaml
In the Google Cloud Console, on the Navigation menu, click Anthos > Service Mesh.
Click on productcatalogservice, and then in the menu pane, click Health.

Note the current compliance percentage.
Click Metrics.

On the latency chart, all the latency series show a dip that corresponds to when you rolled back the bad version of the workload.
Return to the Health page.
Compare the current compliance metric with the one you saw earlier. It should be higher now, reflecting the fact that you are no longer seeing high-latency requests.

Task 8. Visualize your mesh with the Cloud Service Mesh dashboard

On the Navigation menu, click Kuberenetes Engine > Service Mesh.
View the Topology on the right side.

A chart representing your service mesh is displayed.

If you don't see a full topology chart like this, it's possible that not all the topology data has been collected. It can take 10+ minutes for this data to be reflected in the chart. You can proceed to the next section and return later to see the chart.
Click on the frontend workload node and note the services called by that workload.

Take a couple of minutes to explore further and better understand the architecture of the application. You can rearrange nodes, drill down into workloads to see constituent deployments and pods, change time spans, etc.

Congratulations! You've used Google Cloud's operations suite tooling to evaluate, troubleshoot, and improve service performance on your GKE Enterprise cluster.

Review

In this lab, you learned about logging, monitoring, and tracing using Google Cloud's operations suite.

End your lab

When you have completed your lab, click End Lab. Google Cloud Skills Boost removes the resources you’ve used and cleans the account for you.

You will be given an opportunity to rate the lab experience. Select the applicable number of stars, type a comment, and then click Submit.

The number of stars indicates the following:

1 star = Very dissatisfied
2 stars = Dissatisfied
3 stars = Neutral
4 stars = Satisfied
5 stars = Very satisfied

You can close the dialog box if you don't want to provide feedback.

For feedback, suggestions, or corrections, please use the Support tab.

Copyright 2022 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.

08

Cloud Operations and Service Mesh with Anthos

AHYBRID041 Observing Anthos Services

Overview

Objectives

Setup and requirements

Activate Google Cloud Shell

Task 1. Install Cloud Service Mesh with tracing enabled

Configure cluster access for kubectl and verify the cluster

Enable GKE Enterprise

Install Cloud Service Mesh

Task 2. Install the microservices-demo application on the cluster

Configure the mesh data plane

Install the Online Boutique application on the GKE cluster

Task 3. Review Google Cloud's operations suite functionality

Task 4. Deploy a canary release that has high latency

Task 5. Define your service level objective

Task 6. Diagnose the problem

Use service metrics to see where the problem is

Use Cloud Trace to better understand where the delay is

Task 7. Roll back the release and verify an improvement

Task 8. Visualize your mesh with the Cloud Service Mesh dashboard

Review

End your lab

Before you begin

Use private browsing

Sign in to the Console

Use private browsing to run the lab