In this lab, you explore how to collect and analyze telemetry from Anthos
clusters on bare metal deployments and services.
You leverage Google Cloud's operations suite to do so. Anthos clusters on bare
metal write their logs into Cloud Logging and their metrics into Cloud Monitoring.
Once the data is saved in the cloud, you can use Google Cloud's operations
suite's dashboards to understand and troubleshoot service performance, and
have the evolution history over time. In addition, you install the Anthos
Service Mesh to collect tracing information automatically.
Bare metal doesn't always mean bare metal
Anthos clusters on bare metal can indeed run on bare metal servers, but can
also run on virtual machines in VMware, AWS, or even GCE. Doing bare metal
installs doesn't take direct advantage of VMware, AWS, or GKE APIs, it uses
a more generic approach to making Anthos work on your cluster.
In this lab, you run Anthos clusters on bare metal atop of GCE VMs.
This does require a little extra work as the load balancer VMs need Layer 2
connectivity, so the VMs have been created and configured to use
VXLAN, which encapsulates Layer 2 connections on a Layer 3 network. In a pure
bare metal deployment, you would just skip this step and everything else would
remain the same.
Objectives
In this lab, you learn how to perform the following tasks:
Learn how to configure the level of logging in your Anthos on bare metal
clusters.
Evaluate service performance using Cloud Monitoring and Cloud Logging features within Google Cloud.
Install Anthos Service Mesh and investigate the telemetry in Cloud Tracing.
Enable Node Problem Detector to detect issues in your cluster nodes.
Setup and requirements
In this task, you use Qwiklabs and perform initialization steps for your lab.
For each lab, you get a new Google Cloud project and set of resources for a fixed time at no cost.
Sign in to Qwiklabs using an incognito window.
Note the lab's access time (for example, 1:15:00), and make sure you can finish within that time.
There is no pause feature. You can restart if needed, but you have to start at the beginning.
When ready, click Start lab.
Note your lab credentials (Username and Password). You will use them to sign in to the Google Cloud Console.
Click Open Google Console.
Click Use another account and copy/paste credentials for this lab into the prompts.
If you use other credentials, you'll receive errors or incur charges.
Accept the terms and skip the recovery resource page.
After you complete the initial sign-in steps, the project dashboard appears.
Click Select a project, highlight your Google Cloud Project ID, and click
Open to select your project.
Activate Google Cloud Shell
Google Cloud Shell is a virtual machine that is loaded with development tools. It offers a persistent 5GB home directory and runs on the Google Cloud.
Google Cloud Shell provides command-line access to your Google Cloud resources.
In Cloud console, on the top right toolbar, click the Open Cloud Shell button.
Click Continue.
It takes a few moments to provision and connect to the environment. When you are connected, you are already authenticated, and the project is set to your PROJECT_ID. For example:
gcloud is the command-line tool for Google Cloud. It comes pre-installed on Cloud Shell and supports tab-completion.
You can list the active account name with this command:
[core]
project = qwiklabs-gcp-44776a13dea667a6
Note:
Full documentation of gcloud is available in the
gcloud CLI overview guide
.
Task 1. Explore the pre-created environment
This environment has been pre-configured with a hybrid Anthos cluster on bare
metal running on Google Cloud's Compute Engine VMs. The hybrid deployment model is a specialized multi-cluster deployment that enables to run user workloads on your admin cluster. In this type of deployment, you could add more user clusters to support multiple teams or workload types.
Note: Hybrid cluster deployments:
Allow re-use of control plane nodes for user workloads.
Efficient choice when there are no security concerns regarding
running user workloads on your admin cluster, which contains
sensitive data.
Check the provisioned Compute Engine infrastructure
In the Console, go to Navigation menu > Compute Engine > VM instances and verify that you have five VMs. It should look like this:
The pre-created VMs have the following functions:
setup-vm: a vm used to create the rest of the infrastructure. Check
out the startup_script to check the steps taken to create the infrastructure
and install the Anthos software.
abm-ws: an admin workstation in the same network as the cluster to
perfom the configuration, creation, and management of the Anthos cluster on
bare metal.
abm-user-cp1: a control plane nodes.
abm-user-w1 & abm-user-w2: two worker nodes.
Go to Navigation menu > Kubernetes Engine > Clusters and verify that you have
a single Anthos cluster registered. Notice you have not logged in yet. It
should look like this:
Task 2. Log in to your Anthos cluster
When you create Anthos clusters in Google Cloud, AWS, or VMware,
you typically use an environment-specific installation process that
takes advantage of native APIs.
When you create a bare metal cluster, the installation process doesn't
automatically create machines for you (typically, they are physical
machines so they can't be created out of thin air). That doesn't mean,
however, that you can't create "bare metal" clusters running on VMs
in any of those environments.
In this lab, the "bare metal" cluster has been created on GCE VMs. It will
behave almost identically to a bare metal cluster running on physical devices
in your data center. Where the administration in the lab deviates from a pure
bare metal scenario, the lab instructions will make it clear.
The cluster in the lab (see diagram below) is made of a single master node and
two worker nodes. In a production environment, you might consider using three nodes
for high availability of both the data and the control plane.
Investigate the configuration in the admin workstation
Set the Zone environment variable:
ZONE={{{ project_0.default_zone| "Zone added at lab start" }}}
In Cloud Shell, SSH into admin workstation, accepting the defauls for any
questions:
Set the KUBECONFIG environment variable with the path to the cluster's
configuration file to run kubectl commands on the cluster:
cd baremetal
export CLUSTER_ID=abm-hybrid-cluster
export KUBECONFIG=$HOME/baremetal/bmctl-workspace/$CLUSTER_ID/$CLUSTER_ID-kubeconfig
kubectl get nodes -o wide
You should see output that looks like this:
Create a token that you can use to log in to the cluster from the Console:
kubectl create token admin-user -n kube-system
Select the token in the SSH session (this will copy the token - don't try to copy with CTRL+C).
Find the abm-admin-cluster entry in the cluster list showing in the
Console and click the Actions icon (three dots) at the far right of the row.
Select Log in, select Token, then paste the token from your
Clipboard into the provided field. Click Login. When you're done,
it should look like this:
Congratulations! You have successfully signed in to your Anthos on bare metal hybrid cluster!
Check all the observability tools installed in the cluster:
kubectl -n kube-system get pods -l "managed-by=stackdriver"
__Output:__
NAME READY STATUS
gke-metrics-agent-2g59c 1/1 Running
gke-metrics-agent-m476n 1/1 Running
gke-metrics-agent-rznvw 1/1 Running
kube-state-metrics-6c6f7b8bb7-rzbpb 1/1 Running
node-exporter-mc6pl 1/1 Running
node-exporter-vxnpp 1/1 Running
node-exporter-xj8ng 1/1 Running
stackdriver-log-forwarder-4kjlm 1/1 Running
stackdriver-log-forwarder-f8jhc 1/1 Running
stackdriver-log-forwarder-x9lbv 1/1 Running
stackdriver-metadata-agent-cluster-level-js 1/1 Running
Note: Cloud Logging and Cloud Monitoring are installed and activated in each
cluster when you create a new admin, user, or hybrid cluster. The
Stackdriver agents include several components on each cluster:
Stackdriver Operator (stackdriver-operator-*). Manages
the lifecycle for all other Stackdriver agents deployed onto the
cluster.
Stackdriver Custom Resource. A resource that is
automatically created as part of the Anthos clusters on bare metal
installation process.
Stackdriver Log Forwarder (stackdriver-log-forwarder-*).
A Fluent Bit daemonset that forwards logs from each machine to the
Cloud Logging. The log Forwarder buffers the log entries on the node
locally and re-sends them for up to 4 hours. If the buffer gets full
or if the Log Forwarder can't reach the Cloud Logging API for more
than 4 hours, logs are dropped.
Stackdriver Metadata Collector
(stackdriver-metadata-agent-). A deployment that sends metadata for
Kubernetes resources such as pods, deployments, or nodes to the
Stackdriver Resource Metadata API; this data is used to enrich
metric queries by enabling you to query by deployment name, node
name, or even Kubernetes service name.
You can always disable Cloud Logging and Cloud Monitoring by deleting the resources using kubectl delete. If you want to use a third-party service for monitoring your cluster, check out the documentation to find guides to work with third-party solutions such as Elastic Stack, Splunk Connect, or Datadog.
Task 4. Explore audit logs
Verify that the configuration file used to create the Anthos cluster on bare metal does not have the setting to disable the collection of audit logs:
Note: Audit logs keep a chronological record of calls made to a cluster's Kubernetes API server. Audit logs are useful for investigating suspicious API requests and for collecting statistics.
Access the audit logs using the gcloud cli tool:
export PROJECT_ID=$(gcloud config get-value project)
echo "gcloud logging read 'logName="projects/${PROJECT_ID}/logs/externalaudit.googleapis.com%2Factivity"
AND resource.type="k8s_cluster"
AND protoPayload.serviceName="anthosgke.googleapis.com"' --limit 2 --freshness 300d" > get_audit_logs.sh
sh get_audit_logs.sh
Access the audit logs from the Console by navigating to Navigation menu > Logging > Logs Explorer and entering the following query. Then, replace the PROJECT_ID in the query with and click Run query:
On the Logs Explorer screen, remove all filters from the query box and click Run query.
Notice on the left side of the screen that you can choose to explore
Kubernetes Container, Cluster, and Node logs. If you don't see them, click
on the buble named "Log fields" to view them.
In the left-side logs field panel, select Kubernetes Node.
Now select the abm-user-cp1 node name to further filter the logs.
Task 6. Explore application logs
Come back to the Cloud Shell window you were using to SSH into the admin
workstation. Verify that the configuration file used to create the Anthos cluster on bare
metal has the setting enabled to collect metrics and logs from our applications:
Notice that a log was created every time that you ran the curl command.
Task 7. Explore cluster metrics
In the Console, go to Navigation menu > Monitoring > Dashboards. Notice that
the Anthos on bare metal installer has created some dashboards for you already.
Click the Anthos cluster control plane uptime to see the availability
of your cluster. It should look something like this.
Explore the other dashboards and the metrics available.
Task 8. Explore application metrics
In the Console, go to Navigation menu > Monitoring > Dashboards.
In the dashboards list, locate the Anthos cluster pod status and click Open dashboard settings icon > Copy Dashboard.
Name it Hello App Anthos cluster pod status and click Copy.
Click the newly created dashboard. You see that it's gathering metrics
for all pods in your cluster for a variety of metrics. Let's change it so that
it only monitors the hello-app pod that we created earlier. For that, click
the Edit Dashboard button the top right of the screen.
Change all the metrics to only monitor your hello-app pod. For that, select
one of the graphs and see how an Options tab opens on the left side of the
screen. Remove all filters, and create a single filter with the following
details. Then, click Done and do the same in the next graph.
Key
Value
Label
container
Comparison
=(equals)
Value
hello-app
Once you are done modifying all the graphs, click on Close Editor on the
top right corner of your screen.
Congratulations! You have successfully created a dashboard in Google Cloud to monitor your hello-app pod running on your on-premises cluster.
Task 9. Install Anthos Service Mesh
Come back to the Cloud Shell window you were using to SSH into the admin
workstation. Create an overlay file that will be used in the Anthos Service Mesh
installation to enable Cloud Trace:
View the Hipster Shop pods that have been created in the demo namespace.
Notice that they have a 2/2 in a ready state. That means that the two containers
are ready, including the application container and the mesh sidecar container.
Get the external IP from the istio-ingressgateway to access the Hipster
Shop that you just deployed:
kubectl get svc istio-ingressgateway -n demo
Output:
NAME TYPE CLUSTER-IP EXTERNAL-IP
istio-ingressgateway LoadBalancer 10.96.7.34 10.200.0.103
Access the Hipster Shop using the IP you copied in the previous task:
curl 10.200.0.103
In the Console, go to Navigation menu > Trace > Trace List and select a trace
from the graph that you would like to review. You see the breakdown by
service so that you can investigate bottlenecks and networking issues.
Task 11. Detect and repair node problems
Many node problems can affect the pods running on the node, such as issues in
the kernel, hardware, or container runtime. These problems are invisible to the
upstream layers in the cluster management stack.
Node Problem Detector
(NDP) detects common node problems, and reports node events and conditions. NPD
runs as a systemd service on each node and can be enabled and disabled.
Come back to the Cloud Shell window you were using to SSH into the admin
workstation. Check that the following conditions are being checked on the worker
nodes by default:
kubectl describe node abm-user-w2
In the Conditions section of the output, verify that that the following
conditions are present:
Checks provided by the Node Problem Detector:
FrequentContainerdRestart
NoFrequentContainerdRestart
KernelHasNoDeadlock
ReadonlyFilesystem
FilesystemIsNotReadOnly
FrequentUnregisterNetDevice
NoFrequentUnregisterNetDevice
ContainerRuntimeIsHealthy
KubeletIsHealthy
FrequentKubeletRestart
FrequentDockerRestart
Checks provided by Kubernetes:
MemoryPressure
DiskPressure
PIDPressure
Ready
Also, verify that there are not any Events indicating there might be a problem.
Node Problem Detector is enabled by default in your cluster starting in
Anthos clusters on bare metal v1.10.
Stop the container runtime to simulate a problem in the node:
sudo systemctl stop containerd
Go back to the first Cloud Shell window and check the conditions of the node where
we stopped containerd:
kubectl describe node abm-user-w2
Notice that the status of the Condition "ContainerRuntimeUnhealthy" is
"true". Also, notice that the following events appar now:
ContainerdUnhealthy
NodeNotReady
ContainerGCFailed
All errors might take a couple seconds to appear.
In the second Cloud Shell window, you can get more information by running
journalctl. Input letter q to exit:
journalctl -u node-problem-detector
Alternatively, you can also check the logs in Cloud Logging. To do so, in
the Console, go to Navigation menu > Logging > Logs Explorer and enter the
following query. Then, replace the PROJECT_ID in the query with and click Run query:
If you get disconnected from Cloud Shell and want to connect to the hybrid cluster:
# From the hybrid workstation (root@abm-ws)
export KUBECONFIG=$KUBECONFIG:~/baremetal/bmctl-workspace/abm-hybrid-cluster/abm-hybrid-cluster-kubeconfig
kubectl get nodes
Review
In this lab, you explored logs and metrics from Anthos clusters on bare
metal. You learned how to access cluster, application, and audit
logs, as well as cluster and application metrics.
End your lab
When you have completed your lab, click End Lab. Google Cloud Skills Boost removes the resources you’ve used and cleans the account for you.
You will be given an opportunity to rate the lab experience. Select the applicable number of stars, type a comment, and then click Submit.
The number of stars indicates the following:
1 star = Very dissatisfied
2 stars = Dissatisfied
3 stars = Neutral
4 stars = Satisfied
5 stars = Very satisfied
You can close the dialog box if you don't want to provide feedback.
For feedback, suggestions, or corrections, please use the Support tab.
Copyright 2022 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.