Exploring Cost-optimization for GKE Virtual Machines

Sign in Join
Test and share your knowledge with our community!
Get access to over 700 hands-on labs, skill badges, and courses

Exploring Cost-optimization for GKE Virtual Machines

Lab 1 hour 30 minutes universal_currency_alt 5 Credits show_chart Intermediate
Test and share your knowledge with our community!
Get access to over 700 hands-on labs, skill badges, and courses


Google Cloud self-paced labs logo


The underlying infrastructure of a Google Kubernetes Engine cluster is made up of nodes which are individual Compute VM instances. This lab shows how optimization of your cluster's infrastructure can help save costs and lead to a more efficient architecture for your applications.

You will learn strategy to help maximize utilization (and avoid underutilization) of your valuable infrastructure resources through selecting properly shaped machine types for an example workload. In addition to the type of infrastructure you’re using, the physical geographical location of that infrastructure also impacts cost. Through this exercise, you will explore how to create a cost effective strategy for managing higher availability regional clusters.


In this lab, you will learn how to:

  • Examine Resource Usage of a Deployment
  • Scale Up a Deployment
  • Migrate Your Workload to a Node Pool with an Optimized Machine Type
  • Explore Location Options for your Cluster
  • Monitor Flow Logs between Pods in Different Zones
  • Move a Chatty Pod to Minimize Cross-Zonal Traffic Costs

Setup and requirements

Before you click the Start Lab button

Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources will be made available to you.

This hands-on lab lets you do the lab activities yourself in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials that you use to sign in and access Google Cloud for the duration of the lab.

To complete this lab, you need:

  • Access to a standard internet browser (Chrome browser recommended).
Note: Use an Incognito or private browser window to run this lab. This prevents any conflicts between your personal account and the Student account, which may cause extra charges incurred to your personal account.
  • Time to complete the lab---remember, once you start, you cannot pause a lab.
Note: If you already have your own personal Google Cloud account or project, do not use it for this lab to avoid extra charges to your account.

How to start your lab and sign in to the Google Cloud console

  1. Click the Start Lab button. If you need to pay for the lab, a pop-up opens for you to select your payment method. On the left is the Lab Details panel with the following:

    • The Open Google Cloud console button
    • Time remaining
    • The temporary credentials that you must use for this lab
    • Other information, if needed, to step through this lab
  2. Click Open Google Cloud console (or right-click and select Open Link in Incognito Window if you are running the Chrome browser).

    The lab spins up resources, and then opens another tab that shows the Sign in page.

    Tip: Arrange the tabs in separate windows, side-by-side.

    Note: If you see the Choose an account dialog, click Use Another Account.
  3. If necessary, copy the Username below and paste it into the Sign in dialog.

    {{{user_0.username | "Username"}}}

    You can also find the Username in the Lab Details panel.

  4. Click Next.

  5. Copy the Password below and paste it into the Welcome dialog.

    {{{user_0.password | "Password"}}}

    You can also find the Password in the Lab Details panel.

  6. Click Next.

    Important: You must use the credentials the lab provides you. Do not use your Google Cloud account credentials. Note: Using your own Google Cloud account for this lab may incur extra charges.
  7. Click through the subsequent pages:

    • Accept the terms and conditions.
    • Do not add recovery options or two-factor authentication (because this is a temporary account).
    • Do not sign up for free trials.

After a few moments, the Google Cloud console opens in this tab.

Note: To view a menu with a list of Google Cloud products and services, click the Navigation menu at the top-left. Navigation menu icon

This lab generates a small cluster you will use. The provisioning of the cluster takes about 2-5 minutes.

If you've pressed the Start Lab button and see a blue resources being provisioned message with a loading circle, your cluster is still being created.

You can begin reading the next instructions and explanations while you wait, but any shell commands won't work until your resources are done provisioning.

Task 1. Understanding Node machine types

General overview

A machine type is a set of virtualized hardware resources available to a virtual machine (VM) instance, including the system memory size, virtual CPU (vCPU) count, and persistent disk limits. Machine types are grouped and curated by families for different workloads.

When choosing a machine type for your node pool, the general purpose machine type family typically offers the best price-performance ratio for a variety of workloads. The general purpose machine types consist of the N-series and E2-series:

A list of machine types, including E2, N2, N2D, and N1, along with their specifications such as memory and vCPUs.

The differences between the machine types might help your app and they might not. In general, E2s have similar performance to N1s but are optimized for cost. Usually, utilizing the E2 machine type alone can help save on costs.

However, with a cluster, it's most important that the resources utilized are optimized based on your application’s needs. For bigger applications or deployments that need to scale heavily, it can be cheaper to stack your workloads on a few optimized machines rather than spreading them across many general purpose ones.

Understanding the details of your app is important for this decision making progress. If your app has specific requirements, you can make sure the machine type is shaped to fit the app.

In the following section, you will take a look at a demo app and migrate it to a node pool with a well-shaped machine type.

Task 2. Choosing the right machine type for the Hello app

Inspect the Hello demo cluster's requirements

On startup, your lab generated a Hello Demo Cluster with two E2 medium (2vCPU, 4GB memory) nodes. This cluster is deploying one replica of a simple web application called Hello App, a web server written in Go that responds to all requests with the message "Hello, World!".

  1. Once your lab has finished provisioning, in the Cloud Console, click on your Navigation Menu and then click on Kubernetes Engine.
  1. In the Kubernetes Clusters window, select your hello-demo-cluster.

  2. In the following window, select the Nodes tab:

The Nodes tab highlighted within the hello-demo-cluster.

You should now see a list of your cluster's nodes:

A list of nodes, along with their specifications, such as status, CPU requests, and namespace.

Observe how GKE has utilized the resources of your cluster. You can see how much cpu and memory is being requested by each node as well as how much your nodes could potentially allocate.

  1. Click on the first node of your cluster.

Look at the Pods section. You should see your hello-server pod in the default namespace. If you don't see a hello-server pod, go back and select the second node of your cluster instead.

You'll notice the hello-server pod is requesting 400 mcpu. You should also see a handful of other kube-system pods running. These are loaded to help enable GKE's cluster services, like monitoring.

Several pods listed in the Pods section along with their statuses set to Running.

  1. Press the Back button to return to the previous Nodes page.

Already, you'll notice that it takes two E2-medium nodes to run one replica of your Hello-App along with the essential kube-system services. Also, while you're using most of the cluster's cpu resources, you're only using about 1/3rd of its allocatable memory.

If the workload for this app were completely static, you could create a machine type with a custom fitted shape that has the exact amount of cpu and memory needed. By doing this, you would consequently save costs on your overall cluster infrastructure.

However, GKE clusters often run multiple workloads and those workloads will typically need to be scaled up and down.

What would happen if the Hello App were to be scaled up?

Activate Cloud Shell

Cloud Shell is a virtual machine that is loaded with development tools. It offers a persistent 5GB home directory and runs on the Google Cloud. Cloud Shell provides command-line access to your Google Cloud resources.

  1. Click Activate Cloud Shell Activate Cloud Shell icon at the top of the Google Cloud console.

When you are connected, you are already authenticated, and the project is set to your Project_ID, . The output contains a line that declares the Project_ID for this session:

Your Cloud Platform project in this session is set to {{{project_0.project_id | "PROJECT_ID"}}}

gcloud is the command-line tool for Google Cloud. It comes pre-installed on Cloud Shell and supports tab-completion.

  1. (Optional) You can list the active account name with this command:
gcloud auth list
  1. Click Authorize.


ACTIVE: * ACCOUNT: {{{user_0.username | "ACCOUNT"}}} To set the active account, run: $ gcloud config set account `ACCOUNT`
  1. (Optional) You can list the project ID with this command:
gcloud config list project


[core] project = {{{project_0.project_id | "PROJECT_ID"}}} Note: For full documentation of gcloud, in Google Cloud, refer to the gcloud CLI overview guide.

Scale up Hello app

  1. Access your cluster's credentials:
gcloud container clusters get-credentials hello-demo-cluster --zone {{{project_0.default_zone | "ZONE"}}}
  1. Scale up your Hello-Server:
kubectl scale deployment hello-server --replicas=2

Click Check my progress to verify that you've performed the above task. Scale Up Hello App

  1. Back in the Console, select Workloads from the Kubernetes Engine menu on the left.

You might see your hello-server with the Does not have minimum availability error status.

Note: In your lab you might not see the error. Depending on the kubernetes version of your cluster, kube-system pods can have smaller resource requests and the cluster might be able to accommodate the new workload. If you don't see the error, do not worry. The error has no effect on the completion of this lab.
  1. Click on the error message to get status details. You will see that the reason is Insufficient cpu.

This is to be expected. If you remember, the cluster barely had any more cpu resources and you requested another 400m with another replica of the hello-server.

  1. Increase your node pool to handle your new request:
gcloud container clusters resize hello-demo-cluster --node-pool my-node-pool \ --num-nodes 3 --zone {{{project_0.default_zone | "ZONE"}}}
  1. When asked to continue, type y and press enter.

  2. In the Console, refresh the Workloads page until you see the status of your hello-server workload turn to OK:

hello-server with "OK" status on the Workloads page

Examine your cluster

With the workload successfully scaled up, navigate back to the nodes tab of your cluster.

  1. Click on hello-demo-cluster:

hello-demo-cluser highlighted on the nodes tab

  1. Then, click on the Nodes tab.

The larger node pool is able to handle the heavier workload, but look at how your infrastructure's resources are being utilized.

Several nodes listed within the larger node pool, along with information such as status and storage requests.

Although GKE uses a cluster's resources to the best of its ability, there is room for some optimization here. You can see that one of your nodes is using most of its memory, but two of your nodes have a considerable amount of unused memory.

At this point, if you continued to scale up the app, you would start to see a similar pattern. Kubernetes would attempt to find a node for each new replica of the hello-server deployment, fail, and then create a new node with roughly 600m of cpu.

A binpacking problem

A binpacking problem is one in which you must fit items of various volumes/shapes into a finite number of regularly shaped “bins” or containers. Essentially, the challenge is to fit the items into the fewest number of bins, “packing” them as efficiently as possible.

This is similar to the challenge faced when trying to optimize Kubernetes clusters for the applications they run. You have a number of applications, likely with various resource requirements (i.e. memory and cpu). You must try to fit these applications into the infrastructure resources Kubernetes manages for you (where most of your cluster’s cost likely lies) as efficiently as possible.

Your Hello Demo Cluster does not employ very efficient binpacking. It would be more cost-efficient to configure Kubernetes to use a machine type more fitted to this workload.

Note: For simplicity, this lab focuses on optimizing one application. In reality, your Kubernetes cluster will likely be running many applications with varying requirements. Kubernetes has tools to help you match your application workloads to various machines Kubernetes has access to. You can use multiple GKE Node Pools to have one Kubernetes cluster manage multiple machine types.

Migrate to optimized node pool

  • Create a new node pool with a larger machine type:
gcloud container node-pools create larger-pool \ --cluster=hello-demo-cluster \ --machine-type=e2-standard-2 \ --num-nodes=1 \ --zone={{{project_0.default_zone | "ZONE"}}}

Click Check my progress to verify that you've performed the above task. Create node pool

Now, you can migrate pods to the new node pool by following these steps:

  1. Cordon the existing node pool: This operation marks the nodes in the existing node pool (node) as unschedulable. Kubernetes stops scheduling new Pods to these nodes once you mark them as unschedulable.
  2. Drain the existing node pool: This operation evicts the workloads running on the nodes of the existing node pool (node) gracefully.
  • First, cordon the original node pool:
for node in $(kubectl get nodes -l -o=name); do kubectl cordon "$node"; done
  • Next, drain the pool:
for node in $(kubectl get nodes -l -o=name); do kubectl drain --force --ignore-daemonsets --delete-local-data --grace-period=10 "$node"; done

At this point, you should see that your pods are running on the new, larger-pool, node pool:

kubectl get pods -o=wide
  1. With the pods migrated, it's safe to delete the old node pool:
gcloud container node-pools delete my-node-pool --cluster hello-demo-cluster --zone {{{project_0.default_zone | "ZONE"}}}
  1. When asked to continue, type y and enter.

Deletion can take about 2 minutes. Read the next section while you wait.

Cost analysis

You're now running the same workload which required three e2-medium machines on one e2-standard-2 machine.

Take a look at the hourly cost for having an e2 standard and shared core machine types up:

Standard: Several Standard e2 machine types listed, along with their specifications, such as Virtual CPUs, memory, and price.

Shared Core: Several Shared e2 machine types listed, along with their specifications, such as vCPUs, memory, and price.

The cost of three e2-medium machines would be about $0.1 an hour while one e2-standard-2 is listed at about $0.067 an hour.

Saving $.04 an hour may seem small, but this cost can add up over the lifetime of a running application. It would be even more noticeable at a larger scale too. Because the e2-standard-2 machine can pack your workload more efficiently and there's less unused space, the cost of scaling up would grow less quickly.

This is interesting because E2-medium is a shared cored machine type which is designed to be cost effective for small, non resource intensive applications. But, for the Hello-App's current workload, you see that using a node pool with a larger machine type ends up being a more cost effective strategy.

In the Cloud Console, you should still be on the Nodes tab of your hello-demo cluster. Refresh this tab and examine the CPU Requested and CPU Allocatable fields for your larger-pool node.

Notice there's room for further optimization. The new node could fit another replica of your workload without needing to provision another node. Or again, you could potentially choose a custom sized machine type that fits the CPU and memory needs of the application saving even more resources.

It should be noted that these prices will vary depending on the location of your cluster. The next part of this lab will deal with selecting the best region and managing a regional cluster.

Selecting the appropriate location for a cluster

Regions and zones overview

Compute Engine resources, used for your cluster's nodes, are hosted in multiple locations worldwide. These locations are composed of regions and zones. A region is a specific geographical location where you can host your resources. Regions have three or more zones.

Resources that live in a zone, such as virtual machine instances or zonal persistent disks, are referred to as zonal resources. Other resources, like static external IP addresses, are regional. Regional resources can be used by any resource in that region, regardless of zone, while zonal resources can only be used by other resources in the same zone.

When choosing a region or zone, it's important to think about:

  1. Handling failures - If your resources for your app are only distributed in one zone and that zone becomes unavailable, your app will also become unavailable. For larger scale, high demand apps it's often best practice to distribute resources across multiple zones or regions in order to handle failures.
  2. Decreased network latency - To decrease network latency, you might want to choose a region or zone that is close to your point of service. For example, if you mostly have customers on the East Coast of the US, then you might want to choose a primary region and zone that is close to that area.

Best practices for clusters

Costs vary between regions based on a variety of factors. For example, resources in the us-west2 region tend to be more expensive than those in us-central1.

When selecting a region or zone for your cluster, examine what your app is doing. For a latency-sensitive production environment, placing your app in a region/zone with decreased network latency and increased efficiency would likely give you the best performance-to-cost ratio.

However, a non-latency-sensitive dev environment could be placed in a less expensive region to reduce costs.

Note: For more information on VMs and pricing per region, refer to the VM instance pricing documentation.

Handling cluster availability

The types of available clusters in GKE include zonal (single-zone or multi-zonal) and regional. At face value, a single-zone cluster will be the least expensive option. However, for high-availability of your applications, it is best to distribute your cluster’s infrastructure resources across zones.

For many cases, prioritizing availability in your cluster through a multi-zonal or regional cluster results in the best cost-to-performance architecture.

Note: A multi-zonal cluster has at least one additional zone defined but only has a single replica of the control plane running in a single zone. Workloads can still run during an outage of the control plane's zone, but no configurations can be made to the cluster until the control plane is available.

A regional cluster has multiple replicas of the control plane, running in multiple zones within a given region. Nodes also run in each zone where a replica of the control plane runs. Regional clusters consume the most resources but offer the best availability.

Learn more from the article Types of clusters.

Task 3. Managing a regional cluster


Managing your cluster's resources across multiple zones becomes a little more complex. If not careful, it's possible to accumulate extra costs from unnecessary inter-zonal communication between your pods.

In this section, you'll observe the network traffic of your cluster and move two chatty pods, pods which are generating a lot of traffic to one another, to be in the same zone.

  1. In your Cloud Shell tab, create a new regional cluster (this command will take a few minutes to complete):
gcloud container clusters create regional-demo --region={{{project_0.default_region | "REGION"}}} --num-nodes=1

In order to demonstrate traffic between your pods and nodes, you will create two pods on separate nodes in your regional cluster. We will use ping to generate traffic from one pod to the other to generate traffic which we can then monitor.

  1. Run this command to create a manifest for your first pod:
cat << EOF > pod-1.yaml apiVersion: v1 kind: Pod metadata: name: pod-1 labels: security: demo spec: containers: - name: container-1 image: wbitt/network-multitool EOF
  1. Create the first pod in Kubernetes by using this command:
kubectl apply -f pod-1.yaml
  1. Next, run this command to create a manifest for your second pod:
cat << EOF > pod-2.yaml apiVersion: v1 kind: Pod metadata: name: pod-2 spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: security operator: In values: - demo topologyKey: "" containers: - name: container-2 image: EOF
  1. Create the second pod in Kubernetes:
kubectl apply -f pod-2.yaml

Click Check my progress to verify that you've performed the above task. Check Pod Creation

The pods you created use the node-hello container and output a Hello Kubernetes message when requested.

If you look back at the pod-2.yaml file you created, you can see that Pod Anti Affinity is a defined rule. This enables you to ensure that the pod is not scheduled on the same node as pod-1. This is done by matching an expression based on pod-1’s security: demo label. Pod Affinity is used to ensure pods are scheduled on the same node, while Pod Anti Affinity is used to ensure pods are not scheduled on the same node.

Note: Kubernetes also has a concept of Node Affinity, which can help you optimize which applications are run on what machine types.

In this case, Pod Anti Affinity is being used to help illustrate traffic between nodes, but smart use of Pod Anti Affinity and Pod Affinity can help you utilize your regional cluster's resources even better.

  1. View the pods you created:
kubectl get pod pod-1 pod-2 --output wide

You will see both pods returned with a Running status and internal IPs.

Sample output: NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod-1 1/1 Running 0 4m40s gke-regional-demo-default-pool-abb297f1-tz3b pod-2 1/1 Running 0 4m31s gke-regional-demo-default-pool-28b6c708-qn7q

Take note of the IP address of pod-2. You will use it in the following command.

Simulate traffic

  1. Get a shell to your pod-1 container:
kubectl exec -it pod-1 -- sh
  1. In your shell, send a request to pod-2 replacing [POD-2-IP] with the internal IP displayed for pod-2:
ping [POD-2-IP]

Take note of the average latency it takes to ping pod-2 from pod-1.

Examine flow logs

With pod-1 pinging pod-2, you can enable flow logs on the subnet of the VPC the cluster was created to observe traffic.

  1. In the Cloud Console, open the Navigation Menu and select VPC Network in the Networking section.
  1. Locate the default subnet in the region and click on it.

Highlighted default subnet for us-central1

  1. Click Edit at the top of the screen.

  2. Select Flow Logs to be On.

  3. Then, click Save.

  4. Next, click View Flow Logs in Logs Explorer.

The View Flow Logs option highlighted within the Flow Logs menu.

You'll now see a list of logs that display a large amount of information any time something was sent or received from one of your instances.

A list of logs, along with their summary, timestamps, and severity.

If the logs are not generated then replace / before vpc_flows with %2F as given in the above screenshot.

This can be a little difficult to read. Next, export it to a BigQuery table so you can query the relevant information.

  1. Click on More actions > Create Sink.

Two options in the More actions drop-down menu: Create sink, and Manage alerts.

  1. Name your sink FlowLogsSample.

  2. Click Next.

Sink destination

  • For your Sink Service, select BigQuery Dataset.
  • For your BigQuery Dataset, select Create new BigQuery dataset.
  • Name your dataset as 'us_flow_logs', and click CREATE DATASET.

Everything else can be left as-is.

  1. Click Create Sink.

  2. Now, inspect your newly created dataset. In the Cloud Console, from the Navigation Menu in the Analytics section, click BigQuery.

  1. Click Done.

  2. Select your project name, and then select the us_flow_logs to see the newly created table. If no table is there, you may need to refresh until it has been created.

  3. Click on the compute_googleapis_com_vpc_flows_xxx table under your us_flow_logs dataset.

The Explorer pane, which includes the search box, the pinned projects, and table under the us_central_flow_logs dataset.

  1. Click on Query > In new tab.

  2. In the BigQuery Editor, paste this in between SELECT and FROM: AS src_zone, jsonPayload.src_instance.vm_name AS src_vm, AS dest_zone, jsonPayload.dest_instance.vm_name
  1. Click Run.

Query results displayed in the BigQuery Editor, along with the options: Save, More, and Schedule.

You'll see the flow logs from before but filtered by source zone, source vm, destination zone, and destination vm.

Locate a few rows where there are calls being made between two different zones in your regional-demo cluster.

Two rows within the regional-demo cluster: us-central1-a and us-central1-c.

Note: Your logs will not be exactly the same numerically as the example image.

Observing the flow logs, you can see that there is frequent traffic between different zones.

Next, you will move the pods into the same zone and observe the benefits.

Move a chatty pod to minimize cross-zonal traffic costs

  1. Back in Cloud Shell, press Ctrl + C to cancel the ping command.

  2. Type the exit command to exit pod-1's shell:

  1. Run this command to edit the pod-2 manifest:
sed -i 's/podAntiAffinity/podAffinity/g' pod-2.yaml

This changes your Pod Anti Affinity rule into a Pod Affinity rule while still using the same logic. Now pod-2 will be scheduled on the same node as pod-1.

  1. Delete the current running pod-2:
kubectl delete pod pod-2
  1. With pod-2 deleted, recreate it using the newly edited manifest:
kubectl create -f pod-2.yaml

Click Check my progress to verify that you've performed the above task. Simulate Traffic

  1. View the pods you created and ensure they are both Running:
kubectl get pod pod-1 pod-2 --output wide

From the output, you can see that Pod-1 and Pod-2 are now running on the same node.

Take note of the IP address of pod-2. You will use it in the following command.

  1. Get a shell to your pod-1 container:
kubectl exec -it pod-1 -- sh
  1. In your shell, send a request to pod-2 replacing [POD-2-IP] with the internal IP for pod-2 from the earlier command:
ping [POD-2-IP]

You'll notice the average ping time between these pods is much faster now.

At this point, you can go back to your flow logs BigQuery dataset and check recent logs to verify there are no more undesired inter-zonal communications.

Cost analysis

Take a look at the VM-VM egress pricing within Google Cloud:

Three Google Cloud traffic types listed, along with their prices which range from $0 to $0.01 per GB.

When the pods were pinging each other from different zones, it was costing $0.01 per GB. While that may seem small, it could add up very quickly in a large scale cluster with multiple services making frequent calls between zones.

When you moved the pods into the same zone, the pinging became free of charge.


You explored cost-optimization for Virtual Machines that are part of a GKE cluster. First, by migrating a workload to a node pool with a better fitted machine type. Then by building an understanding of the pros and cons of different regions And Finally by moving a chatty pod within a regional cluster to always be in the same zone as the pod it was communicating with.

This lab has shown you cost effective tools and strategies for GKE VMs, but optimizing your virtual machines means first understanding your application and its needs. Knowing what kinds of workloads you will be running and estimating your application's demands will almost always influence the decision as to which location and machine type will be most effective for the virtual machines underlying your GKE cluster.

Efficient utilization of your cluster's infrastructure will go a long way toward optimizing your costs.

Next steps / Learn more

Google Cloud training and certification

...helps you make the most of Google Cloud technologies. Our classes include technical skills and best practices to help you get up to speed quickly and continue your learning journey. We offer fundamental to advanced level training, with on-demand, live, and virtual options to suit your busy schedule. Certifications help you validate and prove your skill and expertise in Google Cloud technologies.

Manual Last Updated April 30, 2024

Lab Last Tested April 30, 2024

Copyright 2024 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.