arrow_back

Improving Network Performance II

Join Sign in

Improving Network Performance II

1 hour 30 minutes 7 Credits

GSP046

Google Cloud self-paced labs logo

Overview

In this hands-on lab learn through some real-word scenarios, re-create the environments, and work on improving network performance using load balancers and other Google Cloud products. At the end, you'll go through an exercise for increasing the window size to improve bandwidth.

This lab was adapted from blog posts by Colt McAnlis: Profiling GCP's Load Balancers, Removing the Need for Caching Servers, with GCP's Load Balancers, and The Bandwidth Delay Problem. Colt blogs about Google Cloud network performance on Medium.

Some of the resources you'll need for this lab have been created for you to save time, and some you will create.

What you'll learn

  • What load balancers are offered through Google Cloud

  • How load balancers can improve network performance

  • How to resize your window

Prerequisites

  • Basic knowledge of Google Cloud services (best obtained by having previously taken the labs in the Google Cloud Essentials Quest

  • Basic Google Cloud networking and TCP/IP knowledge (best obtained by having taken the earlier labs in the Networking in the Google Cloud Quest)

  • Basic Unix/Linux command line knowledge

Setup and Requirements

Before you click the Start Lab button

Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources will be made available to you.

This hands-on lab lets you do the lab activities yourself in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials that you use to sign in and access Google Cloud for the duration of the lab.

To complete this lab, you need:

  • Access to a standard internet browser (Chrome browser recommended).
Note: Use an Incognito or private browser window to run this lab. This prevents any conflicts between your personal account and the Student account, which may cause extra charges incurred to your personal account.
  • Time to complete the lab---remember, once you start, you cannot pause a lab.
Note: If you already have your own personal Google Cloud account or project, do not use it for this lab to avoid extra charges to your account.

How to start your lab and sign in to the Google Cloud Console

  1. Click the Start Lab button. If you need to pay for the lab, a pop-up opens for you to select your payment method. On the left is the Lab Details panel with the following:

    • The Open Google Console button
    • Time remaining
    • The temporary credentials that you must use for this lab
    • Other information, if needed, to step through this lab
  2. Click Open Google Console. The lab spins up resources, and then opens another tab that shows the Sign in page.

    Tip: Arrange the tabs in separate windows, side-by-side.

    Note: If you see the Choose an account dialog, click Use Another Account.
  3. If necessary, copy the Username from the Lab Details panel and paste it into the Sign in dialog. Click Next.

  4. Copy the Password from the Lab Details panel and paste it into the Welcome dialog. Click Next.

    Important: You must use the credentials from the left panel. Do not use your Google Cloud Skills Boost credentials. Note: Using your own Google Cloud account for this lab may incur extra charges.
  5. Click through the subsequent pages:

    • Accept the terms and conditions.
    • Do not add recovery options or two-factor authentication (because this is a temporary account).
    • Do not sign up for free trials.

After a few moments, the Cloud Console opens in this tab.

Note: You can view the menu with a list of Google Cloud Products and Services by clicking the Navigation menu at the top-left. Navigation menu icon

Use case 1: Performance overhead using Google Cloud load balancer

With an estimated 4 billion IoT devices in the world by 2020, scaling your services to handle the sudden load of a million new devices is an important feature.

This concern was explicitly highlighted by the IoT Roulette group. Their company offers a scalable way for IoT devices to connect with cloud backends though a common and simple transport layer and SDK. As you can imagine, scale is really important to IoT Roulette.

Profiling Google Cloud's load balancers

If the term Load Balancer is new to you, the gist is this: These are intermediary systems which scale up new backends based upon frontend work load. When configured the right way, load balancers make it possible to do cool stuff, like 1,000,000 queries per second.

Google Cloud provides two primary load balancers: TCP/UDP (aka L4) and HTTP (aka L7). The TCP/UDP load balancer acts much like you'd expect - a request comes in from the client to the load balancer, and is forwarded along to a backend directly.

b1f565dcb150a43.png

The HTTP load balancer, on the other hand, has some serious magic going on. For the HTTP load balancer, traffic is proxied through Google Front End (GFE) Servers which are typically located close to the edge of Google's global network. The GFE terminates the TCP session and connects to a backend in a region which has the capacity to serve the traffic.

be39513b9a82881a.png

So, with this in mind, let's set up some load balancers and test them, to determine which type of Google Cloud load balancer will provide the best performance.

Establishing baseline with no load balancer

Before talking about what types of overhead the load balancers add, we first need to get a baseline estimate of connectivity to a cloud instance from some machine outside of the cloud cluster. For IoT Roulette, they were more concerned with bandwidth performance rather than just simple ping times, so we tested 500 iterations of a curl command from an instance in Europe fetching from an instance in US-CENTRAL1.

80ba903117e6fee3.png

Now you try. Create a baseline for the instances provided in the lab.

Create baseline for network speed without load balancers

Create a new instance with following configuration:

Configuration Value
Name instance-3
Region europe-west1
Zone europe-west1-d
Boot Disk CentOs 7
  • Click the Adavanced options drop-down.
  • Click Networking tab
  • In Network interfaces, click on the default and select:
  • Network nw-testing from the drop-down
  • Subnetwork: europe-west1

Click Create.

Click Check my progress to verify the objective.

Create a new VM instance

Now SSH into the instance-3 and execute the following command:

sudo yum install bc

Run the following shell script (on instance-3) at the command prompt. Substitute in the external IP Address of any instance from instance group (i.e. instances with prefix myinstance-grp-pri) where it says <ip>.

for ((i=1;i<=10;i++)); do echo total_time: $(echo "`curl -s -o /dev/null -w '1000*%{time_total}\n' -s http://<ip>`" | bc); done

The above command provides output of ten runs of curl in ms. (i.e. milliseconds) against a simple webserver, testing between Europe and the US, to get a baseline performance between regions.

Example Output (yours will differ):

total_time: 212.000 total_time: 212.000 total_time: 212.000 total_time: 212.000 total_time: 212.000 total_time: 211.000 total_time: 212.000 total_time: 212.000 total_time: 212.000 total_time: 212.000

Copy the curl command output. You're going to use it in the next step.

Comparing load balancer types

Now you'll determine what the performance overhead is when using a load balancer. You'll create a HTTP load balancer and a TCP load balancer, test them, and compare the differences.

HTTP Load Balancer

Navigate to Navigation menu > Network Services > Load balancing.

Click on Create load balancer, then click Start Configuration on the HTTP(S) Load Balancing tile.

42e16666be0d2a8d.png

On Create a load balancer page leave the default settings and click Continue.

Name the load balancer ‘my-http-lb'.

new-load-balancer.png

Create frontend

Click on Frontend Configuration. You don't have to do anything here. Keep the default values.

Backend configuration

Click Backend configuration.

From the Backend services and backend buckets drop-down, select Backend services & backend buckets > Create the backend service. Use the following configuration:

  • Name: my-http-lb-backend
  • Instance group: Select myinstance-grp-pri-igm from the drop-down.
  • Port: 80
  • Health Check: Select Create a health check.

create-a-backend.png

Configure the Health Check with following details:

  • Name: instance-group-hc
  • Protocol: TCP
  • Port: 80
  • Proxy Protocol: None
  • Request: /
  • Check Interval: 10 seconds
  • Unhealthy Threshold: 3 consecutive failures
  • Click Save.

backend-health-check.png

  • Cloud CDN: check Enable Cloud CDN

Click Create.

create-backend.png

Create Backend Bucket

Again from the Backend services & backend buckets menu, choose Backend services & backend buckets > Create a backend bucket.

create-backend-bucket.png

Create a bucket with the following configuration:

  • Name: my-http-lb-backend-bucket
  • Cloud Storage Bucket: Click Browse, then click the New Bucket icon.

Storage bucket details:

  • Name: Enter a unique name for your bucket
  • Storage class: Multi-Regional
  • Location: United States
  • Click Create
  • Then click Select.

One more setting:

  • Cloud CDN: check Enable Cloud CDN

create-a-bucket.png

Click Create.

Configure Host and Path Rules

Now click on Routing rules.

In the my-http-lb-backend-bucket row, configure:

  • Hosts: *
  • Paths: /static/*

routing-rules.png

Review and Finalize

Click Review and Finalize to see all of the settings for your HTTP Load Balancer, then click the Create button when you are done.

review-finalize.png

Click Check my progress to verify the objective. It might take a while to reflect.

Create HTTP load balancer

Create TCP load balancer

Navigate to Navigation menu > Network Services > Load balancing.

Click on Create load balancer, then click Start Configuration on the TCP Load Balancing tile:

42e16666be0d2a8d.png

Under Backend Type, select Target Pool or Target Instance.

The other sections can use the default settings for your load balancer. Click Continue.

dfe80ec4f6af44d7.png

Name the load balancer my-tcp-lb.

Choose us-central1 for the region.

Ensure Backend configuration is selected.

Backend configuration

Use the following configuration:

  • Instance Group: select myinstance-grp-pri-igm
  • Health Check: Select Create a health check from the drop-down.

create-tcp-backend.png

Configure Health Check with following details:

  • Name: my-tcp-lb-hc
  • Click Save.

tcp-health-check.png

Create Frontend

Click on Frontend Configuration. Add port number ‘80', and leave other values as they are. Then click Done.

Review and Finalize

Click Review and Finalize to see all of the settings for your TCP load balancer, then click Create.

Create TCP load balancer

Test HTTP Load Balancer:

Navigate to Navigation menu > Network services

Click on my-http-lb load balancer and copy the IP:Port under the Details tab.

SSH into the instance-3 and execute the following command (Make sure to replace <ip>:<port> with the copied IP:Port from HTTP load balancer):

for ((i=1;i<=10;i++)); do echo total_time: $(echo "`curl -s -o /dev/null -w '1000*%{time_total}\n' -s http://<ip>:<port>`" | bc); done

The above command provides output of curl in ms. (i.e. milliseconds).

Example Output (yours will differ):

total_time: 121.000 total_time: 109.000 total_time: 119.000 total_time: 121.000 total_time: 109.000 total_time: 120.000 total_time: 120.000 total_time: 109.000 total_time: 251.000 total_time: 122.000

Copy the curl command output. You're going to use it in the next step.

Test TCP load balancer

Navigate to Navigation menu > Networking Services.

SSH into the instance-3 and execute the following command (Make sure to replace <ip>:<port> with the copied IP:Port from TCP load balancer):

for ((i=1;i<=10;i++)); do echo total_time: $(echo "`curl -s -o /dev/null -w '1000*%{time_total}\n' -s http://<ip>:<port>`" | bc); done

The above command provides output of curl in ms. (i.e. milliseconds).

Example Output (yours will differ):

total_time: 212.000 total_time: 212.000 total_time: 213.000 total_time: 212.000 total_time: 212.000 total_time: 212.000 total_time: 212.000 total_time: 212.000 total_time: 211.000 total_time: 212.000

Look back at the baseline data you collected earlier, and compare it to these two 2 graphs with load balancers. You can see that the HTTP load balancer is faster than the TCP load balancer.

Why HTTP load balancing can be faster

When you look at how the HTTP load balancer is working under the hood, you can see why there's a difference in performance.

When a request hits the HTTP load balancer, the TCP session stops there. The GFEs then move the request on to Google's private network and all further interactions happen between the GFE and your specific backend.

Now here's the important bit: After a GFE connects to a backend to handle a request, it keeps that connection open.This means that future requests from this GFE to this specific backend will not need the overhead of creating a connection. Instead, it can just start sending data ASAP.

8c129cac36808139.png 936de5d063e6162b.png

The results of this setup means that the first query that causes the GFE to open a connection to the backend will see higher response times (those are the large spikes). However, subsequent packets routed to the same backend will see a lower minimum latency.

Google Cloud load balancer conclusion

So, for IoT Roulette's case, the decision was made to use a HTTP load balancer. The more popular your service gets in a region, the better your HTTP load balancer will perform. The first ~100 clients will result in GFE connections being made to the backends, while the next ~100 clients will have faster fetches since those connections have already been established.

While the HTTP load balancer was ideal for IoT Roulette, there's a whole set of reasons why a TCP balancer might be better for your use case. To figure out which one is best for the scenario you're running into, please watch the NEXT 2016 talk or read the official docs.

Use case 2: Load balancing and caching / CDN

Tax Lemming is a startup out of Vancouver, BC, which focuses on helping you make sense of the purchases, taxes, and basic bookkeeping for your small business. Tax Lemming has too many instances spinning up, and need cut that number down before they go for another round of VC capital.

Like most web based applications, Tax Lemming's architecture looks something like this:

2a34cc501bf22a3b.png

At first glance, it's easy to see that this can become expensive quickly. All the static content is being sent through the server instance, so they end up paying for compute hours, and each request requires the Apache server to re-hit the relational DB.

The common answer

For most developers of web based applications, a solution to this problem looks something like this:

4e7c0f1fc8e0fee2.png

Generally, add Nginx as a reverse cache proxy to the Apache server and modify the source assets so the client fetches the big files from the CDN so that they don't end up eating server time (and are sent out faster).

Although a very tried-and-true solution, there's a few issues:

  • The reverse proxy (aka Nginx, Varnish, S quid etc) needs a whole new instance to be setup per region. Technically, it's still cheaper than the load on the server instance itself, but that's still a lot of overhead in terms of cost to cache & send content.
  • Static assets using a de facto CDN. This typically requires a whole new URL scheme most of the time though. (e.g. instead of url="./abc/tacos.jpg" we get url="cdn.cloudr.com/1721617282.jpg") This isn't so much a performance issue, but more an aesthetic one.

Given these two nuances, there are some Google Cloud features that Tax Lemmings can use to improve performance.

Additional load balancer features

Google Cloud Load Balancer already allows you to split traffic between instance groups, regions, etc. It also has two nice features that could help Tax Lemmings reduce their instance count further and reduce some upkeep costs as well.

CDN the cacheable requests

Google Cloud's load balancer can cache common requests and move them to Google Cloud CDN. This will reduce latency as well as reduce the number of requests needing to be served by the instance. As long as the request is cached, it will be served directly at Google's network edge.

You enabled CDN when you created the load balancer earlier!

aef6b38ca6c0cbee.png

Here's what the performance graph looks like: fetching the request through the load balancer to the instance directly vs. fetching the request through the load balancer with the "enable cloud CDN" turned on.

b20316894ff8f60a.png

You can see it: once the asset gets cached, the response time drops significantly.

What's even better about this is that there are no extra instances needed for this process. While Nginix, Varnish, and Squid require dedicated hosting on a VM, Google's load balancer + CDN is serverless.

Cloud Storage for static assets

If your content is static you can reduce the load on the web servers by serving content directly from Cloud Storage. Typically, your compute URL is separated from your CDN URL (e.g. www.gcpa.com/mypage/17266282.jpg vs cdni.cloudcdn.com/17266282.jpg). With Google Cloud Load Balancer, you're able to create a host routing path, so that gcpa.com/mypage will route over to fetch assets from a public Cloud Storage bucket, which is cached to Google Cloud CDN.

The setup for adding a cloud storage bucket is straightforward. Earlier in the lab you created a storage bucket. You can have a backend service which points to an instance group; or a backend bucket, which points to a gcs bucket.

f0b869921c53276.png

Performance of your backend bucket is improved even more by enabling cloud CDN when you create it in the load balancer UI.

Here is a quick exercise for creating a storage bucket with a static folder:

Create a Storage Bucket

Navigate to Navigation menu > Storage, then click on "Create Bucket".

  • Name: Enter a unique name for your bucket and then click Continue.
  • Storage class: Multi-Regional
  • Location: United States
  • Click Create

bucket-with-static-folder.png

Create a static Folder

While in the bucket, click "Create folder" in the top toolbar.

  • Name: static

Click Create.

Creating a bucket this way, you'll be able to select it when building the backend bucket to your load balancer.

Click Review and Finalize to see all of the settings for your TCP load balancer, then click Create.

Create a Storage Bucket

Load balancing and caching conclusion

Armed with the information about the power of Google's Load Balancer, Tax Lemmings updated their architecture:

4a43bb5bf0eb1f83.png

This change resulted in lots of new caching for dynamic request (with proper headers) which helped reduce the number of requests to the backend, spinning up less instances to service the same load.

Use case 3: Google Cloud networking and bandwidth delay problem

Tutorama is a company built to create a crowd-sourced solution to instructional videos. Users all over the world can upload screencasts, recordings, and other videos to help teach people how to do everything from properly walking a dog to changing the oil in your car. Tutorama recently upgraded their connection from on-premise to Google Cloud; but despite having big pipes to connect to Google Cloud instances, they still get poor performance.

We've looked at some networking issues already, and they were all ruled out with simple tests:

  • Core count - Their 8vCPU machine should have a max of 16Gb / sec, so that's not the problem.
  • Internal / external IP - This doesn't impact the throughput. Something else is keeping it arbitrarily low.
  • Region - Obviously we're crossing regions here, but that's kinda the point. We can't solve this by putting the box closer to the client.

So what's going on?

Bandwidth delay product

Like most modern operating systems, Linux now does a good job of auto-tuning the TCP buffers. In some cases, the default maximum Linux TCP buffer sizes are still too small. When this is the case, you can observe an effect called the Bandwidth Delay Product.

The TCP window is the maximum number of bytes that can be sent before the ACK must be received. If either the sender or receiver are frequently forced to stop and wait for ACKs for previously sent packets, gaps in the data flow are created, which limits the maximum throughput of the connection.

743351c79fe9b1a2.png

Finding the right window size

The optimal window size is twice the bandwidth delay product. You can compute the optimal window size if you know the RTT and the available bandwidth on both ends. There's lots of great resources which explain how to compute your window sizes. Rather than covering that here, check out this bandwidth calculator: https://www.switch.ch/network/tools/tcp_throughput/.

For Tutorama, we were able to determine their maximum available bandwidth, and the maximum anticipated latency, which we threw into one of the available calculators. Their tcp_rmem value is set to 125k; and the tcp_wmem to 64kb; then the test was re-run:

977b0b8317bb484e.png

2.10 - 2.20 MBits / sec is much better than what they were getting, but not as good as our default value (90 MBits / sec), to see why we looked at the default values for a new instance:

ccf87a3ccc4efa20.png

As such, it's generally a good idea to leave net.tcp_mem alone, as the defaults are fine. A number of performance experts say to also increase net.core.optmem_max to match net.core.rmem_max and net.core.wmem_max, but we have not found that makes any difference. Using the default window size usually provides the best bandwidth.

Change the window size

Your turn to try this out. In your lab, you have two instances that you'll compare. First you'll find the current value of your bandwidth delay product, then you'll change the size of your window and re-run the test to see the what happens.

In the Cloud Console, navigate to Navigation menu > Compute Engine and review instance-1 and instance-2.

2876aaf119238a78.png

Default TCP window and bandwidth

SSH into instance-1, then run this command:

cat /proc/sys/net/ipv4/tcp_rmem

This is the receiver.

Example Output

gcpstaging6945_student@instance-1:~$ cat /proc/sys/net/ipv4/tcp_rmem 4096 87380 6291456

This is the current value of your bandwidth delay product for instance-1.

SSH into instance-2, then run this command:

cat /proc/sys/net/ipv4/tcp_wmem

This is the sender.

Example Output

gcpstaging6945_student@instance-2:~$ cat /proc/sys/net/ipv4/tcp_wmem 4096 16384 4194304

This is the current value of your bandwidth delay product for instance-2.

Now find out the default bandwidth between these instances. Note the TCP window size when you iperf.

In the receiver's SSH window, run:

iperf -s

Example Output:

gcpstaging6945_student@instance-1:~$ iperf -s ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 85.3 KByte (default) ------------------------------------------------------------

In the sender's SSH window, run:

iperf -c <external ip address of receiver>

Example Output

gcpstaging6945_student@instance-2:~$ iperf -c 104.155.8.102 ------------------------------------------------------------ Client connecting to 104.155.8.102, TCP port 5001 TCP window size: 45.0 KByte (default) ------------------------------------------------------------ [ 3] local 10.40.0.2 port 47356 connected with 104.155.8.102 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 241 MBytes 202 Mbits/sec

In your lab, note what your bandwidth is.

Adjusted window and bandwidth

Now increase the window size, and see what happens to your bandwidth.

In the receiver's SSH window, run the following to adjust the tcp_rmem to the value below:

sudo sysctl -w net.ipv4.tcp_rmem="4096 65536 131072"

Example Output

gcpstaging6945_student@instance-1:~$ sudo sysctl -w net.ipv4.tcp_rmem="4096 65536 131072"

In the sender's SSH window run the following to adjust the tcp_wmem to the value below:

sudo sysctl -w net.ipv4.tcp_wmem="4096 65536 131072"

Example Output

gcpstaging6945_student@instance-2:~$ sudo sysctl -w net.ipv4.tcp_wmem="4096 65536 131072"

In the receiver's SSH window, change the window size to 64kb:

iperf -s -w 64kb

Example Output

gcpstaging6945_student@instance-1:~$ iperf -s -w 64kb ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 125 KByte (WARNING: requested 62.5 KByte) ------------------------------------------------------------

Run the following in the sender's SSH window:

iperf -c <external ip address of receiver>

Example Output

gcpstaging6945_student@instance-2:~$ iperf -c 104.155.8.102 ------------------------------------------------------------ Client connecting to 104.155.8.102, TCP port 5001 TCP window size: 64.0 KByte (default) ------------------------------------------------------------ [ 3] local 10.40.0.2 port 47344 connected with 104.155.8.102 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.4 sec 5.75 MBytes 4.63 Mbits/sec

After increasing the TCP window size, the new transfer rate and bandwidth are much slower.

If You Have More Time

In the Qwiklabs lab interface, under Student Resources on the left-hand side, you'll see links to videos related to this lab. They are worth watching!

a0e22e442ad89493.png

Congratulations!

85dcf3a5504a98e3.png

Finish Your Quest

This self-paced lab is part of the Qwiklabs Quest Network Performance and Optimization. A Quest is a series of related labs that form a learning path. Completing this Quest earns you the badge above, to recognize your achievement. You can make your badge (or badges) public and link to them in your online resume or social media account. Enroll in this Quest and get immediate completion credit if you've taken this lab. See other available Qwiklabs Quests.

Take Your Next Lab

Continue your Quest with Building High-throughput VPN, or check out these suggestions:

Next Steps / Learn More

Google Cloud training and certification

...helps you make the most of Google Cloud technologies. Our classes include technical skills and best practices to help you get up to speed quickly and continue your learning journey. We offer fundamental to advanced level training, with on-demand, live, and virtual options to suit your busy schedule. Certifications help you validate and prove your skill and expertise in Google Cloud technologies.

Manual Last Updated August 12, 2022
Lab Last Tested August 12, 2022