arrow_back

Omnibond: Creating an HPC Environment in Google Cloud with CloudyCluster

Join Sign in

Omnibond: Creating an HPC Environment in Google Cloud with CloudyCluster

1 hour 30 minutes 5 Credits

This lab was developed with our partner, Omnibond. Your personal information may be shared with Omnibond, the lab sponsor, if you have opted-in to receive product updates, announcements, and offers in your Account Profile.

GSP862

Google Cloud Self-Paced Labs

Overview

In this lab, you create a complete turn-key High Performance Computing (HPC) environment in Google Cloud. This environment will provide the familiar look and feel of on-prem HPC systems but with the added elasticity and scalability of Google Cloud.

In this lab you see how CloudyCluster can easily create HPC/HTC jobs that will run on-prem or in CloudyCluster on Google Cloud. You can rely on the familiar look and feel of a standard HPC environment while embracing the capabilities and elasticity of Google Cloud. The HPC jobs can be easily configured to support many instance types including GPU, preemptible, and any number of memory & CPU configurations. You will always have the latest computational technology at your fingertips.

With CloudyCluster, users can now take advantage of the GUI developed by Open OnDemand. In this lab you will be able to experience many of the Open OnDemand HPC tools. Upload and download files with a file browser-like interface. Draft job scripts with the built-in web editor. Spin-up new computing instances and have them tear down automatically after your specified work window. The current release includes JupyterLab via JupyterHub and Cloudy desktop capabilities.

As part of Google Cloud and CloudyCluster you have a vast array of storage technologies available to you. In this lab you will take advantage of High Performance Parallel Storage for job execution.

Objectives

In this lab, you will learn how to perform the following tasks:

  • Set up Cloud Firestore
  • Create a project-wide SSH Key
  • Use Cloud Shell to run a script to create a Service Account with the necessary permissions for the CloudyCluster Control Instance
  • Launch CloudyCluster
  • Launch the Control Instance from the Google Cloud Marketplace
  • Configure the Control Instance
  • HPC Environment Setup
  • HPC Job Execution
  • Setup Sample HPC Job
  • Edit the Sample Job
  • Launch the Sample Job
  • Deleting CloudyCluster
  • HPC Environment Deletion
  • Control Instance Deletion

Prerequisites

Familiarity with HPC concepts and batch processes will help understand the goals. This lab can also be used in conjunction with courses and workshops that require an introduction to HPC.

The following labs will help provide the Google Cloud foundations for this lab:

Setup and Requirements

Before you click the Start Lab button

Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources will be made available to you.

This hands-on lab lets you do the lab activities yourself in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials that you use to sign in and access Google Cloud for the duration of the lab.

What you need

To complete this lab, you need:

  • Access to a standard internet browser (Chrome browser recommended).
  • Time to complete the lab.

Note: If you already have your own personal Google Cloud account or project, do not use it for this lab.

Note: If you are using a Chrome OS device, open an Incognito window to run this lab.

How to start your lab and sign in to the Google Cloud Console

  1. Click the Start Lab button. If you need to pay for the lab, a pop-up opens for you to select your payment method. On the left is a panel populated with the temporary credentials that you must use for this lab.

    Open Google Console

  2. Copy the username, and then click Open Google Console. The lab spins up resources, and then opens another tab that shows the Sign in page.

    Sign in

    Tip: Open the tabs in separate windows, side-by-side.

  3. In the Sign in page, paste the username that you copied from the left panel. Then copy and paste the password.

    Important: You must use the credentials from the left panel. Do not use your Google Cloud Training credentials. If you have your own Google Cloud account, do not use it for this lab (avoids incurring charges).

  4. Click through the subsequent pages:

    • Accept the terms and conditions.
    • Do not add recovery options or two-factor authentication (because this is a temporary account).
    • Do not sign up for free trials.

After a few moments, the Cloud Console opens in this tab.

Activate Cloud Shell

Cloud Shell is a virtual machine that is loaded with development tools. It offers a persistent 5GB home directory and runs on the Google Cloud. Cloud Shell provides command-line access to your Google Cloud resources.

In the Cloud Console, in the top right toolbar, click the Activate Cloud Shell button.

Cloud Shell icon

Click Continue.

cloudshell_continue.png

It takes a few moments to provision and connect to the environment. When you are connected, you are already authenticated, and the project is set to your PROJECT_ID. For example:

Cloud Shell Terminal

gcloud is the command-line tool for Google Cloud. It comes pre-installed on Cloud Shell and supports tab-completion.

You can list the active account name with this command:

gcloud auth list

(Output)

Credentialed accounts: - <myaccount>@<mydomain>.com (active)

(Example output)

Credentialed accounts: - google1623327_student@qwiklabs.net

You can list the project ID with this command:

gcloud config list project

(Output)

[core] project = <project_ID>

(Example output)

[core] project = qwiklabs-gcp-44776a13dea667a6

Google Cloud Project Setup

To start, you will set up the project for Native Cloud Firestore Mode. Follow these steps in the Google Cloud Console to activate it.

  1. In the Navigation Menu Navigation menu, under Databases, click Firestore.

  2. Click Select Native Mode.

  3. Next you will choose your location. For the purposes of this lab, choose nam5 (United States).

  4. Lastly, click Create Database. Once your database has been created, you are ready to continue to the next sections.

Create a project-wide SSH Key

CloudyCluster uses project-wide SSH keys to provide debugging access to the instances. For more information on how to configure them, you can refer to the Managing SSH Keys in Metadata page.

  1. Open a new Cloud Shell window.

  2. In Cloud Shell, use ssh-keygen to generate a new key pair. Replace [USERNAME] with your Google Cloud login. If you are unclear what your username is, use the output of the whoami command in your Cloud Shell as your [USERNAME]. For the lab, it's okay to use an empty passphrase.

ssh-keygen -t rsa -f ~/.ssh/vm-ssh-key -C [USERNAME]

Import the public key to Google Cloud

  1. In Cloud Shell, register your public key with Google Cloud:

gcloud compute config-ssh --ssh-key-file=~/.ssh/vm-ssh-key

You can ignore the warning No host aliases were added... because the command also attempts to update Compute Engine VM instances, but no instances have been created yet.

  1. In the Navigation Menu Navigation menu, go to Compute Engine > Metadata.

  2. Click SSH Keys. Verify the SSH key exists with your student username.

Create a Service Account for the CloudyCluster Control Instance

  1. Back in Cloud Shell, run the following commands to first set some environment variables:

export PROJECT=$(gcloud config get-value project) export NAME="omnibond-sa"
  1. Next, run the following commands to create and assign roles to the new service account.

# Create the service account gcloud iam service-accounts create $NAME --project=$PROJECT --display-name "CloudyCluster service account" # Add the Datastore Permissions gcloud projects add-iam-policy-binding $PROJECT --member serviceAccount:$NAME@$PROJECT.iam.gserviceaccount.com --role roles/datastore.user # Add the Network Admin Permissions gcloud projects add-iam-policy-binding $PROJECT --member serviceAccount:$NAME@$PROJECT.iam.gserviceaccount.com --role roles/compute.networkAdmin # Add the Security Admin Permissions gcloud projects add-iam-policy-binding $PROJECT --member serviceAccount:$NAME@$PROJECT.iam.gserviceaccount.com --role roles/compute.securityAdmin # Add the Project IAM Admin Permissions gcloud projects add-iam-policy-binding $PROJECT --member serviceAccount:$NAME@$PROJECT.iam.gserviceaccount.com --role roles/resourcemanager.projectIamAdmin # Add the Service Account Admin Permissions gcloud projects add-iam-policy-binding $PROJECT --member serviceAccount:$NAME@$PROJECT.iam.gserviceaccount.com --role roles/iam.serviceAccountAdmin # Add the Service Account User Permissions gcloud projects add-iam-policy-binding $PROJECT --member serviceAccount:$NAME@$PROJECT.iam.gserviceaccount.com --role roles/iam.serviceAccountUser # Add the Compute Instance Admin Permissions gcloud projects add-iam-policy-binding $PROJECT --member serviceAccount:$NAME@$PROJECT.iam.gserviceaccount.com --role roles/compute.instanceAdmin.v1 # Add Stackdriver Permission gcloud projects add-iam-policy-binding $PROJECT --member serviceAccount:$NAME@$PROJECT.iam.gserviceaccount.com --role roles/logging.logWriter # Add Storage Admin Permission gcloud projects add-iam-policy-binding $PROJECT --member serviceAccount:$NAME@$PROJECT.iam.gserviceaccount.com --role roles/storage.admin

Click Check my progress to verify the objective. Google Cloud Project Setup

Launch CloudyCluster

  1. Click the left navigation menu and select Marketplace.

a8cc7c2371b94d3e.png

  1. Search for CloudyCluster and click on the results:

509c5662ef189b35.png

  1. From the CloudyCluster offering in the Google Cloud Marketplace, select Launch.

bf9b439b204d44fd.png

  1. In the deployment setup page, keep the default values, and put the service account name you just created into the Script Created SA field.

  2. Select Allow CloudyCluster to use the Scopes of the Script Created SA.

sa-deployment.png

  1. Accept the GCP Terms of Service and click Deploy

  2. Once the deployment has finished, select the Admin URL link.

Note: the Admin URL link can take up to 5 minutes to become active.

Click Check my progress to verify the objective. Launch CloudyCluster

Configure the Control Instance

After you have launched the Image, you will complete the following initial setup tasks. It takes about 5 minutes for the instance to come up and fully self-configure. Once it does, you will see the following:

281dec6d6eee2f6.png

  1. Once DNS propagates, you will be taken to the following page and prompted to provide the custom metadata startup_key to authenticate that you launched the instance.

d9bce343952b0254.png

  1. Back in the Google Cloud console, click on the VM instance name.

8977fda3b1787ac1.png

  1. Scroll down and you will see the Custom Metadata Field. Copy and paste the unique value into the CloudyCluster prompt.

aaba8eced01180e4.png 41b62b86a4b528ad.png

  1. Next, provide the following Admin User information to complete the initial setup (all information you provide is used only in CloudyCluster and never leaves your Google Cloud project):

ad3962bd0ffc3467.png

  1. Create a CloudyCluster username and password to be encrypted and stored in the database in your Google Cloud project. Your password must contain at least 8 characters and include a capital letter and a number.

95f0c0586b7813e4.png

Note: make sure to save these CloudyCluster credentials somewhere as you will be using them later in the lab.

You have successfully completed the initial control instance configuration.

HPC Environment Setup

In this section you will follow the steps below to create an HPC Environment.

  1. Click Overview. On the Environment Overview page, click the + button at the top left to create a new environment.

7b0681f92bdd3bbe.png

  1. Use the pulldown menu to select options for Availability Zone and Instance SSH Key. For these use, us-central1-a and the SSH key you created earlier.

  2. Enter a Name for your Environment. Environment names must be alphanumeric, between 3 and 30 characters long, and may not contain spaces or special characters. For this lab, you can use envname.

f2aa841796b57884.png

  1. Select the radio button for one of the pre-configured setup options. For this lab you will use Slurm and Test.

  2. Select the Quick Start link at the bottom of the screen.

  • Use the pre-populated Scheduler Name or assign a name and select a Scheduler Instance Type and Scheduler Type
  • Select a Compute Instance Type of n1-standard-1
  • Accept the default file system name and mountpoint
  • Select your storage Instance Type of n1-standard-1
  • For number of storage instances, select 1.
  • Accept the default total Parallel Storage size in gigabytes. This storage space will be divided among your file system instances
  • Enter a Login Instance Name and Instance Type of n1-standard-1
  • Click the Next arrow button to proceed.

89b399225511594b.png

  1. The Final Review screen will display. You will see a summary of all instances to be created, with a cost per hour for each type and a total cost per hour for your HPC environment (not including charges for network egress).

50a8234844461b0c.png

  1. Enter a network CIDR representing an IP address range that is permitted to access your environment, for this lab enter 0.0.0.0/0.

  2. After you have carefully reviewed your environment configuration, making sure the n1-standard-1 is selected for all instance types (this lab is restricted to the types of instances that can be used), then select Create Environment.

8dfd8a38286cc043.png

4214b05262fec629.png]

Note: the environment will take roughly 15 minutes to deploy. You can continue on to the next section to see what is going on behind the scenes and browse more resources while you wait.

Click Check my progress to verify the objective. HPC Environment Setup

Behind the Scenes

While you wait for the environment to deploy, you can watch the following video. This video goes behind the scenes and provides more details on the architecture behind CloudyCluster, including how it configures your own Private & Secure HPC/HTC environment in Google Cloud.

You can also check out the following resources, which are also linked in the Student Resources lab panel on the left.

HPC Job Execution

Set up sample HPC Job

  1. After you environment has been created, click to the Access tab and select the link for Open OnDemand. Log in with the CloudyCluster credentials you created earlier.

60fe8bc6210aec43.png

  1. Proceed to Clusters and select sched Shell Access for a terminal session (sched was the scheduler name from the setup step).

5a22aaa350c8286e.png

  1. Select Yes and enter your password.

df64bf8a5ed7b1ff.png

  1. To set up the sample job, copy the sample jobs to the shared file system and switch into the correct directory.

cp -R /software/samplejobs/ /mnt/orangefs/ cd /mnt/orangefs/samplejobs/mpi/GCP

Edit the Sample Job

  1. Edit the sample job you would like to run, you can do this using vim, or the Open OnDemand file editor.

93b9fed81ec1ef7d.png

  1. Navigate to /mnt/orangefs/samplejobs/mpi/GCP. Select: mpi_prime.sh then click the Edit button:

be898f2d90c0d0e7.png

  1. Add the following command to the file below #!./bin/bash

#CC -gcpit n1-standard-2

This command is a ccq directive that sets the job to use a specific instance type. The other #CC directives can be found in the CloudyCluster documentation.

  1. Remove the extra comment (#) from the scheduler you are running.

  2. For this lab, change the following lines to match.

#Uncomment this section for use with Slurm HPC Scheduler #SBATCH -N 1 #SBATCH --ntasks-per-node=2

97240f9af3776c43.png

mpirun -np 2 $SHARED_FS_NAME/samplejobs/mpi/mpi_prime

f34505568280acea.png

  1. The complete file will look like this, click Save.

update-script.png

Compile the mpi_prime binary

  1. In the terminal window, navigate to /mnt/orangefs/samplejobs/mpi/GCP/.

  2. Run the following command:

mpi_prime_compile.sh

Launch the Sample Job

  1. In the same directory, launch the job by running the ccqsub command.

ccqsub mpi_prime.sh

a70a92c09a3cbca7.png

  1. Check the status with ccqstat:

ccqstat

cqstat.png

You can also use the command watch ccqstat to see it go through the stages. You can use CTRL + C to stop that command.

Click Check my progress to verify the objective. HPC Job Execution

After job submission the job will go through the following states:

  • Pending state the job is waiting to be processed by CCQ
  • CreatingCG state it is Creating the instances
  • Provisioning state it is doing final node setup
  • CCQueued state the job is in the process of being submitted to the scheduler
  • Running state when it is handed off to the scheduler, which is SLURM in this job
  • Completed state it has finished running successfully and the instances will tear down automatically
  • Error state the job encountered an error during execution. The error will be logged in the Administration -> Errors tab within the CloudyCluster UI

When operating in your own project, you can pause your HPC Environment if you do not have any more jobs to run. Once everything is paused, you can also stop the control node from the Google Cloud Console and restart it when you are ready to run jobs again.

Deleting CloudyCluster

HPC Environment Deletion

  1. Delete or terminate any HPC Environments you have running by selecting the trash can icon in the upper right corner. This will delete the whole cluster.

3036daa2ce0e0ec.png

  1. All data associated with this cluster will be removed, including any Cloud Storage or OrangeFS Storage. Make sure you back up any needed data to a different location.

8b8df0bf2909ca76.png d22f5c2612e76347.png

Control Instance Deletion

c820d0c89811dabe.png

76bfcdd1e5539d8e.png

b6ab2aae5504b052.png

Congratulations!

In this lab you launched CloudyCluster and the Control Instance from the Google Cloud Marketplace. You then configured the Control Instance, set up an HPC environment and executed a job, set up, edited and launched sample HPC jobs, and deleted the CloudyCluster, HPC environment, and Control Instance.

Next Steps / Learn More

Be sure to check out the following for more practice with Omnibond:

Google Cloud Training & Certification

...helps you make the most of Google Cloud technologies. Our classes include technical skills and best practices to help you get up to speed quickly and continue your learning journey. We offer fundamental to advanced level training, with on-demand, live, and virtual options to suit your busy schedule. Certifications help you validate and prove your skill and expertise in Google Cloud technologies.

Manual Last Updated September 29, 2021

Lab Last Tested September 29, 2021

Copyright 2021 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.