arrow_back

Build a Serverless App with Cloud Run that Creates PDF Files

Join Sign in

Build a Serverless App with Cloud Run that Creates PDF Files

1 hour 5 Credits

GSP644

Google Cloud self-paced labs logo

Pet Theory logo

Overview

For the labs in the Google Cloud Serverless Workshop: Pet Theory Quest, you will read through a fictitious business scenario and assist the characters with their serverless migration plan.

Twelve years ago, Lily started the Pet Theory chain of veterinary clinics. Pet Theory currently sends invoices in DOCX format to clients, but many clients have complained that they are unable to open them. To improve customer satisfaction, Lily has asked Patrick in IT to investigate an alternative to improve the current situation.

Pet Theory's Ops team is a single person, so they are keen to invest in a cost efficient solution that doesn't require a lot of ongoing maintenance. After analyzing the various processing options, Patrick decides to use Cloud Run.

Cloud Run is serverless, so it abstracts away all infrastructure management and lets you focus on building your application instead of worrying about overhead. As a Google serverless product, it is able to scale to zero, meaning it won't incur cost when not used. It also lets you use custom binary packages based on containers, which means building consistent isolated artifacts is now feasible.

In this lab you will build a PDF converter web app on Cloud Run that automatically converts files stored in Cloud Storage into PDFs stored in separate folders.

Architecture

This diagram gives you an overview of the services you will be using and how they connect to one another:

Architecture diagram

What you'll learn

In this lab, you will learn how to:

  • Convert a Node JS application to a container.
  • Build containers with Google Cloud Build.
  • Create a Cloud Run service that converts files to PDF files in the cloud.
  • Use event processing with Cloud Storage

Prerequisites

This is a fundamental level lab. This assumes familiarity with the console and shell environments. Experience with Firebase will be helpful, but it is not required. Before taking this lab it is recommended that you have completed the following Google Cloud Skills Boost labs before taking this one:

You should also be comfortable editing files. You can use your favorite text editor (like nano, vi, etc.) or you can launch the code editor from Cloud Shell, which can be found in the top ribbon:

Code Editor icon

Once you're ready, scroll down and follow the steps below to set up your lab environment.

Setup and requirements

Before you click the Start Lab button

Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources will be made available to you.

This hands-on lab lets you do the lab activities yourself in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials that you use to sign in and access Google Cloud for the duration of the lab.

To complete this lab, you need:

  • Access to a standard internet browser (Chrome browser recommended).
Note: Use an Incognito or private browser window to run this lab. This prevents any conflicts between your personal account and the Student account, which may cause extra charges incurred to your personal account.
  • Time to complete the lab---remember, once you start, you cannot pause a lab.
Note: If you already have your own personal Google Cloud account or project, do not use it for this lab to avoid extra charges to your account.

How to start your lab and sign in to the Google Cloud Console

  1. Click the Start Lab button. If you need to pay for the lab, a pop-up opens for you to select your payment method. On the left is the Lab Details panel with the following:

    • The Open Google Console button
    • Time remaining
    • The temporary credentials that you must use for this lab
    • Other information, if needed, to step through this lab
  2. Click Open Google Console. The lab spins up resources, and then opens another tab that shows the Sign in page.

    Tip: Arrange the tabs in separate windows, side-by-side.

    Note: If you see the Choose an account dialog, click Use Another Account.
  3. If necessary, copy the Username from the Lab Details panel and paste it into the Sign in dialog. Click Next.

  4. Copy the Password from the Lab Details panel and paste it into the Welcome dialog. Click Next.

    Important: You must use the credentials from the left panel. Do not use your Google Cloud Skills Boost credentials. Note: Using your own Google Cloud account for this lab may incur extra charges.
  5. Click through the subsequent pages:

    • Accept the terms and conditions.
    • Do not add recovery options or two-factor authentication (because this is a temporary account).
    • Do not sign up for free trials.

After a few moments, the Cloud Console opens in this tab.

Note: You can view the menu with a list of Google Cloud Products and Services by clicking the Navigation menu at the top-left. Navigation menu icon

Activate Cloud Shell

Cloud Shell is a virtual machine that is loaded with development tools. It offers a persistent 5GB home directory and runs on the Google Cloud. Cloud Shell provides command-line access to your Google Cloud resources.

  1. Click Activate Cloud Shell Activate Cloud Shell icon at the top of the Google Cloud console.

When you are connected, you are already authenticated, and the project is set to your PROJECT_ID. The output contains a line that declares the PROJECT_ID for this session:

Your Cloud Platform project in this session is set to YOUR_PROJECT_ID

gcloud is the command-line tool for Google Cloud. It comes pre-installed on Cloud Shell and supports tab-completion.

  1. (Optional) You can list the active account name with this command:

gcloud auth list
  1. Click Authorize.

  2. Your output should now look like this:

Output:

ACTIVE: * ACCOUNT: student-01-xxxxxxxxxxxx@qwiklabs.net To set the active account, run: $ gcloud config set account `ACCOUNT`
  1. (Optional) You can list the project ID with this command:

gcloud config list project

Output:

[core] project = <project_ID>

Example output:

[core] project = qwiklabs-gcp-44776a13dea667a6 Note: For full documentation of gcloud, in Google Cloud, refer to the gcloud CLI overview guide.

Task 1. Understanding the task

Pet theory would like to convert their invoices into PDFs so that customers can open them reliably. The team wants to accomplish this conversion automatically to minimize the workload for Lisa, the office manager.

Ruby, Pet Theory's computer consultant, gets a message from Patrick in IT...

Patrick

Patrick, IT Administrator

Hi Ruby,

I've done some research and found that LibreOffice is good at converting many different file formats to PDF.

Would it be possible to run LibreOffice in the cloud without having to maintain the servers?

Patrick

RubyRuby, Software Consultant

Hey Patrick,

I think I have just the thing for this type of situation.

I just watched a great video from Next 19 about Cloud Run on YouTube. It looks like we can run LibreOffice in a serverless environment with Cloud Run. No server maintenance is needed!

I'll send over some resources that will help you get set up.

Ruby

Help Patrick set up and deploy Cloud Run.

Task 2. Enable the Cloud Run API

  1. Open the Navigation menu (Navigation menu icon) and click APIs & Services > Library. In the search bar, enter "Cloud Run" and select the Cloud Run API from the results list.

  2. Click Enable and then hit the back button in your browser twice. Your Console should now resemble the following:

Cloud Run API result

Task 3. Deploy a simple Cloud Run service

Ruby has developed a Cloud Run prototype and would like Patrick to deploy it onto Google Cloud. Now help Patrick establish the PDF Cloud Run service for Pet Theory.

  1. Open a new Cloud Shell session and run the following command to clone the Pet Theory repository:

git clone https://github.com/rosera/pet-theory.git
  1. Then change your current working directory to lab03:

cd pet-theory/lab03
  1. Edit package.json with Cloud Shell Code Editor or your preferred text editor. In the "scripts" section, add "start": "node index.js", as shown below:

... "scripts": { "start": "node index.js", "test": "echo \"Error: no test specified\" && exit 1" }, ...
  1. Now run the following commands in Cloud Shell to install the packages that your conversion script will be using:
npm install express npm install body-parser npm install child_process npm install @google-cloud/storage
  1. Now open the lab03/index.js file and review the code.

The application will be deployed as a Cloud Run service that accepts HTTP POSTs. If the POST request is a Pub/Sub notification about an uploaded file, the service writes the file details to the log. If not, the service simply returns the string "OK".

  1. Review the file named lab03/Dockerfile.

The above file is called a manifest and provides a recipe for the Docker command to build an image. Each line begins with a command that tells Docker how to process the following information:

  • The first list indicates the base image should use node v12 as the template for the image to be created.

  • The last line indicates the command to be performed, which in this instance refers to "npm start".

  1. To build and deploy the REST API, use Google Cloud Build. Run this command to start the build process:

gcloud builds submit \ --tag gcr.io/$GOOGLE_CLOUD_PROJECT/pdf-converter

The command builds a container with your code and puts it in the Container Registry of your project.

  1. Return to the Cloud Console, open the navigation menu, and select Container Registry > Images. You should see your container hosted:

Container Registry with pdf-converter listed

Test completed task

Click Check my progress to verify that you've performed the above task.

Build simple a REST API
  1. Return to your code editor tab and in Cloud Shell run the following command to deploy your application:

gcloud run deploy pdf-converter \ --image gcr.io/$GOOGLE_CLOUD_PROJECT/pdf-converter \ --platform managed \ --region us-east1 \ --no-allow-unauthenticated \ --max-instances=1
  1. When the deployment is complete, you will see a message like this:
Service [pdf-converter] revision [pdf-converter-00001] has been deployed and is serving 100 percent of traffic at https://pdf-converter-[hash].a.run.app
  1. Create the environment variable $SERVICE_URL for the app so you can easily access it:

SERVICE_URL=$(gcloud beta run services describe pdf-converter --platform managed --region us-east1 --format="value(status.url)") echo $SERVICE_URL

Test completed task

Click Check my progress to verify that you've performed the above task.

Create a Revision for Cloud Run
  1. Make an anonymous POST request to your new service:

curl -X POST $SERVICE_URL

This will result in an error message saying "Your client does not have permission to get the URL". This is good; you don't want the service to be callable by anonymous users.

  1. Now try invoking the service as an authorized user:

curl -X POST -H "Authorization: Bearer $(gcloud auth print-identity-token)" $SERVICE_URL

If you get the response "OK" you have successfully deployed a Cloud Run service. Well done!

Task 4. Trigger your Cloud Run service when a new file is uploaded

Now that the Cloud Run service has been successfully deployed, Ruby would like Patrick to create a staging area for the data to be converted. The Cloud Storage bucket will use an event trigger to notify the application when a file has been uploaded and needs to be processed.

  1. Run the following command to create a bucket in Cloud Storage for the uploaded docs:

gsutil mb gs://$GOOGLE_CLOUD_PROJECT-upload
  1. And another bucker for the processed PDFs:
gsutil mb gs://$GOOGLE_CLOUD_PROJECT-processed
  1. Now return to your Cloud Console tab, open the Navigation menu and select Cloud Storage. Verify that the buckets have been created (there will be other buckets there as well that are used by the platform.)

Test completed task

Click Check my progress to verify that you've performed the above task.

Create two cloud storage buckets
  1. In Cloud Shell run the following command to tell Cloud Storage to send a Pub/Sub notification whenever a new file has finished uploading to the docs bucket:

gsutil notification create -t new-doc -f json -e OBJECT_FINALIZE gs://$GOOGLE_CLOUD_PROJECT-upload

The notifications will be labeled with the topic "new-doc".

Test completed task

Click Check my progress to verify that you've performed the above task.

Create a Pub/Sub topic for handling notifications from storage bucket
  1. Then create a new service account which Pub/Sub will use to trigger the Cloud Run services:

gcloud iam service-accounts create pubsub-cloud-run-invoker --display-name "PubSub Cloud Run Invoker"
  1. Give the new service account permission to invoke the PDF converter service:

gcloud beta run services add-iam-policy-binding pdf-converter --member=serviceAccount:pubsub-cloud-run-invoker@$GOOGLE_CLOUD_PROJECT.iam.gserviceaccount.com --role=roles/run.invoker --platform managed --region us-east1
  1. Find your project number by running this command:

gcloud projects list

Look for the project whose name starts with "qwiklabs-gcp-". You will be using the value of the Project Number in the next command.

Project number highlighted in output

  1. Create a PROJECT_NUMBER environment variable, replacing [project number] with the Project Number from the last command:

PROJECT_NUMBER=[project number]
  1. Then enable your project to create Cloud Pub/Sub authentication tokens:

gcloud projects add-iam-policy-binding $GOOGLE_CLOUD_PROJECT --member=serviceAccount:service-$PROJECT_NUMBER@gcp-sa-pubsub.iam.gserviceaccount.com --role=roles/iam.serviceAccountTokenCreator
  1. Finally, create a Pub/Sub subscription so that the PDF converter can run whenever a message is published on the topic "new-doc".

gcloud beta pubsub subscriptions create pdf-conv-sub --topic new-doc --push-endpoint=$SERVICE_URL --push-auth-service-account=pubsub-cloud-run-invoker@$GOOGLE_CLOUD_PROJECT.iam.gserviceaccount.com

Test completed task

Click Check my progress to verify that you've performed the above task.

Create a Pub/Sub subscription

Task 5. See if the Cloud Run service is triggered when files are uploaded to Cloud Storage

To verify the application is working as expected, Ruby asks Patrick to upload some test data to the named storage bucket and then check Cloud Logging.

  1. Copy some test files into your upload bucket:

gsutil -m cp gs://spls/gsp644/* gs://$GOOGLE_CLOUD_PROJECT-upload
  1. Once the upload is done, return to your Cloud Console tab, open the navigation menu, and select Logging from under the Operations section.

  2. In the first dropdown, filter your results to Cloud Run Revision and click Apply. Then click Run Query.

  3. In the Query results, look for a log entry that starts with file: and click it. It shows a dump of the file data that Pub/Sub sends to your Cloud Run service when a new file is uploaded.

  4. Can you find the name of the file you uploaded in this object?

Query results

Note: If you do not see any log entries that begin with "file", try clicking on the "load newer logs" button near the bottom of the page.
  1. Now return to the code editor tab and run the following command in Cloud Shell to clean up your upload directory by deleting the files in it:

gsutil -m rm gs://$GOOGLE_CLOUD_PROJECT-upload/*

Task 6. Docker containers

Patrick needs to convert a backlog of invoices to PDFs so all customers can open them. He emails Ruby for some help...

Patrick

Patrick, IT Administrator

Hi Ruby

Based on your findings, I think we can automate this process and also move to using PDF as the invoice format.

I spent a bit of time yesterday coding a solution and built a Node.js script to do what we need. Could you take a look?

Patrick

Patrick sends Ruby the code fragment he wrote to produce a PDF from a file:

const {promisify} = require('util'); const exec = promisify(require('child_process').exec); const cmd = 'libreoffice --headless --convert-to pdf --outdir ' + `/tmp "/tmp/${fileName}"`; const { stdout, stderr } = await exec(cmd); if (stderr) { throw stderr; }

Ruby responds back to Patrick...

Ruby

Ruby, Software Consultant

Hi Patrick

Cloud Run uses containers, so we need to provide your application in this format. For the next step we need to create a Dockerfile manifest for the application.

Your code uses LibreOffice. Can you send me the command for installing that software? I will need to include it in the container.

Ruby

Patrick

Patrick, IT Administrator

Hi Ruby

Awesome, here is how I usually install LibreOffice on servers in the office:

apt-get update -y && apt-get install -y libreoffice && apt-get clean

Let me know if you need any more information.

Patrick

Building the container will require the integration of a number of components:

Components: index.js, OfficeLibre, Express, body-parser, child_process, and @google-cloud/storage

Update the Docker container

With all the files identified, the Dockerfile can now be created. Help Ruby set up and deploy the container.

The package for LibreOffice was not included in the container before, which means it now needs to be added. Patrick has previously provided the commands he uses to build his application, Ruby will add these as a RUN command within the Dockerfile.

  • Open the Dockerfile manifest and add the command RUN apt-get update -y && apt-get install -y libreoffice && apt-get clean line as shown below:

FROM node:12 RUN apt-get update -y \ && apt-get install -y libreoffice \ && apt-get clean WORKDIR /usr/src/app COPY package.json package*.json ./ RUN npm install --only=production COPY . . CMD [ "npm", "start" ]

Deploy the new version of the pdf-conversion service

  1. Open the index.js file and add the following package requirements at the top of the file:

const {promisify} = require('util'); const {Storage} = require('@google-cloud/storage'); const exec = promisify(require('child_process').exec); const storage = new Storage();
  1. Replace the app.post('/', async (req, res) with the following code:

app.post('/', async (req, res) => { try { const file = decodeBase64Json(req.body.message.data); await downloadFile(file.bucket, file.name); const pdfFileName = await convertFile(file.name); await uploadFile(process.env.PDF_BUCKET, pdfFileName); await deleteFile(file.bucket, file.name); } catch (ex) { console.log(`Error: ${ex}`); } res.set('Content-Type', 'text/plain'); res.send('\n\nOK\n\n'); })
  1. Now add the following code that processes LibreOffice documents to the bottom of the file:

async function downloadFile(bucketName, fileName) { const options = {destination: `/tmp/${fileName}`}; await storage.bucket(bucketName).file(fileName).download(options); } async function convertFile(fileName) { const cmd = 'libreoffice --headless --convert-to pdf --outdir /tmp ' + `"/tmp/${fileName}"`; console.log(cmd); const { stdout, stderr } = await exec(cmd); if (stderr) { throw stderr; } console.log(stdout); pdfFileName = fileName.replace(/\.\w+$/, '.pdf'); return pdfFileName; } async function deleteFile(bucketName, fileName) { await storage.bucket(bucketName).file(fileName).delete(); } async function uploadFile(bucketName, fileName) { await storage.bucket(bucketName).upload(`/tmp/${fileName}`); }
  1. Ensure your index.js file looks like the following:
Note: To avoid any formatting errors, it's recommended you replace all of the code in your index.js file with this example code. const {promisify} = require('util'); const {Storage} = require('@google-cloud/storage'); const exec = promisify(require('child_process').exec); const storage = new Storage(); const express = require('express'); const bodyParser = require('body-parser'); const app = express(); app.use(bodyParser.json()); const port = process.env.PORT || 8080; app.listen(port, () => { console.log('Listening on port', port); }); app.post('/', async (req, res) => { try { const file = decodeBase64Json(req.body.message.data); await downloadFile(file.bucket, file.name); const pdfFileName = await convertFile(file.name); await uploadFile(process.env.PDF_BUCKET, pdfFileName); await deleteFile(file.bucket, file.name); } catch (ex) { console.log(`Error: ${ex}`); } res.set('Content-Type', 'text/plain'); res.send('\n\nOK\n\n'); }) function decodeBase64Json(data) { return JSON.parse(Buffer.from(data, 'base64').toString()); } async function downloadFile(bucketName, fileName) { const options = {destination: `/tmp/${fileName}`}; await storage.bucket(bucketName).file(fileName).download(options); } async function convertFile(fileName) { const cmd = 'libreoffice --headless --convert-to pdf --outdir /tmp ' + `"/tmp/${fileName}"`; console.log(cmd); const { stdout, stderr } = await exec(cmd); if (stderr) { throw stderr; } console.log(stdout); pdfFileName = fileName.replace(/\.\w+$/, '.pdf'); return pdfFileName; } async function deleteFile(bucketName, fileName) { await storage.bucket(bucketName).file(fileName).delete(); } async function uploadFile(bucketName, fileName) { await storage.bucket(bucketName).upload(`/tmp/${fileName}`); }
  • The main logic is housed in these functions:

const file = decodeBase64Json(req.body.message.data); await downloadFile(file.bucket, file.name); const pdfFileName = await convertFile(file.name); await uploadFile(process.env.PDF_BUCKET, pdfFileName); await deleteFile(file.bucket, file.name);

Whenever a file has been uploaded, this service gets triggered. It performs these tasks, one per line above:

  • Extracts the file details from the Pub/Sub notification.
  • Downloads the file from Cloud Storage to the local hard drive. This is actually not a physical disk, but a section of virtual memory that behaves like a disk.
  • Converts the downloaded file to PDF.
  • Uploads the PDF file to Cloud Storage. The environment variable process.env.PDF_BUCKET contains the name of the Cloud Storage bucket to write PDFs to. You will assign a value to this variable when you deploy the service below.
  • Deletes the original file from Cloud Storage.

The rest of index.js implements the functions called by this top-level code.

It's time to deploy the service, and to set the PDF_BUCKET environment variable. It's also a good idea to give LibreOffice 2 GB of RAM to work with (see the line with the --memory option).

  1. Run the following command to build the container:

gcloud builds submit \ --tag gcr.io/$GOOGLE_CLOUD_PROJECT/pdf-converter

Test completed task

Click Check my progress to verify that you've performed the above task.

Create another build for REST API
  1. Now deploy the latest version of your application:

gcloud run deploy pdf-converter \ --image gcr.io/$GOOGLE_CLOUD_PROJECT/pdf-converter \ --platform managed \ --region us-east1 \ --memory=2Gi \ --no-allow-unauthenticated \ --max-instances=1 \ --set-env-vars PDF_BUCKET=$GOOGLE_CLOUD_PROJECT-processed

With LibreOffice part of the container, this build will take longer than the previous one. This is a good time to get up and stretch for a few minutes.

Click Check my progress to verify the objective.

Create a new Revision

Task 7. Testing the pdf-conversion service

  1. Once the deployment commands finish, make sure that the service was deployed correctly by running:

curl -X POST -H "Authorization: Bearer $(gcloud auth print-identity-token)" $SERVICE_URL
  1. If you get the response "OK" you have successfully deployed the updated Cloud Run service. LibreOffice can convert many file types to PDF: DOCX, XLSX, JPG, PNG, GIF, etc.

  2. Run the following command to upload some example files:

gsutil -m cp gs://spls/gsp644/* gs://$GOOGLE_CLOUD_PROJECT-upload
  1. Return to the Cloud Console, open the Navigation menu and select Cloud Storage. Open the -upload bucket and click on the Refresh button a couple of times to see how the files are deleted, one by one, as they are converted to PDFs.

  2. Then click Buckets from the left menu, and click on the bucket whose name ends in "-processed". It should contain PDF versions of all files. Feel free to open the PDF files to make sure they were properly converted:

Note: Re-run the command if you don't see all the converted PDF files in -processed bucket.

Congratulations!

Pet Theory now has a system for converting their archive of old files to PDFs. By simply uploading the old files to the "upload" bucket, the pdf-converter service converts them and writes them as PDFs to the "processed" bucket.

Finish your quest

This self-paced lab is part of the Google Cloud Run Serverless Workshop quest. A quest is a series of related labs that form a learning path. Completing this quest earns you a badge to recognize your achievement. You can make your badge or badges public and link to them in your online resume or social media account. Enroll in this quest and get immediate completion credit. Refer to the Google Cloud Skills Boost catalog for all available quests.

Take your next lab

Continue your quest with the next lab in the series, Build a Resilient, Asynchronous System with Cloud Run and Pub/Sub.

Next steps / Learn more

Learn more about Serverless Containers: Next 19 video on YouTube

Google Cloud training and certification

...helps you make the most of Google Cloud technologies. Our classes include technical skills and best practices to help you get up to speed quickly and continue your learning journey. We offer fundamental to advanced level training, with on-demand, live, and virtual options to suit your busy schedule. Certifications help you validate and prove your skill and expertise in Google Cloud technologies.

Manual Last Updated September 23, 2022

Lab Last Tested September 23, 2022

Copyright 2022 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.