arrow_back

Creating a De-identified Copy of Data in Cloud Storage

Join Sign in
Test and share your knowledge with our community!
done
Get access to over 700 hands-on labs, skill badges, and courses

Creating a De-identified Copy of Data in Cloud Storage

Lab 45 minutes universal_currency_alt 1 Credit show_chart Introductory
Test and share your knowledge with our community!
done
Get access to over 700 hands-on labs, skill badges, and courses

GSP1073

Google Cloud self-paced labs logo

Overview

Cloud Data Loss Prevention (Cloud DLP) is a fully managed service designed to help discover, classify, and protect sensitive information. In this lab, you create and run a Cloud DLP job using the De-identify (DeID) Findings Action to create a redacted and de-identified copy of some data in Cloud Storage. You also learn how to create a de-identification template to define how to redact the data.

De-identification process overview diagram showing Input and Output buckets

As part of the setup process of this lab, an "input" Cloud Storage bucket with sample folders and files, as well as an "output" Cloud Storage bucket for the redacted data have been created for you.

Objectives

In this lab, you:

  • Create a Cloud DLP de-identification template for structured and unstructured data
  • Configure a Cloud DLP Inspection Job Trigger with De-identify Findings Action enabled
  • Create a Cloud DLP Inspection Job
  • View results of the inspection job and view new de-identified files in Cloud Storage

Setup and requirements

Before you click the Start Lab button

Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources will be made available to you.

This hands-on lab lets you do the lab activities yourself in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials that you use to sign in and access Google Cloud for the duration of the lab.

To complete this lab, you need:

  • Access to a standard internet browser (Chrome browser recommended).
Note: Use an Incognito or private browser window to run this lab. This prevents any conflicts between your personal account and the Student account, which may cause extra charges incurred to your personal account.
  • Time to complete the lab---remember, once you start, you cannot pause a lab.
Note: If you already have your own personal Google Cloud account or project, do not use it for this lab to avoid extra charges to your account.

How to start your lab and sign in to the Google Cloud console

  1. Click the Start Lab button. If you need to pay for the lab, a pop-up opens for you to select your payment method. On the left is the Lab Details panel with the following:

    • The Open Google Cloud console button
    • Time remaining
    • The temporary credentials that you must use for this lab
    • Other information, if needed, to step through this lab
  2. Click Open Google Cloud console (or right-click and select Open Link in Incognito Window if you are running the Chrome browser).

    The lab spins up resources, and then opens another tab that shows the Sign in page.

    Tip: Arrange the tabs in separate windows, side-by-side.

    Note: If you see the Choose an account dialog, click Use Another Account.
  3. If necessary, copy the Username below and paste it into the Sign in dialog.

    {{{user_0.username | "Username"}}}

    You can also find the Username in the Lab Details panel.

  4. Click Next.

  5. Copy the Password below and paste it into the Welcome dialog.

    {{{user_0.password | "Password"}}}

    You can also find the Password in the Lab Details panel.

  6. Click Next.

    Important: You must use the credentials the lab provides you. Do not use your Google Cloud account credentials. Note: Using your own Google Cloud account for this lab may incur extra charges.
  7. Click through the subsequent pages:

    • Accept the terms and conditions.
    • Do not add recovery options or two-factor authentication (because this is a temporary account).
    • Do not sign up for free trials.

After a few moments, the Google Cloud console opens in this tab.

Note: To view a menu with a list of Google Cloud products and services, click the Navigation menu at the top-left. Navigation menu icon

Task 1. Create de-identify templates

Create a template for unstructured data

In this section, you configure and create a de-identify template for unstructured data.

  1. In the Google Cloud console, in the Navigation menu (Navigation menu), click Security > Data Loss Prevention.

  2. Click the Configuration tab.

  3. In the Configuration > Templates section, click Create Template.

create template

  1. On the Create Template page, define the following options:

    • For Template type, select De-identify (remove sensitive data).
    • Fo Data transformation type select InfoType
    • For Template ID, enter deid_unstruct1
    • For Display name, enter deid_unstruct1 template
    • For Description, leave the field empty.
    • For Resource location, use the default setting Global (any region).
  2. Click Continue.

  3. For the Transformation Rule select Replace with infoType name.

  4. For InfoTypes to transform select Any detected infoTypes defined in an inspection template or inspect config that are not specified in other rules.

configure deidentification

  1. Click Create.

Click Check my progress to verify the objective. Create a Template for Unstructured Data

Create a template for structured data

In this section, you configure and create a de-identify template for structured data.

  1. In the Google Cloud console, navigate back to the Data Loss Prevention page.

  2. Click the Configuration tab.

  3. In the Configuration > Templates section, click Create Template.

create template

  1. On the Create Template page, define the following options:

    • For Template type, select De-identify (remove sensitive data).
    • Fo Data transformation type select Record
    • For Template ID, enter deid_struct1
    • For Display name, enter deid_struct1 template
    • For Description, leave the field empty.
    • For Resource location, use the default setting Global (any region).
  2. Click Continue.

  3. For the Transformation Rule add the following field names: ssn ccn email vin id agent_id user_id.

configure structured deidentification

  1. For the Transformation type select Primitive field transformation.

  2. For Transformation method select Replace. This replaces the contents of every cell for fields that match any in the list you provided.

  3. Click + Add Transformation Rule.

  4. For this new rule add the field: message.

  5. For the Transformation type select Match on infoType and click Add Transformation.

  6. For the Transformation Method select Replace with infoType name.

  7. For InfoTypes to transform select Any detected infoTypes defined in an inspection template or inspect config that are not specified in other rules. This applies infoType inspection and redaction to any files with a field called message.

configure structured deidentification final config

  1. Click Create.

Click Check my progress to verify the objective. Create a template for structured data

Task 2. Create a DLP inspection job trigger

  1. In the Google Cloud console, navigate back to the Data Loss Prevention page.

  2. Click the Inspection tab.

  3. Click Create Job and Job Triggers.

  4. To configure input data, do the following:

    • In the Name section, define the following options:
      • For Job ID, enter DeID_Storage_Demo1.
      • Keep Resource location set to Global (any region).
    • In the Storage type list, select Google Cloud Storage, and then define the following options:
      • For the Location Type select Scan a bucket with optional include/exclude rules.
      • For the URL enter:
      • Set “Percentage of included objects scanned within the bucket” to 100% and select No Sampling
Note: For the storage bucket URL, make sure there are no whitespaces.

configure DLP job

  1. Leave the rest of the fields as default and click Continue.

  2. Under Configure detection, leave all fields as default and click Continue.

  3. Under Add Actions, toggle to enable Make a de-identify copy.

  4. Enter the two templates that you created above in the respective boxes:

    • projects//locations/global/deidentifyTemplates/deid_unstruct1
    • projects//locations/global/deidentifyTemplates/deid_struct1
Note: make sure there are no spaces in the de-identification template paths.

DLP templates

  1. For the Cloud Storage output location specify:

This specifies to write the redacted output to the second bucket that was created for you.

  1. Click Continue.

  2. For Schedule, select Create a trigger to run the job on a periodic schedule and select Weekly.

  3. Click Continue.

  4. Scroll down and click Create > Confirm Create.

  5. You should now have a job under Inspection > Job Triggers.

job starting

Click Check my progress to verify the objective. Create a DLP Inspection Job Trigger

Task 3. Run DLP Inspection and review results

  1. In the Google Cloud console, navigate back to the Data Loss Prevention page.

  2. Click the Inspection tab.

  3. Under Job Triggers you should see the job trigger that you created.

  4. Select this job trigger.

  5. Click Run Now.

  6. This creates and runs a new job instance.

  7. Select the job instance from the section below triggered jobs.

Note: If you do not see a job, you may need to refresh the screen or wait a minute and refresh.
  1. Monitor the job and wait for it to say Done.

  2. Once Done, review the results on this page to see what was found in the bucket.

Great! You should see your findings populated and an overview of your job results at the bottom.

DLP job complete

View de-identified output

  1. On the job results page, click on Configuration.

  2. Scroll down to the section Output bucket for de-identified Cloud Storage Data.

  3. Click on the bucket link to be taken to that Cloud Storage Bucket.

  4. Explore the various folders and files to see what has been redacted. For example click on one of the image in the image folder should show something like:

redacted image

For further exploration, you can try the following:

  • Change the settings in the de-identification templates to try out different ways to de-identify and transform data. See the transformation reference here. You can also try turning on different tokenization or pseudonymization methods using Cloud KMS.
  • Try editing the DLP Job Trigger and adjusting what kind of data is being inspected for and then run another job by clicking “Run Now” from the triggers page. For example, if you turn off PERSON_NAME detection, the names should no longer be redacted.

Congratulations!

Congratulations, in this lab you created Cloud DLP de-identification templates for structured and unstructured data, configured a Job Trigger with De-identify Findings Action enabled, created an Inspection Job, and viewed the results of the inspection job.

Next steps / Learn more

Be sure to check out the following documentation for more practice with Cloud DLP:

Google Cloud training and certification

...helps you make the most of Google Cloud technologies. Our classes include technical skills and best practices to help you get up to speed quickly and continue your learning journey. We offer fundamental to advanced level training, with on-demand, live, and virtual options to suit your busy schedule. Certifications help you validate and prove your skill and expertise in Google Cloud technologies.

Manual Last Updated May 25, 2023

Lab Last Tested May 25, 2023

Copyright 2024 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.