arrow_back

Creating a De-identified Copy of Data in Cloud Storage

Join Sign in

Creating a De-identified Copy of Data in Cloud Storage

45 minutes 1 Credit

GSP1073

Google Cloud self-paced labs logo

Overview

Cloud Data Loss Prevention (Cloud DLP) is a fully managed service designed to help discover, classify, and protect sensitive information. In this lab, you will learn how to create and run a Cloud DLP job using the DeID Findings Action to create a redacted and de-identified copy of some data in Cloud Storage. As part of this, you will also learn how to create a Deidentify Template to define how to redact the data.

overview diagram

As part of the setup process of this lab, an "input" Cloud Storage bucket with sample folders and files, as well as an "output" Cloud Storage bucket for the redacted data have been created for you.

Objectives

In this lab, you will:

  • Create a Cloud DLP De-Identification Template for structured and unstructured data

  • Configure a Cloud DLP Inspection Job Trigger with De-identify Findings Action enabled

  • Create a Cloud DLP Inspection Job

  • View results of the inspection job and view new de-identified files in Cloud Storage

Setup and Requirements

Before you click the Start Lab button

Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources will be made available to you.

This hands-on lab lets you do the lab activities yourself in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials that you use to sign in and access Google Cloud for the duration of the lab.

To complete this lab, you need:

  • Access to a standard internet browser (Chrome browser recommended).
Note: Use an Incognito or private browser window to run this lab. This prevents any conflicts between your personal account and the Student account, which may cause extra charges incurred to your personal account.
  • Time to complete the lab---remember, once you start, you cannot pause a lab.
Note: If you already have your own personal Google Cloud account or project, do not use it for this lab to avoid extra charges to your account.

How to start your lab and sign in to the Google Cloud Console

  1. Click the Start Lab button. If you need to pay for the lab, a pop-up opens for you to select your payment method. On the left is the Lab Details panel with the following:

    • The Open Google Console button
    • Time remaining
    • The temporary credentials that you must use for this lab
    • Other information, if needed, to step through this lab
  2. Click Open Google Console. The lab spins up resources, and then opens another tab that shows the Sign in page.

    Tip: Arrange the tabs in separate windows, side-by-side.

    Note: If you see the Choose an account dialog, click Use Another Account.
  3. If necessary, copy the Username from the Lab Details panel and paste it into the Sign in dialog. Click Next.

  4. Copy the Password from the Lab Details panel and paste it into the Welcome dialog. Click Next.

    Important: You must use the credentials from the left panel. Do not use your Google Cloud Skills Boost credentials. Note: Using your own Google Cloud account for this lab may incur extra charges.
  5. Click through the subsequent pages:

    • Accept the terms and conditions.
    • Do not add recovery options or two-factor authentication (because this is a temporary account).
    • Do not sign up for free trials.

After a few moments, the Cloud Console opens in this tab.

Note: You can view the menu with a list of Google Cloud Products and Services by clicking the Navigation menu at the top-left. Navigation menu icon

Create De-identify Templates

Create a template for unstructured data

In this section, you will configure and create a de-identify template for unstructured data.

  1. In the Google Cloud Console, in the Navigation menu (Navigation menu), click Security > Data Loss Prevention.

  2. Click the Configuration tab.

  3. In the Configuration > Templates section, click Create Template.

create template

  1. On the Create Template page, define the following options:

    • For Template type, select De-identify (remove sensitive data).
    • Fo Data transformation type select InfoType
    • For Template ID, enter deid_unstruct1
    • For Display name, enter deid_unstruct1 template
    • For Description, leave the field empty.
    • For Resource location, use the default setting Global (any region).
  2. Click Continue.

  3. For the Transformation Rule select Replace with infoType name.

  4. For InfoTypes to transform select Any detected infoTypes defined in an inspection template or inspect config that are not specified in other rules.

configure deidentification

  1. Click Create.

Click Check my progress to verify the objective. Create a Template for Unstructured Data

Create a template for structured data

In this section, you will configure and create a de-identify template for structured data.

  1. In the Google Cloud Console, navigate back to the Data Loss Prevention page.

  2. Click the Configuration tab.

  3. In the Configuration > Templates section, click Create Template.

create template

  1. On the Create Template page, define the following options:

    • For Template type, select De-identify (remove sensitive data).
    • Fo Data transformation type select Record
    • For Template ID, enter deid_struct1
    • For Display name, enter deid_struct1 template
    • For Description, leave the field empty.
    • For Resource location, use the default setting Global (any region).
  2. Click Continue.

  3. For the Transformation Rule add the following field names: ssn ccn email vin id agent_id user_id.

configure structured deidentification

  1. For the Transformation type select Primitive field transformation.

  2. For Transformation method select Replace. This will replace contents of every cell for fields that match any in the list you provided.

  3. Click + Add Transformation Rule.

  4. For this new rule add the field: message.

  5. For the Transformation type select Match on infoType and click Add Transformation.

  6. For the Transformation Method select Replace with infoType name.

  7. For InfoTypes to transform select Any detected infoTypes defined in an inspection template or inspect config that are not specified in other rules. This will apply infoType inspection and redaction to any files with a field called message.

configure structured deidentification final config

  1. Click Create.

Click Check my progress to verify the objective. Create a Template for Structured Data

Create a DLP Inspection Job Trigger

  1. In the Google Cloud Console, navigate back to the Data Loss Prevention page.

  2. Click the Inspection tab.

  3. Click Create Job and Job Triggers.

  4. To configure input data, do the following:

    • In the Name section, define the following options:
      • For Job ID, enter DeID_Storage_Demo1.
      • Keep Resource location set to Global (any region).
    • In the Storage type list, select Google Cloud Storage, and then define the following options:
      • For the Location Type select Scan a single file or folder path
      • For the URL enter: -input
      • Set “Percentage of included objects scanned within the bucket” to 100% and select No Sampling
Note: For the storage bucket URL, make sure there are no whitespaces.

configure DLP job

  1. Leave the rest of the fields as default and click Continue.

  2. Under Configure detection, leave all fields as default and click Continue.

  3. Under Add Actions, toggle to enable Make a De-identify copy.

  4. Enter the two templates that you created above in the respective boxes:

    • projects//locations/global/deidentifyTemplates/deid_unstruct1
    • projects//locations/global/deidentifyTemplates/deid_struct1
Note: make sure there are no spaces in the de-identification template paths.

DLP templates

  1. For the Cloud Storage output location specify:

This specifies to write the redacted output to the second bucket that was created for you.

  1. Click Continue.

  2. For Schedule, select Create a trigger to run the job on a periodic schedule and select Weekly.

  3. Click Continue.

  4. Scroll down and click Create > Confirm Create.

  5. You should now have a job under Inspection > Job Triggers.

job starting

Click Check my progress to verify the objective. Create a DLP Inspection Job Trigger

Run DLP Inspection and review results

  1. In the Google Cloud Console, navigate back to the Data Loss Prevention page.

  2. Click the Inspection tab.

  3. Under Job Triggers you should see the job trigger that you created.

  4. Select this job trigger.

  5. Click Run Now.

  6. This will create and run a new job instance.

  7. Select the job instance from the section below triggered jobs.

Note: If you do not see a job, you may need to refresh the screen or wait a minute and refresh.
  1. Monitor job and wait for it to say Done.

  2. Once Done, review the results on this page to see what was found in the bucket.

Great! You should see your findings populated and an overview of your job results at the bottom.

DLP job complete

View De-identified output

  1. On the job results page, click on Configuration.

  2. Scroll down to the section Output bucket for de-identified Cloud Storage Data.

  3. Click on the bucket link to be taken to that Cloud Storage Bucket.

  4. Explore the various folders and files to see what has been redacted. For example click on one of the image in the image folder should show something like:

redacted image

For further exploration, you can try the following:

  • Change the settings in the De-identification templates to try out different ways to de-identify and transform data. See the transformation reference here. You can also try turning on different tokenization or pseudonymization methods using Cloud KMS.

  • Try editing the DLP Job Trigger and adjusting what kind of data is being inspected for and then run another job by clicking “Run Now” from the triggers page. For example, if you turn off PERSON_NAME detection, the names should no longer be redacted.

Congratulations!

Congratulations, in this lab you created Cloud DLP De-Identification templates for structured and unstructured data, configured a Job Trigger with De-identify Findings Action enabled, created an Inspection Job, and viewed the results of the inspection job.

Next Steps / Learn More

Be sure to check out the following documentation for more practice with Cloud DLP:

Google Cloud training and certification

...helps you make the most of Google Cloud technologies. Our classes include technical skills and best practices to help you get up to speed quickly and continue your learning journey. We offer fundamental to advanced level training, with on-demand, live, and virtual options to suit your busy schedule. Certifications help you validate and prove your skill and expertise in Google Cloud technologies.

Manual Last Updated January 30, 2023

Lab Last Tested January 30, 2023

Copyright 2023 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.