arrow_back

Ingesting FHIR Data with the Healthcare API

Join Sign in

Ingesting FHIR Data with the Healthcare API

1 hour 1 Credit

GSP457

Google Cloud selp-paced labs logo

Overview

Cloud Healthcare API provides a managed solution for storing and accessing healthcare data in Google Cloud, providing a critical bridge between existing care systems and applications hosted on Google Cloud. Using the API, you can unlock significant new capabilities for data analysis, machine learning and application development, and use these capabilities to build the next generation of healthcare solutions.

In this lab you will discover and use the basic functionality of Cloud Healthcare API using Fast Healthcare Interoperability Resources (FHIR) data model, how to export data to BigQuery, and how to access data in BigQuery via SQL.

What you learn

In this lab, you will:

  • Gain a general understanding of Cloud Healthcare API and its role in managing healthcare data.
  • Learn how to create Cloud Healthcare API datasets and stores.
  • Import and export FHIR data using the Cloud Healthcare API.
  • Export data from Cloud healthcare API to BigQuery
  • Access data in BigQuery via SQL

Healthcare API introduction

Cloud Healthcare API provides a managed solution for storing and accessing healthcare data in Google Cloud Platform (GCP), providing a critical bridge between existing care systems and applications hosted on Google Cloud. Using the API, you can unlock significant new capabilities for data analysis, machine learning and application development, and use these capabilities to build the next generation of healthcare solutions.

The API is comprised of three modality-specific interfaces that implement key industry-wide standards for healthcare data:

  • FHIR, an emerging standard for health data interchange
  • HL7v2, the most widely adopted method for health systems integration
  • DICOM, the dominant standard for radiology and imaging-related disciplines

Each interface is backed by a standards-compliant data store that provides read, write, search, and other operations on the data.

The Cloud Healthcare API provides a number of key features that are critical to bridging current technologies to the next generation of healthcare systems and applications:

  • Standards conformance - Google supports the use of standards-based interoperability through its participation in a number of healthcare standards bodies. In the Cloud Healthcare API each modality-specific data store and its associated API is substantially conformant with its relevant standard. For example, FHIR stores implement STU3, the current version of the FHIR specification, and DICOM stores implement DICOMweb, a web-based standard for exchanging medical images. In future updates, we expect to support additional versions of these specifications as well as the ability to request a resource in a different version than its canonical representation.
  • Compliance with privacy regulations - GCP provides detailed guidance regarding how it supports compliance with HIPAA in the US, the PIPEDA in Canada, and other global privacy standards at cloud.google.com/security/compliance.
  • Data location control - The Cloud Healthcare API treats data location as a core component of the API. You have the option to select the storage location for each dataset from a list of currently available locations which correspond to distinct geographic areas aligned with GCP's regional structure. Future GCP regions will allow for the distribution of storage across wider geographic areas.
  • Security - The Cloud Healthcare API security model is based on Google's proven Identity and Access Management (IAM) system. IAM's fine-grained permissions give you complete control over access to your healthcare data. In addition, we've created open-source proxies for our powerful Apigee API Management system, which provides comprehensive threat detection and traffic management capabilities that allow you to securely expose sensitive ePHI with patient and provider applications.
  • Bulk import and export - The Cloud Healthcare API's DICOM and FHIR modalities support bulk import and export of data, making it easier to transfer data via the Cloud Storage system.
  • De-identification - De-identification support for DICOM is available, making it much easier to redact patient information from studies for research and other purposes. The de-identification process operates on a data store basis.
  • Auditability - Both administrative and data access requests to the Cloud Healthcare API can be audited. Logs are available through Google Cloud's Stackdriver hybrid monitoring system.
  • High availability - Availability for mission-critical scenarios is made possible through Google Cloud's robust and highly redundant infrastructure.

For many applications, the Cloud Healthcare API can provide a modern alternative to legacy stacks implementing DICOM, HL7v2 or FHIR STU3 standards, simplifying data integration with existing systems and enabling the application developers to focus on their differentiating features such as UX and intelligence.

Healthcare API concepts

To get the most out of the Cloud Healthcare API, there are a few key concepts you'll want to understand. The information below should give you a good sense of Cloud Healthcare API capabilities, but you can find more details in the documentation.

General structure of the Cloud Healthcare API

The Cloud Healthcare API exposes interfaces that enable you to perform different types of functions:

  • Administrative functions, such as creating or listing datasets and stores that will contain your data;
  • Data access functions that allow you to create, update, delete and search the data stored in Cloud Healthcare API, or to perform bulk import and export operations;
  • Security functions that allow you to impose access controls on data stored in Cloud Healthcare API;
  • De-identification functions that allow you to replace ePHI with anonymized data, or to obfuscate ePHI so that it cannot be used;
  • Metadata functions, such as retrieval of a FHIR capabilities statement for the FHIR API.

These functions may vary slightly depending on the modality of data (FHIR, HL7 v2 or DICOM) being operated on. For example, data retrieval operations against an FHIR data store use an API that conforms to the FHIR standard, but data retrieval operations against an HL7 v2 store use operations better suited to operating on HL7v2-structured data.

Datasets and stores

All Cloud Healthcare API usage occurs within the context of a Google Cloud Platform project. Projects form the basis for creating, enabling, and using all GCP services including managing APIs, enabling billing, adding and removing collaborators, and managing permissions for GCP resources. Cloud Healthcare API can be used in one or many GCP projects, as appropriate; this flexibility allows you to separate production from non-production usage, for example, or to segregate applications and resources in order to better manage access or accommodate different development lifecycles.

Within a project, data ingested through Cloud Healthcare API is stored in a dataset, which resides in a geographic location corresponding to a specific GCP region. You use the Cloud Healthcare API's administrative functions to create a dataset in a particular location; doing so facilitates implementation of data location requirements for the countries in which your applications provide services. For example, you can choose to create a dataset in GCP's "us-central1" region for US-based applications, or in an EU or UK region for applications serving those customers. This level of location control is also available in other GCP products, which can be combined with Cloud Healthcare API to create a complete application architecture. A list of generally available GCP products and the regions in which they are implemented can be found here.

Because each healthcare data modality has different structural and processing characteristics, datasets are split into modality-specific stores. A single dataset can contain one or many stores, and those stores can all service the same modality or different modalities as application needs dictate. Using multiple stores in the same dataset might be appropriate if a given application processes different types of data, for example, or if you'd like to be able to separate data according to its source hospital, clinic, department, etc. An application can access as many datasets or stores as its requirements dictate with no performance penalty, so it's important to design your overall dataset and store architecture to meet the organization's broad goals for locality, partitioning, access control, and so on.

The diagram below illustrates two datasets in a GCP project, each of which contains multiple stores.

healthcareAPI_data_model.png

There are many ways to structure datasets and stores. As you design systems that use the Cloud Healthcare API, you may want to take the following into consideration:

  • Security and access control: Rules can be defined at both a dataset and store level, but you may choose to group all data for a particular application into the same dataset, and set access control rules such that only that application can access the dataset.

  • Application requirements: An application processing different types of data may have all of its data for all modalities in a single dataset.

  • Source systems: Often, the structure of healthcare data can vary according to the source system and modality. Separating data for different source systems into their own datasets may facilitate processing.

  • Intended use: Data from different systems can have different intended uses, such as research, analytics or machine learning predictions. Grouping data by intended use may facilitate ingestion into the target system.

  • Separating ePHI from de-identified data: Cloud Healthcare API data de-identification functions read from a source dataset and write the output into a new dataset that you specify. If you are preparing data to be used by researchers, for example, this approach to de-identifying data may be a consideration in how you use datasets to segregate data.

API structure

Data in Cloud Healthcare API datasets and stores can be accessed and managed using a REST API that identifies each store using its project, location, dataset, store type and store name. This API implements modality-specific standards for access that are consistent with industry standards for that modality. For example, the Cloud Healthcare DICOM API natively provides operations for reading DICOM studies and series that are consistent with the DICOMweb standard, and supports the DICOM DIMSE C-STORE protocol via an open-source adapter. Similarly, the FHIR API provides operations for accessing or searching FHIR entity types that is based on the FHIR standard, and the HL7v2 API provides operations for reading and searching HL7v2 messages based on HL7v2 message or segment criteria.

Operations that access a modality-specific store use a request path that is comprised of two pieces: a base path, and a modality-specific request path. Administrative operations—which generally operate only on locations, datasets and stores—may only use the base path, but data modality-specific retrieval operations use both the base path (for identifying the store to be accessed) and request path (for identifying the actual data to be retrieved).

To reference a particular store within a Cloud Healthcare API dataset, you would use a base path structured like this:

/projects/<PROJECT>/locations/<LOCATION>/datasets/<DATASET>/<STORE-TYPE>/<STORE-NAME>

A concrete base path example might look like this:

/projects/myProj/locations/us-central1/datasets/central-ds1/hl7V2Stores/clinical-store1

which references a Cloud Healthcare HL7 v2 store in the Google Cloud project "myProj", in the "us-central1" region, in a dataset called "central-ds1", and with a name of "clinical-store1". This is an HL7 v2 store because of the "hl7V2Stores" type; if you want to access a FHIR store in the same dataset you can use the "fhirStores" type, and if the store contained DICOM data you can used the "dicomStores" type.

To access a specific piece of data, the base path is used in combination with a request path that is formatted according to the appropriate modality standard. For example, a request to read a specific FHIR "Patient" entity using the entity ID might look like this:

<basePath>/resources/Patient/{patient_id}

with /Patient/{patient_id} being a path—structured according to the FHIR standard—for the Patient resource whose identifier is specified by {patient_id}. Similarly, DICOMweb requests to a DICOM store might look like this:

<basePath>/dicomWeb/studies/{study_id}/series?PatientName={patient_name}

where {study_id} identifies a particular DICOM study, and the patient's name is specified by {patient_name}. In this example, the path specification is consistent with the DICOMweb standard path structure.

Setup and requirements

Before you click the Start Lab button

Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources will be made available to you.

This hands-on lab lets you do the lab activities yourself in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials that you use to sign in and access Google Cloud for the duration of the lab.

To complete this lab, you need:

  • Access to a standard internet browser (Chrome browser recommended).
Note: Use an Incognito or private browser window to run this lab. This prevents any conflicts between your personal account and the Student account, which may cause extra charges incurred to your personal account.
  • Time to complete the lab---remember, once you start, you cannot pause a lab.
Note: If you already have your own personal Google Cloud account or project, do not use it for this lab to avoid extra charges to your account.

How to start your lab and sign in to the Google Cloud Console

  1. Click the Start Lab button. If you need to pay for the lab, a pop-up opens for you to select your payment method. On the left is the Lab Details panel with the following:

    • The Open Google Console button
    • Time remaining
    • The temporary credentials that you must use for this lab
    • Other information, if needed, to step through this lab
  2. Click Open Google Console. The lab spins up resources, and then opens another tab that shows the Sign in page.

    Tip: Arrange the tabs in separate windows, side-by-side.

    Note: If you see the Choose an account dialog, click Use Another Account.
  3. If necessary, copy the Username from the Lab Details panel and paste it into the Sign in dialog. Click Next.

  4. Copy the Password from the Lab Details panel and paste it into the Welcome dialog. Click Next.

    Important: You must use the credentials from the left panel. Do not use your Google Cloud Skills Boost credentials. Note: Using your own Google Cloud account for this lab may incur extra charges.
  5. Click through the subsequent pages:

    • Accept the terms and conditions.
    • Do not add recovery options or two-factor authentication (because this is a temporary account).
    • Do not sign up for free trials.

After a few moments, the Cloud Console opens in this tab.

Note: You can view the menu with a list of Google Cloud Products and Services by clicking the Navigation menu at the top-left. Navigation menu icon

Activate Cloud Shell

Cloud Shell is a virtual machine that is loaded with development tools. It offers a persistent 5GB home directory and runs on the Google Cloud. Cloud Shell provides command-line access to your Google Cloud resources.

  1. In the Cloud Console, in the top right toolbar, click the Activate Cloud Shell button.

Cloud Shell icon

  1. Click Continue.

It takes a few moments to provision and connect to the environment. When you are connected, you are already authenticated, and the project is set to your PROJECT_ID. The output contains a line that declares the PROJECT_ID for this session:

Your Cloud Platform project in this session is set to YOUR_PROJECT_ID

gcloud is the command-line tool for Google Cloud. It comes pre-installed on Cloud Shell and supports tab-completion.

  1. (Optional) You can list the active account name with this command:

gcloud auth list

(Output)

ACTIVE: * ACCOUNT: student-01-xxxxxxxxxxxx@qwiklabs.net To set the active account, run: $ gcloud config set account `ACCOUNT`
  1. (Optional) You can list the project ID with this command:

gcloud config list project

(Output)

[core] project = <project_ID>

(Example output)

[core] project = qwiklabs-gcp-44776a13dea667a6 For full documentation of gcloud, in Google Cloud, Cloud SDK documentation, see the gcloud command-line tool overview.

Define variables needed

In Cloud Shell run the following to set variables needed for this lab:

export PROJECT_ID=$(gcloud config list --format 'value(core.project)') export PROJECT_NUMBER=$(gcloud projects list --filter=projectId:$PROJECT_ID \ --format="value(projectNumber)") export LOCATION=us-central1 export DATASET_ID=dataset1 export FHIR_STORE_ID=fhirstore1 export TOPIC=fhir-topic export HL7_STORE_ID=hl7v2store1

Enable the Healthcare API

In the Cloud Console, go to Navigation menu > Healthcare.

Healthcare_menu.png

Click Enable.

healthcare_API.png

Create BigQuery datasets

  1. Run the following in Cloud Shell to create a dataset in BigQuery:

bq --location=us-east1 mk --dataset --description HCAPI-dataset $PROJECT_ID:$DATASET_ID

You'll see a success message:

Dataset '{project_id}:{dataset_id}' successfully created.
  1. Create a second dataset in BigQuery:
bq --location=us-east1 mk --dataset --description HCAPI-dataset-de-id $PROJECT_ID:de_id

(Output)

Dataset '{project_id}:de_id' successfully created.
  1. Set up the appropriate permissions to enable exporting data from the FHIR store to BigQuery.

gcloud projects add-iam-policy-binding $PROJECT_ID \ --member=serviceAccount:service-$PROJECT_NUMBER@gcp-sa-healthcare.iam.gserviceaccount.com \ --role=roles/bigquery.dataEditor gcloud projects add-iam-policy-binding $PROJECT_ID \ --member=serviceAccount:service-$PROJECT_NUMBER@gcp-sa-healthcare.iam.gserviceaccount.com \ --role=roles/bigquery.jobUser

Healthcare API setup

Create a dataset for the healthcare API datastores to be organized under.

gcloud healthcare datasets create $DATASET_ID \ --location=$LOCATION

Create FHIR store

From the Datasets Browser screen, click the Refresh icon.

Then, click on dataset1.

Next, click Create Data Store.

Select the type: FHIR

Click in the ID field and name the data store fhirstore1.

Click Next.

Under Configure your FHIR Store, select R4.

Click Next.

Under Stream resource changes to BigQuery, make no changes and click Next.

In the Receive Cloud Pub/Sub notifications section, click into the Select a Cloud Pub/Sub topic and then click the Create a topic link.

create-topic.png

Name the topic fhir-topic, then click Create Topic.

Click Create.

Your first FHIR store is now created.

Create a second datastore by clicking Create Data Store.

Select FHIR in the Type dropdown.

Name the ID of the data store de_id.

Click Next.

Select R4 for the FHIR Store Configuration option.

Click Create. Your second FHIR store is now created.

You should now see the two FHIR stores listed on the Data stores view.

Click Check my progress to verify the objective.

Create Healthcare Dataset and Data Store

Data creation

Import to FHIR data stores

Now you'll import sample data into the FHIR stores and stream to BigQuery.

Load the sample FHIR data into your FHIR store by calling the API below. In this API call, data is taken from an existing Cloud Storage bucket and loaded into a FHIR store:

gcloud healthcare fhir-stores import gcs $FHIR_STORE_ID \ --dataset=$DATASET_ID \ --location=$LOCATION \ --gcs-uri=gs://spls/gsp457/fhir_devdays_gcp/fhir1/* \ --content-structure=BUNDLE_PRETTY

This may take a couple of minutes to complete.

Click on the Operations tab to monitor the process.

The CreateDataset was a success and the ImportResources1 may still be running. Wait until the operation has been completed before moving on.

Click on the Data Stores tab to view the datastores again.

FHIR bulk export

Using Cloud Shell, bulk export the FHIR data in fhirstore1 to the first BigQuery dataset created:

gcloud beta healthcare fhir-stores export bq $FHIR_STORE_ID \ --dataset=$DATASET_ID \ --location=$LOCATION \ --bq-dataset=bq://$PROJECT_ID.$DATASET_ID \ --schema-type=analytics

This may take a couple of minutes to complete.

You can view progress in the Operations tab in the Console.

Click on the Data Stores tab to view the datastores again once the operation is complete.

FHIR data de-identification

Click the Actions button for fhirstore1.

From the dropdown, select de-identify.

Select dataset1 as the dataset and de_id as the destination data store.

Click Append for the pop-up.

Click Next.

Click de-identify.

You can view progress in the Operations tab in the Console.

Click on the Data Stores tab to view the datastores again once the operation is complete.

Wait for this operation to complete before moving to the next step.

FHIR bulk export

Using Cloud Shell, bulk export the FHIR data in de_id to the second BigQuery data set created earlier. Before running, ensure that the previous bulk export has completed.

gcloud beta healthcare fhir-stores export bq de_id \ --dataset=$DATASET_ID \ --location=$LOCATION \ --bq-dataset=bq://$PROJECT_ID.de_id \ --schema-type=analytics

Click Check my progress to verify the objective.

Data creation

Exploring data in BigQuery

In Cloud Console, use the Navigation menu to open BigQuery.

In the left pane, under resources, select your Project ID and expand the drop-down. You should see the two recently created datasets named dataset1, and de_id.

Select dataset1 and expand the drop-down.

Navigate to the Patient table and preview the Schema.

In the Query Editor tab, add the following SQL command to view patient data exported from the FHIR stores.

SELECT id AS patient_id, name[safe_offset(0)].given AS given_name, name[safe_offset(0)].family AS family, birthDate AS birth_date FROM dataset1.Patient LIMIT 10

Then, click Run.

Click Check my progress to verify the objective.

Exploring data in BigQuery

In the Query window, execute the following SQL command to view de-identified patient data exported from the de-identified FHIR store.

SELECT id AS patient_id, name[safe_offset(0)].given AS given_name, name[safe_offset(0)].family AS family, birthDate AS birth_date FROM de_id.Patient LIMIT 10

See the difference in the data? In the query on the de-identified data, given_name and family name have been redacted, and the birth_date date shifted, while retaining the non-PHI PatientID.

Streaming data export

In this section, you will create a new FHIR Patient resource in the FHIR store and export the newly created FHIR resource to BigQuery using streaming export.

  1. Using BigQuery’s UI, query for Darcys in the database.

SELECT id AS patient_id, name[safe_offset(0)].given AS given_name, name[safe_offset(0)].family AS family, birthDate AS birth_date FROM `dataset1.Patient` where name[safe_offset(0)].family="Smith"

This will not return any results. You will now stream this patient into the dataset and query again to demonstrate the newly created resource.

  1. To enable BigQuery streaming, you must update the FHIR store's streamConfigs field. To update the FHIR store, make a PATCH request with the following information:

  • The parent dataset
  • The FHIR store
  • The BigQuery dataset
  • The BigQuery project
  • An update mask
  • An access token

Run this Patch command in your Cloud Shell:

curl -X PATCH \ -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \ -H "Content-Type: application/json; charset=utf-8" \ --data "{ 'streamConfigs': [ { 'bigqueryDestination': { 'datasetUri': 'bq://$PROJECT_ID.$DATASET_ID', 'schemaConfig': { 'schemaType': 'ANALYTICS' } } } ] }" \ "https://healthcare.googleapis.com/v1/projects/$PROJECT_ID/locations/$LOCATION/datasets/$DATASET_ID/fhirStores/$FHIR_STORE_ID?updateMask=streamConfigs"
  1. Run this command to load the sample FHIR data into your FHIR store:

curl -X POST \ -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \ -H "Content-Type: application/fhir+json; charset=utf-8" \ --data "{ \"name\": [ { \"use\": \"official\", \"family\": \"Smith\", \"given\": [ \"Darcy\" ] } ], \"gender\": \"female\", \"birthDate\": \"1970-01-01\", \"resourceType\": \"Patient\" }" \ "https://healthcare.googleapis.com/v1/projects/$PROJECT_ID/locations/$LOCATION/datasets/$DATASET_ID/fhirStores/$FHIR_STORE_ID/fhir/Patient"

In this API call, you are creating a new FHIR Patient resource in the FHIR store fhirstore1.

  1. Query for Darcys in the dataset again:

SELECT id AS patient_id, name[safe_offset(0)].given AS given_name, name[safe_offset(0)].family AS family, birthDate AS birth_date FROM dataset1.Patient where name[safe_offset(0)].family='Smith'

You should see a new patient created, the recently imported Darcy patient! This is a result of the streaming FHIR data export of the newly created FHIR Patient Resource to the BigQuery Dataset.

Lab review

Cloud Healthcare API provides a comprehensive facility for ingesting, storing, managing, and securely exposing healthcare data in FHIR, DICOM, and HL7 v2 formats. Using Cloud Healthcare API, you can ingest and store data from electronic health records systems (EHRs), radiological information systems (RISs), and custom healthcare applications. You can then immediately make that data available to applications for analysis, machine learning prediction and inference, and consumer access.

Cloud Healthcare API enables application access to healthcare data via widely-accepted, standards-based interfaces such as FHIR STU3 and DICOMweb. These APIs allow data ingestion into modality-specific data stores, which support data retrieval, update, search and other functions using familiar standards-based interfaces.

Further, the API integrates with other capabilities in Google Cloud through two primary mechanisms:

  • Cloud Pub/Sub, which provides near-real-time updates when data is ingested into a Cloud Healthcare API data store, and
  • Import/export APIs, which allow you to integrate Cloud Healthcare API into both Google Cloud Storage and Google BigQuery.

Using Cloud Pub/Sub with Google Cloud Functions enables you to invoke machine learning models on healthcare data, storing the resulting predictions back in Cloud Healthcare API data store. A similar integration with Cloud Dataflow supports transformation and cleansing of healthcare data prior to use by applications.

To support healthcare research, Cloud Healthcare API offers de-identification capabilities for FHIR and DICOM. This feature allows customers to share data with researchers working on new cutting-edge diagnostics and medicines.

Congratulations

In this lab you:

  • Gained a general understanding of Cloud Healthcare API and its role in managing healthcare data.
  • Learned how to create datasets and stores for FHIR data.
  • Imported FHIR data from Cloud Storage
  • Exported FHIR data to BigQuery in both for bulk data export and streaming
  • Reviewed a number of queries against FHIR data in BigQuery

Healthcare_125.png

Finish Your Quest

This self-paced lab is part of the Qwiklabs Cloud Healthcare API Quest. A Quest is a series of related labs that form a learning path. Completing a Quest earns you a badge to recognize your achievement. You can make your badge (or badges) public and link to them in your online resume or social media account. Enroll in this Quest and get immediate completion credit if you've taken this lab. See other available Qwiklabs Quests.

Take your next lab

Continue your quest with Ingesting DICOM Data with the Healthcare API or try one of these suggestions:

End your lab

When you have completed your lab, click End Lab. Qwiklabs removes the resources you’ve used and cleans the account for you.

You will be given an opportunity to rate the lab experience. Select the applicable number of stars, type a comment, and then click Submit.

The number of stars indicates the following:

  • 1 star = Very dissatisfied
  • 2 stars = Dissatisfied
  • 3 stars = Neutral
  • 4 stars = Satisfied
  • 5 stars = Very satisfied

You can close the dialog box if you don't want to provide feedback.

For feedback, suggestions, or corrections, please use the Support tab.

Manual Last Updated: October 14, 2021
Lab Last Tested: October 14, 2021

Copyright 2022 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.