Building Codeless Pipelines on Cloud Data Fusion
When you complete this activity, you can earn the badge displayed above! View all the badges you have earned by visiting your profile page. Boost your cloud career by showing the world the skills you have developed!
In this lab you will learn how to create a Data Fusion instance and deploy a sample pipeline
This lab will teach you how to use the Pipeline Studio in Cloud Data Fusion to build an ETL pipeline. Pipeline Studio exposes the building blocks and built-in plugins for you to build your batch pipeline, one node at a time. You will also use the Wrangler plugin to build and apply transformations to your data that goes through the pipeline.
In this lab you’ll be working with Wrangler directives which are used by the Wrangler plugin, the “Swiss Army Knife” of plugins in the Data Fusion platform, so that your transformations are encapsulated in one place and we can group transformation tasks into manageable blocks.
In addition to batch pipelines, Data Fusion also allows you to create realtime pipelines, that can process events as they are generated. Currently, realtime pipelines execute using Apache Spark Streaming on Cloud Dataproc clusters. In this lab you you will learn how to build a streaming pipeline using Data Fusion.