Course Overview
Processing streaming data is becoming increasingly popular as streaming enables businesses to get real-time metrics on business operations. This course covers how to build streaming data pipelines on Google Cloud. Pub/Sub is described for handling incoming streaming data. The course also covers how to apply aggregations and transformations to streaming data using Dataflow, and how to store processed records to BigQuery or Cloud Bigtable for analysis. Learners get hands-on experience building streaming data pipeline components on Google Cloud by using QwikLabs.
Who should attend
This class is intended for data analysts, data scientists and programmers who want to build for extraordinary scenarios such as high availability, resiliency, high-throughput, real-time streaming analytics on Google Cloud.
Certifications
This course is part of the following Certifications:
Prerequisites
- Experience analyzing and visualizing big data, implementing cloud-based big data solutions, and transforming/processing datasets.
- Google Cloud Big Data and Machine Learning Fundamentals (or equivalent experience).
- Some knowledge of Java
Course Objectives
- Interpret use-cases for real-time streaming analytics
- Manage data events by using the Pub/Sub asynchronous messaging service
- Write streaming pipelines and run transformations where necessary
- Interoperate Dataflow, BigQuery and Pub/Sub for real-time streaming and analysis
Outline: Building Resilient Streaming Analytics Systems on Google Cloud (BRSAS)
Module 1 - Introduction to Processing Streaming Data
Topics:
- Introduction to processing streaming data
Objectives:
- Explain streaming data processing.
- Describe the challenges with streaming data.
- Identify the Google Cloud products and tools that can help address streaming data challenges.
Module 2 - Serverless Messaging with Pub/Sub
Topics:
- Introduction to Pub/Sub
- Pub/Sub push versus pull
- Publishing with Pub/Sub code
Objectives:
- Describe the Pub/Sub service.
- Explain how Pub/Sub works.
- Simulate real-time streaming sensor data using Pub/Sub
Module 3 - Dataflow Streaming Features
Topics:
- Steaming data challenges
- Dataflow windowing
Objectives:
- Describe the Dataflow service.
- Build a stream processing pipeline for live traffic data.
- Demonstrate how to handle late data by using watermarks, triggers, and accumulation.
Module 4 - High-Throughput BigQuery and Bigtable Streaming Features
Topics:
- Streaming into BigQuery and visualizing results
- High-throughput streaming with Bigtable
- Optimizing Bigtable performance
Objectives:
- Describe how to perform ad hoc analysis on streaming data using BigQuery and dashboards.
- Discuss Cloud Bigtable as a low-latency solution.
- Describe how to architect for Bigtable and how to ingest data into Bigtable.
- Highlight performance considerations for the relevant services.
Module 5 - Advanced BigQuery Functionality and Performance
Topics:
- Analytic window functions
- Geographic Information System (GIS) functions
- Performance considerations
Objectives:
- Review some of BigQuery’s advanced analysis capabilities.
- Discuss ways to improve query performance.