Course Overview
Explore ways to derive insights from data at scale using BigQuery, Google Cloud’s serverless, highly scalable, and cost-effective cloud data warehouse. This course uses lectures, demos, and hands-on labs to teach you the fundamentals of BigQuery, including how to create a data transformation pipeline, build a BI dashboard, ingest new datasets, and design schemas at scale.
Who should attend
- Data Analysts, Business Analysts, Business Intelligence professionals
- Cloud Data Engineers who will be partnering with Data Analysts to build scalable data solutions on Google Cloud
Prerequisites
Basic proficiency with ANSI SQL (reference)
Course Objectives
- Derive insights from data using the analysis and visualization tools on Google Cloud
- Load, clean, and transform data at scale with Dataprep
- Explore and Visualize data using Google Data Studio
- Troubleshoot, optimize, and write high performance queries
- Practice with pre-built ML APIs for image and text understanding
- Train classification and forecasting ML models using SQL with BigQuery ML
Follow On Courses
Outline: From Data to Insights with Google Cloud Platform (DIGCP)
Module 1: Introduction to Data on Google Cloud Platform
Topics:
- Analytics Challenges Faced by Data Analysts
- Big Data On-premise Versus on the Cloud
- Real-world Use Cases of Companies Transformed Through Analytics on the Cloud
- Google Cloud Project Basics
Objectives:
- Compare data infrastructure on-premises versus on Google Cloud.
Module 2: Analyzing Large Datasets with BigQuery
Topics:
- Data Analyst Tasks, Challenges, and Google Cloud Data Tools
- Fundamental BigQuery Features
- Google Cloud Tools for Analysts, Data Scientists, and Data Engineers
Objectives:
- Identify data analyst tasks, and challenges, and introduce Google Cloud data tools
- Explore 9 fundamental BigQuery features
- Compare the differences in roles and toolsets between data analysts, data scientists, and data engineers
- Access the BigQuery web UI and explore a public dataset with basic SQL
Module 3: Exploring your Data with SQL
Topics:
- Common Data Exploration Techniques
- Use SQL to Query Public Datasets
Objectives:
- Compare common data exploration techniques
- Identify the key components of a basic SQL SELECT statement and common pitfalls
- Discuss the basics of SQL functions and how they create calculated fields with input parameters
- Explore BigQuery public datasets
- Troubleshoot dataset quality issues by analyzing duplicate records with SQL in the BigQuery Web UI
Module 4: Cleaning and Transforming your Data with Dataprep
Topics:
- 5 Principles of Dataset Integrity
- Dataset Shape and Skew
- Clean and Transform Data using SQL
- Introducing Dataprep by Trifacta
Objectives:
- Characterize different dataset shapes and potential skew
- Clean and transform data using SQL
- Clean and transform data using Dataprep
Module 5: Visualizing Insights and Creating Scheduled Queries
Topics:
- Data Visualization Principles
- Common Data Visualization Pitfalls
- Looker Studio
Objectives:
- Compare data visualizations and make recommendations for improvement
- Create dashboards and visualizations with Looker Studio
Module 6: Storing and Ingesting New Datasheets
Topics:
- Permanent vs Temporary Data Tables
- Ingesting New Datasheets
Objectives:
- Differentiate between permanent and temporary data tables
- Identify what types and formats of data BigQuery can ingest
- Differentiate between native BigQuery table storage and external data source connections
- Load new data into BigQuery
Module 7: Enriching your Data Warehouse with JOINs
Topics:
- Merge Historical Data Tables with UNION
- Introduce Table Wildcards for Easy Merges
- Review Data Schemas: Linking Data Across Multiple Tables
- JOIN Examples and Pitfalls
Objectives:
- Explain when to use UNIONs and when to use JOINs
- Identify the key pitfalls when joining and merging datasets
- Differentiate between join types visually
- Explain how union wildcards work and when to use them
- Write SQL JOINs and UNIONs against a dataset in the BigQuery web UI
Module 8: Advanced Features and Partitioning your Queries and Tables for Advanced Insights
Topics:
- Advanced Functions (Statistical, Analytic, User-defined)
- Date-Partitioned Tables
Objectives:
- Identify the available statistical approximation functions and user-defined functions
- Apply large-scale record estimation with approximate aggregation functions
- Deconstruct an analytical window query and explain when to use RANK() and PARTITION
- Explain when to use Common Table Expressions (WITH) to break apart complex queries
Module 9: Designing Schemas that Scale: Arrays and Structs in BigQuery
Topics:
- BigQuery Versus Traditional Relational Data Architecture
- ARRAY and STRUCT Syntax
- BigQuery Architecture
Objectives:
- Differentiate between BigQuery and traditional data architecture
- Work with ARRAYs and STRUCTs as part of nested fields in data schemas
Module 10: Optimizing Queries for Performance
Topics:
- BigQuery Performance Pitfalls
- Prevent Data Hotspots
- Diagnose Performance Issues with the Query Explanation Map
Objectives:
- Identify BigQuery performance pitfalls
- Discuss the Query Explanation map and how to interpret MAX and AVG processing times per stage
- Describe how to analyze and troubleshoot broken queries
Module 11: Controlling Access with Data Security Best Practices
Topics:
- Hashing Columns
- Authorized Views
- IAM and BigQuery Dataset Roles
- Access Pitfalls
Objectives:
- Review data access roles within Google Cloud and BigQuery
- Highlight key data access pitfalls and how to avoid them
Module 12: Predicting Visitor Return Purchases with BigQuery ML
Topics:
- Machine Learning on Structured Data
- Scenario: Predicting Customer Lifetime Value
- Choosing the Right Model Type
- Creating ML models with SQL
Objectives:
- Explain how ML on structured data drives value
- Describe how customer LTV can be predicted with an ML model
- Choose the right model type for different structured data use cases
- Create ML models with SQL
Module 13: Deriving Insights from Unstructured Data Using Machine Learning
Topics:
- ML Drives Business Value
- How does ML on Unstructured Data Work?
- Choosing the Right ML Approach
- Pre-built AI Building Blocks
- Customizing Pre-built Models with AutoML
- Building a Custom Model
Objectives:
- Discuss how ML is able to drive business value
- Explain how ML on unstructured data works
- Differentiate between pre-built ML models, custom models, and new models when considering an AI application strategy