Course Overview
Please note attendees work together in teams of 5 as a minimum and the pricing advertised is per team of 5.
Course Content
This OpenHack enables attendees to develop, implement, and operationalize ETL pipelines for a multi-source data warehouse solution on Microsoft Azure. This OpenHack simulates a real-world scenario where an online DVD company’s data is coming in from a mess of disparate sources but needs to be stored in a single location, made sense of, and then used to feed a wide variety of downstream systems. During the “hacking” attendees will focus on 1. systematically ingesting and securing data from multiple sources and then, 2. transforming data to fit business’s required schema and monitor dataflow with levels of DevOps testing.
Technical Scenarios
- Disparate data sources: ingest data in from multiple, differing data sources into one single location with one normalized schema for standardized downstream use
- Security of data: protect data at all times while using ETL pipelines
- DevOps: learn how to use a production pipeline to handle data layer
Technologies Azure Data Lake Storage, Azure Data Factory, Azure Databricks, Azure DevOps, SQL Data Warehouse
Who should attend
- App Developers
- Customers that are trying to handle and store data from multiple sources
- Customers who need a DevOps solution that considers data management
Prerequisites
Knowledge Prerequisites To be successful and get the most out of this OpenHack, participants should have existing knowledge of relational database structures and concepts (e.g. tables, joins, SQL) and experience with either SSIS or programing languages like Scala or Python. Previous experience creating ETL pipelines, source control management, automated testing, and build and release automation will help you advance more quickly. Also, recommend familiarity with Azure fundamentals.
Tooling Prerequisites To avoid any delays with downloading or installing tooling, you are encouraged to have the following ready to go!
- Install your choice of Integrated Development Environment (IDE) Software, i.e. Visual Studio/ Visual Studio Code /Eclipse/IntelliJ
- Download Azure CLI
- SQL Server Database Tooling (Azure Data Studio/SSMS)
- SQL Server Data Tools (including BI tools) – If using Visual Studio for IDE
Post Learning Recommendations
- Implement a Data Warehouse with Azure SQL Data Warehouse
- Large-Scale Data Processing with Azure Data Lake Storage Gen2
- Core Cloud Services - Azure data storage options
- Azure for the Data Engineer
- Perform data engineering with Azure Databrick
- Architect a data platform in Azure
Course Objectives
By the end of the OpenHack, attendees will have built out a technical solution that is a fully operating Modern Data Warehouse with corresponding CI/CD pipeline that takes into account data management – which meets top-quality data consumption requirements, like reliability, scalability, and maintainability.
- Modern cloud solution that results in higher reliability, scalability, and maintainability of large amounts of data.
- Introduction to new data storage services to meet unique and multiple data streams needs
Outline: OpenHack – Modern Data Warehousing (OHMDW)
Challenge 1: Select and provision storage for an enterprise data lake
In this challenge, you will…
Learning objectives:
- Compare and contrast Azure storage offerings
- Provision the selected Azure storage service
Challenge 2: Ingest data from cloud sources
In this challenge, you will….
Learning objectives:
- Orchestrate the ingestion of data from multiple cloud-based sources to a single cloud-based store
- Ensure the protection of specific customer data at all times leveraging the current technology set and solution architecture
Challenge 3: Pull data from on-premises and establish source control
In this challenge, you will…
Learning objectives:
- Orchestrate the ingestion of data specifically from maintained “on-premises” solutions
- Implement a cloud-based source control repository for the developed solution
Challenge 4: Transform and normalize data within the lake and establish branch policies
In this challenge, you will…
Learning objectives:
- Transform data into a normalized schema for downstream consumption
- Create new policies to make certain all future changes leverage an appropriate review process
Challenge 5: Populate a data warehouse and implement unit tests
In this challenge, you will…
Learning objectives:
- Transform the data from the various source systems into a common data warehouse schema to support the generation of specific reports mandated by the business
- Orchestrate the dataflow into the data warehouse in an automated manner
- Build out unit tests across core components of the data pipeline
- Integrate automated testing into the code review process
Challenge 6: Differential data loads and telemetry
In this challenge, you will….
Learning objectives:
- Modify the solution to include doing differential data loads as well as the original bulk load
- Automate data load and processing to run daily
- Implement rich telemetry into the dataflow and deployment pipelines
- Add error handling to raise pipeline issues in real-time
Challenge 7: Automated deployment with validation and approval
In this challenge, you will…
Learning objectives:
- Operationalize the solution deployment process through automation
- Create and implement a testing environment
- Implement automated deployment processes and policies