Course Overview

Present-day high-performance computing (HPC) and deep learning applications benefit from, and even require, cluster-scale GPU compute power. Writing CUDA applications that can correctly and efficiently utilize GPUs across a cluster requires a distinct set of skills. In this workshop, you’ll learn the tools and techniques needed to write CUDA C++ applications that can scale efficiently to clusters of NVIDIA GPUs.

Please note that once a booking has been confirmed, it is non-refundable. This means that after you have confirmed your seat for an event, it cannot be cancelled and no refund will be issued, regardless of attendance.

Prerequisites

Intermediate experience writing CUDA C/C++ applications.

Suggested materials to satisfy the prerequisites:

Fundamentals of Accelerated Computing with CUDA C/C++
Accelerating CUDA C++ Applications with Multiple GPUs
Accelerating CUDA C++ Applications with Concurrent Streams
Scaling Workloads Across Multiple GPUs with CUDA C++

Course Objectives

By participating in this workshop, you’ll:

Learn several methods for writing multi-GPU CUDA C++ applications
Use a variety of multi-GPU communication patterns and understand their tradeoffs
Write portable, scalable CUDA code with the single-program multiple-data (SPMD) paradigm using CUDA-aware MPI and NVSHMEM
Improve multi-GPU SPMD code with NVSHMEM’s symmetric memory model and its ability to perform GPU-initiated data transfers
Get practice with common multi-GPU coding paradigms like domain decomposition and halo exchanges

Follow On Courses

Fundamentals of Accelerated Computing with CUDA Python (FACCP)

Outline: Scaling CUDA C++ Applications to Multiple Nodes (SCCAMN)

Introduction

Meet the instructor.
Create an account at courses.nvidia.com/join

Multi-GPU Programming Paradigms

Survey multiple techniques for programming CUDA C++ applications for multiple GPUs using a Monte-Carlo approximation of pi CUDA C++ program.
Use CUDA to utilize multiple GPUs.
Learn how to enable and use direct peer-to-peer memory communication.
Write an SPMD version with CUDA-aware MPI.

Introduction to NVSHMEM

Learn how to write code with NVSHMEM and understand its symmetric memory model.
Use NVSHMEM to write SPMD code for multiple GPUs.
Utilize symmetric memory to let all GPUs access data on other GPUs.
Make GPU-initiated memory transfers.

Halo Exchanges with NVSHMEM

Practice common coding motifs like halo exchanges and domain decomposition using NVSHMEM, and work on the assessment.
Write an NVSHMEM implementation of a Laplace equation Jacobi solver.
Refactor a single GPU 1D wave equation solver with NVSHMEM.
Complete the assessment and earn a certificate.

Final Review

Learn about application tradeoffs on GPU clusters.
Review key learnings and answer questions.
Complete the workshop survey.

Prices & Delivery methods

Online Training

Duration
1 day

Price

US $ 500

Enroll now

Request a date

Classroom Training

Duration
1 day

Price

United States: US $ 500

Enroll now

Request a date

Schedule

Currently there are no training dates scheduled for this course.