Skip to main content

GPU AI SDK Training

H54FXS

Table of Contents

Table of Contents

    Course ID

    H54FXS

    Duration

    1 day

    Format

    ILT/VILT


    Overview

    This course provides a comprehensive introduction to GPU computing for AI/ML workloads. Topics include GPU architecture fundamentals, optimized workflows, framework integration, performance tuning, debugging, and advanced programming techniques. By the end of the training, you should be able to effectively utilize GPUs for deep learning acceleration and enterprise-scale workloads.


    This course includes an initial discussion to understand audience background and setup prerequisites to tailor the examples and exercises accordingly.

    Course ID

    H54FXS

    Duration

    1 day

    Format

    ILT/VILT


    Audience

    This course is ideal for:

    • Data science, machine learning, and AI professionals
    • Technical leaders and managers with programming backgrounds
    • MLOps, ModelOps, and LLMOps professionals
    • Software engineers and DevOps professionals with some exposure to machine learning
    • GPU enthusiasts with Python programming backgrounds

    Prerequisites

    Before attending this course, we recommend that you have:


    Intermediate Python programming knowledge:

    • Familiarity with data structures, list comprehension, lambdas, classes, and loops
    • Awareness of frameworks like NumPy, Pandas, Matplotlib, and Seaborn

    Beginner level machine learning and/or deep learning:

    • The machine learning lifecycle
    • Data cleaning and preparation process
    • Beginning machine learning math like linear algebra and probability
    • General operating systems and computer architecture understanding
    • High-level understanding of parallelization and multi-processing
    • Familiarity with hardware components like CPU, RAM, GPU, and other similar components
    • Basic concepts of system memory, pagination, and partitioning
    • Basic Linux (Bash) or Windows (PowerShell) scripting

    These prerequisites ensure that you enter the course with a solid foundation, maximizing your ability to understand and engage with the material.

    Objectives

    After completing this course, you should be able to:

    • Understand GPU architectures and their role in accelerating AI workloads
    • Implement deep learning frameworks on GPUs for efficient model training
    • Optimize single-GPU and multi-GPU performance with advanced strategies
    • Write and control custom GPU kernels for specialized operations
    • Debug, troubleshoot, and profile GPU performance in production scenarios
    Divider line

    Course outline

    Module 1: GPU Fundamentals and Deep Learning Acceleration


    Topics

    • CPU vs. GPU vs. TPU architectural differences and parallelism inference
    • Checking GPU availability and utilization
    • Getting started with acceleration

    Objectives

    • Understand architectural differences between CPU, GPU, and TPU and their role in parallelism and inference
    • Learn how to check GPU availability and utilization
    • Explore initial steps to enable acceleration in AI/ML workflows

    Module 2: Understanding GPU-Optimized Deep Learning Workflow

    Topics

    • GPU acceleration libraries for deep learning
    • Data flow between CPU, GPU, model, and output
    • Best practices for batching, shuffling, and memory management

    Objectives

    • Understand how GPU acceleration libraries (CUDA, cuDNN, tensor cores) optimize computations
    • Learn the flow of data between CPU, GPU, model, and output
    • Apply best practices for batching, shuffling, and memory management in GPU-enabled training

    Module 3: TensorFlow and Keras on GPUs


    Topics

    • Moving tensors and models to GPU
    • Training networks on datasets with and without GPU acceleration
    • Monitoring GPU memory usage and training speed

    Objectives

    • Learn how to move tensors and models to GPUs
    • Compare training performance with and without GPU acceleration
    • Monitor GPU memory and training speed in real time

    Module 4: Performance Optimization for Single-GPU and Multi-GPU Training

    Topics

    • Identifying and reducing data pipeline bottlenecks
    • Mixed precision training
    • Gradient accumulation and efficient batch sizing strategies

    Objectives

    • Identify and reduce data pipeline bottlenecks
    • Apply mixed precision training for faster computation and reduced memory usage
    • Use gradient accumulation and batch sizing strategies for optimal training

    Module 5: Advanced Controls

    Topics

    • Custom GPU kernels
    • Avoiding race conditions
    • Multidimensional grids
    • Sorting and memory optimization techniques

    Objectives

    • Learn to design and implement custom GPU kernels
    • Understand how to avoid race conditions in GPU programming
    • Explore multidimensional grid strategies and advanced GPU operations

    Module 6: Debugging, Troubleshooting, and Best Practices

    Topics

    • Debugging out-of-memory (OOM) and bottlenecks
    • Framework-specific GPU error handling (TensorFlow, PyTorch)
    • Profiling GPU performance

    Objectives

    • Debug out-of-memory (OOM) and data bottleneck issues in GPU workflows
    • Understand common framework-specific GPU errors in TensorFlow and PyTorch
    • Profile and optimize GPU performance for production readiness

    5 reasons to choose HPE as your training partner

    1. Learn HPE and in-demand IT industry technologies from expert instructors.
    2. Build career-advancing power skills.
    3. Enjoy personalized learning journeys aligned to your company’s needs.
    4. Choose how you learn: in-person , virtually , or online —anytime, anywhere.
    5. Sharpen your skills with access to real environments in virtual labs .

    Explore our simplified purchase options, including HPE Education Learning Credits .

    Lab outline

    Lab 1: GPU Fundamentals and Environment Setup

    • Verify GPU availability and utilization
    • Prepare the workspace for acceleration

    Lab 2: Building a GPU-Optimized Training Workflow


    • Implement data flow, batching, and memory management in a sample deep learning pipeline

    Lab 3: Running Deep Learning Models on GPUs

    • Train models on GPU using TensorFlow/Keras
    • Compare performance with CPU execution

    Lab 4: Performance Optimization Techniques


    • Apply mixed precision training, gradient accumulation, and batch sizing to improve efficiency

    Lab 5: Custom GPU Kernel Programming

    • Develop and run custom GPU kernels, manage grids, and handle synchronization safely

    Lab 6: Debugging and Troubleshooting GPU Workloads

    • Debug OOM issues, resolve framework-specific errors, and profile GPU performance

    Recommended for you