H54FXS | GPU AI SDK Training

Contents
Share

Course data sheet

GPU AI SDK Training

H54FXS

Course ID	H54FXS
Duration	1 day
Format	ILT/VILT
Schedule, pricing & registration
Browse related courses

Overview

This course provides a comprehensive introduction to GPU computing for AI/ML workloads. Topics include GPU architecture fundamentals, optimized workflows, framework integration, performance tuning, debugging, and advanced programming techniques. By the end of the training, you should be able to effectively utilize GPUs for deep learning acceleration and enterprise-scale workloads.

This course includes an initial discussion to understand audience background and setup prerequisites to tailor the examples and exercises accordingly.

Course ID	H54FXS
Duration	1 day
Format	ILT/VILT
Schedule, pricing & registration
Browse related courses

Audience

This course is ideal for:

Data science, machine learning, and AI professionals
Technical leaders and managers with programming backgrounds
MLOps, ModelOps, and LLMOps professionals
Software engineers and DevOps professionals with some exposure to machine learning
GPU enthusiasts with Python programming backgrounds

Prerequisites

Before attending this course, we recommend that you have:

Intermediate Python programming knowledge:

Familiarity with data structures, list comprehension, lambdas, classes, and loops
Awareness of frameworks like NumPy, Pandas, Matplotlib, and Seaborn

Beginner level machine learning and/or deep learning:

The machine learning lifecycle
Data cleaning and preparation process
Beginning machine learning math like linear algebra and probability
General operating systems and computer architecture understanding
High-level understanding of parallelization and multi-processing
Familiarity with hardware components like CPU, RAM, GPU, and other similar components
Basic concepts of system memory, pagination, and partitioning
Basic Linux (Bash) or Windows (PowerShell) scripting

These prerequisites ensure that you enter the course with a solid foundation, maximizing your ability to understand and engage with the material.

Objectives

After completing this course, you should be able to:

Understand GPU architectures and their role in accelerating AI workloads
Implement deep learning frameworks on GPUs for efficient model training
Optimize single-GPU and multi-GPU performance with advanced strategies
Write and control custom GPU kernels for specialized operations
Debug, troubleshoot, and profile GPU performance in production scenarios

Course outline

Module 1: GPU Fundamentals and Deep Learning Acceleration	Topics CPU vs. GPU vs. TPU architectural differences and parallelism inference Checking GPU availability and utilization Getting started with acceleration Objectives Understand architectural differences between CPU, GPU, and TPU and their role in parallelism and inference Learn how to check GPU availability and utilization Explore initial steps to enable acceleration in AI/ML workflows
Module 2: Understanding GPU-Optimized Deep Learning Workflow	Topics GPU acceleration libraries for deep learning Data flow between CPU, GPU, model, and output Best practices for batching, shuffling, and memory management Objectives Understand how GPU acceleration libraries (CUDA, cuDNN, tensor cores) optimize computations Learn the flow of data between CPU, GPU, model, and output Apply best practices for batching, shuffling, and memory management in GPU-enabled training
Module 3: TensorFlow and Keras on GPUs	Topics Moving tensors and models to GPU Training networks on datasets with and without GPU acceleration Monitoring GPU memory usage and training speed Objectives Learn how to move tensors and models to GPUs Compare training performance with and without GPU acceleration Monitor GPU memory and training speed in real time
Module 4: Performance Optimization for Single-GPU and Multi-GPU Training	Topics Identifying and reducing data pipeline bottlenecks Mixed precision training Gradient accumulation and efficient batch sizing strategies Objectives Identify and reduce data pipeline bottlenecks Apply mixed precision training for faster computation and reduced memory usage Use gradient accumulation and batch sizing strategies for optimal training
Module 5: Advanced Controls	Topics Custom GPU kernels Avoiding race conditions Multidimensional grids Sorting and memory optimization techniques Objectives Learn to design and implement custom GPU kernels Understand how to avoid race conditions in GPU programming Explore multidimensional grid strategies and advanced GPU operations
Module 6: Debugging, Troubleshooting, and Best Practices	Topics Debugging out-of-memory (OOM) and bottlenecks Framework-specific GPU error handling (TensorFlow, PyTorch) Profiling GPU performance Objectives Debug out-of-memory (OOM) and data bottleneck issues in GPU workflows Understand common framework-specific GPU errors in TensorFlow and PyTorch Profile and optimize GPU performance for production readiness

5 reasons to choose HPE as your training partner

Learn HPE and in-demand IT industry technologies from expert instructors.
Build career-advancing power skills.
Enjoy personalized learning journeys aligned to your company’s needs.
Choose how you learn: in-person , virtually , or online —anytime, anywhere.
Sharpen your skills with access to real environments in virtual labs .

Explore our simplified purchase options, including HPE Education Learning Credits .

Lab outline

Lab 1: GPU Fundamentals and Environment Setup	Verify GPU availability and utilization Prepare the workspace for acceleration
Lab 2: Building a GPU-Optimized Training Workflow	Implement data flow, batching, and memory management in a sample deep learning pipeline
Lab 3: Running Deep Learning Models on GPUs	Train models on GPU using TensorFlow/Keras Compare performance with CPU execution
Lab 4: Performance Optimization Techniques	Apply mixed precision training, gradient accumulation, and batch sizing to improve efficiency
Lab 5: Custom GPU Kernel Programming	Develop and run custom GPU kernels, manage grids, and handle synchronization safely
Lab 6: Debugging and Troubleshooting GPU Workloads	Debug OOM issues, resolve framework-specific errors, and profile GPU performance

Learn more

HPE.com/ww/learnAI

© Copyright 2025 Hewlett Packard Enterprise Development LP. The information contained herein is subject to change without notice. The only warranties for Hewlett Packard Enterprise products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty.

Hewlett Packard Enterprise shall not be liable for technical or editorial errors or omissions contained herein.

All third-party marks are property of their respective owners.

a50014080enw, H54FXS A.00, November 2025