Introduction to GPU Programming Training Course

GPU programming is a methodology that harnesses the parallel processing capabilities of graphics processing units (GPUs) to accelerate applications requiring high-performance computing, including artificial intelligence, gaming, graphics rendering, and scientific computing. Various frameworks and tools facilitate GPU programming, each with distinct pros and cons. The most widely used include OpenCL, CUDA, ROCm, and HIP.

This instructor-led live training (available online or onsite) targets beginner to intermediate developers seeking to master the fundamentals of GPU programming, along with the primary frameworks and tools for developing GPU-accelerated applications.

By the conclusion of this training, participants will be able to:
Grasp the distinctions between CPU and GPU computing, as well as the advantages and challenges associated with GPU programming.
Select the most appropriate framework and tool for their specific GPU application.
Develop a fundamental GPU program that executes vector addition using one or more of the discussed frameworks and tools.
Utilise the relevant APIs, languages, and libraries to query device details, allocate and deallocate device memory, transfer data between host and device, launch kernels, and synchronise threads.
Leverage the corresponding memory spaces—such as global, local, constant, and private—to optimise data transfers and memory access patterns.
Apply the respective execution models, including work-items, work-groups, threads, blocks, and grids, to manage parallelism effectively.
Debug and test GPU programs using tools such as CodeXL, CUDA-GDB, CUDA-MEMCHECK, and NVIDIA Nsight.
Optimise GPU programs employing techniques like coalescing, caching, prefetching, and profiling.

Format of the Course

Interactive lectures and discussions.
Extensive exercises and practical application.
Hands-on implementation within a live laboratory environment.

Course Customization Options

To request customized training for this course, please contact us to make arrangements.

This course is available as onsite live training in Portugal or online live training.

Thank you for sending your enquiry! One of our team members will contact you shortly.

Thank you for sending your booking! One of our team members will contact you shortly.

Course Outline

Introduction

What is GPU programming?
Why is GPU programming used?
What are the challenges and trade-offs of GPU programming?
What frameworks and tools are available for GPU programming?
Choosing the right framework and tool for your application

OpenCL

What is OpenCL?
What are the advantages and disadvantages of OpenCL?
Setting up the development environment for OpenCL
Creating a basic OpenCL program that performs vector addition
Using the OpenCL API to query device information, allocate and deallocate device memory, copy data between host and device, launch kernels, and synchronise threads
Using the OpenCL C language to write kernels that execute on the device and manipulate data
Using OpenCL built-in functions, variables, and libraries to perform common tasks and operations
Using OpenCL memory spaces, such as global, local, constant, and private, to optimise data transfers and memory accesses
Using the OpenCL execution model to control work-items, work-groups, and ND-ranges that define parallelism
Debugging and testing OpenCL programs using tools such as CodeXL
Optimising OpenCL programs using techniques such as coalescing, caching, prefetching, and profiling

CUDA

What is CUDA?
What are the advantages and disadvantages of CUDA?
Setting up the development environment for CUDA
Creating a basic CUDA program that performs vector addition
Using the CUDA API to query device information, allocate and deallocate device memory, copy data between host and device, launch kernels, and synchronise threads
Using the CUDA C/C++ language to write kernels that execute on the device and manipulate data
Using CUDA built-in functions, variables, and libraries to perform common tasks and operations
Using CUDA memory spaces, such as global, shared, constant, and local, to optimise data transfers and memory accesses
Using the CUDA execution model to control threads, blocks, and grids that define parallelism
Debugging and testing CUDA programs using tools such as CUDA-GDB, CUDA-MEMCHECK, and NVIDIA Nsight
Optimising CUDA programs using techniques such as coalescing, caching, prefetching, and profiling

ROCm

What is ROCm?
What are the advantages and disadvantages of ROCm?
Setting up the development environment for ROCm
Creating a basic ROCm program that performs vector addition
Using the ROCm API to query device information, allocate and deallocate device memory, copy data between host and device, launch kernels, and synchronise threads
Using the ROCm C/C++ language to write kernels that execute on the device and manipulate data
Using ROCm built-in functions, variables, and libraries to perform common tasks and operations
Using ROCm memory spaces, such as global, local, constant, and private, to optimise data transfers and memory accesses
Using the ROCm execution model to control threads, blocks, and grids that define parallelism
Debugging and testing ROCm programs using tools such as the ROCm Debugger and ROCm Profiler
Optimising ROCm programs using techniques such as coalescing, caching, prefetching, and profiling

HIP

What is HIP?
What are the advantages and disadvantages of HIP?
Setting up the development environment for HIP
Creating a basic HIP program that performs vector addition
Using the HIP language to write kernels that execute on the device and manipulate data
Using HIP built-in functions, variables, and libraries to perform common tasks and operations
Using HIP memory spaces, such as global, shared, constant, and local, to optimise data transfers and memory accesses
Using the HIP execution model to control threads, blocks, and grids that define parallelism
Debugging and testing HIP programs using tools such as the ROCm Debugger and ROCm Profiler
Optimising HIP programs using techniques such as coalescing, caching, prefetching, and profiling

Comparison

Comparing the features, performance, and compatibility of OpenCL, CUDA, ROCm, and HIP
Evaluating GPU programs using benchmarks and metrics
Learning best practices and tips for GPU programming
Exploring current and future trends and challenges in GPU programming

Summary and Next Steps

Requirements

A solid understanding of the C/C++ language and parallel programming concepts.
Foundational knowledge of computer architecture and memory hierarchy.
Experience with command-line tools and code editors.

Audience

Developers eager to learn the basics of GPU programming and the key frameworks and tools for developing GPU applications.
Developers aiming to write portable and scalable code capable of running across different platforms and devices.
Programmers interested in exploring the benefits and challenges of GPU programming and optimization.

21 Hours

Custom Corporate Training

Training solutions designed exclusively for businesses.

Customized Content: We adapt the syllabus and practical exercises to the real goals and needs of your project.
Flexible Schedule: Dates and times adapted to your team's agenda.
Format: Online (live), In-company (at your offices), or Hybrid.

Investment

Price per private group, online live training, starting from 3900 € + VAT*

(*The final price may vary depending on the technical specialization of the course, the level of customization, the method of delivery and the number of learners)

Need help picking the right course?
info@nobleprog.pt or +351 30 050 9666

Introduction to GPU Programming Training Course

Course Outline

Requirements

Custom Corporate Training

Provisional Upcoming Courses (Contact Us For More Information)

Introduction to GPU Programming

Introduction to GPU Programming

Introduction to GPU Programming

Introduction to GPU Programming

Introduction to GPU Programming

Related Categories

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites

Introduction to GPU Programming Training Course

Course Outline

Requirements

Custom Corporate Training

Provisional Upcoming Courses (Contact Us For More Information)

Introduction to GPU Programming

Introduction to GPU Programming

Introduction to GPU Programming

Introduction to GPU Programming

Introduction to GPU Programming

Related Courses

Developing AI Applications with Huawei Ascend and CANN

Deploying AI Models with CANN and Ascend AI Processors

AI Inference and Deployment with CloudMatrix

GPU Programming on Biren AI Accelerators

Cambricon MLU Development with BANGPy and Neuware

Introduction to CANN for AI Framework Developers

CANN for Edge AI Deployment

Understanding Huawei’s AI Compute Stack: From CANN to MindSpore

Optimizing Neural Network Performance with CANN SDK

CANN SDK for Computer Vision and NLP Pipelines

Building Custom AI Operators with CANN TIK and TVM

Migrating CUDA Applications to Chinese GPU Architectures

Performance Optimization on Ascend, Biren, and Cambricon

Related Categories

GPU

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites