Home
Programming Training
Python Training
GPU Programming with CUDA and Python Training Course

GPU Programming with CUDA and Python Training Course

CUDA (Compute Unified Device Architecture) is a parallel computing platform and API created by Nvidia.

This instructor-led, live training (online or onsite) is aimed at intermediate-level developers who wish to use CUDA to build Python applications that run in parallel on NVIDIA GPUs.

By the end of this training, participants will be able to:

Use the Numba compiler to accelerate Python applications running on NVIDIA GPUs.
Create, compile and launch custom CUDA kernels.
Manage GPU memory.
Convert a CPU based application into a GPU-accelerated application.

Format of the Course

Interactive lecture and discussion.
Lots of exercises and practice.
Hands-on implementation in a live-lab environment.

Course Customization Options

To request a customized training for this course, please contact us to arrange.

This course is available as onsite live training in Hong Kong or online live training.

Thank you for sending your enquiry! One of our team members will contact you shortly.

Thank you for sending your booking! One of our team members will contact you shortly.

Course Outline

Introduction

What is GPU programming?
Why use CUDA with Python?
Key concepts: Threads, Blocks, Grids

Overview of CUDA Features and Architecture

GPU vs CPU architecture
Understanding SIMT (Single Instruction, Multiple Threads)
CUDA programming model

Setting up the Development Environment

Installing CUDA Toolkit and drivers
Installing Python and Numba
Setting up and verifying the environment

Parallel Programming Fundamentals

Introduction to parallel execution
Understanding threads and thread hierarchies
Working with warps and synchronization

Working with the Numba Compiler

Introduction to Numba
Writing CUDA kernels with Numba
Understanding @cuda.jit decorators

Building a Custom CUDA Kernel

Writing and launching a basic kernel
Using threads for element-wise operations
Managing grid and block dimensions

Memory Management

Types of GPU memory (global, shared, local, constant)
Memory transfer between host and device
Optimizing memory usage and avoiding bottlenecks

Advanced Topics in GPU Acceleration

Shared memory and synchronization
Using streams for asynchronous execution
Multi-GPU programming basics

Converting CPU-based Applications to GPU

Profiling CPU code
Identifying parallelizable sections
Porting logic to CUDA kernels

Troubleshooting

Debugging CUDA applications
Common errors and how to resolve them
Tools and techniques for testing and validation

Summary and Next Steps

Review of key concepts
Best practices in GPU programming
Resources for continued learning

Requirements

Python programming experience
Experience with NumPy (ndarrays, ufuncs, etc.)

Audience

Developers

14 Hours

Number of participants

Online

Classroom

Select Location

Please select a Venue

Price per participant

Open Training Courses require 5+ participants.

GPU Programming with CUDA and Python Training Course - Booking

Full Name *

Email *

Phone *

Job Title

Company Name

Address 1 *

City *

State / Province

Country *

Postcode *

Start Date

Tax ID

Dates are subject to availability and take place between 09:30 and 16:30.

Payment *

Bank Transfer (Invoice, PO)

Debit / Credit Card

Comments

Terms and Conditions *

I am an authorised representative of the above named client and I wish to book the above courses or services in accordance with NobleProg Terms and Conditions and Privacy Policy.

Inform me about discounts and promotions

Please read our Privacy Policy to find out how we use your data

GPU Programming with CUDA and Python Training Course - Enquiry

Full Name *

Email *

Phone *

Number of participants

Company Name

Company Address

How do you want to take the course?

Client Premises

Online

Classroom

Comments

Inform me about discounts and promotions

Please read our Privacy Policy to find out how we use your data

GPU Programming with CUDA and Python - Consultancy Enquiry

Full Name *

Phone *

Email *

Company Name

Consultancy Subject *

Consultancy Goal

Who will the consultant work with?

Consultancy Urgency *

Comments

Inform me about discounts and promotions

Please read our Privacy Policy to find out how we use your data

Testimonials (1)

Very interactive with various examples, with a good progression in complexity between the start and the end of the training.

Jenny - Andheo

Course - GPU Programming with CUDA and Python

HK$ 75300 (Classroom)

Related Courses

Advanced Python: Best Practices and Design Patterns

28 Hours

This intensive, hands-on course covers advanced Python techniques, engineering best practices, and commonly used design patterns to build maintainable, testable, and high-performance Python applications. It emphasizes modern tooling, typing, concurrency models, architecture patterns, and deployment-ready workflows.

This instructor-led, live training (online or onsite) is aimed at intermediate-level to advanced-level Python developers who wish to adopt professional practices and patterns for production-grade Python systems.

By the end of this training, participants will be able to:

Apply Python typing, dataclasses, and type-checking to increase code reliability.
Use design patterns and architecture principles to structure robust applications.
Implement concurrency and parallelism correctly using asyncio and multiprocessing.
Build well-tested code with pytest, property-based testing, and CI pipelines.
Profile, optimize, and harden Python applications for production.
Package, distribute, and deploy Python projects using modern tools and containers.

Format of the Course

Interactive lectures and short demos.
Hands-on labs and coding exercises each day.
Capstone mini-project integrating patterns, testing, and deployment.

Course Customization Options

To request a customized training or focus area (data, web, or infra), please contact us to arrange.

Agentic AI Engineering with Python — Build Autonomous Agents

21 Hours

This course teaches practical engineering techniques to design, build, test, and deploy agentic (autonomous) systems using Python. It covers the agent loop, tool integrations, memory and state management, orchestration patterns, safety controls, and production considerations.

This instructor-led, live training (online or onsite) is aimed at intermediate-level to advanced-level ML engineers, AI developers, and software engineers who wish to build robust, production-ready autonomous agents using Python.

By the end of this training, participants will be able to:

Design and implement the agent loop and decision-making workflows.
Integrate external tools and APIs to extend agent capabilities.
Implement short-term and long-term memory architectures for agents.
Coordinate multi-step orchestrations and agent composability.
Apply safety, access control, and observability best practices for deployed agents.

Format of the Course

Interactive lecture and discussion.
Hands-on labs building agents with Python and popular SDKs.
Project-based exercises that produce deployable prototypes.

Course Customization Options

To request a customized training for this course, please contact us to arrange.

Introduction to Data Science and AI using Python

35 Hours

This is a 5 day introduction to Data Science and Artificial Intelligence (AI).

The course is delivered with examples and exercises using Python

Artificial Intelligence with Python (Intermediate Level)

35 Hours

Artificial Intelligence with Python is the development of intelligent systems using Python’s extensive ecosystem of AI and machine learning libraries.

This instructor-led, live training (online or onsite) is aimed at intermediate-level Python programmers who wish to design, implement, and deploy AI solutions using Python.

By the end of this training, participants will be able to:

Implement AI algorithms using Python’s core AI libraries.
Work with supervised, unsupervised, and reinforcement learning models.
Integrate AI solutions into existing applications and workflows.
Evaluate model performance and optimize for accuracy and efficiency.

Format of the Course

Interactive lecture and discussion.
Lots of exercises and practice.
Hands-on implementation in a live-lab environment.

Course Customization Options

To request a customized training for this course, please contact us to arrange.

Algorithmic Trading with Python and R

14 Hours

This instructor-led, live training in Hong Kong (online or onsite) is aimed at business analysts who wish to automate trade with algorithmic trading, Python, and R.

By the end of this training, participants will be able to:

Employ algorithms to buy and sell securities at specialized increments rapidly.
Reduce costs associated with trade using algorithmic trading.
Automatically monitor stock prices and place trades.

Applied AI from Scratch in Python

28 Hours

This is a 4 day course introducing AI and it's application using the Python programming language. There is an option to have an additional day to undertake an AI project on completion of this course.

AWS Cloud9 and Python: A Practical Guide

14 Hours

This instructor-led, live training in Hong Kong (online or onsite) is aimed at intermediate-level Python developers who wish to enhance their Python development experience using AWS Cloud9.

By the end of this training, participants will be able to:

Set up and configure AWS Cloud9 for Python development.
Understand the AWS Cloud9 IDE interface and features.
Write, debug, and deploy Python applications in AWS Cloud9.
Collaborate with other developers using the AWS Cloud9 platform.
Integrate AWS Cloud9 with other AWS services for advanced deployments.

Building Chatbots in Python

21 Hours

ChatBots are computer programs that automatically simulate human responses via chat interfaces. ChatBots help organizations maximize their operations efficiency by providing easier and faster options for their user interactions.

In this instructor-led, live training, participants will learn how to build chatbots in Python.

By the end of this training, participants will be able to:

Understand the fundamentals of building chatbots
Build, test, deploy, and troubleshoot various chatbots using Python

Audience

Developers

Format of the course

Part lecture, part discussion, exercises and heavy hands-on practice

Note

To request a customized training for this course, please contact us to arrange.

Administration of CUDA

35 Hours

This instructor-led, live training in Hong Kong (online or onsite) is aimed at beginner-level system administrators and IT professionals who wish to install, configure, manage, and troubleshoot CUDA environments.

By the end of this training, participants will be able to:

Understand the architecture, components, and capabilities of CUDA.
Install and configure CUDA environments.
Manage and optimize CUDA resources.
Debug and troubleshoot common CUDA issues.

Scaling Data Analysis with Python and Dask

14 Hours

This instructor-led, live training in Hong Kong (online or onsite) is aimed at data scientists and software engineers who wish to use Dask with the Python ecosystem to build, scale, and analyze large datasets.

By the end of this training, participants will be able to:

Set up the environment to start building big data processing with Dask and Python.
Explore the features, libraries, tools, and APIs available in Dask.
Understand how Dask accelerates parallel computing in Python.
Learn how to scale the Python ecosystem (Numpy, SciPy, and Pandas) using Dask.
Optimize the Dask environment to maintain high performance in handling large datasets.

Data Analysis with Python, Pandas and Numpy

14 Hours

This instructor-led, live training in Hong Kong (online or onsite) is aimed at intermediate-level Python developers and data analysts who wish to enhance their skills in data analysis and manipulation using Pandas and NumPy.

By the end of this training, participants will be able to:

Set up a development environment that includes Python, Pandas, and NumPy.
Create a data analysis application using Pandas and NumPy.
Perform advanced data wrangling, sorting, and filtering operations.
Conduct aggregate operations and analyze time series data.
Visualize data using Matplotlib and other visualization libraries.
Debug and optimize their data analysis code.

FARM (FastAPI, React, and MongoDB) Full Stack Development

14 Hours

This instructor-led, live training in (online or onsite) is aimed at developers who wish to use the FARM (FastAPI, React, and MongoDB) stack to build dynamic, high-performance, and scalable web applications.

By the end of this training, participants will be able to:

Set up the necessary development environment that integrates FastAPI, React, and MongoDB.
Understand the key concepts, features, and benefits of the FARM stack.
Learn how to build REST APIs with FastAPI.
Learn how to design interactive applications with React.
Develop, test, and deploy applications (front end and back end) using the FARM stack.

Developing APIs with Python and FastAPI

14 Hours

This instructor-led, live training in Hong Kong (online or onsite) is aimed at developers who wish to use FastAPI with Python to build, test, and deploy RESTful APIs easier and faster.

By the end of this training, participants will be able to:

Set up the necessary development environment to develop APIs with Python and FastAPI.
Create APIs quicker and easier using the FastAPI library.
Learn how to create data models and schemas based on Pydantic and OpenAPI.
Connect APIs to a database using SQLAlchemy.
Implement security and authentication in APIs using the FastAPI tools.
Build container images and deploy web APIs to a cloud server.

Fraud Detection with Python and TensorFlow

14 Hours

This instructor-led, live training in Hong Kong (online or onsite) is aimed at data scientists who wish to use TensorFlow to analyze potential fraud data.

By the end of this training, participants will be able to:

Create a fraud detection model in Python and TensorFlow.
Build linear regressions and linear regression models to predict fraud.
Develop an end-to-end AI application for analyzing fraud data.

Accelerating Python Pandas Workflows with Modin

14 Hours

This instructor-led, live training in Hong Kong (online or onsite) is aimed at data scientists and developers who wish to use Modin to build and implement parallel computations with Pandas for faster data analysis.

By the end of this training, participants will be able to:

Set up the necessary environment to start developing Pandas workflows at scale with Modin.
Understand the features, architecture, and advantages of Modin.
Know the differences between Modin, Dask, and Ray.
Perform Pandas operations faster with Modin.
Implement the entire Pandas API and functions.

GPU Programming with CUDA and Python Training Course

Course Outline

Requirements

Testimonials (1)

Jenny - Andheo

Course - GPU Programming with CUDA and Python

Provisional Upcoming Courses (Require 5+ participants)

GPU Programming with CUDA and Python

GPU Programming with CUDA and Python

GPU Programming with CUDA and Python

GPU Programming with CUDA and Python

GPU Programming with CUDA and Python

Related Categories

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites