Performance Optimization on Ascend, Biren, and Cambricon Training Course
Ascend, Biren, and Cambricon are prominent AI hardware platforms in China, each providing distinct acceleration and profiling capabilities tailored for large-scale AI workloads.
This instructor-led, live training session (available online or onsite) is designed for advanced AI infrastructure and performance engineers seeking to enhance model inference and training workflows across various Chinese AI chip ecosystems.
Upon completion of this training, participants will be equipped to:
- Benchmark models on Ascend, Biren, and Cambricon platforms.
- Identify system bottlenecks and memory or compute inefficiencies.
- Implement optimizations at the graph, kernel, and operator levels.
- Tune deployment pipelines to enhance throughput and reduce latency.
Format of the Course
- Interactive lectures and discussions.
- Practical application of profiling and optimization tools on each platform.
- Guided exercises focused on real-world tuning scenarios.
Course Customization Options
- To request a customized training version based on your specific performance environment or model type, please contact us to make arrangements.
Course Outline
Performance Concepts and Metrics
- Latency, throughput, power consumption, and resource utilization
- System versus model-level bottlenecks
- Profiling for inference versus training phases
Profiling on Huawei Ascend
- Utilizing CANN Profiler and MindInsight
- Kernel and operator diagnostics
- Offload patterns and memory mapping
Profiling on Biren GPU
- Performance monitoring features via the Biren SDK
- Kernel fusion, memory alignment, and execution queues
- Power and temperature-aware profiling
Profiling on Cambricon MLU
- BANGPy and Neuware performance tools
- Kernel-level visibility and log interpretation
- Integration of the MLU profiler with deployment frameworks
Graph and Model-Level Optimization
- Graph pruning and quantization strategies
- Operator fusion and computational graph restructuring
- Input size standardization and batch tuning
Memory and Kernel Optimization
- Optimizing memory layout and reuse
- Efficient buffer management across different chipsets
- Platform-specific kernel-level tuning techniques
Cross-Platform Best Practices
- Performance portability: abstraction strategies
- Developing shared tuning pipelines for multi-chip environments
- Example: Tuning an object detection model across Ascend, Biren, and MLU
Summary and Next Steps
Requirements
- Experience in AI model training or deployment pipelines
- Understanding of GPU/MLU compute principles and model optimization techniques
- Basic familiarity with performance profiling tools and metrics
Audience
- Performance engineers
- Machine learning infrastructure teams
- AI system architects
Custom Corporate Training
Training solutions designed exclusively for businesses.
- Customized Content: We adapt the syllabus and practical exercises to the real goals and needs of your project.
- Flexible Schedule: Dates and times adapted to your team's agenda.
- Format: Online (live), In-company (at your offices), or Hybrid.
Price per private group, online live training, starting from 3900 € + VAT*
Contact us for an exact quote and to hear our latest promotions
(*The final price may vary depending on the technical specialization of the course, the level of customization, the method of delivery and the number of learners)
Need help picking the right course?
Performance Optimization on Ascend, Biren, and Cambricon Training Course - Enquiry
Performance Optimization on Ascend, Biren, and Cambricon - Consultancy Enquiry
Provisional Upcoming Courses (Contact Us For More Information)
Related Courses
Developing AI Applications with Huawei Ascend and CANN
21 HoursHuawei Ascend constitutes a range of AI processors engineered for superior inference and training capabilities.
This instructor-led live training, available both online and onsite, targets intermediate-level AI engineers and data scientists eager to develop and fine-tune neural network models utilizing Huawei’s Ascend platform alongside the CANN toolkit.
Upon completion of this training, participants will be equipped to:
- Establish and configure the CANN development environment.
- Create AI applications leveraging MindSpore and CloudMatrix workflows.
- Enhance performance on Ascend NPUs through custom operators and tiling techniques.
- Deploy models across edge or cloud infrastructures.
Course Format
- Interactive lectures accompanied by group discussions.
- Practical application of Huawei Ascend and the CANN toolkit within sample projects.
- Guided exercises concentrating on model construction, training, and deployment.
Customization Options
- To arrange customized training tailored to your specific infrastructure or datasets for this course, please contact us.
Deploying AI Models with CANN and Ascend AI Processors
14 HoursCANN (Compute Architecture for Neural Networks) serves as Huawei’s AI compute stack, designed to facilitate the deployment and optimization of AI models on Ascend AI processors.
This instructor-led live training, available either online or onsite, is tailored for intermediate-level AI developers and engineers aiming to efficiently deploy trained AI models onto Huawei Ascend hardware. The course utilizes the CANN toolkit alongside popular frameworks such as MindSpore, TensorFlow, and PyTorch.
Upon completion of this training, participants will gain the ability to:
- Comprehend the CANN architecture and its pivotal role within the AI deployment pipeline.
- Convert and adapt models from leading frameworks into formats compatible with Ascend.
- Utilize tools such as ATC, OM model conversion utilities, and MindSpore for both edge and cloud inference tasks.
- Identify deployment challenges and optimize performance on Ascend hardware.
Course Format
- Interactive lectures combined with practical demonstrations.
- Hands-on lab exercises employing CANN tools, Ascend simulators, or physical devices.
- Practical deployment scenarios derived from real-world AI models.
Customization Options
- For those wishing to request a customized version of this course, please reach out to us to make the necessary arrangements.
AI Inference and Deployment with CloudMatrix
21 HoursCloudMatrix serves as Huawei's consolidated platform for AI development and deployment, engineered to facilitate scalable, production-ready inference pipelines.
This instructor-led live training, available online or onsite, targets beginner to intermediate AI professionals seeking to deploy and monitor AI models using CloudMatrix with CANN and MindSpore integration.
Upon completion of this training, participants will be capable of:
- Utilizing CloudMatrix for model packaging, deployment, and serving.
- Converting and optimizing models for Ascend chipsets.
- Establishing pipelines for both real-time and batch inference tasks.
- Monitoring deployments and tuning performance within production environments.
Course Format
- Interactive lectures and discussions.
- Practical application of CloudMatrix in real-world deployment scenarios.
- Guided exercises focusing on conversion, optimization, and scaling.
Course Customization Options
- For customized training tailored to your specific AI infrastructure or cloud environment, please contact us to make arrangements.
GPU Programming on Biren AI Accelerators
21 HoursBiren AI Accelerators are high-performance GPUs engineered for AI and HPC workloads, supporting large-scale training and inference.
This instructor-led, live training (available online or onsite) targets intermediate to advanced developers looking to program and optimize applications using Biren’s proprietary GPU stack, with practical comparisons to CUDA-based environments.
Upon completion of this training, participants will be able to:
- Grasp Biren GPU architecture and memory hierarchy.
- Configure the development environment and utilize Biren’s programming model.
- Translate and optimize CUDA-style code for Biren platforms.
- Implement performance tuning and debugging techniques.
Course Format
- Interactive lectures and discussions.
- Practical application of the Biren SDK in sample GPU workloads.
- Guided exercises focused on porting and performance tuning.
Customization Options
- To request customized training tailored to your application stack or integration requirements, please contact us to arrange.
Cambricon MLU Development with BANGPy and Neuware
21 HoursCambricon MLUs (Machine Learning Units) are specialised AI chips optimised for inference and training in edge and datacentre environments.
This instructor-led, live training (available online or onsite) is designed for intermediate-level developers who wish to build and deploy AI models using the BANGPy framework and Neuware SDK on Cambricon MLU hardware.
By the end of this training, participants will be able to:
- Set up and configure the BANGPy and Neuware development environments.
- Develop and optimise Python- and C++-based models for Cambricon MLUs.
- Deploy models to edge and data centre devices running the Neuware runtime.
- Integrate ML workflows with MLU-specific acceleration features.
Format of the Course
- Interactive lecture and discussion.
- Hands-on use of BANGPy and Neuware for development and deployment.
- Guided exercises focused on optimisation, integration, and testing.
Course Customisation Options
- To request a customised training for this course based on your Cambricon device model or use case, please contact us to arrange.
Introduction to CANN for AI Framework Developers
7 HoursCANN (Compute Architecture for Neural Networks) is Huawei’s AI computing toolkit designed to compile, optimize, and deploy AI models on Ascend AI processors.
This instructor-led live training, available online or onsite, is tailored for beginner-level AI developers who want to grasp how CANN fits into the model lifecycle, from training to deployment, and how it integrates with frameworks such as MindSpore, TensorFlow, and PyTorch.
By the end of this training, participants will be able to:
- Understand the purpose and architecture of the CANN toolkit.
- Set up a development environment using CANN and MindSpore.
- Convert and deploy a simple AI model to Ascend hardware.
- Gain foundational knowledge for future CANN optimization or integration projects.
Format of the Course
- Interactive lecture and discussion.
- Hands-on labs with simple model deployment.
- Step-by-step walkthrough of the CANN toolchain and integration points.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
CANN for Edge AI Deployment
14 HoursHuawei's Ascend CANN toolkit empowers powerful AI inference on edge devices like the Ascend 310. It offers vital tools for compiling, optimising, and deploying models in environments with constrained compute and memory resources.
This instructor-led, live training (available online or onsite) is designed for intermediate-level AI developers and integrators looking to deploy and optimise models on Ascend edge devices using the CANN toolchain.
Upon completion of this training, participants will be able to:
- Prepare and convert AI models for the Ascend 310 using CANN tools.
- Construct lightweight inference pipelines employing MindSpore Lite and AscendCL.
- Optimise model performance for environments with limited compute and memory.
- Deploy and monitor AI applications in real-world edge use cases.
Course Format
- Interactive lectures and demonstrations.
- Practical lab sessions focusing on edge-specific models and scenarios.
- Live deployment examples on virtual or physical edge hardware.
Course Customisation Options
- To request customised training for this course, please contact us to arrange.
Understanding Huawei’s AI Compute Stack: From CANN to MindSpore
14 HoursHuawei's comprehensive AI stack — spanning from the low-level CANN SDK to the high-level MindSpore framework — provides a tightly integrated environment for developing and deploying AI solutions, specifically optimized for Ascend hardware.
This instructor-led training, available either online or onsite, is designed for technical professionals at beginner to intermediate levels who want to understand how CANN and MindSpore collaborate to manage the AI lifecycle and support infrastructure decisions.
Upon completing this training, participants will be equipped to:
- Comprehend the layered architecture of Huawei’s AI compute stack.
- Identify the role of CANN in model optimization and hardware-level deployment.
- Evaluate the MindSpore framework and its toolchain in comparison to industry alternatives.
- Position Huawei's AI stack effectively within enterprise or cloud/on-premise environments.
Course Format
- Interactive lectures and discussions.
- Live system demonstrations and case-based walkthroughs.
- Optional guided labs focusing on the model flow from MindSpore to CANN.
Customization Options
- To request tailored training for this course, please contact us to arrange.
Optimizing Neural Network Performance with CANN SDK
14 HoursCANN SDK (Compute Architecture for Neural Networks) serves as Huawei’s foundational AI compute platform, empowering developers to fine-tune and maximise the performance of deployed neural networks on Ascend AI processors.
This instructor-led training session (available online or onsite) targets advanced AI developers and system engineers aiming to enhance inference performance through CANN’s advanced toolset, which includes the Graph Engine, TIK, and capabilities for custom operator development.
Upon completion of this training, participants will be equipped to:
- Comprehend CANN's runtime architecture and performance lifecycle.
- Utilise profiling tools and the Graph Engine for performance analysis and optimisation.
- Develop and optimise custom operators using TIK and TVM.
- Address memory bottlenecks and boost model throughput.
Course Format
- Interactive lectures and discussions.
- Practical labs featuring real-time profiling and operator tuning.
- Optimisation exercises based on edge-case deployment examples.
Course Customisation Options
- To arrange a bespoke training session for this course, please contact us.
CANN SDK for Computer Vision and NLP Pipelines
14 HoursThe CANN SDK (Compute Architecture for Neural Networks) offers robust deployment and optimization utilities for real-time AI applications in computer vision and NLP, particularly on Huawei Ascend hardware.
This instructor-led live training (available online or onsite) is designed for intermediate-level AI professionals seeking to build, deploy, and optimize vision and language models using the CANN SDK for production-grade solutions.
Upon completing this training, participants will be capable of:
- Deploying and optimizing CV and NLP models utilizing CANN and AscendCL.
- Leveraging CANN utilities to convert models and integrate them into active pipelines.
- Enhancing inference performance for tasks such as detection, classification, and sentiment analysis.
- Constructing real-time CV/NLP pipelines tailored for edge or cloud-based deployment environments.
Course Format
- Interactive lectures and demonstrations.
- Practical labs focused on model deployment and performance profiling.
- Designing live pipelines using real-world CV and NLP scenarios.
Customization Options
- For those requiring customized training for this course, please contact us to arrange.
Building Custom AI Operators with CANN TIK and TVM
14 HoursCANN TIK (Tensor Instruction Kernel) and Apache TVM facilitate advanced optimization and customization of AI model operators for Huawei Ascend hardware.
This instructor-led live training (available online or onsite) is designed for advanced system developers who want to create, deploy, and fine-tune custom operators for AI models using CANN’s TIK programming model and TVM compiler integration.
Upon completing this training, participants will be able to:
- Write and test custom AI operators using the TIK DSL for Ascend processors.
- Integrate custom operators into the CANN runtime and execution graph.
- Utilize TVM for operator scheduling, auto-tuning, and benchmarking.
- Debug and optimize instruction-level performance for custom computation patterns.
Course Format
- Interactive lectures and demonstrations.
- Practical coding of operators using TIK and TVM pipelines.
- Testing and tuning on Ascend hardware or simulators.
Customization Options
- To request a customized training session for this course, please contact us to make arrangements.
Migrating CUDA Applications to Chinese GPU Architectures
21 HoursChinese GPU architectures, including Huawei Ascend, Biren, and Cambricon MLUs, provide CUDA alternatives specifically designed for local AI and HPC markets.
This instructor-led live training, available online or onsite, is designed for advanced GPU programmers and infrastructure specialists aiming to migrate and optimize existing CUDA applications for deployment on Chinese hardware platforms.
Upon completion of this training, participants will be able to:
- Evaluate the compatibility of existing CUDA workloads with Chinese chip alternatives.
- Port CUDA codebases to Huawei CANN, Biren SDK, and Cambricon BANGPy environments.
- Compare performance metrics and identify optimization opportunities across different platforms.
- Address practical challenges related to cross-architecture support and deployment.
Course Format
- Interactive lectures and discussions.
- Hands-on labs focused on code translation and performance comparison.
- Guided exercises concentrating on multi-GPU adaptation strategies.
Course Customization Options
- To request customized training tailored to your platform or CUDA project, please contact us to arrange.