Course Outline
Introduction:
- Apache Spark within the Hadoop Ecosystem
- Overview of Python and Scala
Basics (Theory):
- Architecture
- RDD
- Transformations and Actions
- Stages, Tasks, and Dependencies
Practical Workshop: Understanding the Basics in a Databricks Environment:
- Exercises using the RDD API
- Fundamental action and transformation functions
- PairRDD
- Joins
- Caching strategies
- Exercises using the DataFrame API
- SparkSQL
- DataFrame operations: select, filter, group, sort
- UDF (User-Defined Functions)
- Exploring the DataFrame API
- Streaming
Practical Workshop: Understanding Deployment in an AWS Environment:
- Introduction to AWS Glue
- Understanding the differences between AWS EMR and AWS Glue
- Example jobs in both environments
- Evaluating pros and cons
Additional Topics:
- Introduction to Apache Airflow orchestration
Requirements
Programming skills (preferably in Python or Scala)
Basic knowledge of SQL
Custom Corporate Training
Training solutions designed exclusively for businesses.
- Customized Content: We adapt the syllabus and practical exercises to the real goals and needs of your project.
- Flexible Schedule: Dates and times adapted to your team's agenda.
- Format: Online (live), In-company (at your offices), or Hybrid.
Price per private group, online live training, starting from 3900 € + VAT*
Contact us for an exact quote and to hear our latest promotions
Testimonials (3)
Having hands on session / assignments
Poornima Chenthamarakshan - Intelligent Medical Objects
Course - Apache Spark in the Cloud
1. Right balance between high level concepts and technical details. 2. Andras is very knowledgeable about his teaching. 3. Exercise
Steven Wu - Intelligent Medical Objects
Course - Apache Spark in the Cloud
Get to learn spark streaming , databricks and aws redshift