Get in Touch

Course Outline

Introduction:

  • Apache Spark within the Hadoop Ecosystem
  • Overview of Python and Scala

Basics (Theory):

  • Architecture
  • RDD
  • Transformations and Actions
  • Stages, Tasks, and Dependencies

Practical Workshop: Understanding the Basics in a Databricks Environment:

  • Exercises using the RDD API
  • Fundamental action and transformation functions
  • PairRDD
  • Joins
  • Caching strategies
  • Exercises using the DataFrame API
  • SparkSQL
  • DataFrame operations: select, filter, group, sort
  • UDF (User-Defined Functions)
  • Exploring the DataFrame API
  • Streaming

Practical Workshop: Understanding Deployment in an AWS Environment:

  • Introduction to AWS Glue
  • Understanding the differences between AWS EMR and AWS Glue
  • Example jobs in both environments
  • Evaluating pros and cons

Additional Topics:

  • Introduction to Apache Airflow orchestration

Requirements

Programming skills (preferably in Python or Scala)

Basic knowledge of SQL

 21 Hours

Custom Corporate Training

Training solutions designed exclusively for businesses.

  • Customized Content: We adapt the syllabus and practical exercises to the real goals and needs of your project.
  • Flexible Schedule: Dates and times adapted to your team's agenda.
  • Format: Online (live), In-company (at your offices), or Hybrid.
Investment

Price per private group, online live training, starting from 3900 € + VAT*

Contact us for an exact quote and to hear our latest promotions

Testimonials (3)

Provisional Upcoming Courses (Contact Us For More Information)

Related Categories