Get in Touch

Course Outline

  1. Scala Primer

    • Quick introduction to Scala
    • Labs: Getting to know Scala
  2. Spark Basics

    • Background and history
    • Spark and Hadoop
    • Spark concepts and architecture
    • Spark ecosystem (Core, Spark SQL, MLlib, Streaming)
    • Labs: Installing and running Spark
  3. First Look at Spark

    • Running Spark in local mode
    • Spark Web UI
    • Spark shell
    • Dataset analysis – part 1
    • Inspecting RDDs
    • Labs: Exploring the Spark shell
  4. RDDs

    • RDD concepts
    • Partitions
    • RDD operations and transformations
    • RDD types
    • Key-Value pair RDDs
    • MapReduce on RDDs
    • Caching and persistence
    • Labs: Creating and inspecting RDDs; Caching RDDs
  5. Spark API Programming

    • Introduction to the Spark API / RDD API
    • Submitting your first program to Spark
    • Debugging and logging
    • Configuration properties
    • Labs: Programming with the Spark API; Submitting jobs
  6. Spark SQL

    • SQL support in Spark
    • DataFrames
    • Defining tables and importing datasets
    • Querying DataFrames using SQL
    • Storage formats: JSON and Parquet
    • Labs: Creating and querying DataFrames; Evaluating data formats
  7. MLlib

    • Introduction to MLlib
    • MLlib algorithms
    • Labs: Writing MLlib applications
  8. GraphX

    • GraphX library overview
    • GraphX APIs
    • Labs: Processing graph data using Spark
  9. Spark Streaming

    • Streaming overview
    • Evaluating streaming platforms
    • Streaming operations
    • Sliding window operations
    • Labs: Writing Spark Streaming applications
  10. Spark and Hadoop

    • Hadoop introduction (HDFS / YARN)
    • Hadoop and Spark architecture
    • Running Spark on Hadoop YARN
    • Processing HDFS files using Spark
  11. Spark Performance and Tuning

    • Broadcast variables
    • Accumulators
    • Memory management and caching
  12. Spark Operations

    • Deploying Spark in production
    • Sample deployment templates
    • Configurations
    • Monitoring
    • Troubleshooting

Requirements

PRE-REQUISITES

Familiarity with Java, Scala, or Python (as used in our labs)
Basic understanding of the Linux development environment, including command-line navigation and file editing with VI or nano

 21 Hours

Custom Corporate Training

Training solutions designed exclusively for businesses.

  • Customized Content: We adapt the syllabus and practical exercises to the real goals and needs of your project.
  • Flexible Schedule: Dates and times adapted to your team's agenda.
  • Format: Online (live), In-company (at your offices), or Hybrid.
Investment

Price per private group, online live training, starting from 3900 € + VAT*

Contact us for an exact quote and to hear our latest promotions

Testimonials (6)

Provisional Upcoming Courses (Contact Us For More Information)

Related Categories