Get in Touch

Course Outline

Section 1: Introduction to Hadoop

  • Hadoop history, concepts
  • ecosystem
  • distributions
  • high level architecture
  • Hadoop myths
  • Hadoop challenges
  • hardware / software
  • lab : first look at Hadoop

Section 2: HDFS

  • Design and architecture
  • concepts (horizontal scaling, replication, data locality, rack awareness)
  • Daemons : NameNode, Secondary NameNode, DataNode
  • communications / heart-beats
  • data integrity
  • read / write path
  • NameNode High Availability (HA), Federation
  • labs : Interacting with HDFS

Section 3 : MapReduce

  • concepts and architecture
  • daemons (MRV1) : jobtracker / tasktracker
  • phases : driver, mapper, shuffle/sort, reducer
  • MapReduce Version 1 and Version 2 (YARN)
  • Internals of MapReduce
  • Introduction to Java MapReduce programs
  • labs : Running a sample MapReduce program

Section 4 : Pig

  • Pig vs Java MapReduce
  • Pig job flow
  • PigLatin language
  • ETL with Pig
  • Transformations & Joins
  • User defined functions (UDF)
  • labs : writing Pig scripts to analyze data

Section 5: Hive

  • architecture and design
  • data types
  • SQL support in Hive
  • Creating Hive tables and querying
  • partitions
  • joins
  • text processing
  • labs : various labs on processing data with Hive

Section 6: HBase

  • concepts and architecture
  • HBase vs RDBMS vs Cassandra
  • HBase Java API
  • Time series data on HBase
  • schema design
  • labs : Interacting with HBase using shell; programming in HBase Java API ; Schema design exercise

Requirements

  • Proficiency in the Java programming language (most programming exercises are conducted in Java)
  • Familiarity with the Linux environment (ability to navigate the Linux command line and edit files using vi / nano)

Lab environment

Zero Install : There is no need to install Hadoop software on students’ machines! A functioning Hadoop cluster will be provided for the students.

Students will need the following items

  • an SSH client (Linux and Mac systems already include SSH clients; for Windows, PuTTY is recommended)
  • a browser to access the cluster (Firefox is recommended)
 28 Hours

Custom Corporate Training

Training solutions designed exclusively for businesses.

  • Customized Content: We adapt the syllabus and practical exercises to the real goals and needs of your project.
  • Flexible Schedule: Dates and times adapted to your team's agenda.
  • Format: Online (live), In-company (at your offices), or Hybrid.
Investment

Price per private group, online live training, starting from 5200 € + VAT*

Contact us for an exact quote and to hear our latest promotions

Testimonials (1)

Provisional Upcoming Courses (Contact Us For More Information)

Related Categories