Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Section 1: Introduction to Hadoop
- Hadoop history, concepts
- ecosystem
- distributions
- high level architecture
- Hadoop myths
- Hadoop challenges
- hardware / software
- lab : first look at Hadoop
Section 2: HDFS
- Design and architecture
- concepts (horizontal scaling, replication, data locality, rack awareness)
- Daemons : NameNode, Secondary NameNode, DataNode
- communications / heart-beats
- data integrity
- read / write path
- NameNode High Availability (HA), Federation
- labs : Interacting with HDFS
Section 3 : MapReduce
- concepts and architecture
- daemons (MRV1) : jobtracker / tasktracker
- phases : driver, mapper, shuffle/sort, reducer
- MapReduce Version 1 and Version 2 (YARN)
- Internals of MapReduce
- Introduction to Java MapReduce programs
- labs : Running a sample MapReduce program
Section 4 : Pig
- Pig vs Java MapReduce
- Pig job flow
- PigLatin language
- ETL with Pig
- Transformations & Joins
- User defined functions (UDF)
- labs : writing Pig scripts to analyze data
Section 5: Hive
- architecture and design
- data types
- SQL support in Hive
- Creating Hive tables and querying
- partitions
- joins
- text processing
- labs : various labs on processing data with Hive
Section 6: HBase
- concepts and architecture
- HBase vs RDBMS vs Cassandra
- HBase Java API
- Time series data on HBase
- schema design
- labs : Interacting with HBase using shell; programming in HBase Java API ; Schema design exercise
Requirements
- Proficiency in the Java programming language (most programming exercises are conducted in Java)
- Familiarity with the Linux environment (ability to navigate the Linux command line and edit files using vi / nano)
Lab environment
Zero Install : There is no need to install Hadoop software on students’ machines! A functioning Hadoop cluster will be provided for the students.
Students will need the following items
- an SSH client (Linux and Mac systems already include SSH clients; for Windows, PuTTY is recommended)
- a browser to access the cluster (Firefox is recommended)
28 Hours
Custom Corporate Training
Training solutions designed exclusively for businesses.
- Customized Content: We adapt the syllabus and practical exercises to the real goals and needs of your project.
- Flexible Schedule: Dates and times adapted to your team's agenda.
- Format: Online (live), In-company (at your offices), or Hybrid.
Price per private group, online live training, starting from 5200 € + VAT*
Contact us for an exact quote and to hear our latest promotions
Testimonials (1)
Hands on exercises. Class should have been 5 days, but the 3 days helped to clear up a lot of questions that I had from working with NiFi already