Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
1: HDFS (17%)
- Explain the roles of HDFS Daemons
- Describe the standard operation of an Apache Hadoop cluster, covering both data storage and processing.
- Identify the computing system features that drive the need for a system like Apache Hadoop.
- Outline the primary objectives of HDFS design.
- In specific scenarios, determine the appropriate use cases for HDFS Federation.
- Identify the components and daemons within an HDFS HA-Quorum cluster.
- Analyze the role of HDFS security, specifically Kerberos.
- Select the optimal data serialization method for a given scenario.
- Describe the pathways for file reading and writing.
- Recognize the commands used to manipulate files in the Hadoop File System Shell.
2: YARN and MapReduce version 2 (MRv2) (17%)
- Comprehend how upgrading a cluster from Hadoop 1 to Hadoop 2 impacts cluster configurations.
- Learn how to deploy MapReduce v2 (MRv2 / YARN), including all associated YARN daemons.
- Understand the fundamental design strategy of MapReduce v2 (MRv2).
- Determine how YARN manages resource allocation.
- Trace the workflow of a MapReduce job running on YARN.
- Identify which files require modification and the necessary changes to migrate a cluster from MapReduce version 1 (MRv1) to MapReduce version 2 (MRv2) on YARN.
3: Hadoop Cluster Planning (16%)
- Key considerations when selecting hardware and operating systems to host an Apache Hadoop cluster.
- Evaluate options for selecting an operating system.
- Grasp kernel tuning and disk swapping mechanisms.
- Given a scenario and workload pattern, identify the hardware configuration suitable for that situation.
- For a given scenario, determine the ecosystem components required for your cluster to meet SLAs.
- Cluster sizing: Based on a scenario and execution frequency, identify workload specifics, including CPU, memory, storage, and disk I/O.
- Disk Sizing and Configuration, covering JBOD versus RAID, SANs, virtualization, and cluster disk sizing requirements.
- Network Topologies: Understand network usage in Hadoop (for both HDFS and MapReduce) and propose or identify key network design components for a given scenario.
4: Hadoop Cluster Installation and Administration (25%)
- In a given scenario, identify how the cluster manages disk and machine failures.
- Analyze logging configurations and the format of logging configuration files.
- Understand the fundamentals of Hadoop metrics and cluster health monitoring.
- Identify the function and purpose of available cluster monitoring tools.
- Install all ecosystem components in CDH 5, including (but not limited to): Impala, Flume, Oozie, Hue, Manager, Sqoop, Hive, and Pig.
- Identify the function and purpose of available tools for managing the Apache Hadoop file system.
5: Resource Management (10%)
- Understand the overarching design goals of each Hadoop scheduler.
- In a given scenario, determine how the FIFO Scheduler allocates cluster resources.
- In a given scenario, determine how the Fair Scheduler allocates cluster resources under YARN.
- In a given scenario, determine how the Capacity Scheduler allocates cluster resources.
6: Monitoring and Logging (15%)
- Understand the functions and features of Hadoop’s metric collection capabilities.
- Analyze the NameNode and JobTracker Web UIs.
- Learn how to monitor cluster Daemons.
- Identify and monitor CPU usage on master nodes.
- Describe methods for monitoring swap and memory allocation across all nodes.
- Identify how to view and manage Hadoop’s log files.
- Interpret log files.
Requirements
- Foundational skills in Linux administration
- Basic programming proficiency
35 Hours
Custom Corporate Training
Training solutions designed exclusively for businesses.
- Customized Content: We adapt the syllabus and practical exercises to the real goals and needs of your project.
- Flexible Schedule: Dates and times adapted to your team's agenda.
- Format: Online (live), In-company (at your offices), or Hybrid.
Price per private group, online live training, starting from 6500 € + VAT*
Contact us for an exact quote and to hear our latest promotions
Testimonials (3)
I genuinely enjoyed the many hands-on sessions.
Jacek Pieczatka
Course - Administrator Training for Apache Hadoop
I genuinely enjoyed the big competences of Trainer.
Grzegorz Gorski
Course - Administrator Training for Apache Hadoop
I mostly liked the trainer giving real live Examples.