Get in Touch

Course Outline

1: HDFS (17%)

  • Explain the roles of HDFS Daemons
  • Describe the standard operation of an Apache Hadoop cluster, covering both data storage and processing.
  • Identify the computing system features that drive the need for a system like Apache Hadoop.
  • Outline the primary objectives of HDFS design.
  • In specific scenarios, determine the appropriate use cases for HDFS Federation.
  • Identify the components and daemons within an HDFS HA-Quorum cluster.
  • Analyze the role of HDFS security, specifically Kerberos.
  • Select the optimal data serialization method for a given scenario.
  • Describe the pathways for file reading and writing.
  • Recognize the commands used to manipulate files in the Hadoop File System Shell.

2: YARN and MapReduce version 2 (MRv2) (17%)

  • Comprehend how upgrading a cluster from Hadoop 1 to Hadoop 2 impacts cluster configurations.
  • Learn how to deploy MapReduce v2 (MRv2 / YARN), including all associated YARN daemons.
  • Understand the fundamental design strategy of MapReduce v2 (MRv2).
  • Determine how YARN manages resource allocation.
  • Trace the workflow of a MapReduce job running on YARN.
  • Identify which files require modification and the necessary changes to migrate a cluster from MapReduce version 1 (MRv1) to MapReduce version 2 (MRv2) on YARN.

3: Hadoop Cluster Planning (16%)

  • Key considerations when selecting hardware and operating systems to host an Apache Hadoop cluster.
  • Evaluate options for selecting an operating system.
  • Grasp kernel tuning and disk swapping mechanisms.
  • Given a scenario and workload pattern, identify the hardware configuration suitable for that situation.
  • For a given scenario, determine the ecosystem components required for your cluster to meet SLAs.
  • Cluster sizing: Based on a scenario and execution frequency, identify workload specifics, including CPU, memory, storage, and disk I/O.
  • Disk Sizing and Configuration, covering JBOD versus RAID, SANs, virtualization, and cluster disk sizing requirements.
  • Network Topologies: Understand network usage in Hadoop (for both HDFS and MapReduce) and propose or identify key network design components for a given scenario.

4: Hadoop Cluster Installation and Administration (25%)

  • In a given scenario, identify how the cluster manages disk and machine failures.
  • Analyze logging configurations and the format of logging configuration files.
  • Understand the fundamentals of Hadoop metrics and cluster health monitoring.
  • Identify the function and purpose of available cluster monitoring tools.
  • Install all ecosystem components in CDH 5, including (but not limited to): Impala, Flume, Oozie, Hue, Manager, Sqoop, Hive, and Pig.
  • Identify the function and purpose of available tools for managing the Apache Hadoop file system.

5: Resource Management (10%)

  • Understand the overarching design goals of each Hadoop scheduler.
  • In a given scenario, determine how the FIFO Scheduler allocates cluster resources.
  • In a given scenario, determine how the Fair Scheduler allocates cluster resources under YARN.
  • In a given scenario, determine how the Capacity Scheduler allocates cluster resources.

6: Monitoring and Logging (15%)

  • Understand the functions and features of Hadoop’s metric collection capabilities.
  • Analyze the NameNode and JobTracker Web UIs.
  • Learn how to monitor cluster Daemons.
  • Identify and monitor CPU usage on master nodes.
  • Describe methods for monitoring swap and memory allocation across all nodes.
  • Identify how to view and manage Hadoop’s log files.
  • Interpret log files.

Requirements

  • Foundational skills in Linux administration
  • Basic programming proficiency
 35 Hours

Custom Corporate Training

Training solutions designed exclusively for businesses.

  • Customized Content: We adapt the syllabus and practical exercises to the real goals and needs of your project.
  • Flexible Schedule: Dates and times adapted to your team's agenda.
  • Format: Online (live), In-company (at your offices), or Hybrid.
Investment

Price per private group, online live training, starting from 6500 € + VAT*

Contact us for an exact quote and to hear our latest promotions

Testimonials (3)

Provisional Upcoming Courses (Contact Us For More Information)

Related Categories