Get in Touch

Course Outline

Section 1: Data Management in HDFS

  • Various Data Formats (JSON, Avro, Parquet)
  • Compression Schemes
  • Data Masking
  • Labs : Analyzing different data formats; enabling compression

Section 2: Advanced Pig

  • User-defined Functions
  • Introduction to Pig Libraries (ElephantBird, Data-Fu)
  • Loading Complex Structured Data using Pig
  • Pig Tuning
  • Labs : advanced Pig scripting, parsing complex data types

Section 3 : Advanced Hive

  • User-defined Functions
  • Compressed Tables
  • Hive Performance Tuning
  • Labs : creating compressed tables, evaluating table formats and configuration

Section 4 : Advanced HBase

  • Advanced Schema Modelling
  • Compression
  • Bulk Data Ingest
  • Wide-table vs. Tall-table comparison
  • HBase and Pig
  • HBase and Hive
  • HBase Performance Tuning
  • Labs : tuning HBase; accessing HBase data from Pig & Hive; Using Phoenix for data modeling

Requirements

  • Proficiency in the Java programming language (as most programming exercises are conducted in Java)
  • Familiarity with the Linux environment (ability to navigate the Linux command line and edit files using vi or nano)
  • A solid working knowledge of Hadoop.

Lab environment

Zero Install: There is no requirement to install Hadoop software on students’ personal machines! A fully operational Hadoop cluster will be provided for use.

Students will need the following

 21 Hours

Custom Corporate Training

Training solutions designed exclusively for businesses.

  • Customized Content: We adapt the syllabus and practical exercises to the real goals and needs of your project.
  • Flexible Schedule: Dates and times adapted to your team's agenda.
  • Format: Online (live), In-company (at your offices), or Hybrid.
Investment

Price per private group, online live training, starting from 3900 € + VAT*

Contact us for an exact quote and to hear our latest promotions

Testimonials (1)

Provisional Upcoming Courses (Contact Us For More Information)

Related Categories