Duration : 35 days | 1hr theory/day 1hr Practical

 

Module 1 : Introduction to BigData, Hadoop (HDFS and MapReduce)Introduction to Big Data and

Hadoop.

1. BigData Introduction.

2. Hadoop Introduction .

3. HDFS Introduction .

4. MapReduce Introduction.

 

 Module 2 : Deep Dive in HDFS

1. HDFS Design

2. Fundamental of HDFS (Blocks, NameNode, DataNode, Secondary Name Node)

3. Rack Awareness

4. Read/Write from HDFS

5. HDFS Federation and High Availability (Hadoop 2.x.x)

6. Parallel Copying using DistCp

7. HDFS Command Line Interface

 

 Module 2A : HDFS File Operation Lifecycle (Supplementary)

1. File Read Cycel from HDFS - DistributedFileSystem – FSDataInputStream

2. Failure or Error Handling When File Reading Fails

3. File Write Cycle from HDFS - FSDataOutputStream

4. Failure or Error Handling while File write fails

 

 Module 3 : Understanding MapReduce :

1. JobTracker and TaskTracker

2. Topology Hadoop cluster

3. Example of MapReduce Map Function Reduce Function

4. Java Implementation of MapReduce

5. DataFlow of MapReduce

6. Use of Combiner

 

 Module 4 : MapReduce Internals

1. How MapReduce Works

2. Anatomy of MapReduce Job (MR-1)

3. Submission & Initialization of MapReduce Job (What Happen ?)

4. Assigning & Execution of Tasks

5. Monitoring & Progress of MapReduce Job

6. Completion of Job

7. Handling of MapReduce Job - Task Failure - TaskTracker Failure - JobTracker Failure

 

 Module 5 :YARN

1. Limitation of Current Architecture (Classic)

2. What are the Requirement ?

3. YARN Architecture

6. Progress and Monitoring of the Job

7. Failure Handling in YARN - Task Failure - Application Master Failure - Node Manager Failure -

Resource Manager Failure

 Module 6: Apache Pig

1. What is Pig ?

2. Introduction to Pig Data Flow Engine

3. Pig and MapReduce in Detail 4. When should Pig Used ?

5. Pig and Hadoop Cluster

6. Pig Interpreter and MapReduce

7. Pig Relations and Data Types

8. PigLatin Example in Detail

9. Debugging and Generating Example in Apache Pig

 

 Module 7 : Fundamental of Apache Hive

1. What is Hive ?

2. Architecture of Hive

3. Hive Services

4. Hive Clients

5. how Hive Differs from Traditional RDBMS

6. Introduction to HiveQL

7. Data Types and File Formats in Hive

8. File Encoding

9. Common problems while working with Hive

 

 Module 8: HBase Introduction and NoSQL

 3 Hours of Detailed Class on Interview Questions.

Related Links