Hadoop

Hadoop is an open source distributed processing framework that manages data processing and storage for big data applications running in clustered systems.

Course Content:

Big Data

❖       It’s characteristics.

❖       Some facts and figures

❖       Importance of Big Data

❖       Need of understanding and analysing Big Data

❖       Basics of Data Analytics

❖       Problems with existing systems

 

Introduction to Hadoop

❖       What is Hadoop ?

❖       Architecture

❖       Hadoop Job Process

❖       File Anatomy

❖       Read Operations

❖       Write Operations

❖       Useful Configurations

❖       core-site.xml

❖       hdfs-site.xml

❖       mapred-site.xml

 

HDFS

Significance of HDFS in Hadoop

❖       HDFS Features

❖       Daemons of Hadoop and functionalities

❖       NameNode

❖       DataNode

❖       JobTracker

❖       TaskTracker

❖       Secondary NameNode

❖       Data Storage in HDFS

❖       Blocks

❖       Heartbeats

❖       Data Replication

❖       Accessing HDFS

❖       CLI (Command Line Interface) Unix and Hadoop  Commands

❖       Java Based Approach

 

Map Reduce

❖       Introduction to MapReduce

❖       MapReduce Architecture

❖       MapReduce Programming Model

❖       MapReduce Algorithm and Phases

❖       Basic MapReduce Program

❖       Driver Code

❖       Mapper Code

❖       Reducer Code

 

Hadoop Ecosystem

❖       What is ecosystem

❖       Different ecosystem projects

❖       Sqoop

❖       Hive

❖       Pig

❖       Flume

❖       Ambari

❖       Hue

 

Introduction to Hadoop Ecosystem

❖       Need of Hadoop Ecosystem

❖       Problems with MapReduce Architecture

❖       How does the Ecosystem Tools help

❖       Initializing and Configuring a FLUME Agent

❖       Understanding OOZIE Workflows

 

SQOOP

❖       Importing Data using SQOOP

❖       Exporting Data

❖       Selective Imports and Exports

❖       Creating Hive Tables

 

Apache Pig

Introduction to PIG

❖       Working with GRUNT

❖       Operating PIG in local mode

❖       Working with MapReduce Mode of PIG

❖       Using Loops and Conditional statements

❖       Introduction to Bags and Items in PIG

 

Apache Hive

Working with Hive Shell

❖       Configuring HIVE Warehouse

❖       Creating different types of Tables

❖       Executing Simple Queries

❖       Creating Tables out of Data Set Results

❖       Working with External Tables

❖       Assigning Permissions • Working with JOINS

 

Apache Flume

Configuring Flume Agent

  • Starting the Agent
  • Flume Configurations

 

Deeper Dive

Advanced HDFS

❖       Secondary NameNode

❖       Federation

❖       High Availability

❖       Advanced MapReduce

❖       Demo of Precedence levels

❖       Partioners

❖       Combiners

 

Administration

Cluster Planning

❖       Understanding hardware and software requirements of a Hadoop cluster

❖       Different modes of operation of Hadoop

❖       Precedence levels

❖       Some dos and don’ts

 

Data Visualization

❖       Working with Hadoop ODBC Connector.

❖       Data Visualization using Excel.

❖       Exporting Hive data

❖       Creating graphs and interactive charts for your hive data.

❖       Analysing hive data using power view in excel.

 

Generation 2

❖       YARN

❖       Federation

❖       High Availability

 

Pentaho Data Integration

Map Reduce

  • Hive
  • Pig