Big Data and Hadoop

Become a Hadoop Expert by mastering MapReduce, Yarn, Pig, Hive, HBase, Oozie, Flume and Sqoop while working on industry based Use-cases and Projects.

About the Course

Big Data and Hadoop training course is designed to provide knowledge and skills to become a successful Hadoop Developer. In-depth knowledge of core concepts will be covered in the course along with implementation on varied industry use-cases.

Course Objectives

At the end of the course, participants should be able to:
>Master the concepts of HDFS and MapReduce framework.
>Master Hadoop [hadoop 1.x and hadoop 2.x] installation in various modes.
>Working on HDFS file system and trying some examples.
>Introduction to PIG,, HIVE scripting.
>Learn the data loading techniques using Pig, Hive and sqoop.
>Perform Data Analytics using Pig, Hive in pseudo-distributed mode.
>Perform Data Analytics using Pig, Hive in cluster mode.
>Implement HBase and MapReduce Integration.
>Schedule jobs using Oozie.
>Implement best Practices for Hadoop Development.
>Setup Hadoop Cluster and write Complex MapReduce programs.
>Work on a Real Life Project on Big Data Analytics.

Who should go for this course?

Predictions say 2015 will be the year Hadoop finally becomes a cornerstone of your business technology agenda. To stay ahead in the game, Hadoop has become a must-know technology for the following professionals:
>Analytics Professionals.
>Project Managers.
>Testing Professionals.
>Mainframe Professionals.
>Software Developers and Architects.
>Graduates aiming to build a career in Big Data.

Course Curriculum:

1. Understanding Big Data and Hadoop
Learning Objectives - In this module, you will understand Big Data, the limitations of the existing solutions for Big Data problem, how Hadoop solves the Big Data problem, the common Hadoop ecosystem components, Hadoop Architecture, HDFS, Anatomy of File Write and Read, Rack Awareness.
Topics - Big Data, Limitations and Solutions of existing Data Analytics Architecture, Hadoop, Hadoop Features, Hadoop Ecosystem, Hadoop 2.x core components, Hadoop Storage: HDFS, Hadoop Processing: MapReduce Framework, Anatomy of File Write and Read, Rack Awareness. .

2. Hadoop Architecture and HDFS
Learning Objectives - In this module, you will learn the Hadoop Cluster Architecture, Important Configuration files in a Hadoop Cluster, Data Loading Techniques.
Topics - Hadoop 2.x Cluster Architecture, A Typical Production Hadoop Cluster, Hadoop Cluster Modes, Common Hadoop Shell Commands, Hadoop 2.x Configuration Files, Password-Less SSH, MapReduce Job Execution, Data Loading Techniques: Hadoop Copy Commands, FLUME, SQOOP.

3. Hadoop MapReduce Framework - I
Learning Objectives - In this module, you will understand Hadoop MapReduce framework and the working of MapReduce on data stored in HDFS. You will learn about YARN concepts in MapReduce.
Topics - MapReduce Use Cases, Traditional way Vs MapReduce way, Why MapReduce, Hadoop 2.x MapReduce Architecture, Hadoop 2.x MapReduce Components, YARN MR Application Execution Flow, YARN Workflow, Anatomy of MapReduce Program, Demo on MapReduce.

4. Pig
Learning Objectives - In this module, you will learn Pig, types of use case we can use Pig, tight coupling between Pig and MapReduce, and Pig Latin scripting.
Topics - About Pig, MapReduce Vs Pig, Pig Use Cases, Programming Structure in Pig, Pig Running Modes, Pig components, Pig Execution, Pig Latin Program, Data Models in Pig, Pig Data Types. Practical demo on pig-latin covering various concepts of pig-latin, Pig-UDF, Pig Demo on Data set

5. Hive
Learning Objectives - This module will help you in understanding Hive concepts, Loading and Querying Data in Hive and Hive UDF.
Topics - Hive Background, Hive Use Case, About Hive, Hive Vs Pig, Hive Architecture and Components, Metastore in Hive, Limitations of Hive, Comparison with Traditional Database, Hive Data Types and Data Models, Partitions and Buckets, Hive Tables(Managed Tables and External Tables), Importing Data, Querying Data, Managing Outputs, Hive Script, Hive UDF, Hive Demo on Data set.

6. HBase
Learning Objectives - You will acquire in-depth knowledge of HBase, Hbase Architecture and its components.
Topics - HBase: Introduction to NoSQL Databases and HBase, HBase v/s RDBMS, HBase Components, HBase Architecture, HBase Cluster Deployment.

7. Zookeeper and Oozie
Learning Objectives - This module will cover Advance HBase concepts. We will see demos on Bulk Loading , Filters. You will also learn what Zookeeper is all about, how it helps in monitoring a cluster, why HBase uses Zookeeper. This module will also cover Apache Oozie Workflow Scheduler for Hadoop Jobs
Topics - HBase Data Model, HBase Shell, HBase Client API, Data Loading Techniques, ZooKeeper Data Model, Zookeeper Service, Zookeeper, Demos on Bulk Loading, Getting and Inserting Data, Oozie, Oozie Components, Oozie Workflow, Scheduling with Oozie, Demo on Oozie Workflow, Oozie Co-ordinator, Oozie Commands, Oozie Web Console.

8. Flume, Sqoop and Hadoop Project
Learning Objectives - This module will cover Flume & Sqoop demo and working on a real time project.
Topics - Flume and Sqoop Demo and Hadoop Project Demo.