Become an expert in Big Data Hadoop by getting hands-on knowledge on MapReduce, Hadoop Architecture, Pig & Hive, Oozie, Flume and Apache workflow scheduler. Also, get familiar with HBase, Zookeeper, and Sqoop concepts while working on industry-based, use-cases and projects.
This big data course also prepares you for the Cloudera CCA175 certification with simulation exams and real-life projects. The Cloudera certification is the most sought-after big data certification in the industry. After completing the ISEL Big Data and Hadoop training you will be exam ready for the Cloudera certification and job-ready for your next Big Data Assignment.
As new job opportunities are arising for IT professionals in the field of “Big Data & Hadoop,” there is an enormous scope for them. According to the recent study, in 2018, there will be 181,000 Big Data roles within the U.S. By 2020, the Big Data & Hadoop market is estimated to grow at a compound annual growth rate (CAGR) 58% surpassing $16 billion.
MAPREDUCE and HDFS
- Introduction to BIG DATA and Its characteristics
- 4 V’s of BIG DATA
- What is Hadoop?
- Why Hadoop?
- Core Components of Hadoop
- Intro to HDFS and its Architecture
- HDFS commands
- Intro to MAPREDUCE
- Versions of HADOOP
- What is Daemon?
- Hadoop Daemons?
- What is Name Node?
- What is Data Node?
- What is Secondary name Node?
- What is Job Tracker?
- What is Task Tracker?
- Read/Write operations in HDFS
- Complete Overview of Hadoop1.x and its architecture
- Rack awareness
- Introduction to Block size
- Introduction to replication factor
- Introduction to HeartBeat signal/pulse
- MAPREDUCE Architecture
- What is Mapper phase?
- What is shuffle and sort phase?
- What is Reducer phase?
- What is split?
- Difference between Block and split
- Intro to first wordcount program using MAPREDUCE
- Different classes for running MAPREDUCE program using Java
- Mapper class
- Reducer Class and its role
- Driver class
- Submitting the wordcount MAPREDUCE program
- Going through the Job’s system output
- Intro to Practitioner with example
- Intro to Combiner with example
- Intro to Counters and its types
- Different types of counters
- Joins in HADOOP
- Mapside Joins
- Reduce side joins
- Different types of input/output formats in HADOOP
- Use cases for HDFS and MapReduce programs using Java
- Schedulers in Hadoop
- FIFO, Fair, and capacity schedulers
- Speculative Execution in Hadoop
- Difference between Hadoop 1.x and Hadoop 2.x
- Intro to YARN and its core components(RM,NM AMs and AM)
- YARN architecture
PIG
- Intro to PIG
- Why PIG?
- The difference between MAPREDUCE and PIG
- When to go with MAPREDUCE?
- When to go with PIG?
- PIG datatypes
- What is field in PIG?
- What is tuple in PIG?
- What is Bag in PIG?
- Intro to Grunt shell?
- Local Mode
- MAPREDUCE mode
- Running PIG programs
- PIG Script
- Intro to PIG UDFs
- Writing PIG UDF using Java
- Registering PIG UDF
- Running PIG UDF
- Different types of UDFs in PIG
- WordCount program using PIG script
- Use cases for PIG scripts
- MAPREDUCE mode
- Running PIG programs
- PIG Script
- Intro to PIG UDFs
- Writing PIG UDF using Java
- Registering PIG UDF
- Running PIG UDF
- Different types of UDFs in PIG
- WordCount program using PIG script
- Use cases for PIG scripts
HIVE
- Intro to HIVE
- Why HIVE?
- History of HIVE
- Difference between PIG and HIVE
- HIVE data types
- Complex data types
- What is Metastore and its importance?
- Different types of tables in HIVE
- Managed tables
- External tables
- Running HIVE queries
- Intro to HIVE partitions
- Intro to HIVE Buckets
- How to perform the JOINS using HIVE queries
- Intro to HIVE UDFs
- Different types of UDFs in HIVE
- Running HIVE queries for wordcount example
- Use cases for HIVE
HBASE
- Intro to HBASE
- Intro to NoSQL database
- Sparse and dense Concept in RDBMS
- Intro to columnar/column oriented database
- Core architecture of Hbase
- Why Hbase?
- HDFS vs HBase
- Intro to Regions, Region server and Hmaster
- Limitations of Hbase
- Integration with Hive and Hbase
- Hbase commands
- Use cases for HBASE
FLUME
- Intro to Flume
- Intro to Sink, source, Flume master and Flume agents
- Importance of Flume agents
- Live Demo on copying the LOG DATA into HDFS
SQOOP
- Intro to Sqoop
- Importing and exporting the RDBMs into HDFS
- Intro to incremental imports and its types
- Use cases to import the Mysql data into HDFS
ZOOKEEPER
- Intro to Zookeeper
- Zookeeper operations
OOZIE
- Intro to Oozie
- What is Job.properties
- What is workflow.xml
- Scheduling the jobs in Oozie
- Scheduling MapReduce, HIVE,PIG jobs/Programs using Oozie workflows
- Setting up the VMware for Hadoop
- Installing all Hadoop Components
- Intro to Hadoop Distributions
- Intro to Cloudera and its major components
- Discuss on different projects
From the course:
- Learn to write complex codes in MapReduce on both MRv1 & MRv2 (Yarn) and understand Hadoop architecture.
- Perform analytics and learn high-level scripting frameworks Pig & Hive.
- Get full understanding of Hadoop system and its advance elements like Oozie, Flume and apache workflow scheduler.
- Get familiar with other concepts: Hbase, Zookeeper and Sqoop.
- Get hands-on expertise in numerous configurations surroundings of Hadoop cluster.
- Learn about optimization & troubleshooting.
- Acquire in-depth knowledge on Hadoop architecture by learning about Hadoop Distribution file system (vHDFS one.0 & vHDFS a pair of.0).
- Get to work on Real Life Project on Industry standards.
-
Architects and developers who design, develop and maintain Hadoop-based solutions
-
Data Analysts, BI Analysts, BI Developers, SAS Developers and related profiles who analyze Big Data in Hadoop environment
-
Consultants who are actively involved in a Hadoop Project
-
Experienced Java software engineers who need to understand and develop Java MapReduce applications for Hadoop 2.0.
On successful completion of the course and course requisites, the candidate will receive Big Data Hadoop Developer Certification accredited by Ministry of Skill Development and Entrepreneurship, Govt. of India.