Cart
Sign In

Sorry! Learn Big Data- Hadoop and MapReduce 2 DVD Set (45 hours of Content and 2 Real Time Projects) is sold out.

Compare Products
Clear All
Let's Compare!

Learn Big Data- Hadoop and MapReduce 2 DVD Set (45 hours of Content and 2 Real Time Projects)

This product has been sold out
(3.8) 93 Ratings 18 Reviews Have a question?

We will let you know when in stock
notify me

Featured

Highlights

  • Digi Pathshala
  • Stream:Hadoop
  • Format:DVD
  • Duration:55 Hours For queries and concerns drop an email to learning@snapdeal.com
  • SUPC: SDL328915758

Description

We will dispatch the device containing course content within 72 Hrs of purchase. This DVD may be used to get started and continue with the course.


Product Description

This training will help you setup Hadoop and learn Hadoop and Mapreduce programming from beginner to advance level and this course is designed to clear Cloudera Certified Developer for Apache Hadoop (CCDH).

Here are the major components of this training course:

Introduction to Hadoop and its Ecosystem, Map Reduce and HDFS, Deep Dive in Map Reduce and Yarn, Deep Dive in Pig & Hive, Introduction to Hbase architecture, Hadoop Cluster Setup and Running Map Reduce Jobs, Advance Mapreduce.

Hands on Exercise:

  1. End to end POC using Yarn or Hadoop 2
  2. End to end POC using Yarn or Hadoop 2.0

Learning Objectives:
The key objectives of this online Big Data Hadoop Tutorial and training program are to enable developers to:

  1. Programming in YARN (MRv2) latest version of Hadoop Release 2.0
  2. Implementation of HBase, MapReduce Integration, Advanced Usage and Advanced Indexing
  3. Advance Map Reduce exercises – examples of Facebook sentiment analysis, LinkedIn shortest path algorithm, Inverted indexing
  4. Derive an insight into the field of Data Science
  5. Understand the Apache Hadoop framework
  6. Learn to work with Hadoop Distributed File System (HDFS)
  7. Learn how MapReduce interacts with data and processes them
  8. Ability to design and develop applications involving large data using Hadoop eco system.
  9. Differentiate between new as well as old APIs for Hadoop
  10. Understand how YARN engages in managing compute resources into clusters

Last but not the least, the Hadoop online tutorial program prepares programmers for better career opportunities in the world of Big Data!

In this course from each component we have covered almost all the concepts in depth, like Partitioner, Combiner, MapReduce Sql Programs, MapReduce Counters, Secondary Sorting, Record Reader, Reading Excell Files, Reading JSON Files, Reading URL files, Reading data from Twitter and performing analytics, Hive SerDe, Hive UDF, UDTF, Pig Storage, Pig UDF, Load Func, Eval Func, Filter Func, NameNode Architecture, DataNode Architecture, Secondary NameNode Architecture and MapReduce Architecture.

Course Structure:

1. 45 hours course
2. Real time explanations of the concepts
3. 95% Technical and 5 % PPT for Architecture discussion + 50 Hours of course for practical no PPT
4. 50 Real Time POC’s
5. 2 Real Time Projects
6. Interview Questions

Benefits of taking this course:

  1. We provide Ubuntu VMWare where you can learn the subject from scratch, instead of Cloudera and Hortonworks where each and every thing will be installed already where developer can’t learn the installation concepts, we are providing the training on plain Ubuntu so that developer also can learn installations.
  2. Cluster Setup document
  3. Phyton and C++ example of MapReduce
  4. Sqoop Concepts in depth
  5. HBase discussion with Commands and Java API(mostly people skip Java Api discussion)
  6. Oozie with coordination example

After learning from this course you will be perfect in all the areas of Hadoop with real time scenarios and you can work on any project of Hadoop without any one’s support.

Course Curriculum:

Module I. Introduction to Big Data and Hadoop
* What is Big Data?
* What are the challenges for processing big data?
* What technologies support big data?
*3V’s of BigData and Growing.
* What is Hadoop?
* Why Hadoop and its Use cases
* History of Hadoop
* Different Ecosystems of Hadoop.
* Advantages and Disadvantages of Hadoop
* Real Life Use Cases

Module II. HDFS (Hadoop Distributed File System)
* HDFS architecture
* Features of HDFS
* Where does it fit and Where doesn't fit?
* HDFS daemons and its functionalities
* Name Node and its functionality
* Data Node and its functionality
* Secondary Name Node and its functionality
* Data Storage in HDFS
* Introduction about Blocks
* Data replication
*Accessing HDFS
* CLI(Command Line Interface) and admin commands
* Java Based Approach
*Hadoop Administration
*Hadoop Configuration Files
*Configuring Hadoop Domains
*Precedence of Hadoop Configuration
*Diving into Hadoop Configuration
*Scheduler
*RackAwareness
*Cluster Administration Utilities
*Rebalancing HDFS DATA
*Copy Large amount of data from HDFS
*FSImage and Edit.log file theoretically and practically.

Module III. MAPREDUCE
* Map Reduce architecture
* JobTracker , TaskTracker and its functionality
* Job execution flow
* Configuring development environment using Eclipse
* Map Reduce Programming Model
* How to write a basic Map Reduce jobs
* Running the Map Reduce jobs in local mode and distributed mode
* Different Data types in Map Reduce
* How to use Input Formatters and Output Formatters in Map Reduce Jobs
* Input formatters and its associated Record Readers with examples
* Text Input Formatter
* Key Value Text Input Formatter
* Sequence File Input Formatter
* How to write custom Input Formatters and its Record Readers
* Output formatters and its associated Record Writers with examples
* Text Output Formatter
* Sequence File Output Formatter
* How to write custom Output Formatters and its Record Writers
* How to write Combiners, Partitioners and use of these
* Importance of Distributed Cache
* Importance Counters and how to use Counters

Module IV. Advance MapReduce Programming – Program’s sharing as part of POC’s
* Joins - Map Side and Reduce Side
* Use of Secondary Sorting
* Importance of Writable and Writable Comparable Api's
* How to write Map Reduce Keys and Values
* Use of Compression techniques
* Snappy, LZO and Zip
* How to debug Map Reduce Jobs in Local and Pseudo Mode.
* Introduction to Map Reduce Streaming and Pipes with examples
*Job Submission
*Job Initialization
*Task Assignment
*Task Execution
*Progress and status bar
*Job Completion
*Failures
*Task Failure
*Tasktracker failure
*JobTracker failure
*Job Scheduling
*Shuffle & Sort in depth
* Diving into Shuffle and Sort
* Dive into Input Splits
* Dive into Buffer Concepts
*Dive into Configuration Tuning
*Dive into Task Execution
*The Task assignment Environment
*Speculative Execution
*Output Committers
*Task JVM Reuse
*Multiple Inputs & Multiple Outputs
*Build In Counters
* Dive into Counters – Job Counters & User Defined Counters
* Sql operations using Java MapReduce
* Introduction to YARN (Next Generation Map Reduce)

Module V. Apache HIVE
* Hive Introduction
* Hive architecture
* Driver
* Compiler
* Semantic Analyzer
* Hive Integration with Hadoop
* Hive Query Language(Hive QL)
* SQL VS Hive QL
* Hive Installation and Configuration
* Hive, Map-Reduce and Local-Mode
* Hive DLL and DML Operations
* Hive Services
* CLI
*Schema Design
*Views
*Indexes
* Hiveserver
* Metastore
* embedded metastore configuration
* external metastore configuration
* Transformations in Hive
* UDFs in Hive
* How to write a simple hive queries
* Usage
*Tuning
* Hive with HBASE Integration
* Need to add some more R&D done by myself 

Module VI. Apache PIG
* Introduction to Apache Pig
* Map Reduce Vs Apache Pig
* SQL Vs Apache Pig
* Different data types in Pig
* Modes of Execution in Pig
* Local Mode
* Map Reduce Mode
* Execution Mechanism
* Grunt Shell
* Script
* Embedded
* Transformations in Pig
* How to write a simple pig script
* UDFs in Pig
* Pig with HBASE Integration
* Need to add some more R&D done by myself

Module VII. Apache SQOOP
* Introduction to Sqoop
* MySQL client and Server Installation
* How to connect to Relational Database using Sqoop
* Sqoop Commands and Examples on Import and Export commands.
*Transferring an Entire Table
*Specifying a Target Directory
*Importing only a Subset of data
*Protecting your password
*Using a file format other than CSV
*Compressing Imported Data
*Speeding up Transfers
*Overriding Type Mapping
*Controlling Parallelism
*Encoding Null Values
*Importing all your tables
*Incremental Import
*Importing only new data
*Incrementing Importing Mutable data
*Preserving the last imported value
*Storing Password in the Metastore
*Overriding arguments to a saved job
*Sharing the MetaStore between sqoop client
*Importing data from two tables
*Using Custom Boundary Queries
*Renaming Sqoop Job instances
*Importing Queries with duplicate columns
*Transferring data from Hadoop
*Inserting Data in Batches
*Updating or Inserting at the same time
*Exporting Corrupted Data

Module VIII. Apache FLUME
* Introduction to flume
* Flume agent usage

Module IX Apache Hbase
* Hbase introduction
* Hbase basics
* Catgories of NoSQL DataBases
- Key-Value Database
- Document DataBase
- Column Family DataBase
- Graph DataBase
* Column families
* Scans
* Hbase installation
* Hbase Architecture
* Storage
* WriteAhead Log
* Mapreduce integration
* Mapreduce over Hbase
* Hbase Usage
* Key design
* Bloom Filters
* Versioning
* Filters
* Hbase Clients
* REST
* Thrift
* Hive with HBase Integration
* Web Based UI
* Hbase Admin
* Schema definition
* Basic CRUD operations
*CRUD Operation using Java API
*Zookeeper 

Module X. Apache OOZIE
* Introduction to Oozie
* Executing workflow jobs

Module XI. Hadoop Installation on Linux, All other ecosystems installations on Linux.

Module XII. Cluster setup (200 Nodes cluster) knowledge sharing with setup document.

Module XIII. Cloudera & Hortonworks

Module XIV. 50 POC’s/ Real Time Use Cases & 2 Projects Discussion


Learn Everything, Anywhere, Anytime
India's Largest Online Education Marketplace

Terms & Conditions

The images represent actual product though color of the image and product may slightly differ.