Working with Spark – Big Data Hadoop MapReduce

July 05, 2020

Introduction

Before moving to Spark RDD concept, which is the base line of Spark, we need to understand the concept of Hadoop Mapredue. RDD is the improvement of Hadoop Mapreduce to get 100 outputs in memory. In this blog post we are going to discuss about Hadoop Mapredue to understand the concept only.

Hope it will be interesting.

What is MapReduce

MapReduce is a software framework and programming module to handle huge data. Hadoop is capable to run MapReduce programs which are written in different language like Java, Ruby, Python and C++.

How MapReduce Works

As the name specified, MapReduce is a combination of Mapping and Reducing.

We can divide the MapReduc into following Sections.

· Input Splits

· Mapping

· Shuffling

· Reducing

Before examining those, let’s take an example.

Please consider the following input data as Map Reduce program.

Mayuree how are you

We welcome Mayuree in India

Mayaure is a best developer

Please follow the bellow diagram.

Final Output:

Input Splits

An input to a MapReduce job is divided into fixed-size pieces called input splits Input split is a chunk of the input that is consumed by a single map

Mapping

This is the very first phase in the execution of map-reduce program. In this phase data in each split is passed to a mapping function to produce output values.

Shuffling

This phase consumes the output of Mapping phase. Its task is to consolidate the relevant records from Mapping phase output.

Reducing

In this phase, output values from the Shuffling phase are aggregated. This phase combines values from Shuffling phase and returns a single output value. In short, this phase summarizes the complete datasets.

Hope you like it.

Search This Blog

SQL Knowledge Bank - Azure

Working with Spark – Big Data Hadoop MapReduce

Comments

Post a Comment

Popular Posts

Working with Python – Map, Filter and Reduce Function

Working with Python – Module

Copying Multiple File from Blob Storage to Single SQL Table – Part-2