MapReduce and Its Applications, Challenges, and Architecture a Comprehensive Review and Directions for Future Research.pdf (original) (raw)

A Survey on MapReduce Implementations

International Journal of Cloud Applications and Computing, 2016

A distinguished successful platform for parallel data processing MapReduce is attracting a significant momentum from both academia and industry as the volume of data to capture, transform, and analyse grows rapidly. Although MapReduce is used in many applications to analyse large scale data sets, there is still a lot of debate among scientists and researchers on its efficiency, performance, and usability to support more classes of applications. This survey presents a comprehensive review of various implementations of MapReduce framework. Initially the authors give an overview of MapReduce programming model. They then present a broad description of various technical aspects of the most successful implementations of MapReduce framework reported in the literature and discuss their main strengths and weaknesses. Finally, the authors conclude by introducing a comparison between MapReduce implementations and discuss open issues and challenges on enhancing MapReduce.

The MapReduce Framework : Analysis and The Research Perspectives

MapReduce is a simple and powerful programming model which enables development of scalable parallel applications to process large amount of data which is scattered on a cluster of machines. The original implementations of Map Reduce framework had some limitations which have been faced by many research follow up work after its introduction. It is gaining a lot of attraction in both research and industrial community as it has the capacity of processing large data. In this review paper, we are going to discuss the map reduce framework used in different applications and for different purposes. This is a analysis done for implementing the architecture of Map Reduce from different research perspectives.

A Study on MapReduce: Challenges and Trends

Indonesian Journal of Electrical Engineering and Computer Science, 2016

Nowadays we all are surrounded by Big data. The term ‘Big Data’ itself indicates huge volume, high velocity, variety and veracity i.e. uncertainty of data which gave rise to new difficulties and challenges. Big data generated may be structured data, Semi Structured data or unstructured data. For existing database and systems lot of difficulties are there to process, analyze, store and manage such a Big Data. The Big Data challenges are Protection, Curation, Capture, Analysis, Searching, Visualization, Storage, Transfer and sharing. Map Reduce is a framework using which we can write applications to process huge amount of data, in parallel, on large clusters of commodity hardware in a reliable manner. Lot of efforts have been put by different researchers to make it simple, easy, effective and efficient. In our survey paper we emphasized on the working of Map Reduce, challenges, opportunities and recent trends so that researchers can think on further improvement.

MapReduce: an infrastructure review and research insights

The Journal of Supercomputing, 2019

In the current decade, doing the search on massive data to find "hidden" and valuable information within it is growing. This search can result in heavy processing on considerable data, leading to the development of solutions to process such huge information based on distributed and parallel processing. Among all the parallel programming models, one that gains a lot of popularity is MapReduce. The goal of this paper is to survey researches conducted on the MapReduce framework in the context of its open-source implementation, Hadoop, in order to summarize and report the wide topic area at the infrastructure level. We managed to do a systematic review based on the prevalent topics dealing with MapReduce in seven areas: (1) performance; (2) job/task scheduling; (3) load balancing; (4) resource provisioning; (5) fault tolerance in terms of availability and reliability; (6) security; and (7) energy efficiency. We run our study by doing a quantitative and qualitative evaluation of the research publications' trend which is published between January 1, 2014, and November 1, 2017. Since the MapReduce is a challenge-prone area for researchers who fall off to work and extend with, this work is a useful guideline for getting feedback and starting research.

Parallel data processing with MapReduce

ACM SIGMOD Record, 2012

A prominent parallel data processing tool MapReduce is gaining significant momentum from both industry and academia as the volume of data to analyze grows rapidly. While MapReduce is used in many areas where massive data analysis is required, there are still debates on its performance, efficiency per node, and simple abstraction. This survey intends to assist the database and open source communities in understanding various technical aspects of the MapReduce framework. In this survey, we characterize the MapReduce framework and discuss its inherent pros and cons. We then introduce its optimization strategies reported in the recent literature. We also discuss the open issues and challenges raised on parallel data analysis with MapReduce.

MapReduce: A Parallel Framework

2019

Map Reduce is a practical model used for processing the large scale data that is the huge volume data at a very high speed. It is parallel processing programming model helping in achieving near real time results. Designed efficiently by Google, Map Reduce use Map and Reduce function to accurately produce a large amount of data sets. The division of one large process problem into smaller tasks and issues is carried out by the Map Function. In contrast, the Reduce Function will read and combine the intermediate results to form a single final result. Thus this paper is a brief analysis of the Basics Of Map Reduce and its applications.

An Extensive Investigate the MapReduce Technology

International Journal of Computer Sciences and Engineering (IJCSE), E-ISSN : 2347-2693, Volume-5, Issue-10, Page No. 218-225, 2017

Since, the last three or four years, the field of "big data" has appeared as the new frontier in the wide spectrum of IT-enabled innovations and favorable time allowed by the information revolution. Today, there is a raise necessity to analyses very huge datasets, that have been coined big data, and in need of uniqueness storage and processing infrastructures. MapReduce is a programming model the goal of processing big data in a parallel and distributed manner. In MapReduce, the client describes a map function that processes a key/value pair to procreate a set of intermediate value pairs & key, and a reduce function that merges all intermediate values be associated with the same intermediate key. In this paper, we aimed to demonstrate a close-up view about MapReduce. The MapReduce is a famous framework for data-intensive distributed computing of batch jobs. This is oversimplify fault tolerance, many implementations of MapReduce materialize the overall output of every map and reduce task before it can be consumed. Finally, we also discuss the comparison between RDBMS and MapReduce, and famous scheduling algorithms in this field.

MapReduce Model: A Paradigm for Large Data Processing

Zenodo (CERN European Organization for Nuclear Research), 2023

The input key/value pairs [1] split the data into different segments based on the assigned keys and values that are sent into different machines in parallel over the cluster, each of the machines runs the data mapped to it, the immediate output generates two scripts; the map script and the reduced script [3]. The MapReduce architecture have three phases; Mapper, Reducer, and Shuffler (Khezr & Navimipour, 2017) [5].

Analysis of MapReduce Methods

2020

As the uses of social media networking platforms increasing hence, huge volume of data is generated with that it required large amount of memory for storing the data. Since, there are some issues like storing the large data set, parallel and distributed large data set also including some challenges like critical path problem, reliability problem, equal split issues, single split issues and aggregation of issues. Hence, to overcome this problem we have MapReduce method which allows us to use parallel computations, distributed processing without considering issues like fault tolerance and reliability. For such reasons, this survey paper mentioned the various implementation methods of MapReduce and also, comparison between various implementation methods with respect to the volume and variety.

MapReduce

Communications of the ACM, 2008

MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in terms of a map and a reduce function, and the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks. Programmers find the system easy to use: more than ten thousand distinct MapReduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand MapReduce jobs are executed on Google's clusters every day, processing a total of more than twenty petabytes of data per day.