Beyond Hadoop: The Paradigm Shift of Data From Stationary to Streaming Data for Data Analytics (original) (raw)

Big Data Streaming Platforms to Support Real-time Analytics

2020

In recent years data has grown exponentially due to the evolution of technology. The data flow circulates in a very fast and continuous way, so it must be processed in real time. Therefore, several big data streaming platforms have emerged for processing large amounts of data. Nowadays, companies have difficulties in choosing the platform that best suits their needs. In addition, the information about the platforms is scattered and sometimes omitted, making it difficult for the company to choose the right platform. This work focuses on helping companies or organizations to choose a big data streaming platform to analyze and process their data flow. We provide a description of the most popular platforms, such as: Apache Flink, Apache Kafka, Apache Samza, Apache Spark and Apache Storm. To strengthen the knowledge about these platforms, we also approached their architectures, advantages and limitations. Finally, a comparison among big data streaming platforms will be provided, using as...

Big Data Analytics Beyond Hadoop: Real-Time Applications with Storm, Spark, and More Hadoop Alternatives

2014

Master alternative Big Data technologies that can do what Hadoop can't: real-time analytics and iterative machine learning. When most technical professionals think of Big Data analytics today, they think of Hadoop. But there are many cutting-edge applications that Hadoop isn't well suited for, especially real-time analytics and contexts requiring the use of iterative machine learning algorithms. Fortunately, several powerful new technologies have been developed specifically for use cases such as these. Big Data Analytics Beyond Hadoop is the first guide specifically designed to help you take the next steps beyond Hadoop. Dr. Vijay Srinivas Agneeswaran introduces the breakthrough Berkeley Data Analysis Stack (BDAS) in detail, including its motivation, design, architecture, Mesos cluster management, performance, and more. He presents realistic use cases and up-to-date example code for: Spark, the next generation in-memory computing technology from UC Berkeley Storm, the parall...

Real-time stream processing for Big Data

it - Information Technology, 2016

With the rise of the web 2.0 and the Internet of things, it has become feasible to track all kinds of information over time, in particular fine-grained user activities and sensor data on their environment and even their biometrics. However, while efficiency remains mandatory for any application trying to cope with huge amounts of data, only part of the potential of today's Big Data repositories can be exploited using traditional batch-oriented approaches as the value of data often decays quickly and high latency becomes unacceptable in some applications. In the last couple of years, several distributed data processing systems have emerged that deviate from the batch-oriented approach and tackle data items as they arrive, thus acknowledging the growing importance of timeliness and velocity in Big Data analytics. In this article, we give an overview over the state of the art of stream processors for low-latency Big Data analytics and conduct a qualitative comparison of the most po...

Future of Big Data beyond Batch Processing

In recent years, big data are generated from a variety of sources, and there is an enormous demand for storing, managing, processing, and querying on big data. The MapReduce framework and its open source implementation Hadoop, has proven itself as the de facto solution for processing large amounts of data in parallel, and is intrinsically designed for batch processing and high throughput jobs. Although Hadoop has proven as de facto solution for batch jobs, there is growing demand for non-batch applications like: real-time queries, interactive jobs, and big data streams. Since Hadoop is not suitable for these non-batch jobs, new solutions are proposed to meet these new challenges. In this paper, we discuss the strength, features, and shortcomings of the standard MapReduce framework and its open source implementation Hadoop. In addition; we have discussed the significant extensions of MapReduce. Further, we have considered two categories of these solutions: real-time processing, and stream processing of big data. For each category, we have included paradigms and strengths.

Stream processing platforms for analyzing big dynamic data

it - Information Technology, 2016

Nowadays, data is produced in every aspect of our lives, leading to a massive amount of information generated every second. However, this vast amount is often too large to be stored and for many applications the information contained in these data streams is only useful when it is fresh. Batch processing platforms like Hadoop MapReduce do not fit these needs as they require to collect data on disk and process it repeatedly. Therefore, modern data processing engines combine the scalability of distributed architectures with the one-pass semantics of traditional stream engines. In this paper, we survey the current state of the art in scalable stream processing from a user perspective. We examine and describe their architecture, execution model, programming interface, and data analysis support as well as discuss the challenges and limitations of their APIs. In this connection, we introduce Piglet, an extended Pig Latin language and code generator that compiles (extended) Pig Latin code ...

A New Architecture for Real Time Data Stream Processing

International Journal of Advanced Computer Science and Applications, 2017

Processing a data stream in real time is a crucial issue for several applications, however processing a large amount of data from different sources, such as sensor networks, web traffic, social media, video streams and other sources, represents a huge challenge. The main problem is that the big data system is based on Hadoop technology, especially MapReduce for processing. This latter is a high scalability and fault tolerant framework. It also processes a large amount of data in batches and provides perception blast insight of older data, but it can only process a limited set of data. MapReduce is not appropriate for real time stream processing, and is very important to process data the moment they arrive at a fast response and a good decision making. Ergo the need for a new architecture that allows real-time data processing with high speed along with low latency. The major aim of the paper at hand is to give a clear survey of the different open sources technologies that exist for real-time data stream processing including their system architectures. We shall also provide a brand new architecture which is mainly based on previous comparisons of real-time processing powered with machine learning and storm technology.

Investigation on Processing of Real-Time Streaming Big Data

International Journal of Engineering & Technology

MapReduce is the most widely used for huge data processing and it is a part of the Hadoop big data and this will provide the quality and efficient results because of their processing functions. For the batch jobs, Hadoop is the proper and also there is inflated request for non-batch elements homogeneous interactive jobs, and high data currents. For this non-batch assignments, consider Hadoop is not useful and present situations are recommending to these new crises. In this paper, these are divided into two stages that are real-time processing, and stream processing of big data. For every stage, the models are deliberate, stability and diversity to Hadoop. For every group, we have provided the working systems and structures. For the creation of the new examples, some experiments are conducted to improve the new results belongs to available Hadoop-based solutions.

A Comparative Study on Streaming Frameworks for Big Data

2018

Recently, increasingly large amounts of data are generated from a variety of sources. Existing data processing technologies are not suitable to cope with the huge amounts of generated data. Yet, many research works focus on streaming in Big Data, a task referring to the processing of massive volumes of structured/unstructured streaming data. Recently proposed streaming frameworks for Big Data applications help to store, analyze and process the continuously captured data. In this paper, we discuss the challenges of Big Data and we survey existing streaming frameworks for Big Data. We also present an experimental evaluation and a comparative study of the most popular streaming platforms.

Architectural Structures and Viewpoints of Streaming Big Data

International Journal of Recent Technology and Engineering (IJRTE), 2019

The Streaming big data is one of shifting trend of technology, from data cycle from external behaviour (Hardware) that is low-level low language to digital level, through virtual memory to be implemented as datasets. The proposed model in this preliminary study impending to track data from a different platform and considered important parameters in establishing regulate the storage machinery, storage format, and the pre-processing tools. Moreover, the source for the data type is unstructured data that request organized data in different levels to able forwards process data from machine level to MetaData. The type of storage is virtual memory and the important matter is the capacity is limited. The tuple is third level context the digital data from each sensor through template are recorded by counter time.