Big Data and AI Pipeline Framework: Technology Analysis from a Benchmarking Perspective (original) (raw)


Recently, increasingly large amounts of data are generated from a variety of sources. Existing data processing technologies are not suitable to cope with the huge amounts of generated data. Yet, many research works focus on Big Data, a buzzword referring to the processing of massive volumes of (unstructured) data. Recently proposed frameworks for Big Data applications help to store, analyze and process the data. In this paper, we discuss the challenges of Big Data and we survey existing Big Data frameworks. We also present an experimental evaluation and a comparative study of the most popular Big Data frameworks. This survey is concluded with a presentation of best practices related to the use of studied frameworks in several application domains such as machine learning, graph processing and real-world applications.

The rapid development of technology over the past 20 years has led to explosive data growth in various industries, including defense industries, healthcare. The analysis of generated Big Data has recently been addressed by many researchers, because today's Big Data analysis are one of the most important and most profitable areas of development in Data Science and companies that are able to extract valuable knowledge among the massive amount of data at logical time can earn significant advantages . Accordingly, in this survey, we investigate definition of the Big Data and the data sources. Also look at advantages, challenges, applications, analysis and platforms used in the Big Data.

A plethora of Big Data Analytics technologies and platforms have been proposed in the last years. However, in 2017, only 53% of companies are adopting such tools. It seems that the industry is not convinced about Big Data promises or maybe choosing the right technology/platform requires in-depth knowledge about the capabilities of all these tools. Before deciding the right technology or platform to choose from, the organizations have to investigate the application/algorithm needs and the advantages and drawbacks of each technology/platform. In this paper, we aim at helping organizations in the selection of technologies/platforms more appropriate to their analytic processes by offering a short-review according to some categories of Big Data problems as processing (streaming and batch), storage, data integration, analytics, data governance, and monitoring.

https://www.ijert.org/big-data-an-overview https://www.ijert.org/research/big-data-an-overview-IJERTV3IS070118.pdf Large amounts of data are generated everyday by a variety of sources including cell phone users [1] [5] to military services [2] [13]. This data is so enormous in amount that regular data extraction or querying techniques are useless. The gigantic data is termed as BIG DATA. A very vague term in itself, no proper explanation or measurement is suitable as the limits to big data are very dynamic. For the upper limits to data today might be the lower ones 5-10 years from now. It is very important to clearly understand and decode this term to the fullest as it is of extreme significance in the years to come. This paper aims to explain with subtlety what big data is and why we should know about it. We also go further to provide an overview on the architectures employed, and challenges being faced presently.