Big data analysis Research Papers (original) (raw)

With the growing use of information technology in all life domains, hacking has become more negatively effective than ever before. Also with developing technologies, attacks numbers are growing exponentially every few months and become... more

With the growing use of information technology in all life domains, hacking has become more negatively effective than ever before. Also with developing technologies, attacks numbers are growing exponentially every few months and become more sophisticated so that traditional IDS becomes inefficient detecting them. This paper proposes a solution to detect not only new threats with higher detection rate and lower false positive than already used IDS, but also it could detect collective and contextual security attacks. We achieve those results by using Networking Chatbot, a deep recurrent neural network: Long Short Term Memory (LSTM) on top of Apache Spark Framework that has an input of flow traffic and traffic aggregation and the output is a language of two words, normal or abnormal. We propose merging the concepts of language processing, contextual analysis, distributed deep learning, big data, anomaly detection of flow analysis. We propose a model that describes the network abstract ...

Avant le dernier « sprint final » pour remporter la prochaine présidentielle qui se gagnera en partie sur le Web, un grand vainqueur s'impose d'ores et déjà au coeur de la campagne électorale : l'usage immodéré des plateformes de Data... more

Avant le dernier « sprint final » pour remporter la prochaine présidentielle qui se gagnera en partie sur le Web, un grand vainqueur s'impose d'ores et déjà au coeur de la campagne électorale : l'usage immodéré des plateformes de Data Intelligence américaines par toutes les écuries politiques en lice.

Big Data Visualization Tools : A Survey of the State of the Art and Challenges Ahead

Direct mail marketing will continue to be successful when tools are used to optimize audience selection. Audience selection for direct mail campaigns should be smarter and more personalized in the growing presence of online marketing.... more

Direct mail marketing will continue to be successful when tools are used to optimize audience selection. Audience selection for direct mail campaigns should be smarter and more personalized in the growing presence of online marketing. Road Scholar has been offering learning adventures in all 50 states and 150 countries worldwide to adults aged 50+ since 1975. Direct mail has been and continues to be an important channel to Road Scholar due to the older audience, but as a not-for-profit we need to use our budget wisely. Historically, audience selections for our catalog campaigns relied on limited segmentation and were not targeted. In 2015, we implemented advanced predictive modeling as an additional tool to the audience selection process via JMP Pro. We developed, validated, and pre-assessed predictive response models by using JMP Pro to identify key independent variables predictive of the desired response using historical data. Post-campaign analyses show that advanced predictive modeling identified high performing customers in the top deciles. As a result, we have been able to take a more targeted approach while still increasing response. The results demonstrate that direct mail campaigns are still effective when advanced tools are used to optimize audience selection.

Big Data Analytics is a way of extracting value from these huge volumes of information, and it drives new market opportunities and maximizes customer retention. The rapid rise of the Internet and the digital economy has fuelled an... more

Big Data Analytics is a way of extracting value from these huge volumes of information, and it drives new market opportunities and maximizes customer retention. The rapid rise of the Internet and the digital economy has fuelled an exponential growth in demand for data storage and analytics, and IT department are facing tremendous challenge in protecting and analyzing these increased volumes of information. The reason organizations are collecting and storing more data than ever before is because their business depends on it. The type of information being created is no more traditional database-driven data referred to as structured data rather it is data that include documents, images, audio, video, and social media contents known as unstructured data or Big Data. This paper primarily focuses on discussing the various technologies that work together as a Big Data Analytics system that can help predict future volumes, gain insights, take proactive actions, and give way to better strategic decision-making.

In an airport system, revenues are divided into aeronautical and non-aeronautical, where non-aeronautical revenues are related to services characterized by commercial facilities. Considering the non-aeronautical revenues, two store... more

In an airport system, revenues are divided into aeronautical and non-aeronautical, where non-aeronautical revenues are related to services characterized by commercial facilities. Considering the non-aeronautical revenues, two store segments are studied: (i) retail and (ii) food and beverage. Factors such as customer satisfaction, passenger feelings, purchasing motivations, customer experience, waiting time, wayfinding and new technologies can directly

To tackle the increasing challenges of agricultural production, the complex agricultural ecosystems need to be better understood. This can happen by means of modern digital technologies that monitor continuously the physical environment,... more

To tackle the increasing challenges of agricultural production, the complex agricultural ecosystems need to be better understood. This can happen by means of modern digital technologies that monitor continuously the physical environment, producing large quantities of data in an unprecedented pace. The analysis of this (big) data would enable farmers and companies to extract value from it, improving their productivity. Although big data analysis is leading to advances in various industries, it has not yet been widely applied in agriculture. The objective of this paper is to perform a review on current studies and research works in agriculture which employ the recent practice of big data analysis, in order to solve various relevant problems. Thirty four different studies are presented, examining the problem they address, the proposed solution, tools, algorithms and data used, nature and dimensions of big data employed, scale of use as well as overall impact. Concluding, our review highlights the large opportunities of big data analysis in agriculture towards smarter farming, showing that the availability of hardware and software, techniques and methods for big data analysis, as well as the increasing openness of big data sources, shall encourage more academic research, public sector initiatives and business ventures in the agricultural sector. This practice is still at an early development stage and many barriers need to be overcome.

Data exploration and visualization systems are of great importance in the Big Data era. Exploring and visualizing very large datasets has become a major research challenge, of which scalability is a vital requirement. In this survey, we... more

Data exploration and visualization systems are of great importance in the Big Data era. Exploring and visualizing very large datasets has become a major research challenge, of which scalability is a vital requirement. In this survey, we describe the major prerequisites and challenges that should be addressed by the modern exploration and visualization systems. Considering these challenges, we present how state-of-the-art approaches from the Database and Information Visualization communities attempt to handle them. Finally , we survey the systems developed by Semantic Web community in the context of the Web of Linked Data, and discuss to which extent these satisfy the contemporary requirements.

Big Data has become a research hotspot in academia and industry, and it is affecting people's daily life, work habits and ways of thinking. However, at present, big data faces many security risks in the process of collection, storage and... more

Big Data has become a research hotspot in academia and industry, and it is affecting people's daily life, work habits and ways of thinking. However, at present, big data faces many security risks in the process of collection, storage and use. The leakage of privacy caused by big data poses serious problems for the users, also the incorrect or false big data will lead to wrong or invalid analysis of results. This paper analyzes the technical challenges of implementing big data security and privacy protection, and describes some key solutions to address the issues related with big data security and privacy. It is pointed out that big data is an effective means to solve information security problems while introducing security issues. It brings new opportunities for the development of information security.

The Size of the data is increasing day by day with the using of social site. Big Data is a concept to manage and mine the large set of data. Today the concept of Big Data is widely used to mine the insight data of organization as well... more

The Size of the data is increasing day by day with the using of social site. Big Data is a concept to manage and mine the large set of data. Today the concept of Big Data is widely used to mine the insight data of organization as well outside data. There are many techniques and technologies used in Big Data mining to extract the useful information from the distributed system. It is more powerful to extract the information compare with traditional data mining techniques. One of the most known technologies is Hadoop, used in Big Data mining. It takes many advantages over the traditional data mining technique but it has some issues like visualization technique, privacy etc.

Data volume grows explosively with the proliferation of powerful smartphones and innovative mobile applications. The ability to accurately and extensively monitor and analyze these data is necessary. Much concern in cellular data analysis... more

Data volume grows explosively with the proliferation of powerful smartphones and innovative mobile applications. The ability to accurately and extensively monitor and analyze these data is necessary. Much concern in cellular data analysis is related to human beings and their behaviours. Due to the potential value that lies behind these massive data, there have been different proposed approaches for understanding corresponding patterns. To that end, analyzing people’s activities, e.g., counting them at fixed locations and tracking them by generating origin-destination matrices is crucial. The former can be used to determine the utilization of assets like roads and city attractions. The latter is valuable when planning transport infrastructure. Such insights allow a government to predict the adoption of new roads, new public transport routes, modification of existing infrastructure, and detection of congestion zones, resulting in more efficient designs and improvement. Smartphone data exploration can help research in various fields, e.g., urban planning, transportation, health care, and business marketing. It can also help organizations in decision making, policy implementation, monitoring, and evaluation at all levels. This work aims to review the methods and techniques that have been implemented to discover knowledge from mobile phone data. We classify these existing methods and present a taxonomy of the related work by discussing their pros and cons.

A larger amount of data gives a better output but also working with it can become a challenge due to processing limitations. Nowadays companies are starting to realize the importance of using more data in order to support decision for... more

A larger amount of data gives a better output but also working with it can become a challenge due to processing limitations. Nowadays companies are starting to realize the importance of using more data in order to support decision for their strategies. It was said and proved through study cases that "More data usually beats better algorithms". With this statement companies started to realize that they can chose to invest more in processing larger sets of data rather than investing in expensive algorithms. During the last decade, large statistics evaluation has seen an exponential boom and will absolutely retain to witness outstanding tendencies due to the emergence of new interactive multimedia packages and extraordinarily incorporated systems driven via the speedy growth in statistics services and microelectronic gadgets. Up to now, maximum of the modern mobile structures are especially centered to voice communications with low transmission fees.

This article explores how society can benefit from big data analytics. Specifically, it presents big data applications in different fields, e.g. in business industries, marketing, humanitarian aid, healthcare, and engineering; fields that... more

This article explores how society can benefit from big data analytics. Specifically, it presents big data applications in different fields, e.g. in business industries, marketing, humanitarian aid, healthcare, and engineering; fields that can take advantage of technological innovations to improve the efficiency of their services.

Intensive farming has been linked to significant degradation of land, water and air. A common body of knowledge is needed, to allow an effective monitoring of cropping systems, fertilization and water demands, and impacts of climate... more

Intensive farming has been linked to significant degradation of land, water and air. A common body of knowledge is needed, to allow an effective monitoring of cropping systems, fertilization and water demands, and impacts of climate change, with a focus on sustainability and protection of the physical environment. In this paper, we describe AgriBigCAT, an online software platform that uses geo-physical information from various diverse sources, employing geospatial and big data analysis, together with web technologies, in order to estimate the impact of the agricultural sector on the environment, considering land, water, biodiversity and natural areas requiring protection, such as forests and wetlands. This platform can assist both the farmers' decision-taking processes and the administration planning and policy making, with the ultimate objective of meeting the challenge of increasing food production at a lower environmental impact.

Big data analytics uses efficient analytic techniques to discover hidden patterns, correlations, and other insights from big data. It brings significant cost advantages, enhances the performance of decision making, and creates new... more

Big data analytics uses efficient analytic techniques to discover hidden patterns, correlations, and other insights from big data. It brings significant cost advantages, enhances the performance of decision making, and creates new products to meet customers’ needs. This method has various applications in plants, bioinformatics, healthcare, etc. It can be improved with various techniques such as machine learning, intelligent tools, and network analysis. This chapter describes applications of big data analytics in biological systems. These applications can be conducted in systems biology by using cloud-based databases (e.g., NoSQL). The chapter explains the improvement of big data technology in plants community with machine learning. Furthermore, it presents various tools to apply big data analytics in bioinformatics systems. Medical signal and genomics are two major fields in healthcare environments that would be improved by this type of analytical method. Finally, the chapter discusses on several use cases of healthcare information system.

Prediction of a novel or potential lead molecules for a therapeutic drug target without adverse effects is a challenging task in the drug designing, discovery, and development process. The systematic integration of multi-omics data from... more

Prediction of a novel or potential lead molecules for a therapeutic drug target without adverse effects is a challenging task in the drug designing, discovery, and development process. The systematic integration of multi-omics data from various data/knowledge bases through computational techniques enables to identify potential lead molecules and study the therapeutic properties. Over the last decades, several drug discoveries using multi-omics and huge dataset integration methods proven with successive results. In this paper, we present different types of computational approaches for prediction of potential lead molecules through the systems-level integration of multi-omics datasets.

13th International conference on Database Management Systems (DMS 2022) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Database Management Systems. The goal of... more

13th International conference on Database Management Systems (DMS 2022) will provide
an excellent international forum for sharing knowledge and results in theory, methodology and
applications of Database Management Systems. The goal of this conference is to bring together
researchers and practitioners from academia and industry to focus on understanding Modern
developments in this field and establishing new collaborations in these areas.
Authors are solicited to contribute to this conference by submitting articles that illustrate
research results, projects, surveying works and industrial experiences that describe significant
advances in the areas of Database management systems.

The statistically equivalent signature (SES) algorithm is a method for feature selection inspired by the principles of constrained-based learning of Bayesian Networks. Most of the currently available feature-selection methods return only... more

The statistically equivalent signature (SES) algorithm is a method for feature selection inspired by the principles of constrained-based learning of Bayesian Networks. Most of the currently available feature-selection methods return only a single subset of features, supposedly the one with the highest predictive power. We argue that in several domains multiple subsets can achieve close to maximal predictive accuracy, and that arbitrarily providing only one has several drawbacks. The SES method attempts to identify multiple, predictive feature subsets whose performances are statistically equivalent. Under that respect SES subsumes and extends previous feature selection algorithms, like the max-min parent children algorithm. SES is implemented in an homonym function included in the R package MXM, standing for mens ex machina, meaning 'mind from the machine' in Latin. The MXM implementation of SES handles several data-analysis tasks, namely classification, regression and survival analysis. In this paper we present the SES algorithm, its implementation, and provide examples of use of the SES function in R. Furthermore, we analyze three publicly available data sets to illustrate the equivalence of the signatures retrieved by SES and to contrast SES against the state-of-the-art feature selection method LASSO. Our results provide initial evidence that the two methods perform comparably well in terms of pre-dictive accuracy and that multiple, equally predictive signatures are actually present in real world data.

Data exploration and visualization systems are of great importance in the Big Data era, in which the volume and heterogeneity of available information make it difficult for humans to manually explore and analyse data. Most traditional... more

Data exploration and visualization systems are of great importance in the Big Data era, in which the volume and
heterogeneity of available information make it difficult for humans to manually explore and analyse data. Most traditional systems
operate in an offline way, limited to accessing preprocessed (static) sets of data. They also restrict themselves to dealing
with small dataset sizes, which can be easily handled with conventional techniques. However, the Big Data era has realized the
availability of a great amount and variety of big datasets that are dynamic in nature; most of them offer API or query endpoints
for online access, or the data is received in a stream fashion. Therefore, modern systems must address the challenge of on-the-fly
scalable visualizations over large dynamic sets of data, offering efficient exploration techniques, as well as mechanisms for
information abstraction and summarization. Further, they must take into account different user-defined exploration scenarios
and user preferences. In this work, we present a generic model for personalized multilevel exploration and analysis over large
dynamic sets of numeric and temporal data. Our model is built on top of a lightweight tree-based structure which can be efficiently
constructed on-the-fly for a given set of data. This tree structure aggregates input objects into a hierarchical multiscale
model. We define two versions of this structure, that adopt different data organization approaches, well-suited to exploration and
analysis context. In the proposed structure, statistical computations can be efficiently performed on-the-fly. Considering different
exploration scenarios over large datasets, the proposed model enables efficient multilevel exploration, offering incremental construction
and prefetching via user interaction, and dynamic adaptation of the hierarchies based on user preferences. A thorough
theoretical analysis is presented, illustrating the efficiency of the proposed model. The proposed model is realized in a web-based
prototype tool, called SynopsViz that offers multilevel visual exploration and analysis over Linked Data datasets. Finally, we
provide a performance evaluation and a empirical user study employing real datasets.

In recent years, the application and wide adoption of Internet of Things (IoT)-based technologies have increased the proliferation of monitoring systems, which has consequently exponentially increased the amounts of heterogeneous data... more

In recent years, the application and wide adoption of Internet of Things (IoT)-based technologies have increased the proliferation of monitoring systems, which has consequently exponentially increased the amounts of heterogeneous data generated. Processing and analysing the massive amount of data produced is cumbersome and gradually moving from classical 'batch' processing-extract, transform, load (ETL) technique to real-time processing. For instance, in environmental monitoring and management domain, time-series data and historical dataset are crucial for prediction models. However, the environmental monitoring domain still utilises legacy systems, which complicates the real-time analysis of the essential data, integration with big data platforms and reliance on batch processing. Herein, as a solution, a distributed stream processing middleware framework for real-time analysis of heterogeneous environmental monitoring and management data is presented and tested on a cluster using open source technologies in a big data environment. The system ingests datasets from legacy systems and sensor data from heterogeneous automated weather systems irrespective of the data types to Apache Kafka topics using Kafka Connect APIs for processing by the Kafka streaming processing engine. The stream processing engine executes the predictive numerical models and algorithms represented in event processing (EP) languages for real-time analysis of the data streams. To prove the feasibility of the proposed framework, we implemented the system using a case study scenario of drought prediction and forecasting based on the Effective Drought Index (EDI) model. Firstly, we transform the predictive model into a form that could be executed by the streaming engine for real-time computing. Secondly, the model is applied to the ingested data streams and datasets to predict drought through persistent querying of the infinite streams to detect anomalies. As a conclusion of this study, a performance evaluation of the distributed stream processing middleware infrastructure is calculated to determine the real-time effectiveness of the framework.

Big Data is the most upcoming field of research and can be used to interpret a lot of trends in data .In my research I have analysed some attributes on the grade of students in the subject Maths from a Portugal senior secondary school... more

Big Data is the most upcoming field of research and can be used to interpret a lot of trends in data .In my research I have analysed some attributes on the grade of students in the subject Maths from a Portugal senior secondary school with the help of Regression Analysis on Excel. It draws various significance in the data and various patterns to be inculcated in curriculum of schools. The Regression Analysis consists of T-Statistics, P-Statistics and ANOVA test. It is determined between 95% confidence levels. I. INTRODUCTION Big Data is a pool of data which becomes useful only when patterns are drawn and knowledge and information is extracted. Student's lifestyle has different aspects which affect their grades. This is an area of utmost interest. It can be used for the study of evolution in crowd and changes that can be made in course curriculum to make students more efficient and interactive. Big data is a vast topic of research; I have proposed a way for the analysis of big data of students of college in secondary education of two Portuguese schools. The data collected various grades classified as G1 , G2 half yearly grade and G3 Final grade and students personal devotion of time into various day-today activities.G1,G2 are strongly co-related with G3 as the final grade is cumulative outcome of these two. Other aspects of students are graded and analysed which help in Multiple regression analysis. Some aspects show a strong relation with the Grades which I have explained in my paper. Regression technique is being used which helps in verifying the relation of different attributes with each other .It consist of independent and several dependent variable whose correlation is test for the data analysis .It is the basic technique of data mining and implemented by various tools like Excel , R , Python. The data sets can be further studied by the help of graphs specifically Scattered Graphs.

This paper presents a data acquisition process for solar energy generation and then analyzes the dynamics of its data stream, mainly employing open software solutions such as Python, MySQL, and R. For the sequence of hourly power... more

This paper presents a data acquisition process for solar energy generation and then analyzes the dynamics of its data stream, mainly employing open software solutions such as Python, MySQL, and R. For the sequence of hourly power generations during the period from January 2016 to March 2017, a variety of queries are issued to obtain the number of valid reports as well as the average, maximum, and total amount of electricity generation in 7 solar panels. The query result on all-time, monthly, and daily basis has found that the panel-by-panel difference is not so significant in a university-scale microgrid, the maximum gap being 7.1% even in the exceptional case. In addition, for the time series of daily energy generations, we develop a neural network-based trace and prediction model. Due to the time lagging effect in forecasting, the average prediction error for the next hours or days reaches 27.6%. The data stream is still being accumulated and the accuracy will be enhanced by more intensive machine learning. 1. INTRODUCTION Increased environment contamination is one of the most urgent problems we are facing these days. Especially, the industrialization of China makes air quality worse and worse. A great deal of air pollutants come from burning fossil fuels to obtain electricity [1]. Renewable energy, such as wind and sunlight, is the most promising solution to this problem, as they can generate energy without greenhouse gas emissions. However, its intermittent nature prevents itself from being seamlessly integrated into the current energy grid or entirely replacing legacy energy generation mechanisms. Indispensably, the renewable energy integration needs electricity reserve units to cope with the time disparity between energy generation and consumption. Their efficient management is the key not only to blend more renewable energy in our power systems but also to reduce the cost of excessive reserves [2]. A grid can make an energy generation plan according to the forecast on how much renewable energy will be available on the next day or the next few hours in addition to the traditional demand forecast [3], [4]. Generally, the prediction of energy availability can be done based on historical statistics or on relevant spatial and temporal parameters [5]. In the example of solar energy, irradiance will be the most important entity. As for history data analysis, most modern renewable energy generators are able to capture their operation status to report to a central manager or store for further analysis [6]. Those datasets allow us to conduct diverse analysis to better understand the operation of facilities and make a prediction model. Particularly, solar energy generation is deeply dependent on climate conditions. Hence, we can enhance the accuracy of prediction models by the integration of diverse data streams. Here, the prediction model will be

In the contemporary world it is inevitable to minimize accumulation of exponential data. Apart from actual data, data formats are also growing. Explosive data growth by itself, does not accurately describe how data is changing; the format... more

In the contemporary world it is inevitable to minimize accumulation of exponential data. Apart from actual data, data formats are also growing. Explosive data growth by itself, does not accurately describe how data is changing; the format and structure of data are changing. Rather than being neatly formatted, cleaned, and normalized data in a corporate database, the data is coming in as raw, unstructured text. The exploration of this paper could serve as a benchmarking and management tool for overhauling Big Data management practices and processing. Experimental fallouts of our investigation advocate that maintainability is enriched.

Recent years, CGM (Consumer Generated Me- dia), such as YouTube and nicovideo.jp for movies, syosetu.com for novel stories, become very popular. A lot of contents are posted to CGM sites every day, and also a large number of users are... more

Recent years, CGM (Consumer Generated Me- dia), such as YouTube and nicovideo.jp for movies, syosetu.com for novel stories, become very popular. A lot of contents are posted to CGM sites every day, and also a large number of users are enjoying posted contents. At present, some articles mentioned decreasing diversity of contents. Some posted new content may be similar with previous posted contents. The authors are afraid that decreasing diversity of contents causes less energetic cultural activity. In this paper, the authors proposed two quantitative metrics of contents diversity, and applied them to the contents in syosetu.com. They focused the keywords which are given to the novel by the novel author, and calculated entropy and similarity of keywords. As the results, they observed increase of similarity, and it shows decrease of diversity of contents.

ALGORITHMS ARE TRADE SECRETS 01000100010000010101010001000001 " Companies are trying to centralize control of their collected big-data. Governments are centralizing control of the big-data. With Digital Object Architecture -almost... more

ALGORITHMS ARE TRADE SECRETS 01000100010000010101010001000001 "
Companies are trying to centralize control of their collected big-data. Governments are centralizing control of the big-data. With Digital Object Architecture -almost mandatory- we are surveyed all the time, every time and right now.
“There is no privacy defense strategy here: we are getting access for free and they are getting our data for free. Are we going to pay attention to big-data collection?”
Key Words: big-data analytics, big data technologies, cyber security, data mining and processing, big data visualization, big data security, machine learning big data, applications of big data, cyber control of internet, centralized control, digital object architecture, cyber surveillance, mandatory addressing system.

In recent years, the application and wide adoption of Internet of Things (IoT)-based technologies have increased the proliferation of monitoring systems, which has consequently exponentially increased the amounts of heterogeneous data... more

In recent years, the application and wide adoption of Internet of Things (IoT)-based technologies have increased the proliferation of monitoring systems, which has consequently exponentially increased the amounts of heterogeneous data generated. Processing and analysing the massive amount of data produced is cumbersome and gradually moving from classical ‘batch’ processing—extract, transform, load (ETL) technique to real-time processing. For instance, in environmental monitoring and management domain, time-series data and historical dataset are crucial for prediction models. However, the environmental monitoring domain still utilises legacy systems, which complicates the real-time analysis of the essential data, integration with big data platforms and reliance on batch processing. Herein, as a solution, a distributed stream processing middleware framework for real-time analysis of heterogeneous environmental monitoring and management data is presented and tested on a cluster using o...

Against the backdrop of the General Data Protection Regulation (GDPR) taking effect in the European Union (EU), a debate emerged about the role of citizens and their relationship with data. European city authorities claim that (smart)... more

Against the backdrop of the General Data Protection Regulation (GDPR) taking effect in the European Union (EU), a debate emerged about the role of citizens and their relationship with data. European city authorities claim that (smart) citizens are as important to a successful smart city program as data and technology are, and that those citizens must be convinced of the benefits and security of such initiatives. This paper examines how the city of Barcelona is marking a transition from the conventional, hegemonic smart city approach to a new paradigm—the experimental city. Through (i) a literature review, (ii) carrying out twenty in-depth interviews with key stakeholders, and (iii) actively participating in three symposiums in Barcelona from September 2017 to March 2018, this paper elucidates how (smart) citizens are increasingly considered decision-makers rather than data providers. This paper considers (i) the implications of the technopolitics of data ownership and, as a result, (ii) the ongoing implementation of the Digital Plan 2017–2020, its three experimental strategies, and the related seven strategic initiatives. This paper concludes that, from the policy perspective, smartness may not be appealing in Barcelona, although the experimental approach has yet to be entirely established as a paradigm. To obtain the full article in open access: http://www.mdpi.com/2071-1050/10/9/3252 To cite this article: Calzada, I. (2018), (Smart) Citizens from Data Providers to Decision-Makers? The Case Study of Barcelona. Sustainability 10(9): 3252. DOI: 10.3390/su10093252. Special Issue: Big Data Research for Social Sciences and Social Impact.

The well-known three Vs of Big Data-Volume, Variety, and Velocity – are increasingly placing pressure on organizations that need to manage this data as well as extract value from this data deluge for Predictive Analytics and... more

The well-known three Vs of Big Data-Volume, Variety, and Velocity – are increasingly placing pressure on organizations that need to manage this data as well as extract value from this data deluge for Predictive Analytics and Decision-Making. Big Data technologies, services, and tools such as Hadoop, MapReduce, Hive and NoSQL/NewSQL databases and Data Integration techniques, In-Memory approaches, and Cloud technologies have emerged to help meet the challenges posed by the flood of Web, Social Media, Internet of Things (IoT) and machine-to-machine (M2M) data flowing into organizations.

We present a novel platform for the interactive visualization of very large graphs. The platform enables the user to interact with the visualized graph in a way that is very similar to the exploration of maps at multiple levels. Our... more

We present a novel platform for the interactive visualization of very large graphs. The platform enables the user to interact with the visualized graph in a way that is very similar to the exploration of maps at multiple levels. Our approach involves an offline preprocessing phase that builds the layout of the graph by assigning coordinates to its nodes with respect to a Euclidean plane. The respective points are indexed with a spatial data structure, i.e., an R-tree, and stored in a database. Multiple abstraction layers of the graph based on various criteria are also created offline, and they are indexed similarly so that the user can explore the dataset at different levels of granularity, depending on her particular needs. Then, our system translates user operations into simple and very efficient spatial operations (i.e., window queries) in the backend. This technique allows for a fine-grained access to very large graphs with extremely low latency and memory requirements and without compromising the functionality of the tool. Our web-based prototype supports three main operations: (1) interactive navigation, (2) multi-level exploration, and (3) keyword search on the graph metadata.

Cel pracy: Identyfikacja potencjału analizy wielkich zbiorów danych (Big Data Analysis - BDA) jako źródła przewagi konkurencyjnej w obszarze poprawy efektywności zarządzania i kompetencji marketingowych przedsiębiorstw i regionów... more

Cel pracy: Identyfikacja potencjału analizy wielkich zbiorów danych (Big Data Analysis - BDA) jako źródła przewagi konkurencyjnej w obszarze poprawy efektywności zarządzania i kompetencji marketingowych przedsiębiorstw i regionów turystycznych.
Metoda badań: W pierwszym etapie dokonano przeglądu literatury i analizy bibliometrycznej, w celu ustalenia znaczenia BDA dla rynku turystycznego oraz skali zainteresowania tą problematyką w kręgach badaczy rynku turystycznego. W drugim etapie zastosowano metodę foresight, czyli zbiór narzędzi umożliwiających konstrukcję scenariuszu rozwoju (w tym przypadku) metod zarządzania. W obszarze metod jakościowych wykorzystano panel ekspercki, analizę strukturalną i analizę ograniczeń, a w obszarze metod identyfikacji działań kluczowych zastosowano drzewo uwarunkowań i studium przełomowych technologii.
Wyniki badań: Wskazano możliwości i uwarunkowania zastosowania BDA w obszarze kreowania nowych doświadczeń turystycznych (experience tourism), zacieśniania relacji z usługobiorcami, zwiększenia ich partycypacji w promocji atrakcji turystycznych i kreowaniu oferty turystycznej (co-creation, involvment), zindywidualizowanie oferty (tailor-made offer), poprawy efektywności przedsiębiorstw oraz zwiększenia atrakcyjności i skuteczności działań promocyjnych regionów, m.in. poprzez dostarczenie metod prognozowania i analizy jakościowej popytu turystycznego.
Ograniczenia badań i wnioskowania: W badaniach metodą foresight najważniejszym rezultatem jest uświadomienie perspektyw i przygotowanie do zmian, drugorzędną kwestią jest dokładność prognozy, co bywa traktowane jako czynnik ograniczający przydatność tej metody. W przypadku prezentowanego tematu naturalnym rozwinięciem powinna się stać identyfikacja założeń rozwoju turystyki, które umożliwiłyby przezwyciężenie ograniczeń zastosowania BDA w realiach polskiego rynku turystycznego.
Implikacje praktyczne: Rozwiązania z zakresu nowych technologii oparte na BDA w najbliższych latach będą stanowiły podstawę funkcjonowania całej branży turystycznej (prognozowanie popytu i dostarczanie unikatowej jakości produktów turystycznych). Wyniki stanowią wskazówkę dla podmiotów rynku turystycznego oraz branż wspomagających (np. nowe technologie) lub powiązanych osobą nabywcy (np. placówki kulturalne).
Oryginalność pracy: Prezentowane wyniki stanowią pierwsze tego typu opracowanie na rynku polskim, a uczestnictwo zróżnicowanej grupy ekspertów decyduje o nowatorskiej formie, znajdującej tylko jeden odpowiednik w dostępnych badaniach zagranicznych.
Rodzaj pracy: Praca analityczna o charakterze problemowym.

We address the problem of estimating the running quantile of a data stream when the memory for storing observations is limited. We (i) highlight the limitations of approaches previously described in the literature which make them... more

We address the problem of estimating the running quantile of a data stream when the memory for storing observations is limited. We (i) highlight the limitations of approaches previously described in the literature which make them unsuitable for non-stationary streams, (ii) describe a novel principle for the utilization of the available storage space, and (iii) introduce two novel algorithms which exploit the proposed principle. Experiments on three large real-world data sets demonstrate that the proposed methods vastly outperform the existing alternatives.

In recent years, the application and wide adoption of Internet of Things (IoT)-based technologies have increased the proliferation of monitoring systems, which has consequently exponentially increased the amounts of heterogeneous data... more

In recent years, the application and wide adoption of Internet of Things (IoT)-based technologies have increased the proliferation of monitoring systems, which has consequently exponentially increased the amounts of heterogeneous data generated. Processing and analysing the massive amount of data produced is cumbersome and gradually moving from classical 'batch' processing-extract, transform, load (ETL) technique to real-time processing. For instance, in environmental monitoring and management domain, time-series data and historical dataset are crucial for prediction models. However, the environmental monitoring domain still utilises legacy systems, which complicates the real-time analysis of the essential data, integration with big data platforms and reliance on batch processing. Herein, as a solution, a distributed stream processing middleware framework for real-time analysis of heterogeneous environmental monitoring and management data is presented and tested on a cluster...

In recent years, the application and wide adoption of Internet of Things (IoT)-based technologies have increased the proliferation of monitoring systems, which has consequently exponentially increased the amounts of heterogeneous data... more

In recent years, the application and wide adoption of Internet of Things (IoT)-based technologies have increased the proliferation of monitoring systems, which has consequently exponentially increased the amounts of heterogeneous data generated. Processing and analysing the massive amount of data produced is cumbersome and gradually moving from classical ‘batch’ processing—extract, transform, load (ETL) technique to real-time processing. For instance, in environmental monitoring and management domain, time-series data and historical dataset are crucial for prediction models. However, the environmental monitoring domain still utilises legacy systems, which complicates the real-time analysis of the essential data, integration with big data platforms and reliance on batch processing. Herein, as a solution, a distributed stream processing middleware framework for real-time analysis of heterogeneous environmental monitoring and management data is presented and tested on a cluster using o...

In on-demand multimedia streaming systems, streaming techniques are usually combined with proxy caching to obtain better performance. The patch streaming technique has no start-up latency inherent to it, but requires extra bandwidth to... more

In on-demand multimedia streaming systems, streaming techniques are usually combined with proxy caching to obtain better performance. The patch streaming technique has no start-up latency inherent to it, but requires extra bandwidth to deliver the media data in patch streams. This paper proposes a proxy caching technique which aims at reducing the bandwidth cost of the patch streaming technique. The proposed approach determines media prefixes with high patching cost and caches the appropriate media prefix at the proxy/local server. Herein the scheme is evaluated using a synthetically generated media access workload and its performance is compared with that of the popularity and prefix-aware interval caching scheme (the prefix part) and with that of patch streaming with no caching. The bandwidth saving, hit ratio and concurrent number of clients are used to compare the performance, and the proposed scheme is found to perform better for different caching capacities of the proxy server.

World is a happening ground for the disasters almost daily. These incidents of mass destruction irrespective of the whether natural calamities or man-made catastrophes cause a huge loss of money, property and lives due to non-planning on... more

World is a happening ground for the disasters almost daily. These incidents of mass destruction irrespective of the whether natural calamities or man-made catastrophes cause a huge loss of money, property and lives due to non-planning on the part of the governments and the management agencies. Therefore steps are required to be taken towards the prevention of these situations by pre determining the causes of these disasters and providing quick rescue measures once the disaster occurs. Wireless Ad hoc sensor networks are playing a vital role in wireless data transmission infrastructure and can be very helpful in these situations. Wireless sensor networks utilize the technologies which can cause an alert for the immediate rescue operation to begin, whenever this disaster is struck. Through this paper our aim is to review technological solutions for managing disaster using wireless sensor networks (WSN) via disaster detection and alerting system, and search and rescue operations. We ha...