Predictive Analytics Research Papers - Academia.edu (original) (raw)

The Predictive Airliner is an airline that utilizes the latest technology to deliver an exceptional personalized experience to each and every passenger it flies. Today, technology such as AI, Machine Learning, Augmented Reality, IoT,... more

The Predictive Airliner is an airline that utilizes the latest technology to deliver an exceptional personalized experience to each and every passenger it flies. Today, technology such as AI, Machine Learning, Augmented Reality, IoT, Real-time stream processing, social media, streaming analytics and wearables are altering the Customer Experience (CX) landscape and airlines need to jump aboard this fast moving technology or run the risk of being left out in the cold. The Predictive Airliner reveals how these and other technologies can help shape the customer journey. The book details how the five types of analytics—descriptive, diagnostic, predictive, prescriptive, and edge analytics—affect not only the customer journey, but also just about every operational function within an airline. An IoT-connected airline can make its operations smart. Data collected at multiple company and customer touch points can be utilized to increase customer satisfaction, as well as make the airline more profitable. The book lays out a blueprint for airlines to use to build a better overall operation. By utilizing AI, machine learning, and deep learning airlines can monitor the health of their airplanes, ensure employee satisfaction, and deliver an award-winning customer experience every time. Analytical processes like decision trees, k-means clustering, logistic regression and neural networks are explained in detail, with specific use cases detailing how they are used profitably in the aviation industry. Edge analytics, sentiment analysis, clickstream analysis, and location analysis are seen through a customer intelligence lens to ensure passengers are treated in a personalized way that will not only increase loyalty but turn passengers into apostles for the airlines they chose to fly on. Connected devices can help with inventory optimization, supply chain management, labor management, waste management, as well as keep the airline’s data centers green and its energy use smart. Social media is no longer a vanity platform, but rather it is a place to both connect with current customers, as well as court new ones. It is also a powerful branding channel that can be utilized to both understand an airline’s position in the market, as well as a place to benchmark its position against competitors. The Predictive Airliner reveals how airlines can utilize this channel in a multitude of ways to connect with customers, as well as help in moments of crisis. Today, technology moves at break-neck speed and it can offer the potential of anticipatory capabilities, but it also comes with a confusing variety of technological terms--Big Data, Cognitive Computing, CX, Data Lakes, Hadoop, Kafka, Personalization, Spark, etc., etc. The Predictive Airliner will help airline executives make sense of it all, so that he or she can cut through the confusing clutter of technological jargon and understand why a Spark-based real-time stream processing data stream might be preferable to a TIBCO Streambase one, or none at all. The final chapter explains how an airline can utilize the concept of the customer journey as a roadmap to increase customer satisfaction. This book will help airline executives break through the technological clutter so that they can deliver an unrivaled customer experience to each and every passenger who steps aboard their planes.

Forecasting is an important activity in economics, finance, marketing and various other domains like environmental and social sciences. There are several methods for making forecasts, but they all fall into two categories: causal methods... more

Forecasting is an important activity in economics, finance, marketing and various other domains like environmental and social sciences. There are several methods for making forecasts, but they all fall into two categories: causal methods and time series methods. In many cases, predictive algorithms implementing time series are good candidates for forecasting. In this paper we run a comparative study of three of these algorithms: Linear Regression, Support Vector Machines and Multilayer Perceptron in order to determine their performances in term of implementing times series for predictive systems. To assess the performance of these algorithms, we have conducted experiments over four representative datasets. The results exhibit that linear regression produced the best forecasts. The other two algorithms show a good behavior as well.

We conduct a lottery experiment to assess the predictive importance of simple choice process metrics (SCPMs) in forecasting risky 50/50 gambling decisions using different types of machine learning algorithms as well as traditional choice... more

We conduct a lottery experiment to assess the predictive importance of simple choice process metrics (SCPMs) in forecasting risky 50/50 gambling decisions using different types of machine learning algorithms
as well as traditional choice modeling approaches. The SCPMs are recorded during a fixed pre-decision phase and are derived from tracking subjects’ eye movements, pupil sizes, skin conductance, and cardiovascular and respiratory signals. Our study demonstrates that SCPMs provide relevant information for predicting gambling decisions, but we do not find forecasting accuracy to be substantially affected by adding SCPMs to standard choice data. Instead, our results show that forecasting accuracy highly depends on differences in subject-specific risk preferences and is largely driven by including information on lottery design variables. As a key result, we find evidence for dynamic changes in the predictive importance of psychophysiological responses that appear to be linked to habituation and resource-depletion effects. Subjects’ willingness to gamble and choice-revealing arousal signals both decrease as the experiment progresses. Moreover, our findings
highlight the importance of accounting for previous lottery payoff characteristics when investigating the role of emotions and cognitive bias in repeated decision-making scenarios.

Machine learning enables computers to learn from large amounts of data without specific programming. Besides its commercial application, companies are starting to recognize machine learning importance and possibilities in order to... more

Machine learning enables computers to learn from large amounts of data without specific programming. Besides its commercial application, companies are starting to recognize machine learning importance and possibilities in order to transform their data assets into business value. This study explores integration of machine learning into business core processes, while enabling predictive analytics that can increase business values and provide competitive advantage. It proposes machine learning algorithm based on regression analysis for a business solution in large enterprise company in Macedonia, while predicting real-value outcome from a given array of business inputs. The results show that most of the machine learning predictive values for the desired process output deviated from 0 to 15% of actual employees’ decision. Hence, it verifies the appropriateness of the chosen approach, with predictive accuracy that can be meaningful in practice. As a machine learning case study in business context, it contains valuable information that can help companies understand the significance of machine learning for enterprise computing. It also points out some potential pitfalls of machine learning misuse.

Keywords: MIMO antenna UWB antenna Model predictive Control Non-uniform microstrip line Printed monopole antenna a b s t r a c t A novel ultra wideband (UWB) printed monopole multiple-input multiple-output (MIMO) antenna with non-uniform... more

Keywords: MIMO antenna UWB antenna Model predictive Control Non-uniform microstrip line Printed monopole antenna a b s t r a c t A novel ultra wideband (UWB) printed monopole multiple-input multiple-output (MIMO) antenna with non-uniform transmission line using nonlinear model predictive control (NMPC) is presented. The proposed antenna is superior to conventional antennas in terms of dimensions, gain, and efficiency while maintaining the impedance bandwidth. In order to improve the results, a non-uniform transmission line has been used for impedance matching between the radiated patch element and the coaxial cable. For designing the non-uniform transmission line, it has been expanded using cosine terms. Regarding the presence of differential equation for the variation in the impedance of the transmission line and its transformation to the state-space equation, NMPC has been employed to design the transmission line and determine the cosine expansion coefficients. Two base antennas, as MIMO, were simulated configuration and fabricated. The surface area of the proposed MIMO antenna is 0.99 k 2 g , the wavelength has been obtained for the center frequency of the 3.16 GHz to 10.6 GHz range, and its mutual coupling, peak gain, channel capacity loss (CCL), total active reflection coefficient (TARC), mean effective gain (MEG) and diversity gain (DG), envelope correlation (ECC) are acceptable. The simulation and measurement results are in good agreement, and the proposed antenna is suitable for MIMO applications. Ó 2020 The ''Authors". Karabuk University. Publishing services by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

In 2013, I proposed using drones to extinguish future forest fires. Admittedly, both the idea and the technology were crude, and proposing an idea like this is not the same as implementing a solution. During the first few minutes, between... more

In 2013, I proposed using drones to extinguish future forest fires. Admittedly, both the idea and the technology were crude, and proposing an idea like this is not the same as implementing a solution. During the first few minutes, between the time when a fire first starts and when it reaches a point of being out of control, is a containment window where only a few gallons of water or a few pounds of fire retardant is necessary to put the evil genie back into its bottle. Using a fleet of surveillance drones, equipped with special infrared cameras, fires can be spotted during the earliest moments of a containment window, signaling a fleet of extinguisher drones to douse the blaze before anything serious happens. Drones specifically designed for extinguishing forest fires have the potential to eliminate virtually 100% of the devastating fires that dominate newspaper headlines every summer. But is that what we want?

Iverson, Lee, and Wagenmakers (2009) claimed that Killeen's statistic prep overestimated the “true probability of replication.” We show that Iverson et al. confused the probability of replication of an observed direction of effect with a... more

Iverson, Lee, and Wagenmakers (2009) claimed that Killeen's statistic prep overestimated the “true probability of replication.” We show that Iverson et al. confused the probability of replication of an observed direction of effect with a probability of coincidence—the probability that two future experiments will return the same sign. We emphasize throughout that prep is intended to evaluate the probability of a replication outcome after observations, not to estimate a parameter. Hence, the usual conventional criteria (unbiasedness, minimum variance estimator) for judging estimators are not appropriate for probabilities such as p and prep.

This is a case study that Qualex Asia did using some sports betting website data. To highlight the application of QlikView and SAP’s InfiniteInsights with a real source of data, Qualex Asia has done an analysis using transactional... more

This is a case study that Qualex Asia did using some sports betting website data. To highlight the application of QlikView and SAP’s InfiniteInsights with a real source of data, Qualex Asia has done an analysis using transactional information from a boutique global sports book predominantly accepting wagers via the internet. The majority of their business is on soccer and American team sports. The raw data set contains approximately four (4) years of single wagers (i.e. no multi/accumulator bets) comprising some 2.2M unique transactions across almost 300K unique events and 37.5K unique customers. The total turnover is $4.2B (AUD).
It is clearly evident that a significant reduction in data has taken place, allowing the sports book to utilise the much more succinct representation of their clientele to make strategic decisions across different areas of the business. The results of this analysis would be beneficial to the odds makers, risk managers, marketing department and also senior management.

ABSTRACT Reducing the amount of lost and damaged perishable goods (food and medicines) during transportation and storage represents a substantial global challenge, which imply the implementation of cold chain monitoring at all levels of... more

ABSTRACT Reducing the amount of lost and damaged perishable goods (food and medicines) during transportation and storage represents a substantial global challenge, which imply the implementation of cold chain monitoring at all levels of the supply chains. The main objective of this work is to develop an IoT enabled infrastructure, which supports the development of smart transportation specific applications. The proposed infrastructure contains an emulator, which can be used for emulating the conditions inside a truck during the transportation process and some CEP based tools for processing the data generated by the sensors.

It is estimated that approximately $700 billion is lost due to fraud, waste, and abuse in the US healthcare system. Medicaid has been particularly susceptible target for fraud in recent years, with a distributed management model, limited... more

It is estimated that approximately $700 billion is lost due to fraud, waste, and abuse in the US healthcare system. Medicaid has been particularly susceptible target for fraud in recent years, with a distributed management model, limited crossprogram communications, and a difficult-to-track patient population of low-income adults, their children, and people with certain disabilities. For effective fraud detection, one has to look at the data beyond the transaction-level. This paper builds upon Sparrow's fraud type classifications and the Medicaid environment and to develop a Medicaid multidimensional schema and provide a set of multidimensional data models and analysis techniques that help to predict the likelihood of fraudulent activities. These data views address the most prevalent known fraud types and should prove useful in discovering the unknown unknowns. The model is evaluated by functionally testing against known fraud cases.

This paper discusses a couple of basic methodological problems inherent in predictive modelling as used today in mapping the location of Stone Age settlements based solely on landscape topography/bathymetry. It argues that the modelling... more

This paper discusses a couple of basic methodological problems inherent in predictive modelling as used today in mapping the location of Stone Age settlements based solely on landscape topography/bathymetry. It argues that the modelling approach employed is based on elements adopted from a type of landscape ecology that was abandoned more than 20 years ago, because it was unable to produce reasonable results, and that it can be
difficult to develop prediction methodology based on the present understanding of landscape ecology as being
extremely complex and dynamic. Furthermore, it maintains that the modelling approach currently employed in Stone Age archaeology is based on assumptions about prehistoric resource-strategic behaviour that are simplistic and out of tune with what we now know. It therefore questions whether it is possible to develop a precise and efficient predictive procedure for modelling the locations of Stone Age sites.

Objectives Quantification and early identification of unplanned readmission risk have the potential to improve the quality of care during hospitalization and after discharge. However, high dimensionality, sparsity, and class imbalance of... more

Objectives Quantification and early identification of unplanned readmission risk have the potential to improve the quality of care during hospitalization and after discharge. However, high dimensionality, sparsity, and class imbalance of electronic health data and the complexity of risk quantification, challenge the development of accurate predictive models. Predictive models require a certain level of interpretability in order to be applicable in real settings and create actionable insights. This paper aims to develop accurate and interpretable predictive models for readmission in a general pediatric patient population, by integrating a data-driven model (sparse logistic regression) and domain knowledge based on the international classification of diseases 9th-revision clinical modification (ICD-9-CM) hierarchy of diseases. Additionally, we propose a way to quantify the interpretability of a model and inspect the stability of alternative solutions. Materials and methods The analysis was conducted on >66,000 pediatric hospital discharge records from California, State Inpatient Databases, Healthcare Cost and Utilization Project between 2009 and 2011. We incorporated domain knowledge based on the ICD-9-CM hierarchy in a data driven, Tree-Lasso regularized logistic regression model, providing the framework for model interpretation. This approach was compared with traditional Lasso logistic regression resulting in models that are easier to interpret by fewer high-level diagnoses, with comparable prediction accuracy. Results The results revealed that the use of a Tree-Lasso model was as competitive in terms of accuracy (measured by area under the receiver operating characteristic curve-AUC) as the traditional Lasso logistic regression, but integration with the ICD-9-CM hierarchy of diseases provided more interpretable models in terms of high-level diagnoses. Additionally, interpretations of models are in accordance with existing medical understanding of pediatric readmission. Best performing models have similar performances reaching AUC values 0.783 and 0.779 for traditional Lasso and Tree-Lasso, respectfully. However, information loss of Lasso models is 0.35 bits higher compared to Tree-Lasso model.

Across the industries, there is a growing need of increased Operational Reliability, Availability and Maintainability of the equipment, which comprises of diagnosis and prognosis of a particular problem. As systems are evolving daily to a... more

Across the industries, there is a growing need of increased Operational Reliability, Availability and Maintainability of the equipment, which comprises of diagnosis and prognosis of a particular problem. As systems are evolving daily to a new level of sophistication, maintenance of those systems needs a critical approach. Hence, more industries are trying to adapt predictive maintenance policies in their critical area of operations. With the advantage of predictive analytics and machine learning techniques, predictive maintenance is gaining its momentum in different industries. In this paper, a framework of predictive reliability modelling has been discussed, which is a part of predictive maintenance. With the help of various machine-learning models, one predictive reliability model has been made. This paper also projects a model to calculate Remaining useful life (RUL) by incorporating Recurrent Neural Network (RNN).

The film industry is one of the biggest contributors to the entertainment industry and also it is characterized with its unpredictability in success and Failure. Film Industry has always amused everyone with its unpredictable success and... more

The film industry is one of the biggest contributors to the entertainment industry and also it is characterized with its unpredictability in success and Failure. Film Industry has always amused everyone with its unpredictable success and Failure. This research looks into the inner details of watching a movie by splitting the research into three main components. First section is exploring the variables that influence the frequency of movie watch; second, developing a model to predict the success or failure. Finally, social network sentiment analysis is carried out through data mining to capture the audience sentiment and its impact on movie’s success and failure. The research tries to look at the success or failure of a movie on a more holistic manner than trying to grade the performance of a movie over a few variables based on the previous research works on movie success prediction.

Breast cancer unarguably has been the very prominent disease amongst women as well as the next most dangerous after lung cancer. Early diagnosis and prevention is of paramount importance. Several methods such as micro-array analysis and... more

Breast cancer unarguably has been the very prominent disease amongst women as well as the next most dangerous after lung cancer. Early diagnosis and prevention is of paramount importance. Several methods such as micro-array analysis and network analysis have been proffered but they are somewhat expensive and time consuming. There is a need to develop an automated system based on Machine learning techniques to detect breast cancer early. Benign and Malignant tumors were classified using Logistic Regression (LRO), Bayes Network (BNK), Multilayer Perceptron (MLP), Sequential Minimal Optimization (SMO), J48, Naive Bayes (NBS) and Instance Based Learner (IBK) algorithms, which were implemented in Waikato Environment for Knowledge Analysis (WEKA). The breast cancer database for this study was collected from the University of Wisconsin Hospitals, published on California College, Irvive (UCI) website. The five most critical performance metrics when selecting an algorithm in model building i...

Nowadays, online social media is online discourse where people contribute to create content, share it, bookmark it, and network at an impressive rate. The faster message and ease of use in social media today is Twitter. The messages on... more

Nowadays, online social media is online discourse where people contribute to create content, share it, bookmark it, and network at an impressive rate. The faster message and ease of use in social media today is Twitter. The messages on Twitter include reviews and opinions on certain topics such as movie, book, product, politic, and so on. Based on this condition, this research attempts to use the messages of twitter to review a movie by using opinion mining or sentiment analysis. Opinion mining refers to the application of natural language processing, computational linguistics, and text mining to identify or classify whether the movie is good or not based on message opinion. Support Vector Machine (SVM) is supervised learning methods that analyze data and recognize the patterns that are used for classification. This research concerns on binary classification which is classified into two classes. Those classes are positive and negative. The positive class shows good message opinion; otherwise the negative class shows the bad message opinion of certain movies. This justification is based on the accuracy level of SVM with the validation process uses 10-Fold cross validation and confusion matrix. The hybrid Partical Swarm Optimization (PSO) is used to improve the election of best parameter in order to solve the dual optimization problem. The result shows the improvement of accuracy level from 71.87% to 77%.

Predictive policing generally refers to police work that utilises strategies, algorithmic technologies, and big data to generate near-future predictions about the people and places deemed likely to be involved in or experience crime.... more

Predictive policing generally refers to police work that utilises strategies, algorithmic technologies, and big data to generate near-future predictions about the people and places deemed likely to be involved in or experience crime. Claimed benefits of predictive policing centre on the technology’s ability to enable pre-emptive police work by automating police decisions. The goal is that officers will rely on computer software and smartphone applications to instruct them about where and who to police just as Uber drivers rely on similar technologies to instruct them about where to pick up passengers. Unfortunately, little is known about the experiences of the in-field users of predictive technologies. This article helps fill this gap by addressing the under researched area of how police officers engage with predictive technologies. As such, data is presented that outlines the findings of a qualitative study with UK police organisations involved in designing and trialing predictive policing software. Research findings show that many police officers have a detailed awareness of the limitations of predictive technologies, specifically those brought about by errors and biases in input data. This awareness has led many officers to develop a sceptical attitude towards predictive technologies and, in a few cases, these officers have expressed a reluctance to use predictive technologies. Based on these findings, this paper argues that claims about predictive software’s ability to neutralise the subjectivity of police work overlooks the ongoing struggles of the police officer to assert their agency and mediate the extent to which predictions will be trusted and utilised.

The report finds the prospects of the gaming community by accessing the gamer based upon age, ethnicity, occupation, annual household income and education qualifications. The data collected as part of this survey would require further... more

The report finds the prospects of the gaming community by accessing the gamer based upon age, ethnicity, occupation, annual household income and education qualifications. The data collected as part of this survey would require further investigation and remedial action by management as certain variables provided does not contribute to the overall variance percentage in factorial analysis.

Introduction: n volume 3, number 2, pages 48-49, we explained some screening characteristics of a diagnostic test in an educational manuscript entitled " Simple definition and calculation of accuracy, sensitivity and specific-ity" (1).... more

Introduction: n volume 3, number 2, pages 48-49, we explained some screening characteristics of a diagnostic test in an educational manuscript entitled " Simple definition and calculation of accuracy, sensitivity and specific-ity" (1). The present article was aimed to review other screening performance characteristics including positive and negative predictive values (PPV and NPV). PPV and NPV are true positive and true negative results of a diagnostic test, respectively (2). In other words, if a subject receives a certain diagnosis by a test, predictive values describe how likely it is for the diagnosis to be correct Definitions: Patient: positive for disease Healthy: negative for disease True positive (TP)= the number of cases correctly identified as patient False positive (FP) = the number of cases incorrectly identified as patient True negative (TN) = the number of cases correctly identified as healthy False negative (FN) = the number of cases incorrectly identified as healthy Positive predictive value: Positive predictive value is the proportion of cases giving positive test results who are already patients (3). It is the ratio of patients truly diagnosed as positive to all those who had positive test results (including healthy subjects who were incorrectly diagnosed as patient). This characteristic can predict how likely it is for someone to truly be patient, in case of a positive test result.

Acute Myocardial Infarction (Heart Attack), a Cardiovascular Disease (CVD) leads to Ischemic Heart Disease(IHD) is one of the major killers worldwide. A proficient approach is proposed in this paper that can predict the chances of heart... more

Acute Myocardial Infarction (Heart Attack), a Cardiovascular Disease (CVD) leads to Ischemic Heart Disease(IHD) is one of the major killers worldwide. A proficient approach is proposed in this paper that can predict the chances of heart attack when a person is bearing chest pain or equivalent symptoms. We have developed a prototype by integrating clinical data collected from patients admitted in different hospitals attacked by Acute Myocardial Infarction (AMI). 25 attributes related to symptoms of heart attack are collected and analyzed where chest pain, palpitation, breathlessness, syncope with nausea, sweating, vomiting are the prominent symptoms of a person getting heart attack. The data mining techniques namely decision tree and random forest are used to analyze heart attack dataset where classification of more common symptoms related to heart attack is done using c4.5 decision tree algorithm, alongside, random forest is applied to improve the accuracy of the classification result of heart attack prediction. A guiding system to suspect the chest pain as having heart attack or not may help many people who tend to neglect the chest pain and later land up in catastrophe of heart attacks.

Digital textbook analytics are a new method of collecting student-generated data in order to build predictive models of student success. Previous research using self-report or laboratory measures of reading show that engagement with the... more

Digital textbook analytics are a new method of collecting student-generated data in order to build predictive models of student success. Previous research using self-report or laboratory measures of reading show that engagement with the textbook was related to student learning outcomes. We hypothesized that an engagement index based on digital textbook usage data would predict student course grades. Linear regression analyses were conducted using data from 233 students to determine whether digital textbook usage metrics predicted final course grades. A calculated linear index of textbook usage metrics was significantly predictive of final course grades and was a stronger predictor of course outcomes than previous academic achievement. However, time spent reading, one of the variables that make up the index was more strongly predictive of course outcomes. Additionally, students who were in the top 10th percentile in number of highlights had significantly higher course grades than those in the lower 90th percentile. These findings suggest that digital textbook analytics are an effective early warning system to identify students at risk of academic failure. These data can be collected unobtrusively and automatically and provide stronger prediction of outcomes than prior academic achievement (which to this point has been the single strongest predictor of student success).

The term big data refers to the data that exceeds the processing or analyzing capacity of existing database management systems. The inability of existing DBMS to handle big data is due to its large volume, high velocity, pertaining... more

The term big data refers to the data that exceeds the processing or analyzing capacity of existing database management systems. The inability of existing DBMS to handle big data is due to its large volume, high velocity, pertaining veracity, heterogeneous variety, and on-atomic values. Nowadays, healthcare plays a vital role in everyone's life. It becomes a very large and open platform for everyone to do all kinds of research work without affecting human life. When it comes to disease, there are so many types found all over the world. But among them, AIDS (acquired immunodeficiency syndrome) is a disease that spreads so quickly and can easily turn life to death. There are many studies going on to create drugs to cure this deadly disease, but until now, there has been no success. In cases such as this, big data is implemented for better a result, which will have a good impact on society.

Big data has emerged as an important area of interest pertaining to the study and research of practitioners and academicians. The exponential growth of data is fuelled by the exponential growth of various internet and digital devices.... more

Big data has emerged as an important area of interest pertaining to the study and research of practitioners and academicians. The exponential growth of data is fuelled by the exponential growth of various internet and digital devices. Advancement in technology has made it economically feasible to store and analyse huge amounts of data. Big data is a juxtaposition of structured, semi-structured and unstructured real time data originating from a variety of sources. Predictive analytics provides the methodology in tapping intelligence from large data sets. Many visionary companies such as Google, Amazon, etc. have realised the potential of big data and analytics in gaining competitive advantage. These techniques provide several opportunities like discovering patterns or better optimisation algorithms. Managing and analysing big data also constitutes a few challenges - namely size, quality, reliability and completeness of data. This paper provides an extensive review of literature on big data and predictive analytics. It gives the reader details of the fundamental concepts in this emerging field. Finally, we conclude with the findings of our study and have outlined future research directions in this field.

Archaeological excavations are currently active at Orgeres - La Thuile in the Aosta Valley (NW Italy). The efforts from both academic players and local policy makers are addressed to operate an inclusive synergy, where excavations are the... more

Archaeological excavations are currently active at Orgeres - La Thuile in the Aosta Valley (NW Italy). The efforts from both academic players and local policy makers are addressed to operate an inclusive synergy, where excavations are the focus around which research, didactic, local business and tourism can rotate to generate mutual benefits. The declared research goal is to look for traces suggesting reading keys useful to describe ancient occupations of the area. In spite of the well-known applications of Geomatics in archaeology, in this work authors explicitly intend to make clear the high potentialities resident in those geographical data, generally released for free from institutional players by Geoportals, that were not specifically intended for an archaeological use. In the case study here presented some suggestions are given about the possibility of jointly using medium density point clouds from ALS, aerial hyperspectral images, high resolution true colour RGB aerial digital orthoimages. Presented results are preliminary.

An active machine learning technique for monitoring the voltage stability in transmission systems is presented. It has been shown that machine learning algorithms may be used to supplement the traditional simulation approach, but they... more

An active machine learning technique for monitoring the voltage stability in transmission systems is presented. It has been shown that machine learning algorithms may be used to supplement the traditional simulation approach, but they suffer from the difficulties of online machine learning model update and offline training data preparation. We propose an active learning solution to enhance existing machine learning applications by actively interacting with the online prediction and offline training process. The technique identifies operating points where machine learning predictions based on power system measurements contradict with actual system conditions. By creating the training set around the identified operating points, it is possible to improve the capability of machine learning tools to predict future power system states. The technique also accelerates the offline training process by reducing the amount of simulations on a detailed power system model around operating points where correct predictions are made. Experiments show a significant advantage in relation to the training time, prediction time and number of measurements that need to be queried to achieve high prediction accuracy.

Acute Myocardial Infarction (Heart Attack), a Cardiovascular Disease (CVD) leads to Ischemic Heart Disease(IHD) is one of the major killers worldwide. A proficient approach is proposed in this paper that can predict the chances of heart... more

Acute Myocardial Infarction (Heart Attack), a Cardiovascular Disease (CVD) leads to Ischemic Heart Disease(IHD) is one of the major killers worldwide. A proficient approach is proposed in this paper that can predict the chances of heart attack when a person is bearing chest pain or equivalent symptoms. We have developed a prototype by integrating clinical data collected from patients admitted in different hospitals attacked by Acute Myocardial Infarction (AMI). 25 attributes related to symptoms of heart attack are collected and analyzed where chest pain, palpitation, breathlessness, syncope with nausea, sweating, vomiting are the prominent symptoms of a person getting heart attack. The data mining techniques namely decision tree and random forest are used to analyze heart attack dataset where classification of more common symptoms related to heart attack is done using c4.5 decision tree algorithm, alongside, random forest is applied to improve the accuracy of the classification result of heart attack prediction. A guiding system to suspect the chest pain as having heart attack or not may help many people who tend to neglect the chest pain and later land up in catastrophe of heart attacks.

The Canvas Learning Management System (LMS) is used in thousands of universities across the United States and internationally, with a strong and growing presence in K-12 and higher education markets. Analyzing the development of the... more

The Canvas Learning Management System (LMS) is used in thousands of universities across the United States and internationally, with a strong and growing presence in K-12 and higher education markets. Analyzing the development of the Canvas LMS, we examine 1) ‘frictionless’ data transitions that bridge K12, higher education, and workforce data 2) integration of third party applications and interoperability or data-sharing across platforms 3) privacy and security vulnerabilities, and 4) predictive analytics and dataveillance. We conclude that institutions of higher education are currently ill-equipped to protect students and faculty required to use the Canvas Instructure LMS from data harvesting or exploitation. We challenge inevitability narratives and call for greater public awareness concerning the use of predictive analytics, impacts of algorithmic bias, need for algorithmic transparency, and enactment of ethical and legal protections for users who are required to use such software platforms.

In this paper, we present a Cluster-Based Approach (CBA) that utilizes the support vector machine (SVM) and an artificial neural network (ANN) to estimate and predict the daily horizontal global solar radiation. In the proposed... more

In this paper, we present a Cluster-Based Approach (CBA) that utilizes the support vector machine (SVM) and an artificial neural network (ANN) to estimate and predict the daily horizontal global solar radiation. In the proposed CBA-ANN-SVM approach, we first conduct clustering analysis and divided the global solar radiation data into clusters, according to the calendar months. Our approach aims at maximizing the homogeneity of data within the clusters, and the heterogeneity between the clusters. The proposed CBA-ANN-SVM approach is validated and the precision is compared with ANN and SVM techniques. The mean absolute percentage error (MAPE) for the proposed approach was reported lower than those of ANN and SVM.

It is estimated that approximately $700 billion is lost due to fraud, waste, and abuse in the US healthcare system. Medicaid has been particularly susceptible target for fraud in recent years, with a distributed management model, limited... more

It is estimated that approximately $700 billion is lost due to fraud, waste, and abuse in the US healthcare system. Medicaid has been particularly susceptible target for fraud in recent years, with a distributed management model, limited crossprogram communications, and a difficult-to-track patient population of low-income adults, their children, and people with certain disabilities. For effective fraud detection, one has to look at the data beyond the transaction-level. This paper builds upon Sparrow's fraud type classifications and the Medicaid environment and to develop a Medicaid multidimensional schema and provide a set of multidimensional data models and analysis techniques that help to predict the likelihood of fraudulent activities. These data views address the most prevalent known fraud types and should prove useful in discovering the unknown unknowns. The model is evaluated by functionally testing against known fraud cases.

Purpose: The research aims to characterize newest soultions, especially with the respect to cybersecurity aspects of Business Intelligence analytics based on the processing of large sets of information with the use of sentiment analysis... more

Purpose: The research aims to characterize newest soultions, especially with the respect to cybersecurity aspects of Business Intelligence analytics based on the processing of large sets of information with the use of sentiment analysis and Big Data. Design/Methodology/Approach: The working hypothesis refers to assumption that current regulations and security solutions for Business Intelligence analytics based on the processing of large sets of information with the use of sentiment analysis and Big Data is under extreme preassure to meet evergrowing challenges. There are more and more dends form the legal regulators as well as from the market and that creates a lot of problems with data protection. The article uses legal and comparative analysis as well as structural and functional analysis. Additionally, the interpretation method is also present. Findings: Article indicates that the aforementioned issues with the respect to growing importance of internet including the Internet of Things and Internet of Everything are becoming of more and more importance and cannot go with appropriate level of cybersecurity since the data they collect is of the great importance. The trends immanent to Industry 4.0 require from business more effort and customer orientation. Growing population and access to Internet demands larger scales of business operations. Practical Implications: As a result of conducting the research, it is possible to identify threats and present some recommendations for cybersecurity of Business Intelligence. Originality/Value: This is a complete research for Cybersecurity of Business Intelligence analytics based on the processing of large sets of information with the use of sentiment analysis and Big Data.

The global coronavirus pandemic and lockdown has had negative impacts on individuals' mental health and well-being. The crisis has generated symptoms of depression in many, which may last even after the lockdown is over. To provide... more

The global coronavirus pandemic and lockdown has had negative impacts on individuals' mental health and well-being. The crisis has generated symptoms of depression in many, which may last even after the lockdown is over. To provide support to individuals in terms of counseling and psychiatric treatment, it is necessary to identify such depressive symptoms in a timely fashion. To address this problem, an artificial intelligence-based system is proposed to assess the changes, if any, in the mental health of an individual as a function of time, starting from the pre-lockdown period (in India from 20 April 2020). A Mental Health Analyzer has been implemented to automatically detect whether an individual is trending toward a state of depression based on his or her tweets over time. The deep learning models of Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), and Bidirectional LSTM have been implemented and compared for the emotion classification task, specifically to detect the emotions of sadness, fear, anger, and joy present in a person's tweets. The system identifies the emotion of sadness present in tweets to detect depression. An ensemble maximizing model using CNN, LSTM, and Bidirectional LSTM is proposed to maximize the recall metric to improve the performance for the task of depression detection. The implemented system was tested using the dataset provided for the SemEval-2018 semantic evaluation tasks and achieves better results than previous models for the task of emotion classification and, further, can detect depression when tested on real Twitter data.

Knowledge management and predictive analytics are considered to be unusual partners in today's technology. However, they can be very good tools that would solve current problems in valuing data. Predictive analytics has now become one of... more

Knowledge management and predictive analytics are considered to be unusual partners in today's technology. However, they can be very good tools that would solve current problems in valuing data. Predictive analytics has now become one of the forecasting tools that is of huge help in information management. Its application in IT project development risk management is very important, where a lot of raw data is involved with risk analysis and prediction. The use of IT project risk management as supported by knowledge management (KM) will help increase the success rate of IT projects. Knowledge management will bring about additional value to the data needed. This paper presents the usage of KM and predictive analytics to in success ratings of projects by predicting the risks that might happen during project development. It explores how KM and predictive analytics can identify risks in IT project development and give recommendations in evaluating the risks that could affect successful completion of IT projects.

Film industries all around the globe are fast creating hundreds of films, capturing the attention of people of all ages. Every movie producer is eager to learn which films are likely to succeed or fail at the box office. As a result, the... more

Film industries all around the globe are fast creating hundreds of films, capturing the attention of people of all ages. Every movie producer is eager to learn which films are likely to succeed or fail at the box office. As a result, the film business places a premium on predicting a film's success early on. As a result, the difficult task of forecasting a movie's gross income may be eased with the use of new processing power and previous data accessible in movie databases. In this study, we look at the elements that contribute to the popularity of movies that are concealed patterns.

This paper collates multidisciplinary perspectives on the use of predictive analytics in government services. It moves away from the hyped narratives of “AI” or “digital”, and the broad usage of the notion of “ethics”, to focus on... more

This paper collates multidisciplinary perspectives on the use of predictive analytics in government services. It moves away from the hyped narratives of “AI” or “digital”, and the broad usage of the notion of “ethics”, to focus on highlighting the possible risks of the use of prediction algorithms in public administration. Guidelines for AI use in public bodies are currently available, however there is little evidence these are being followed or that they are being written into new mandatory regulations. The use of algorithms is not just an issue of whether they are fair and safe to use, but whether they abide with the law and whether they actually work. Particularly in public services, there are many things to consider before implementing predictive analytics algorithms, as flawed use in this context can lead to harmful consequences for citizens, individually and collectively, and public sector workers. All stages of the implementation process of algorithms are discussed, from the specification of the problem and model design through to the context of their use and the outcomes. Evidence is drawn from case studies of use in child welfare services, the US Justice System and UK public examination grading in 2020. The paper argues that the risks and drawbacks of such technological approaches need to be more comprehensively understood, and testing done in the operational setting, before implementing them. The paper concludes that while algorithms may be useful in some contexts and help to solve problems, it seems those relating to predicting real life have a long way to go to being safe and trusted for use. As “ethics” are located in time, place and social norms, the authors suggest that in the context of public administration, laws on human rights, statutory administrative functions, and data protection — all within the principles of the rule of law — provide the basis for appraising the use of algorithms, with maladministration being the primary concern rather than a breach of “ethics”.

The use of algorithmic prediction in insurance is regarded as the beginning of a new era, because it promises to personalise insurance policies and premiums on the basis of individual behaviour and level of risk. The core idea is that the... more

The use of algorithmic prediction in insurance is regarded as the beginning of a new era, because it promises to personalise insurance policies and premiums on the basis of individual behaviour and level of risk. The core idea is that the price of the policy would no longer refer to the calculated uncertainty of a pool of policyholders, with the consequence that everyone would have to pay only for her real exposure to risk. For insurance, however, uncertainty is not only a problem – shared uncertainty is a resource. The availability of individual risk information could undermine the principle of risk-pooling and risk-spreading on which insurance is based. The article examines this disruptive change first by exploring the possible consequences of the use of predictive algorithms to set insurance premiums. Will it endanger the principle of mutualisation of risks, producing new forms of discrimination and exclusion from coverage? In a second step, we analyse how the relationship betwee...

The development of Indonesia's ICT environment has made the mobile video-on-demand (VOD) platform one of the emerging lifestyles. With advanced smartphone technology, mobile phone subscribers able to enjoy high-resolution mobile VOD... more

The development of Indonesia's ICT environment has made the mobile video-on-demand (VOD) platform one of the emerging lifestyles. With advanced smartphone technology, mobile phone subscribers able to enjoy high-resolution mobile VOD service with a greater user experience. The purpose of this study is to profile and predict potential customers of one of the VOD platforms, Netflix, for personalizing marketing targets. Using machine learning predictive analytic methodology, customer profile and behavior data are divided into 3 clusters using the K-Means model before tested with several supervised models for getting the best model for each cluster. Feature importance analysis is conducted to support marketing insight for product offering follows up to each targeted potential customer. Significant variables affecting Netflix buyers and non-buyers within 3 different clusters are defined clearly with the number of potential customers that can be targeted as Netflix's future subscri...

The data mining its main process is to collect, extract and store the valuable information and now-a-days it's done by many enterprises actively. In advanced analytics, Predictive analytics is the one of the branch which is mainly used to... more

The data mining its main process is to collect, extract and store the valuable information and now-a-days it's done by many enterprises actively. In advanced analytics, Predictive analytics is the one of the branch which is mainly used to make predictions about future events which are unknown. Predictive analytics which uses various techniques from machine learning, statistics, data mining, modeling, and artificial intelligence for analyzing the current data and to make predictions about future. The two main objectives of predictive analytics are Regression and Classification. It is composed of various analytical and statistical techniques used for developing models which predicts the future occurrence, probabilities or events. Predictive analytics deals with both continuous changes and discontinuous changes. It provides a predictive score for each individual (healthcare patient, product SKU, customer, component, machine, or other organizational unit, etc.) to determine, or influence the organizational processes which pertain across huge numbers of individuals, like in fraud detection, manufacturing, credit risk assessment, marketing, and government operations including law enforcement.

The Boston Police Department has limited resources and cannot easily patrol and guard the entire area under its jurisdiction. This situation limits its immediate intervention capacity to the patrolled area: forecasting dangerous criminal... more

The Boston Police Department has limited resources and cannot easily patrol and guard the entire area under its jurisdiction. This situation limits its immediate intervention capacity to the patrolled area: forecasting dangerous criminal misconduct in a urban area throughout data analytics techniques would enable Boston Police to have better visibility on future potential crimes. For this project we are going to focus on predicting time and location of criminal offences, with the ultimate goal to help the Boston Police Department with a model that offer meaningful insights on the crime patterns in the city.
Our data includes time, location and type of crime, weather conditions at the time and socioeconomic data of the area where it happened: these are all sourced from public entities such as the City of Boston portal, the NOAA and the US Census. To simplify the original dataset we reduced the number of possible locations by splitting Boston into a grid, defining ~ 500 areas; secondly, we summarized the 30+ different crime types into violent crimes, property crimes and other crimes, and we computed a series of time series variables accounting for the frequency of crime in each area in the previous 24 hours, week, one month and four months.
Because of the inherently different causes that might determine a violent versus a crime against property, we decided to have two separate models for forecasting the two types of crimes. In both cases, given the binary nature of the dependent variable (the occurrence of violent or property crimes for a given date and location) we developed a logistic regression and a random forest model. Specifically, as imaginable given the variable transformations described above, the model is intended to be a mixed of time series features, and “fundamentals” (income), similarly to what is done with financial models. We found that areas that have been experiencing violent crimes in the past are more likely to be affected by the same problem tomorrow. Also, the colder and the rainer outside, the less violent crimes there will be on the streets of Boston. Finally, since the average income is positively correlated and the median income is negatively correlated with the probability of both property and violent crime, this suggests that - as intuitively true - the difference between the two - representing income inequality - cause higher level of crime.
Our logistic model succeeds in delivering a basic forecasting tool that would allow Boston Police to be present - and therefore possibly deter and prevent - in 71.1% of areas in which a violent crime is happening, by patrolling only 30% of the city. Similarly, by patrolling 30% of the town, the Police would be able to be in the same area where a property crime is happening 64.9% of the times. We also tried fitting a Random Forest model on the data, but it yielded poorer results. Besides of what percentage of the city of Boston the Police is able to patrol, we can effectively interpret the ROCR curve as an optimization tool: given limited resources, the police department should give priority to patrol the locations featuring the highest probability of crime happening in a given day. In this sense our model gives the law enforcement agencies an edge against micro criminality in the city of Boston as well as a tool to counterbalance possible budget cuts by optimizing Police presence on the ground.

In database marketing, data mining has been used extensively to find the optimal customer targets so as to maximize return on investment. In particular, using marketing campaign data, models are typically developed to identify... more

In database marketing, data mining has been used extensively to
find the optimal customer targets so as to maximize return on
investment. In particular, using marketing campaign data, models
are typically developed to identify characteristics of customers
who are most likely to respond. While these models are helpful in
identifying the likely responders, they may be targeting customers
who have decided to take the desirable action or not regardless of
whether they receive the campaign contact (e.g. mail, call). Based
on many years of business experience, we identify the appropriate
business objective and its associated mathematical objective
function. We point out that the current approach is not directly
designed to solve the appropriate business objective. We then
propose a new methodology to identify the customers whose
decisions will be positively influenced by campaigns. The
proposed methodology is easy to implement and can be used in
conjunction with most commonly used supervised learning
algorithms. An example using simulated data is used to illustrate
the proposed methodology. This paper may provide the database
marketing industry with a simple but significant methodological
improvement and open a new area for further research and
development.