Data Handling Research Papers - Academia.edu (original) (raw)
- by Nikos Korfiatis and +1
- •
- Programming Languages, Information Retrieval, Semantics, Metadata
We present the Convergence Processor, an innovative component that integrates a high performance 32- bit RISC core, a custom IP core optimised for header-processing and other blocks for specific communication interfaces required for the... more
We present the Convergence Processor, an innovative component that integrates a high performance 32- bit RISC core, a custom IP core optimised for header-processing and other blocks for specific communication interfaces required for the delivery of broadband residential applications. The component is a System-on-Chip supporting the real time processing of packets and protocol data units from various networking interfaces. Its
In high-energy physics, with the search for ever smaller signals in ever larger data sets, it has become essential to extract a maximum of the available information from the data. Multivariate classification methods based on machine... more
In high-energy physics, with the search for ever smaller signals in ever larger data sets, it has become essential to extract a maximum of the available information from the data. Multivariate classification methods based on machine learning techniques have ...
The aims is to consider the political and ethical challenges involved in conducting ethnographic managerial/organisational behaviour research within the highly regulated health and social care context, in light of the emergence of more... more
The aims is to consider the political and ethical challenges involved in conducting ethnographic managerial/organisational behaviour research within the highly regulated health and social care context, in light of the emergence of more stringent "ethical approval" policies and requirements set by Local Research Ethics Committees in the United Kingdom. In the attempt and requirement to protect "vulnerable" employees, this paper aims to present an unintended paradox of consequences when participants voluntarily revealed themselves. The authors briefly review literature on research ethics and present an understanding of the ethical regulations currently existing within the British National Health Service. Within an ethnographic case study exploring the psychological contract, the authors consider the issues that arose during one stage of data collection: a qualitative questionnaire survey with 13 participants, including members of the lead author's team. Incorpo...
Forged documents specifically passport, driving licence and VISA stickers are used for fraud purposes including robbery, theft and many more. So detecting forged characters from documents is a significantly important and challenging task... more
Forged documents specifically passport, driving licence and VISA stickers are used for fraud purposes including robbery, theft and many more. So detecting forged characters from documents is a significantly important and challenging task in digital forensic imaging. Forged characters detection has two big challenges. First challenge is, data for forged characters detection is extremely difficult to get due to several reasons including limited access of data, unlabeled data or work is done on private data. Second challenge is, deep learning (DL) algorithms require labeled data, which poses a further challenge as getting labeled is tedious, time-consuming, expensive and requires domain expertise. To end these issues, in this paper we propose a novel algorithm, which generates the three datasets namely forged characters detection for passport (FCD-P), forged characters detection for driving licence (FCD-D) and forged characters detection for VISA stickers (FCD-V). To the best of our kn...
Since the explicit collaboration of biological and physical scientists with archaeologists started in the late 1930s, the discourse on the nature of this collaboration has been intense. The question of the relative roles of the specialist... more
Since the explicit collaboration of biological and physical scientists with archaeologists started in the late 1930s, the discourse on the nature of this collaboration has been intense. The question of the relative roles of the specialist scientist and the archaeologist in the collaboration, and the training and experience of both in the use of scientific techniques of recording and analysis is still not resolved, as we indicate by our experience in the Catalhoyuk Archaeological Project in Turkey. In this paper, we expand this discourse to examine the more recent collaboration of archaeologists with computer graphics specialists as archaeologists increasingly incorporate cutting-edge and not-so-cutting-edge digital technologies into their practice
The simulation of vehicle dynamics has a wide array of applications in the development of vehicle technologies. This study deals with the methodological aspect of the problem of assessing the validity of a simulation using double lane... more
The simulation of vehicle dynamics has a wide array of applications in the development of vehicle technologies. This study deals with the methodological aspect of the problem of assessing the validity of a simulation using double lane change maneuver as the experimental data source. The maneuver time history is analyzed. Problems in handling the obtained measurements and possibilities to assess the maneuver are examined. Techniques to split and align the data are presented and compared. Methodologies to handle the experimental and simulation data are introduced. The presented methods can be utilized in order to achieve more time and cost efficient simulation projects with increased model confidence. 1
This work investigates the possibility of extending the linguistic notion «Explorative Data Analysis», aiming to experimentally investigate linguistic data in order to obtain useful information more exploratively, through a spatial... more
This work investigates the possibility of extending the linguistic notion «Explorative Data Analysis», aiming to experimentally investigate linguistic data in order to obtain useful information more exploratively, through a spatial installation using the example of birth reports. This prototypical attempt of translating data into the physical realm raises other questions and perceptional considerations that contribute to providing a more explorative access to the data. Furthermore, the process of data handling in combination with political implications due to material considerations is thematized.
COVID-19's outbreak affected and compelled people from all walks of life to self-quarantine in their houses in order to prevent the virus from spreading. As a result of adhering to the exceedingly strict guideline, many people developed... more
COVID-19's outbreak affected and compelled people from all walks of life to self-quarantine in their houses in order to prevent the virus from spreading. As a result of adhering to the exceedingly strict guideline, many people developed mental illnesses. Because the educational institution was closed at the time, students remained at home and practiced self-quarantine. As a result, it is necessary to identify the students who developed mental illnesses at that time. To develop AiPsych, a mobile application-based artificial psychiatrist, we train supervised and deep learning algorithms to predict the mental illness of students during the COVID-19 situation. Our experiment reveals that supervised learning outperforms deep learning, with a 97% accuracy of the Support Vector Machine (SVM) for mental illness prediction. Random Forest (RF) achieves the best accuracy of 91% for the recovery suggestion prediction. Our android application can be used by parents, educational institutes, or the government to get the predicted result of a student's mental illness status and take proper measures to overcome the situation.
Problems that involve interacting with humans, such as natural language understanding, have not proven to be solvable by concise, neat formulas like F = ma. Instead, the best approach appears to be to embrace the complexity of the domain... more
Problems that involve interacting with humans, such as natural language understanding, have not proven to be solvable by concise, neat formulas like F = ma. Instead, the best approach appears to be to embrace the complexity of the domain and address it by harnessing the power of ...
The structural elucidation of small molecules using mass spectrometry plays an important role in modern life sciences and bioanalytical approaches. This review covers different soft and hard ionization techniques and figures of merit for... more
The structural elucidation of small molecules using mass spectrometry plays an important role in modern life sciences and bioanalytical approaches. This review covers different soft and hard ionization techniques and figures of merit for modern mass spectrometers, such as mass resolving power, mass accuracy, isotopic abundance accuracy, accurate mass multiple-stage MS(n) capability, as well as hybrid mass spectrometric and orthogonal chromatographic approaches. The latter part discusses mass spectral data handling strategies, which includes background and noise subtraction, adduct formation and detection, charge state determination, accurate mass measurements, elemental composition determinations, and complex data-dependent setups with ion maps and ion trees. The importance of mass spectral library search algorithms for tandem mass spectra and multiple-stage MS(n) mass spectra as well as mass spectral tree libraries that combine multiple-stage mass spectra are outlined. The successi...
- by Akemi Chatfield and +1
- •
- Management, Information Management, Government, Media
Forged documents specifically passport, driving licence and VISA stickers are used for fraud purposes including robbery, theft and many more. So detecting forged characters from documents is a significantly important and challenging task... more
Forged documents specifically passport, driving licence and VISA stickers are used for fraud purposes including robbery, theft and many more. So detecting forged characters from documents is a significantly important and challenging task in digital forensic imaging. Forged characters detection has two big challenges. First challenge is, data for forged characters detection is extremely difficult to get due to several reasons including limited access of data, unlabeled data or work is done on private data. Second challenge is, deep learning (DL) algorithms require labeled data, which poses a further challenge as getting labeled is tedious, time-consuming, expensive and requires domain expertise. To end these issues, in this paper we propose a novel algorithm, which generates the three datasets namely forged characters detection for passport (FCD-P), forged characters detection for driving licence (FCD-D) and forged characters detection for VISA stickers (FCD-V). To the best of our knowledge, we are the first to release these datasets. The proposed algorithm starts by reading plain document images, simulates forging simulation tasks on five different countries' passports, driving licences and VISA stickers. Then it keeps the bounding boxes as a track of the forged characters as a labeling process. Furthermore, considering the real world scenario, we performed the selected data augmentation accordingly. Regarding the stats of datasets, each dataset consists of 15000 images having size of 950 x 550 of each. For further research purpose we release our algorithm code 1 and, datasets i.
The performance of industrial power system studies can be significantly improved in both speed and reliability by the application of a similar format for all standard studies. The major calculations and drafting work are performed using... more
The performance of industrial power system studies can be significantly improved in both speed and reliability by the application of a similar format for all standard studies. The major calculations and drafting work are performed using packaged computer programs that provide results in accordance with industry standards. The collection of the initial data for the study is performed using computer-generated
SQL is the (more or less) standardised language that is used by the majority of commercial database management systems. However, it is seriously flawed, as has been documented in detail by Date, Darwen, Pascal, and others. One of the most... more
SQL is the (more or less) standardised language that is used by the majority of commercial database management systems. However, it is seriously flawed, as has been documented in detail by Date, Darwen, Pascal, and others. One of the most serious problems with SQL is the way it handles missing data. It uses a special value 'NULL ' to represent data items whose value is not known. This can have a variety of meanings in different circumstances (such as 'inapplicable ' or 'unknown'). The SQL language also allows an 'unknown ' truth value in logical expressions. The resulting incomplete three-valued logic leads to inconsistencies in data handling within relational database management systems. Relational database theorists advocate that a strict two-valued logic (true/false) be used instead, with prohibition of the use of NULL, and justify this stance by assertion that it is a true representation of the 'real world'. Nevertheless, in real...
A foundational problem in kernel-based semi-supervised learning is the design of suitable kernels which can properly reflect the underlying data manifold. One of the most well-known semi-supervised kernel learning approaches is the... more
A foundational problem in kernel-based semi-supervised learning is the design of suitable kernels which can properly reflect the underlying data manifold. One of the most well-known semi-supervised kernel learning approaches is the spectral kernel learning methodology which usually tunes the spectra of the graph Laplacian empirically or through optimizing some generalized performance measures. In this study, we proposed a novel approach to do spectral kernel learning based on maximum margin criterion, which is theoretically justified as a more essential semi-supervised kernel learning measure than others, such as kernel target alignment. We have conducted lots of experiments on public data sets, showing promising performance of our scheme.
With the recent boosted enthusiasm in Artificial Intelligence (AI) and Financial Technology (FinTech), applications such as credit scoring have gained substantial academic interest. However, despite the evergrowing achievements, the... more
With the recent boosted enthusiasm in Artificial Intelligence (AI) and Financial Technology (FinTech), applications such as credit scoring have gained substantial academic interest. However, despite the evergrowing achievements, the biggest obstacle in most AI systems is their lack of interpretability. This deficiency of transparency limits their application in different domains including credit scoring. Credit scoring systems help financial experts make better decisions regarding whether or not to accept a loan application so that loans with a high probability of default are not accepted. Apart from the noisy and highly imbalanced data challenges faced by such credit scoring models, recent regulations such as the `right to explanation' introduced by the General Data Protection Regulation (GDPR) and the Equal Credit Opportunity Act (ECOA) have added the need for model interpretability to ensure that algorithmic decisions are understandable and coherent. A recently introduced concept is eXplainable AI (XAI), which focuses on making black-box models more interpretable. In this work, we present a credit scoring model that is both accurate and interpretable. For classification, state-of-the-art performance on the Home Equity Line of Credit (HELOC) and Lending Club (LC) Datasets is achieved using the Extreme Gradient Boosting (XGBoost) model. The model is then further enhanced with a 360-degree explanation framework, which provides different explanations (i.e. global, local feature-based and local instance- based) that are required by different people in different situations. Evaluation through the use of functionally-grounded, application-grounded and human-grounded analysis shows that the explanations provided are simple and consistent as well as correct, effective, easy to understand, sufficiently detailed and trustworthy.
With the recent boosted enthusiasm in Artificial Intelligence (AI) and Financial Technology (FinTech), applications such as credit scoring have gained substantial academic interest. However, despite the evergrowing achievements, the... more
With the recent boosted enthusiasm in Artificial Intelligence (AI) and Financial Technology (FinTech), applications such as credit scoring have gained substantial academic interest. However, despite the evergrowing achievements, the biggest obstacle in most AI systems is their lack of interpretability. This deficiency of transparency limits their application in different domains including credit scoring. Credit scoring systems help financial experts make better decisions regarding whether or not to accept a loan application so that loans with a high probability of default are not accepted. Apart from the noisy and highly imbalanced data challenges faced by such credit scoring models, recent regulations such as the `right to explanation' introduced by the General Data Protection Regulation (GDPR) and the Equal Credit Opportunity Act (ECOA) have added the need for model interpretability to ensure that algorithmic decisions are understandable and coherent. A recently introduced con...