Preprocessing Research Papers - Academia.edu (original) (raw)

Devanagari Script is mostly useful in India for writing number of official forms and documents, especially in the banking applications for writing amount in words on the cheque. The Offline recognition of handwritten Devanagari script has... more

Devanagari Script is mostly useful in India for writing number of official forms and documents, especially in the banking applications for writing amount in words on the cheque. The Offline recognition of handwritten Devanagari script has great application in automatic processing of handwritten bank cheques images, documentation of various official documents as well as digitization of government and non government documents. In this paper it has been tried to achieve automatic recognition of handwritten Devanagari Script by using various algorithms.

With the growing consumer market, India is one of the most favorable places to open a start-up. Various factors that influence the start-ups’ success, change over time. Their effects have to be analyzed properly in order to sustain this... more

With the growing consumer market, India is one of the most favorable places to open a start-up. Various factors that influence the start-ups’ success, change over time. Their effects have to be analyzed properly in order to sustain this ever-evolving fast-changing world of trends. In this paper, we aim to study the most common patterns of funding in Indian start-ups industry and to analyze the current status of these start-ups. Means of interactive graphs have been employed to get an insight of the analysis done on the data set. Various start-ups have been categorized broadly into 8 industry verticals, then the graphs have been presented to show the number of start-ups opened in this category and also about how many of them were unsuccessful or being closed. This paper presents the analysis report to get an insight into present trends of investments in various industries and the success rate of start-ups opened in those industries.

Research in image processing involves many active areas, of these Recognition of Handwritten character holds lots of promises and is challenging one .The idea is to enable the computer to be able to recognize intelligibly hand written... more

Research in image processing involves many active areas, of these Recognition of Handwritten character holds lots of promises and is challenging one .The idea is to enable the computer to be able to recognize intelligibly hand written inputs In this paper, a new method that uses structural features and support vector Machine (SVM) classifier for recognition of Handwritten Kannada characters is presented. On an average recognition accuracy of 89.84 % and 85.14% for handwritten Kannada vowels and Consonants
obtained with this proposed method, inspite of inherent variations.

Data mining is the process of extraction useful patterns and models from a huge dataset. These models and patterns have an effective role in a decision making task. Data mining basically depend on the quality of data. Raw data usually... more

Data mining is the process of extraction useful patterns and models from a huge dataset. These models and patterns have an effective role in a decision making task. Data mining basically depend on the quality of data. Raw data usually susceptible to missing values, noisy data, incomplete data, inconsistent data and outlier data. So, it is important for these data tobe processed before being mined. Preprocessing data is an essential step to enhance data efficiency. Data preprocessing is one of the most data mining steps which deals with data preparation and transformation of the dataset and seeks at the same time to make knowledge discovery more efficient. Preprocessing include several techniques like cleaning, integration, transformation and reduction. This study shows a detailed description of data preprocessing techniques which are used for data mining.

Recruitment in different sectors, especially job recruitment in organizations has been a major concern due to nepotism, tribalism and biasness from the side of interviewing panels which in turn affects the optimal effectiveness of the... more

Recruitment in different sectors, especially job recruitment in organizations has been a major concern due to nepotism, tribalism and biasness from the side of interviewing panels which in turn affects the optimal effectiveness of the organization. Application for jobs in various capacity in different companies and organization in recent years is on the increase. This increasing number of applicants has therefore resulted to a very great challenge to recruiting organizations as the need arises to efficiently and effectively recruit or screen this teeming number of applicate. This paper presents an Applicant Recruitment System using fuzzy logic. It focuses on the selection or recruitment of suitable applicants ranking them from the most suitable, using the ministry of works Bayelsa State in Nigeria as case study. We adopted Structured Systems Analysis and Design Methodology. Hypertext preprocessor programming language was used for the implementation of the system. The system developed is a more flexible and reliable automated system in the field of personnel recruitment that will quickly enable personnel managers to easily assess and recruit most suitable applicants for a given job based on some weighted fuzzy scores depending on certain criteria attached to the job applied for and the qualification needed to obtain such position.

With the recent boosted enthusiasm in Artificial Intelligence (AI) and Financial Technology (FinTech), applications such as credit scoring have gained substantial academic interest. However, despite the evergrowing achievements, the... more

With the recent boosted enthusiasm in Artificial Intelligence (AI) and Financial Technology (FinTech), applications such as credit scoring have gained substantial academic interest. However, despite the evergrowing achievements, the biggest obstacle in most AI systems is their lack of interpretability. This deficiency of transparency limits their application in different domains including credit scoring. Credit scoring systems help financial experts make better decisions regarding whether or not to accept a loan application so that loans with a high probability of default are not accepted. Apart from the noisy and highly imbalanced data challenges faced by such credit scoring models, recent regulations such as the `right to explanation' introduced by the General Data Protection Regulation (GDPR) and the Equal Credit Opportunity Act (ECOA) have added the need for model interpretability to ensure that algorithmic decisions are understandable and coherent. A recently introduced concept is eXplainable AI (XAI), which focuses on making black-box models more interpretable. In this work, we present a credit scoring model that is both accurate and interpretable. For classification, state-of-the-art performance on the Home Equity Line of Credit (HELOC) and Lending Club (LC) Datasets is achieved using the Extreme Gradient Boosting (XGBoost) model. The model is then further enhanced with a 360-degree explanation framework, which provides different explanations (i.e. global, local feature-based and local instance- based) that are required by different people in different situations. Evaluation through the use of functionally-grounded, application-grounded and human-grounded analysis shows that the explanations provided are simple and consistent as well as correct, effective, easy to understand, sufficiently detailed and trustworthy.

This paper describes an own implementation of a regular expression preprocessor written in PHP. It extends the regular expression functionality by allowing users to define named segments. These segments include custom character classes,... more

This paper describes an own implementation of a regular expression preprocessor written in PHP. It extends the regular expression functionality by allowing users to define named segments. These segments include custom character classes, matching groups etc. The pre-processor allows for writing complex regular expressions that are simpler to maintain. In addition, this paper presents a use case of the practical utilisation of the pre-processor. Furthermore, it includes a comparison of expressions written with and without user-defined segments.

With the recent boosted enthusiasm in Artificial Intelligence (AI) and Financial Technology (FinTech), applications such as credit scoring have gained substantial academic interest. However, despite the evergrowing achievements, the... more

With the recent boosted enthusiasm in Artificial Intelligence (AI) and Financial Technology (FinTech), applications such as credit scoring have gained substantial academic interest. However, despite the evergrowing achievements, the biggest obstacle in most AI systems is their lack of interpretability. This deficiency of transparency limits their application in different domains including credit scoring. Credit scoring systems help financial experts make better decisions regarding whether or not to accept a loan application so that loans with a high probability of default are not accepted. Apart from the noisy and highly imbalanced data challenges faced by such credit scoring models, recent regulations such as the `right to explanation' introduced by the General Data Protection Regulation (GDPR) and the Equal Credit Opportunity Act (ECOA) have added the need for model interpretability to ensure that algorithmic decisions are understandable and coherent. A recently introduced con...

Big data is an assemblage of large and complex data that is difficult to process with the traditional DBMS tools. The scale, diversity, and complexity of this huge data demand new analytics techniques to extract useful and hidden value... more

Big data is an assemblage of large and complex data that is difficult to process with the traditional DBMS tools. The scale, diversity, and complexity of this huge data demand new analytics techniques to extract useful and hidden value from it. Data must be prepared before starting mining as real data is sometimes not suitable for mining, and poor quality finishes in poor results. This paper presents the needs, various problems, and solutions for the preprocessing of big data.

Anterior surface heart ischemia is one of the most prominent heart diseases. Chest cardiac mapping using multi-electrode systems for chest leads increases the diagnostic power over the traditional chest lead ECG. Noise is one of the most... more

Anterior surface heart ischemia is one of the most prominent heart diseases. Chest cardiac mapping using multi-electrode systems for chest leads increases the diagnostic power over the
traditional chest lead ECG. Noise is one of the most apparent problems in cardiac mapping which decreases the fidelity of the signals. In this paper we propose a new technique for signal de-noising and presentation of chest cardiac maps. Using the 3D wavelet transform, we apply sensitivity analysis to the wavelets of the Daubechies (dbs) family to find out the most suitable wavelet for each chest lead at each position. We have computed the performance measure of the signal to noise ratio (SNR) for each chest lead at each position to measure the quality of the de-noising techniques focusing at the most important chest lead. By applying db4, db8 and db11 at selected positions of chest leads we are able to get optimal de-noising. The resulting cardiac maps have proven to be of diagnostic value for the bio-potential state of the anterior surface ischemia of heart.

Document Analysis and Recognition (DAR) aims to extract automatically the information in the document and also addresses to human comprehension. The automatic processing of degraded historical documents are applications of document image... more

Document Analysis and Recognition (DAR) aims to extract automatically the information in the document and also addresses to human comprehension. The automatic processing of degraded historical documents are applications of document image analysis field which is confronted with many difficulties due to the storage condition and the complexity of the script. The main interest of enhancement of historical documents is to remove undesirable statistics that appear in the background and highlight the foreground, so as to enable automatic recognition of documents with high accuracy. This paper addresses pre-processing and segmentation of ancient scripts, as an initial step to automate the task of an epigraphist in reading and deciphering inscriptions. Pre-processing involves, enhancement of degraded ancient document images which is achieved through four different Spatial filtering methods for smoothing or sharpening namely Median, Gaussian blur, Mean and Bilateral filter, with different mask sizes. This is followed by binarization of the enhanced image to highlight the foreground information, using Otsu thresholding algorithm. In the second phase Segmentation is carried out using Drop Fall and WaterReservoir approaches, to obtain sampled characters, which can be used in later stages of OCR. The system showed good results when tested on the nearly 150 samples of varying degraded epigraphic images and works well giving better enhanced output for, 4x4 mask size for Median filter, 2x2 mask size for Gaussian blur, 4x4 mask size for Mean and Bilateral filter. The system can effectively sample characters from enhanced images, giving a segmentation rate of 85%-90% for Drop Fall and 85%-90% for Water Reservoir techniques respectively.

In the age of digital and network, every high efficiency and high profit activity has to harmonize with internet. The business behaviors and activities always are the precursor for getting high efficiency and high profit. Consequently,... more

In the age of digital and network, every high efficiency and high profit activity has to harmonize with internet. The business behaviors and activities always are the precursor for getting high efficiency and high profit. Consequently, each business behavior and activities have to adjust for integrating with internet. Underlay on the internet, business extension and promotion behaviors and activities general are called the Electronic Commerce (E-commerce). The quality of web-based customer service is the capability of a firm's website to provide individual heed and attention. Today scenario personalization has become a vital business problem in various e-commerce applications, ranging from various dynamic web content presentations. In our paper Iterative technique partitions the customer in terms of frankly combining transactional data of various consumers that forms dissimilar customer behavior for each group, and best customers are acquired, by applying approach such as, IE (Iterative Evolution), ID (Iterative Diminution) and II (Iterative Intermingle) algorithm. The excellence of clustering is improved via Ant Colony Optimization (ACO) and Particle Swarm Optimization (PSO). In this paper these two algorithms are compared and it is found that Iterative technique chorus Particle Swarm Optimization (PSO) is better than the other Ant Colony Optimization (ACO) algorithms. Additionally the results show that the Particle Swarm Optimization (PSO) algorithm outperforms other Ant Colony Optimization (ACO) algorithms methods. Finally quality is superior along with this response time higher and cost wise performance is increased and both accuracy and efficiency.

در این فایل راهنمای مقدماتی از نحوه کار با جعبه ابزار EEGLAB آورده شده است

Handwriting is one of the most important means of daily communication. Although the problem of handwriting recognition has been considered for more than 60 years there are still many open issues, especially in the task of unconstrained... more

Handwriting is one of the most important means of daily communication. Although the problem of handwriting recognition has been considered for more than 60 years there are still many open issues, especially in the task of unconstrained handwritten sentence recognition. This paper focuses on the automatic system that recognizes continuous English sentence through a mouse-based gestures in real-time based on Artificial Neural Network. The proposed Artificial Neural Network is trained using the traditional backpropagation algorithm for self supervised neural network which provides the system with great learning ability and thus has proven highly successful in training for feed-forward Artificial Neural Network. The designed algorithm is not only capable of translating discrete gesture moves, but also continuous gestures through the mouse. In this paper we are using the efficient neural network approach for recognizing English sentence drawn by mouse. This approach shows an efficient wa...

Brain tumor segmentation aims to separate the different tumor tissues such as active cells, necrotic core, and edema from normal brain tissues of White Matter (WM), Gray Matter (GM), and Cerebrospinal Fluid (CSF). MRI based brain tumor... more

Brain tumor segmentation aims to separate the different tumor tissues such as active cells, necrotic core, and edema from normal brain tissues of White Matter (WM), Gray Matter (GM), and Cerebrospinal Fluid (CSF). MRI based brain tumor segmentation studies are attracting more and more attention in recent years due to non-invasive imaging and good soft tissue contrast of Magnetic Resonance Imaging (MRI) images. With the development of almost two decades, the innovative approaches applying computer-aided techniques for segmenting brain tumor are becoming more and more mature and coming closer to routine clinical applications. The purpose of this paper is to provide a comprehensive overview for MRI-based brain tumor segmentation methods. Firstly, a brief introduction to brain tumors and imaging modalities of brain tumors is given. Then, the preprocessing operations and the state of the art methods of MRI-based brain tumor segmentation are introduced. Moreover, the evaluation and validation of the results of MRI-based brain tumor segmentation are discussed. Finally, an objective assessment is presented and future developments and trends are addressed for MRI-based brain tumor segmentation methods.

Text mining is the process of extracting interesting and non-trivial knowledge or information from unstructured text data. Text mining is the multidisciplinary field which draws on data mining, machine learning, information retrieval,... more

Text mining is the process of extracting interesting and non-trivial knowledge or information from unstructured text data. Text mining is the multidisciplinary field which draws on data mining, machine learning, information retrieval, computational linguistics and statistics. Important text mining processes are information extraction, information retrieval, natural language processing, text classification, content analysis and text clustering. All these processes are required to complete the preprocessing step before doing their intended task. Pre-processing significantly reduces the size of the input text documents and the actions involved in this step are sentence boundary determination, natural language specific stop-word elimination, tokenization and stemming. Among this, the most essential and important action is the tokenization. Tokenization helps to divide the textual information into individual words. For performing tokenization process, there are many open source tools are available. The main objective of this work is to analyze the performance of the seven open source tokenization tools. For this comparative analysis, we have taken Nlpdotnet Tokenizer, Mila Tokenizer, NLTK Word Tokenize, TextBlob Word Tokenize, MBSP Word Tokenize, Pattern Word Tokenize and Word Tokenization with Python NLTK. Based on the results, we observed that the Nlpdotnet Tokenizer tool performance is better than other tools.

TIntelligent multi agent systems have great potentials to use in different purposes and research areas. One of the important issues to apply intelligent multi agent systems in real world and virtual environment is to develop a framework... more

TIntelligent multi agent systems have great potentials to use in different purposes and research areas. One of the important issues to apply intelligent multi agent systems in real world and virtual environment is to develop a framework that support machine learning model to reflect the whole complexity of the real world. In this paper, we proposed a framework of intelligent agent based neural network classification model to solve the problem of gap between two applicable flows of intelligent multi agent technology and learning model from real environment. We consider the new Supervised Multilayers Feed Forward Neural Network (SMFFNN) model as an intelligent classification for learning model in the framework. The framework earns the information from the respective environment and its behavior can be recognized by the weights. Therefore, the SMFFNN model that lies in the framework will give more benefits in finding the suitable information and the real weights from the environment which result for better recognition. The framework is applicable to different domains successfully and for the potential case study, the clinical organization and its domain is considered for the proposed framework