Support Vector Machines (SVMs) Research Papers (original) (raw)

Objective: The objective of this paper is to highlight the state-of-the-art machine learning (ML) techniques in computational docking. The use of smart computational methods in the life cycle of drug design is relatively a recent... more

Objective: The objective of this paper is to highlight the state-of-the-art machine learning (ML) techniques in computational docking. The use of smart computational methods in the life cycle of drug design is relatively a recent development that has gained much popularity and interest over the last few years. Central to this methodology is the notion of computational docking which is the process of predicting the best pose (orientation + conformation) of a small molecule (drug candidate) when bound to a target larger receptor molecule (protein) in order to form a stable complex molecule. In computational docking, a large number of binding poses are evaluated and ranked using a scoring function. The scoring function is a mathematical predictive model that produces a score that represents the binding free energy, and hence the stability, of the resulting complex molecule. Generally, such a function should produce a set of plausible ligands ranked according to their binding stability along with their binding poses. In more practical terms, an effective scoring function should produce promising drug candidates which can then be synthesized and physically screened using high throughput screening process. Therefore, the key to computer-aided drug design is the design of an efficient highly accurate scoring function (using ML techniques). Methods: The methods presented in this paper is specifically based on ML techniques. Despite many traditional techniques have been proposed, the performance was generally poor. Only in the last few years started the application of the ML technology in the design of scoring functions; and the results have been very promising.
Material: The ML-based techniques are based on various molecular features extracted from the abundance of protein-ligand information in the public molecular databases, e.g., protein data bank bind (PDBbind).
Results: In this paper, we present this paradigm shift elaborating on the main constituent elements of the ML approach to molecular docking along with the state-of-the-art research in this area. For instance, the best random forest (RF)-based scoring function (Li, 2014) on PDBbind v2007 achieves a Pear- son correlation coefficient between the predicted and experimentally deter- mined binding affinities of 0.803 while the best conventional scoring function achieves 0.644 (Cheng, 2009). The best RF-based ranking power (Ashtawy, 2012) ranks the ligands correctly based on their experimentally determined binding affinities with accuracy 62.5% and identifies the top binding ligand with accuracy 78.1%.
Conclusions: We conclude with open questions and potential future research directions that can be pursued in smart computational docking; using molecular features of different nature (geometrical, energy terms, pharmacophore), advanced ML techniques (e.g., deep learning), combining more than one ML models.
Keywords:
machine learning, random forest, support vector machine, drug discovery, computational docking, scoring function, virtual screening, complex binding affinity, ligands ranking accuracy, force field interaction, pharmacophore fingerprint.

Classification is one of the most predominant tasks for wide range of applications such as Sentiment analysis in text, voice recognition, image recognition, genetic engineering, data classification etc. Though many efficient... more

Classification is one of the most predominant tasks for wide range of applications such as Sentiment analysis in text, voice recognition, image recognition, genetic engineering, data classification etc. Though many efficient classification algorithms have been introduced in the past few decades, due to the drastic increase in the amount of data generated across industry and academia there is a demand for classification algorithms with very high accuracy and robustness. This paper presents a new approach to enhance the accuracy of the classifier by combining Support Vector Machine (Classification algorithm) with K-Means Clustering algorithm and, finally using K Nearest Neighbours to make optimal choice on the classification problem .Experiments have shown that this new methodology has increased the accuracy of the classification problem and thus serves the intended purpose.

Cyber Supply Chain (CSC) system is complex which involves different subsystems performing various tasks. Security in supply chain is challenging due to the inherent vulnerabilities and threats from any part of the system which can be... more

Cyber Supply Chain (CSC) system is complex which involves different subsystems performing various tasks. Security in supply chain is challenging due to the inherent vulnerabilities and threats from any part of the system which can be exploited at any point within the supply chain. This can cause a severe disruption on the overall business continuity. Therefore, it is paramount important to understand and predicate the threats so that organization can undertake necessary control measures for the supply chain security. Cyber Threat Intelligence (CTI) provides an intelligence analysis to discover unknown to known threats using various properties including threat actor skill and motivation, Tactics, Techniques, and Procedure (TT and P), and Indicator of Compromise (IoC). This paper aims to analyse and predicate threats to improve cyber supply chain security. We have applied Cyber Threat Intelligence (CTI) with Machine Learning (ML) techniques to analyse and predict the threats based on the CTI properties. That allows to identify the inherent CSC vulnerabilities so that appropriate control actions can be undertaken for the overall cybersecurity improvement. To demonstrate the applicability of our approach, CTI data is gathered and a number of ML algorithms, i.e., Logistic Regression (LG), Support Vector Machine (SVM), Random Forest (RF), and Decision Tree (DT), are used to develop predictive analytics using the Microsoft Malware Prediction dataset. Parameters and vulnerabilities and Indicators of compromise (IoC) as output parameters. The results relating to the prediction reveal that Spyware/Ransomware and spear phishing are the most predictable threats in CSC. We have also recommended relevant controls to tackle these threats. We advocate using CTI data for the ML predicate model for the overall CSC cyber security improvement.

in this paper, we explain our Traffic Detection technique using OpenCV concept, Neural Networks, Tensorflow, and how it is successfully detecting and identifying vehicles and other roadside attributes such as pedestrians, signs, and lane... more

in this paper, we explain our Traffic Detection technique using OpenCV concept, Neural Networks, Tensorflow, and how it is successfully detecting and identifying vehicles and other roadside attributes such as pedestrians, signs, and lane markings for a thorough analysis through a road surveillance camera image. Our pre-trained SVM model is highly efficient and accurate in performing the desired task successfully.
The paper should be of interest to readers in the areas of Image processing, neural networks, and machine learning.

The fundamental cause of death among women in developed nations of the world is breast cancer. Breast cancer has been identified as one of the most deadly type of cancer prevalent among women globally. There have been a dramatic increase... more

The fundamental cause of death among women in developed nations of the world is breast cancer. Breast cancer has been identified as one of the most deadly type of cancer prevalent among women globally. There have been a dramatic increase of breast cancer cases among women of recent. Machine learning algorithms are effective tools that have found application in the field of medical imaging for early detection and diagnosis of cancer. This paper investigate the performance of eight (8) machine learning algorithms that have been applied for timely detection of breast cancer. Diagnosing breast cancer involves making a distinction between benign and malignant breast lumps. Our experimental results indicated that Support Vector Machine (SVM) have the best performance in term of classification accuracy (97.07%) and lowest error rate compared to Radial Based Function (96.49 %), Simple Linear Logistic Regression Model (96.78%), Naïve Bayes (96.48%), k-Nearest Neighbour (96.34%), AdaBoost (96...

The fundamental cause of death among women in developed nations of the world is breast cancer. Breast cancer has been identified as one of the most deadly type of cancer prevalent among women globally. There have been a dramatic increase... more

The fundamental cause of death among women in developed nations of the world is breast cancer. Breast cancer has been identified as one of the most deadly type of cancer prevalent among women globally. There have been a dramatic increase of breast cancer cases among women of recent. Machine learning algorithms are effective tools that have found application in the field of medical imaging for early detection and diagnosis of cancer. This paper investigate the performance of eight (8) machine learning algorithms that have been applied for timely detection of breast cancer. Diagnosing breast cancer involves making a distinction between benign and malignant breast lumps. Our experimental results indicated that Support Vector Machine (SVM) have the best performance in term of classification accuracy (97.07%) and lowest error rate compared to Radial Based Function (96.49 %), Simple Linear Logistic Regression Model (96.78%), Naïve Bayes (96.48%), k-Nearest Neighbour (96.34%), AdaBoost (96...

A terrifying spread of COVID-19 (which is also known as severe acute respiratory syndrome coronavirus 2 or SARS-COV-2) led scientists to conduct tremendous efforts to reduce the pandemic effects. COVID-19 has been announced pandemic... more

A terrifying spread of COVID-19 (which is also known as severe acute respiratory syndrome coronavirus 2 or SARS-COV-2) led scientists to conduct tremendous efforts to reduce the pandemic effects. COVID-19 has been announced pandemic discovered in 2019 and affected millions of people. Infected people may experience headache, body pain, and sometimes difficulty in breathing. For older people, the symptoms can get worse. Also, it can cause death because of the huge effect on some parts of the human body, particularly for those who have chronic diseases like diabetes. Machine learning algorithms are applied to patients diagnosed with Corona Virus to estimate the severity of the disease depending on their chronic diseases at an early stage. Chronic diseases could raise the severity of COVID-19 and that is what has been proved in this paper. This paper applies different machine learning techniques such as random forest, decision tree, linear regression, binary search, and k-nearest neighbor on Mexican patients' dataset to find out the impact of lifelong illnesses on increasing the symptoms of the virus in the human body. Besides, the paper demonstrates that in some cases, especially for older people, the virus can cause inevitable death.

— Tuberculosis is threatening and hinders the socioeconomic development of countries burdened with TB cases. 75% of TB cases are documented in the productive age group of 15-54 years. The definitive diagnoses methods are timely expensive... more

— Tuberculosis is threatening and hinders the socioeconomic development of countries burdened with TB cases. 75% of TB cases are documented in the productive age group of 15-54 years. The definitive diagnoses methods are timely expensive and lack sensitivity in recognizing all TB cases and in all stages. The development of CAD systems (Computer Aided Detection) will facilitate mass screening. In this work, we experimented the use of spatial pyramid of Speed-up Robust Features (SURF) in diagnosing TB. Though dense information representing the lung anatomy imply substantial generalization, the empirical results suggest otherwise. The SURF descriptors are extracted from a grid windows of several sizes and concatenated together. The SVM classifier with sigmoid kernel achieved AUC score of 89% in grid size of 64 pixels compared to only 73% in the concatenated spatial pyramid features.

Worm gearboxes (WG) are used in many machines for industrial purposes. Therefore, it is important to detect and prevent faults in WGs for efficient operations of the machines. In this study, vibration data, sound data and thermal images... more

Worm gearboxes (WG) are used in many machines for industrial purposes. Therefore, it is important to detect and prevent faults in WGs for efficient operations of the machines. In this study, vibration data, sound data and thermal images collected from a test rig developed to detect faults of WGs were analysed. In order to discriminate the faults occurred due to several environmental factors, a classification methodology is designed. Feature extraction step is applied for each data set and then the data are classified with k-nearest neighbour (k-NN) and support vector machine (SVM) algorithms. The fault classification performances of k-NN and SVM algorithms were presented comparatively. Although the accuracy rates are close, it is observed that SVM performs better than the k-NN for fault classification problems of WGs.

Human capital is the key factor to maintain the competitiveness of an organization by having enough right people with the right skills. In technology advancement, machine learning technique can be used in order to identify the right... more

Human capital is the key factor to maintain the
competitiveness of an organization by having
enough right people with the right skills. In
technology advancement, machine learning
technique can be used in order to identify the right
employee for the right task by classifying their
performance achievement. Support Vector Machine
(SVM) is a powerful supervised machine learning
technique for classification because it uses kernel
trick with the ability to build expert knowledge for
the problem via kernel engineering process. In this
study, Sequential Minimal Optimization (SMO)
algorithm from SVM technique is the chosen
method due to its capability to solve most of convex
optimization problem. This study consists of four
phases; data collection, data preparation, model
development and model evaluation. In the
experimental phase, selected academician
performance achievement data in Malaysian Higher
Institution have been used as the training dataset
based on 10-fold cross validation. Several
experiments were carried out by using different set
of training and testing datasets to evaluate the
accuracy of the model. As a result, the accuracy of
the proposed model is considered acceptable and
needs further enhancement. For future work, to
enhance the accuracy of the proposed model, a
comparative study should be conducted using other
SVM algorithms such as Grid Search and Gabriel
graph algorithms that focus on reducing the size of a
training set.

This study employed Support Vector Machine (SVM) in the classification and prediction of fire outbreak based on fire outbreak dataset captured from the Fire Outbreak Data Capture Device (FODCD). The fire outbreak data capture device... more

This study employed Support Vector Machine (SVM) in the classification and prediction of fire outbreak based on fire outbreak dataset captured from the Fire Outbreak Data Capture Device (FODCD). The fire outbreak data capture device (FODCD) used was developed to capture environmental parameters values used in this work. The FODCD device comprised DHT11 temperature sensor, MQ-2 smoke sensor, LM393 Flame sensor, and ESP8266 Wi-Fi module, connected to Arduino nano v3.0.board. 700 data point were captured using the FODCD device, with 60% of the dataset used for training while 20% was used for testing and validation respectively. The SVM model was evaluated using the True Positive Rate (TPR), False Positive Rate (FPR), Accuracy, Error Rate (ER), Precision, and Recall performance metrics. The performance results show that the SVM algorithm can predict cases of fire outbreak with an accuracy of 80% and a minimal error rate of 0.2%. This system was able to predict cases of fire outbreak with a higher degree of accuracy. It is indicated that the use of sensors to capture real world dataset, and machine learning algorithm such as support vector machine gives a better result to the problem of fire management

As a primary defense technique, intrusion detection becomes more and more significant since the security of the networks is one of the most critical issues in the world. We present an adaptive collaboration intrusion detection method to... more

As a primary defense technique, intrusion detection becomes more and more significant since the security of the networks is one of the most critical issues in the world. We present an adaptive collaboration intrusion detection method to improve the safety of a network. A self-adaptive and collaborative intrusion detection model is built by applying the Environmentsclasses, agents, roles, groups, and objects (E-CARGO) model. The objects, roles, agents, and groups are designed by using decision trees (DTs) and support vector machines (SVMs), and adaptive scheduling mechanisms are set up. The KDD CUP 1999 data set is used to verify the effectiveness of the method. The experimental results demonstrate the feasibility and efficiency of the proposed collaborative and adaptive intrusion detection method. Also, the proposed method is shown to be more predominant than the methods that use a set of single type support vector machine (SVM) in terms of detection precision rate and recall rate.

The ability of Minkowski Functionals to characterize local structure in different biological tissue types has been demonstrated in a variety of medical image processing tasks. We introduce anisotropic Minkowski Functionals (AMFs) as a... more

The ability of Minkowski Functionals to characterize local structure in different biological tissue types has been demonstrated in a variety of medical image processing tasks. We introduce anisotropic Minkowski Functionals (AMFs) as a novel variant that captures the inherent anisotropy of the underlying gray-level structures. To quantify the anisotropy characterized by our approach, we further introduce a method to compute a quantitative measure motivated by a technique utilized in MR diffusion tensor imaging, namely fractional anisotropy. We showcase the applicability of our method in the research context of characterizing the local structure properties of trabecular bone micro-architecture in the proximal femur as visualized on multi-detector CT. To this end, AMFs were computed locally for each pixel of ROIs extracted from the head, neck and trochanter regions. Fractional anisotropy was then used to quantify the local anisotropy of the trabecular structures found in these ROIs and to compare its distribution in different anatomical regions. Our results suggest a significantly greater concentration of anisotropic trabecular structures in the head and neck regions when compared to the trochanter region (p < 10-4). We also evaluated the ability of such AMFs to predict bone strength in the femoral head of proximal femur specimens obtained from 50 donors. Our results suggest that such AMFs, when used in conjunction with multi-regression models, can outperform more conventional features such as BMD in predicting failure load. We conclude that such anisotropic Minkowski Functionals can capture valuable information regarding directional attributes of local structure, which may be useful in a wide scope of biomedical imaging applications.

MRI is the most important technique, in detecting the tumors in various body parts. In this paper survey of various data mining methods are used for classification of MRI images. A new hybrid technique based on the support vector machine... more

MRI is the most important technique, in detecting the tumors in various body parts. In this paper survey of various data mining methods are used for classification of MRI images. A new hybrid technique based on the support vector machine (SVM) and fuzzy c-means for brain tumor classification is studied in this paper. The algorithm is a combination of support vector machine (SVM) and fuzzy c-means, a hybrid technique for prediction of brain tumor. In this algorithm, the image is enhanced using enhancement techniques such as contrast improvement, and mid-range stretch. Fuzzy c-means (FCM) clustering is used for the segmentation of the image to detect the suspicious region in brain MRI image.

Data mining is known as the process of detection concerning patterns from essential amounts of data. As a process of knowledge discovery. Classification is a data analysis that extracts a model which describes an important data classes.... more

Data mining is known as the process of detection concerning patterns from essential amounts of data. As a process of knowledge discovery. Classification is a data analysis that extracts a model which describes an important data classes. One of the outstanding classifications methods in data mining is support vector machine classification (SVM). It is capable of envisaging results and mostly effective than other classification methods. The SVM is a one technique of machine learning techniques that is well known technique, learning with supervised and have been applied perfectly to a vary problems of: regression, classification, and clustering in diverse domains such as gene expression, web text mining. In this study, we proposed a newly mode for classifying iris data set using SVM classifier and genetic algorithm to optimize c and gamma parameters of linear SVM, in addition principle components analysis (PCA) algorithm was use for features reduction.

Cancer has become a leading cause of death worldwide. To deal with medical images to discover tumors and their types, Authors need a distinct experience in understanding medical images. Authors need machine learning techniques to reach... more

Cancer has become a leading cause of death worldwide. To deal with medical images to discover tumors and their types, Authors need a distinct experience in understanding medical images. Authors need machine learning techniques to reach great accuracy and speed to analyse these images to avoid a lack of experience or errors. In this paper, Authors will study a (SVM) of machine learning techniques used to classify brain images. SVM will be used in this paper to analyse brain images and discover Benign Tumor and Malignant tumor by using Matlab software. The results of the experiments conducted showed the accuracy of the system provided for the classification of tumor types (Benign, Malignant) found in medical brain images. Authors will adhere in this research that the images to be classified are limited by the presence of only two types of tumors. In the future, some pre-processing procedures will be added to the brain's medical images prior to the classification process.

The quality of fresh banana fruit is a main concern for consumers and fruit industrial companies. The effectiveness and fast classification of banana's maturity stage are the most decisive factors in determining its quality. It is... more

The quality of fresh banana fruit is a main concern for consumers and fruit industrial companies. The effectiveness and fast classification of banana's maturity stage are the most decisive factors in determining its quality. It is necessary to design and implement image processing tools for correct ripening stage classification of the different fresh incoming banana bunches. Ripeness in banana fruit generally affects the eating quality and the market price of the fruit. In this paper, an automatic computer vision system is proposed to identify the ripening stages of bananas. First, a four-class homemade database is prepared. Second, an artificial neural network-based framework which uses color, development of brown spots, and Tamura statistical texture features is employed to classify and grade banana fruit ripening stage. Results and the performance of the proposed system are compared with various techniques such as the SVM, the naive Bayes, the KNN, the decision tree, and discriminant analysis classifiers. Results reveal that the proposed system has the highest overall recognition rate, which is 97.75%, among other techniques.

in this paper, we explain our Traffic Detection technique using OpenCV concept, Neural Networks, Tensorflow, and how it is successfully detecting and identifying vehicles and other roadside attributes such as pedestrians, signs, and lane... more

in this paper, we explain our Traffic Detection technique using OpenCV concept, Neural Networks, Tensorflow, and how it is successfully detecting and identifying vehicles and other roadside attributes such as pedestrians, signs, and lane markings for a thorough analysis through a road surveillance camera image. Our pre-trained SVM model is highly efficient and accurate in performing the desired task successfully. The paper should be of interest to readers in the areas of Image processing, neural networks, and machine learning.

Generating accurate and timely internal and external audit reports may seem difficult for some auditors due to limited time or expertise in matching the correct clauses of the standard with the textual statement of findings. To overcome... more

Generating accurate and timely internal and external audit reports may seem difficult for some auditors due to limited time or expertise in matching the correct clauses of the standard with the textual statement of findings. To overcome this gap, this paper presents the design of text classification models using support vector machine (SVM) and long short-term memory (LSTM) neural network in order to automatically classify audit findings and standard requirements according to text patterns. Specifically, the study explored the optimization of datasets, holdout percentage and vocabulary of learned words called NumWords, then analyzed their capability to predict training accuracy and timeliness performance of the proposed text classification models. The study found that SVM (96.74%) and LSTM (97.54%) were at par with each other in terms of the best training accuracy, although SVM (67.96±17.93 seconds [s]) was found to be significantly faster than LSTM (136.67±96.42 s) in any dataset size. The study proposed optimization formulas that highlight dataset and holdout as predictors of accuracy, while dataset and NumWords as predictors of timeliness. In terms of actual implementation, both classification models were able to accurately classify 20 out of 20 sample audit findings at 1 and 3 s, respectively. Hence, the extent of choosing between the two algorithms depend on the datasets size, learned words, holdout percentage, and workstation speed. This paper is part of a series, which explores the use of Artificial Intelligence (AI) techniques in optimizing the performance of QMS in the context of a state university.

This article introduces an efficient approach to detect and identify unhealthy tomato leaves using image processing technique. The proposed approach consists of three main phases; namely pre-processing, feature extraction, and... more

This article introduces an efficient approach to detect and identify unhealthy tomato leaves using image processing technique. The proposed approach consists of three main phases; namely pre-processing, feature extraction, and classification phases. Since the texture characteristic is one of the most important features that describe tomato leaf, the proposed system system uses Gray-Level Co-occurrence Matrix (GLCM) for detecting and identifying tomato leaf state, is it healthy or infected. Support Vector Machine (SVM) algorithm with different kernel functions is used for classification phase. Datasets of total 800 healthy and infected tomato leaves images were used for both training and testing stages. N-fold cross-validation technique is used to evaluate the performance of the presented approach. Experimental results showed that the proposed classification approach has obtained classification accuracy of 99.83%, using linear kernel function.

The context of this work is the development of persons' personality recognition system using machine learning techniques. Identifying the personality traits from a face image are helpful in many situations, such as identification of... more

The context of this work is the development of persons' personality recognition system using machine learning techniques. Identifying the personality traits from a face image are helpful in many situations, such as identification of criminal behavior in criminology, students' learning attitudes in education sector and recruiting employees. Identifying the personality traits from a face image has rarely been studied. In this research identifying the personality traits from a face image includes three separate methods; ANN, SVM and deep learning. Face area of an image is identified by a color segmentation algorithm. Then that extracted image is input to personality recognition process. Features of the face are identified manually and input them to ANN and SVM. Each personality trait is valued from 1 to 9. In the second attempt m-SVM is used because outputs are multi-valued. ANN gave better results than m-SVM. In the third attempt we propose a methodology to identify personality traits using deep learning.

The suspended sediment load (SSL) is one of the major hydrological processes affecting the sustainability of river planning and management. Moreover, sediments have a significant impact on dam operation and reservoir capacity. To this... more

The suspended sediment load (SSL) is one of the major hydrological processes affecting the sustainability of river planning and management. Moreover, sediments have a significant impact on dam operation and reservoir capacity. To this end, reliable and applicable models are required to compute and classify the SSL in rivers. The application of machine learning models has become common to solve complex problems such as SSL modeling. The present research investigated the ability of several models to classify the SSL data. This investigation aims to explore a new version of machine learning classifiers for SSL classification at Johor River, Malaysia. Extreme gradient boosting, random forest, support vector machine, multi-layer perceptron and k-nearest neighbors classifiers have been used to classify the SSL data. The sediment values are divided into multiple discrete ranges, where each range can be considered as one category or class. This study illustrates two different scenarios related to the number of categories, which are five and 10 categories, with two time scales, daily and weekly. The performance of the proposed models was evaluated by several statistical indicators. Overall, the proposed models achieved excellent classification of the SSL data under various scenarios.

The universal tribulation with ever rising trend is the increase in the number of road accidents. This most disastrous pileups are the main cause of increasing death rates and casualties worldwide. The manifoldness of macabre of these... more

The universal tribulation with ever rising trend is the increase in the number of road accidents. This most disastrous pileups are the main cause of increasing death rates and casualties worldwide. The manifoldness of macabre of these disasters leads to a need for effective analysis and prediction of road accidents with the aid of powerful analysis tools and techniques. In this project, we analyses all the possible aspects of accidents happening in India and predict the number of accidents in the forthcoming years using python by data mining techniques. Various methods like PCA, SVM, classification and clustering algorithms are used in the process. Here, with the help of predictive analysis and time forecasting methods, the number of road accidents that are likely to occur in the upcoming years for different states are predicted with greater accuracy. Android Application is developed that alerts the travelers about the nearing potholes and bumps present on the roads so that they can take precautionary measures while travelling. These analysis and prediction can help the RTA to plan accordingly and reduce the number of road accidents by adapting safety measures for the welfare of the society.

Sign language is a tool used by deaf for communication. They use different gestures to express their thoughts and communicate to people. Each gestures or movement of the hand has a special assigned meaning. Previous studies were conducted... more

Sign language is a tool used by deaf for communication. They use different gestures to express their thoughts and communicate to people. Each gestures or movement of the hand has a special assigned meaning. Previous studies were conducted on hand gestures using Kinect camera, detection gloves, and leap motion controller to improve the accuracy in detection of the implied meanings. This study aims to develop a model that would increase the accuracy rate of detection, using a customized camera that addresses the background and lighting conditions. Moreover, the study will provide learning to those people who are not familiar in sign languages. The model were tested to 28 gestures together with the forming of words, it acquired an accuracy rate of 94.49%.

phân loại văn bản sử dụng học máy svm và naive bayes

Recommender systems can be implemented in several fields beginning E-commerce to set- up protection in the structure of personalized services. They offer assistance to mutually consumers and the manufacturers, through suggested matter to... more

Recommender systems can be implemented in several fields beginning E-commerce to set- up protection in the structure of personalized services. They offer assistance to mutually consumers and the manufacturers, through suggested matter to the consumers, which cannot exist demand until the recommendations given.User's input their preferred hotel features. (e.g. Pool,Gym, Restaurant, etc..) after the user's log-in, the content-based filtering SVM algorithm analyzes the hotel features and generates recommendations of what classes of the hotel it features. Classification through the procedure of Support Vector Machines (SVMs) is being suggested by putting away of user’s preferences in several activities and its equivalent characteristics therefore vectors are completed. The collaborative filtering generates filtered hotels constructed on the user's earlier involvement (Review, Rooms, Evaluations, and Prices) in some relationships with statistical method.Among the various classifications of algorithms are K-nearest neighbor, Naïve Bayes, Random Forest, and Support Vector Machine.The SVM was chosen as a previous research from Duvvur that shows more accurate than other models

Diseases in plants causes' major production as well as economic losses, to enhance crop production it is most important that plant diseases must be analyzed earlier so that effective control actions can be taken. This paper discusses the... more

Diseases in plants causes' major production as well as economic losses, to enhance crop production it is most important that plant diseases must be analyzed earlier so that effective control actions can be taken. This paper discusses the various bacterial/fungal capsicum diseases, how to identify/classify these diseases using image processing technique, capsicum is exposed to be infected by various bacterial, fungal and virus diseases, these disease symptoms are distinguishable through inspecting either stem, leave or fruit part of the capsicum.This proposed algorithm/method automatically identifies the capsicum diseases and classifies whether the capsicum or its leaf is normal or diseased i.e. having either bacterial or fungal disease, the infected area of the capsicum is extracted out by k-means clustering technique after that texture i.e. GLCM features are extracted for this infected area, by these features various bacterial/fungal capsicum diseases can be classified by using support vector machine (SVM). The different classifiers like Tree, Linear Discriminant, KNN and SVM are used for training and classification purpose, out of these classifiers KNN and SVM gives better results for our application. This system is tested on 62 images of healthy/diseased capsicum and its leaves, by SVM these images are well classified into healthy and diseased one with accuracy of 100%.

Due to sharp increases in data dimensions, working on every data mining or machine learning (ML) task requires more efficient techniques to get the desired results. Therefore, in recent years, researchers have proposed and developed many... more

Due to sharp increases in data dimensions, working on every data mining or machine learning (ML) task requires more efficient techniques to get the desired results. Therefore, in recent years, researchers have proposed and developed many methods and techniques to reduce the high dimensions of data and to attain the required accuracy. To ameliorate the accuracy of learning features as well as to decrease the training time dimensionality reduction is used as a pre-processing step, which can eliminate irrelevant data, noise, and redundant features. Dimensionality reduction (DR) has been performed based on two main methods, which are feature selection (FS) and feature extraction (FE). FS is considered an important method because data is generated continuously at an ever-increasing rate; some serious dimensionality problems can be reduced with this method, such as decreasing redundancy effectively, eliminating irrelevant data, and ameliorating result comprehensibility. Moreover, FE transacts with the problem of finding the most distinctive, informative, and decreased set of features to ameliorate the efficiency of both the processing and storage of data. This paper offers a comprehensive approach to FS and FE in the scope of DR. Moreover, the details of each paper, such as used algorithms/approaches, datasets, classifiers, and achieved results are comprehensively analyzed and summarized. Besides, a systematic discussion of all of the reviewed methods to highlight authors' trends, determining the method(s) has been done, which significantly reduced computational time, and selecting the most accurate classifiers. As a result, the different types of both methods have been discussed and analyzed the findings.

The fundamental cause of death among women in developed nations of the world is breast cancer. Breast cancer has been identified as one of the most deadly type of cancer prevalent among women globally. There have been a dramatic increase... more

The fundamental cause of death among women in developed nations of the world is breast cancer. Breast cancer has been identified as one of the most deadly type of cancer prevalent among women globally. There have been a dramatic increase of breast cancer cases among women of recent. Machine learning algorithms are effective tools that have found application in the field of medical imaging for early detection and diagnosis of cancer. This paper investigate the performance of eight (8) machine learning algorithms that have been applied for timely detection of breast cancer. Diagnosing breast cancer involves making a distinction between benign and malignant breast lumps. Our experimental results indicated that Support Vector Machine (SVM) have the best performance in term of classification accuracy (97.07%) and lowest error rate compared to Radial Based Function (96.49 %), Simple Linear Logistic Regression Model (96.78%), Naïve Bayes (96.48%), k-Nearest Neighbour (96.34%), AdaBoost (96.19%), Fuzzy Unordered Role Induction algorithm (96.78%) and Decision Tree-J48 (96.48%). All experiments are conducted using WEKA data mining and machine learning simulation environment.

Any abnormal activity can be assumed to be anomalies intrusion. In the literature several techniques and algorithms have been discussed for anomaly detection. In the most of cases true positive and false positive parameters have been used... more

Any abnormal activity can be assumed to be anomalies intrusion. In the literature several techniques and algorithms have been discussed for anomaly detection. In the most of cases true positive and false positive parameters have been used to compare their performance. However, depending upon the application a wrong true positive or wrong false positive may have severe detrimental effects. This necessitates inclusion of cost sensitive parameters in the performance. Moreover the most common testing dataset KDD-CUP-99 has huge size of data which intern require certain amount of pre-processing. Our work in this paper starts with enumerating the necessity of cost sensitive analysis with some real life examples. After discussing KDD-CUP-99 an approach is proposed for feature elimination and then features selection to reduce the number of more relevant features directly and size of KDD-CUP-99 indirectly. From the reported literature general methods for anomaly detection are selected which perform best for different types of attacks. These different classifiers are clubbed to form an ensemble. A cost opportunistic technique is suggested to allocate the relative weights to classifiers ensemble for generating the final result. The cost sensitivity of true positive and false positive results is done and a method is proposed to select the elements of cost sensitivity metrics for further improving the results to achieve the overall better performance. The impact on performance trade of due to incorporating the cost sensitivity is discussed. KEYWORDS Intrusion detection system (IDS), True positive (TP), False Positive(FP), Support Vector Machine (SVM).

A B S T R A C T Daily solar radiation is an important variable in many models. In this paper, the accuracy and performance of three soft computing techniques (i.e., adaptive neuro-fuzzy inference system (ANFIS), artificial neural network... more

A B S T R A C T Daily solar radiation is an important variable in many models. In this paper, the accuracy and performance of three soft computing techniques (i.e., adaptive neuro-fuzzy inference system (ANFIS), artificial neural network (ANN) and support vector machine (SVM) were assessed for predicting daily horizontal global solar radiation from measured meteorological variables in the Yucatán Peninsula, México. Model performance was assessed with statistical indicators such as root mean squared error (RMSE), mean absolute error (MAE) and coefficient of determination (R 2). The performance assessment indicates that the SVM technique with requirements of daily maximum and minimum air temperature, extraterrestrial solar radiation and rainfall has better performance than the other techniques and may be a promising alternative to the usual approaches for predicting solar radiation.

Techniques de commande DTC-SVM appliquées à la machine asynchrone

Mapping of patterns and spatial distribution of land-use/cover (LULC) has long been based on remotely sensed data. In the recent past, efforts to improve the reliability of LULC maps have seen a proliferation of image classification... more

Mapping of patterns and spatial distribution of land-use/cover (LULC) has long been based on remotely sensed data. In the recent past, efforts to improve the reliability of LULC maps have seen a proliferation of image classification techniques. Despite these efforts, derived LULC maps are still often judged to be of insufficient quality for operational applications, due to disagreement between generated maps and reference data. In this study we sought to pursue two objectives: first, to test the new-generation multispectral RapidEye imagery classification output using machine-learning random forest (RF) and support vector machines (SVM) classifiers in a heterogeneous coastal landscape; and second, to determine the importance of different RapidEye bands on
classification output. Accuracy of the derived thematic maps was assessed by computing confusion matrices of the classifiers’ cover maps with respective independent validation data sets. An overall classification accuracy of 93.07% with a kappa value of 0.92, and 91.80 with a kappa value of 0.92 was achieved using RF and SVM, respectively. In this study, RF and SVM classifiers performed comparatively similarly as demonstrated by the results of McNemer’s test (Z = 1.15). An evaluation of different RapidEye bands using the two classifiers showed that incorporation of the red-edge band has a significant effect on the overall classification accuracy in vegetation cover types. Consequently, pursuit of high classification accuracy using high-spatial resolution imagery on complex landscapes remains paramount.

Taking cue from the natural phenomenon of producer-scrounger process in animals where a producer searches for the food and the scrounger looks for opportunities to join, a new population based optimization technique called Group Search... more

Taking cue from the natural phenomenon of producer-scrounger process in animals where a producer searches for the food and the scrounger looks for opportunities to join, a new population based optimization technique called Group Search Optimization (GSO) has come to fore in the recent past. Among the classification algorithms Support Vector Machine (SVM) as a novel method has found applications in many areas. Its accuracy is highly dependent upon the kernel parameters and the relevant feature subset selection This paper proposes the use of a hybrid GSO-SVM for feature selection which can select relevant feature subsets from the classification dataset and also optimize the kernel parameters of the SVM classifier so as to achieve maximum classification accuracy. Elimination of the insignificant and useless inputs leads to a simplification of the classification problem, thereby producing faster and more accurate systems. The aim is to achieve maximum detection accuracy and to minimize computational complexity. The GSO-SVM is thus useful for parameter determination and feature selection in the SVM. The quality and effectiveness of the proposed methodology has been evaluated on standard machine learning datasets.

Support Vector Data Description (SVDD) is a variant of Support Vector Machines (SVM) used for one class classification. It is particularly designed for outlier detection and hence the focus of our paper. In this paper we solve the SVDD... more

Support Vector Data Description (SVDD) is a variant of Support Vector Machines (SVM) used for one class classification. It is particularly designed for outlier detection and hence the focus of our paper. In this paper we solve the SVDD optimization problem using gradient descent (primal problem) and minibatch gradient ascent(dual problem). We compare its performance with that of stochastic Support vector Machine (SVM), and test the algorithm on a generated synthetic dataset using kernel and multiple kernel methods.

—There is a great benefit of Alzheimer disease (AD) classification for health care application. AD is the most common form of dementia. This paper presents a new methodology of invariant interest point descriptor for Alzheimer disease... more

—There is a great benefit of Alzheimer disease (AD) classification for health care application. AD is the most common form of dementia. This paper presents a new methodology of invariant interest point descriptor for Alzheimer disease classification. The descriptor depends on the normalized Hu Moment Invariants (NHMI). The proposed approach deals with raw Magnetic Resonance Imaging (MRI) of Alzheimer disease. Seven Hu moments are computed for extracting images' features. These moments are then normalized giving new more powerful features that highly improve the classification system performance. The moments are invariant which is the robustness point of Hu moments algorithm to extract features. The classification process is implemented using two different classifiers, K-Nearest Neighbors algorithm (KNN) and Linear Support Vector Machines (SVM). A comparison among their performances is investigated. The results are evaluated on Alzheimer's Disease Neuroimaging Initiative (ADNI) database. The best classification accuracy is 91.4% for KNN classifier and 100% for SVM classifier.

An algorithm for automated, image-based segregation of cashew kernels into different categories is the need of the hour to drive up the productivity of the Indian cashew industry. The aim of this study is to find a supervised learning... more

An algorithm for automated, image-based segregation of cashew kernels into different categories is the need of the hour to drive up the productivity of the Indian cashew industry. The aim of this study is to find a supervised learning model that will accurately recognize and classify the cashew kernel into different grades. Various image processing techniques are used to preprocess the cashew image dataset. K-means clustering is used to perform colour image segmentation. Feature selection is performed first using neighbourhood component analysis, followed by stepwise regression. Two multi-class classification methods are implemented. Support Vector Machines (SVM) with ‘one-vs-one’ classification and Adaptive Directed Acyclic Graph (ADAG) learning model showed satisfactory results. However, even higher accuracy is obtained by using the Random Forest classification model. Random Forests are easy to train, which makes them good for high dimensional data, with a large number of training examples. The main contribution of this work is developing a robust and efficient computer vision system that can grade cashew kernels on the industrial scale with high accuracy and without compromising much on the speed of computation.

A reliable assessment of above ground forest biomass spatial distribution is needed for various applications ranging from carbon and bioenergy policies to sustainable forest management. Remotely sensed images have become one of the... more

A reliable assessment of above ground forest biomass spatial distribution is needed for various applications ranging from carbon and bioenergy policies to sustainable forest management. Remotely sensed images have become one of the primary sources for rapidly assessing the above ground biomass (AGB) from local to global scales. Among various remote sensing techniques, Radar remote sensing has shown promising results for forest ecosystem studies due to its unique sensitivity towards dielectric, geometrical and structural properties of vegetation cover. In this study, we have investigated the use of multi-frequency multi-polarized synthetic aperture radar (SAR) data from different sensors i.e., ALOS-2/PALSAR-2 and Sentinel-1/SAR for AGB estimation. Field inventory data from about 53 plots of 1 ha size each, collected from Nongkhyllem wildlife sanctuary and reserve forest located in the state of Meghalaya, India have been utilized for analysis. As anticipated from previous studies, the L-band backscatter has shown higher sensitivity towards AGB than C-band. Moreover, cross polarization has shown higher sensitivity towards AGB than like polarization. An attempt has been made of utilizing the multi-polarized backscatter for AGB retrieval using Support Vector Machine (SVM). An attempt has also been made to train and validate the SVM using C-band VH backscatter and L-band HV backscatter for AGB estimation. This model observed R and RMSE of 0.96 and 21.30 ton/ha respectively. The study results indicate that the multi-frequency cross polarized backscatter can significantly improve the AGB retrieval accuracy than single-band multi-polarized backscatter.

Plant leaf does an exceptional play within the production of the crops since the energy flow is through leaves. Which is why the recognition of the plant leaf sickness is demanded. In the projected paper, the image processing-based... more

Plant leaf does an exceptional play within the production of the crops since the energy flow is through leaves. Which is why the recognition of the plant leaf sickness is demanded. In the projected paper, the image processing-based methodology is employed to spot the plant diseases by relying on the symptoms on the leaves. There are five steps in the present work. The work starts with the acquisition of images. The Image pre-processing for increasing the standard of the photographs which includes hue conversion, sound elimination, histogram linearization. The snaps are bifurcated employing Mean Shift Cluster algorithmic rule. Hue, form, surface characteristics are obtained. In the final stage, disease classification is accomplished employing Support Vector Machine (SVM).

Accurate and fast islanding detection of distributed generation is highly important for its successful operation in distribution networks. Up to now, various islanding detection technique based on communication, passive, active and... more

Accurate and fast islanding detection of distributed generation is highly important for its successful operation
in distribution networks. Up to now, various islanding detection technique based on communication,
passive, active and hybrid methods have been proposed. However, each technique suffers from
certain demerits that cause inaccuracies in islanding detection. Computational intelligence based techniques,
due to their robustness and flexibility in dealing with complex nonlinear systems, is an option
that might solve this problem. This paper aims to provide a comprehensive review of computational
intelligence based techniques applied for islanding detection of distributed generation. Moreover, the
paper compares the accuracies of computational intelligence based techniques over existing techniques
to provide a handful of information for industries and utility researchers to determine the best method
for their respective system.

In recent years, there has been an incredible improvement in battery technology because of the occurrence of EVs and HEVs. However, the State-of-Charge (SoC) estimation remains a challenge in battery engineering. SoC is the ratio of... more

In recent years, there has been an incredible improvement in battery technology because of the occurrence of EVs and HEVs. However, the State-of-Charge (SoC) estimation remains a challenge in battery engineering. SoC is the ratio of available capacity and maximum possible charge that can be stored in a battery. SoC estimation is of prime importance with relation to battery safety and maintenance. This paper shows SoC estimation by optimisation SVM technique. Support Vector Machine (SVM) is a kind of learning machine based on statistical learning premises. An accurate SoC estimation can improve the performance of the battery and raise the security of the EVs. The SoC cannot only protect the battery, avoid overcharge or discharge, but also improve the battery life. Therefore, the aim of this study is correct sampling of voltage, current and temperature signals. In this project a SVM optimized by Particle Swarm Optimization (PSO) to boost SoC estimation accuracy.

A B S T R A C T Abnormal activity recognition is a challenging task in surveillance videos. In this paper, we propose an approach for abnormal activity recognition based on graph formulation of video activities and graph kernel support... more

A B S T R A C T Abnormal activity recognition is a challenging task in surveillance videos. In this paper, we propose an approach for abnormal activity recognition based on graph formulation of video activities and graph kernel support vector machine. The interaction of the entities in a video is formulated as a graph of geometric relations among space– time interest points. The vertices of the graph are spatio-temporal interest points and an edge represents the relation between appearance and dynamics around the interest points. Once the activity is represented using a graph, then for classification of the activities into normal or abnormal classes, we use binary support vector machine with graph kernel. These graph kernels provide robustness to slight topological deformations in comparing two graphs, which may occur due to the presence of noise in data. We demonstrate the efficacy of the proposed method on the publicly available standard datasets viz. UCSDped1, UCSDped2 and UMN. Our experiments demonstrate high rate of recognition and outperform the state-of-the-art algorithms.

Image retrieval is still an active research topic in the computer vision field. There are existing several techniques to retrieve visual data from large databases. Bag-of-Visual Word (BoVW) is a visual feature descriptor that can be used... more

Image retrieval is still an active research topic in the computer vision field. There are existing several techniques to retrieve visual data from large databases. Bag-of-Visual Word (BoVW) is a visual feature descriptor that can be used successfully in Content-based Image Retrieval (CBIR) applications. In this paper, we present an image retrieval system that uses local feature descriptors and BoVW model to retrieve efficiently and accurately similar images from standard databases. The proposed system uses SIFT and SURF techniques as local descriptors to produce image signatures that are invariant to rotation and scale. As well as, it uses K-Means as a clustering algorithm to build visual vocabulary for the features descriptors that obtained of local descriptors techniques. To efficiently retrieve much more images relevant to the query,
SVM algorithm is used. The performance of the proposed system is evaluated by calculating both precision and recall. The
experimental results reveal that this system performs well on two different standard datasets.

The focus of the research study was analysis of diabetes dataset and how it will perform if we try to do a prediction of diabetes with different machine learning algorithms. We used the original dataset from the National Institute of... more

The focus of the research study was analysis of
diabetes dataset and how it will perform if we try to do a
prediction of diabetes with different machine learning
algorithms. We used the original dataset from the National
Institute of Diabetes, and Digestive and Kidney Diseases. The
dataset can be used to predict whether or not a patient has
diabetes, based on certain diagnostics. For analysis we used
Amazon Web Services. We used AWS S3 service to store our
dataset, and Amazon Sagemaker to perform an analysis. For
the given dataset we applied three classification models:
Logistic Regression Model, K-nearest Neighbors and
Support Vector Machines. For each of the models we also
performed a performance measurement. We also compared
all the results we got and according to the results, Support
Vector Machines has the best performance. Insights and
recommendations are provided.