Features Selection Research Papers - Academia.edu (original) (raw)

With the recently increasing interest for opinion mining from different research communities, there is an evolving body of work on Arabic Sentiment Analysis. There are few available polarity annotated datasets for this language, so most... more

With the recently increasing interest for opinion mining from different research communities, there is an evolving body of work on Arabic Sentiment Analysis. There are few available polarity annotated datasets for this language, so most existing works use these datasets to test the best known supervised algorithms for their objectives. Naïve Bayes and SVM are the best reported algorithms in the Arabic sentiment analysis literature. The work described in this paper shows that using a genetic algorithm to select features and enhancing the quality of the training dataset improve significantly the accuracy of the learning algorithm. We use the LABR dataset of book reviews and compare our results with LABR's authors' results.

Opinion analysis is by a long shot most basic zone of characteristic language handling. It manages the portrayal of information to choose the motivation behind the wellspring of the content. The reason might be of a type of gratefulness... more

Opinion analysis is by a long shot most basic zone of characteristic language handling. It manages the portrayal of information to choose the motivation behind the wellspring of the content. The reason might be of a type of gratefulness (positive) or study (negative). This paper offers a correlation between the outcomes accomplished by applying the calculation arrangement using various classifiers for instance K-nearest neighbor and multinomial naive Bayes. These techniques are utilized to assess a significant assessment with either a positive remark or negative remark. The gathered information considered on the grounds of the extremity film datasets and an association with the results accessible proof has been created for a careful assessment. This paper investigates the word level count vectorizer and term frequency inverse document frequency (TF-IDF) influence on film sentiment analysis. We concluded that multinomial Naive Bayes (MNB) classier generate more accurate result using ...

Handwriting or character recognition is one of the most challenging topics and the oldest application of pattern recognition. Till now, all the research progress and innovation has been made either regarding enhancing the preprocessing... more

Handwriting or character recognition is one of the most challenging topics and the oldest application of pattern recognition. Till now, all the research progress and innovation has been made either regarding enhancing the preprocessing steps or via advancing the language modelling process. In this paper, we are presenting a combined method for Hiragana handwriting recognition through the exploration of morphological processing and features extraction. Our approach is tested in online mode and uses two different classifiers for evaluation, the Support Vector Machine (SVM) and the K-Nearest Neighbour (KNN). The outcome of this investigation gave knowledge about the best combination and structure of the adequate method for performing a recognition accuracy of 90.3 percent on online test data not used in the training.

This article proposes an automatic approach-based on non-verbal speech features-aimed at the automatic discrimination between depressed and non-depressed speakers. The experiments have been performed over one of the largest corpora... more

This article proposes an automatic approach-based on non-verbal speech features-aimed at the automatic discrimination between depressed and non-depressed speakers. The experiments have been performed over one of the largest corpora collected for such a task in the literature (62 patients diagnosed with depression and 54 healthy control subjects), especially when it comes to data where the depressed speakers have been diagnosed as such by professional psychiatrists. The results show that the discrimination can be performed with an accuracy of over 75% and the error analysis shows that the chances of correct classification do not change according to gender, depression-related pathology diagnosed by the psychiatrists or length of the pharmacological treatment (if any). Furthermore, for every depressed subject, the corpus includes a control subject that matches age, education level and gender. This ensures that the approach actually discriminates between depressed and non depressed speakers and does not simply capture differences resulting from other factors.

In the era of data explosion, speech emotion plays crucial commercial significance. Emotion recognition in speech encompasses a gamut of techniques starting from mechanical recording of audio signal to complex modeling of extracted... more

In the era of data explosion, speech emotion plays crucial commercial significance. Emotion recognition in speech encompasses a gamut of techniques starting from mechanical recording of audio signal to complex modeling of extracted patterns. Most challenging part of this research purview is to classify the emotion of the speech purely based on the physical characteristics of the audio signal independent of language of speech. This paper focuses on the predictive modeling of audio speech data based on most viable feature set extraction and deployment of these features to predict the emotion of unknown speech data. We have used two most widely used classifiers, a variant of CART and Naïve Bayes, to model the dynamics of interplay of crucial features like Root Mean Square (RMS), Zero Cross Rate (ZCR), Pitch and Brightness of audio signal to determine the emotion of speech. In order to carry out comparative analysis of the proposed classifiers, a set of experiments on real speech data is conducted. Results clearly indicate that decision tree based classifier works well on accuracy whereas Naïve Bayes works fairly well on generality.

Identification of bandwidth-heavy Internet traffic is important for network administrators to throttle high-bandwidth application traffic. Flow features based classification have been previously proposed as promising method to identify... more

Identification of bandwidth-heavy Internet traffic is important for network administrators to throttle high-bandwidth application traffic. Flow features based classification have been previously proposed as promising method to identify Internet traffic based on packet statistical features. The selection of statistical features plays an important role for accurate and timely classification. In this work, we investigate the impact of packet inter-arrival time feature for online P2P classification in terms of accuracy, Kappa statistic and time. Simulations were conducted using available traces from University of Brescia, University of Aalborg and University of Cambridge. Experimental results show that the inclusion of inter-arrival time (IAT) as an online feature increases simulation time and decreases classification accuracy and Kappa statistic. 1. INTRODUCTION Today, peer-to-peer (P2P) is as an architecture for sharing a wide range of media on the Internet. P2P traffic represents about 27% to 60% of the total Internet traffic, depending on geographic location [1], [2]. The high volume of P2P traffic is due to file sharing, video streaming, on-line gaming and other activities that client-server architecture cannot accomplish as fast or as efficient as the P2P architecture. Rapid progression of P2P traffic volume throughout the years have resulted in deteriorated network performance and congestion due to the high bandwidth consumption of P2P applications [3]. Therefore, traffic identification is required to improve traffic management. First generation P2P application traffic were relatively easy to be identified due to the use of fixed ports numbers. However, current P2P applications are able to circumvent port-based identification by using anonymous port numbers or port disguise [4], [2]. Besides, methods that rely on inspecting application payload signatures have also been proposed [5]. For privacy and impractical reasons, this method is ineffective. The effectiveness of the port-based and payload-based methods prompted the use of flow statistics as features for traffic identification. These strategies offer flexibility to detect P2P traffic compared to using signature-based and port-based methods. Several techniques have been proposed over the last two decades that focused on the attainable identification accuracy using several machine learning (ML) algorithms. However, the impact of exploring the effect of distinct sets of statistical features has not been researched in-depth. Work in [6] has reported that