Naïve Bayes Research Papers - Academia.edu (original) (raw)
Dimensionality refers to number of terms in a web page. While classifying web pages high dimensionality of web pages causes problem. The main objective of reducing dimensionality of web pages is improving the performance of classifier.... more
Dimensionality refers to number of terms in a web page. While classifying web pages high dimensionality of web pages causes problem. The main objective of reducing dimensionality of web pages is improving the performance of classifier. Processing time and accuracy are two parameters which influence the performance of a classifier. To reduce the processing time, less informative and redundant terms have to be removed from web pages. This research describes hybrid approach for dimensionality reduction in web page classification using a rough set and naïve Bayesian method. Feature selection and dimensionality reduction methods are used for reducing the dimensionality. Information gain method is used as feature selection method. Rough set based Quick Reduct algorithm is used for dimensionality reduction. Naïve Bayesian method is used for classifying web pages to optimal predefined categories. Assignment of web pages to category is based on maximum posterior probability. Words remaining ...
The term non-permanent employee first appeared in the rule of law, namely in Act Number 13 Year 2003 concerning Manpower.This Act has an impact on the emergence of clarity about staffing status so that the salaries obtained by employees... more
The term non-permanent employee first appeared in the rule of law, namely in Act Number 13 Year 2003 concerning Manpower.This Act has an impact on the emergence of clarity about staffing status so that the salaries obtained by employees do not match their workload.Therefore, this study aims to determine employees who are eligible to earn the same income at one of the private universities in Palembang based on university's strategic plan, namely class, employment status, membership, and education permit using the Naïve Bayes method. The results showed that the highest accuracy of predictive conclusions for non-permanent and permanent employees is 83.33%, while the lowest accuracy value is 50%.
Twitter is the popular and commonly used social networking platform because it permits users to express their thoughts, opinions about any item, and allows them to post comments or messages all around the world. Sentiment Analysis... more
Twitter is the popular and commonly used social networking platform because it permits users to express their thoughts, opinions about any item, and allows them to post comments or messages all around the world. Sentiment Analysis techniques are used to study and analyze these reviews or opinions. Sentiment analysis is a NLP technique that is used to express opinions into dif erent sentiments like positive, negative, and neutral. In this paper, we take Airline Dataset from Twitter and did sentiment analysis on that dataset using machine learning algorithms like SVM, Naïve Bayes and Random Forest. Sentiments are expressed in three categories positive, negative and neutral. Our dataset contains 11533 tweets and the dataset is not balanced. The performance of various machine learning algorithms is discussed in this paper
In this paper, a hybrid method has been introduced to improve the classification performance of naïve Bayes (NB) for the mixed dataset and multi-class problems. This proposed method relies on a similarity measure which is applied to... more
In this paper, a hybrid method has been introduced to improve the classification performance of naïve Bayes (NB) for the mixed dataset and multi-class problems. This proposed method relies on a similarity measure which is applied to portions that are not correctly classified by NB. Since the data contains a multi-valued short text with rare words that limit the NB performance, we have employed an adapted selective classifier based on similarities (CSBS) classifier to exceed the NB limitations and included the rare words in the computation. This action has been achieved by transforming the formula from the product of the probabilities of the categorical variable to its sum weighted by numerical variable. The proposed algorithm has been experimented on card payment transaction data that contains the label of transactions: the multi-valued short text and the transaction amount. Based on K-fold cross validation, the evaluation results confirm that the proposed method achieved better results in terms of precision, recall, and F-score compared to NB and CSBS classifiers separately. Besides, the fact of converting a product form to a sum gives more chance to rare words to optimize the text classification, which is another advantage of the proposed method.
- by TELKOMNIKA JOURNAL and +1
- •
- Naïve Bayes, CSBS, multi-classification
In this study, we present a new classifier that combines the distance-based algorithm K-Nearest Neighbor and statistical based Naïve Bayes Classifier. That is equipped with the power of both but avoid their weakness. The performance of... more
In this study, we present a new classifier that combines the distance-based algorithm K-Nearest Neighbor and statistical based Naïve Bayes Classifier. That is equipped with the power of both but avoid their weakness. The performance of the proposed algorithm in terms of accuracy is experimented on some standard datasets from the machine-learning repository of University of California and compared with some of the art algorithms. The experiments show that in most of the cases the proposed algorithm outperforms the other to some extent. Finally we apply the algorithm for predicting profitability positions of some financial institutions of Bangladesh using data provided by the central bank.
The term non-permanent employee first appeared in the rule of law, namely in Act Number 13 Year 2003 concerning Manpower.This Act has an impact on the emergence of clarity about staffing status so that the salaries obtained by employees... more
The term non-permanent employee first appeared in the rule of law, namely in Act Number 13 Year 2003 concerning Manpower.This Act has an impact on the emergence of clarity about staffing status so that the salaries obtained by employees do not match their workload.Therefore, this study aims to determine employees who are eligible to earn the same income at one of the private universities in Palembang based on university's strategic plan, namely class, employment status, membership, and education permit using the Naïve Bayes method. The results showed that the highest accuracy of predictive conclusions for non-permanent and permanent employees is 83.33%, while the lowest accuracy value is 50%.
Ensemble learning is one of machine learning method that can solve performance measurement problem. Standalone classifiers often show a poor performance result, thus why combining them with ensemble methods can improve their performance... more
Ensemble learning is one of machine learning method that can solve performance measurement problem. Standalone classifiers often show a poor performance result, thus why combining them with ensemble methods can improve their performance scores. Ensemble learning has several methods, in this study, three methods of ensemble learning are compared with standalone classifiers of support vector machine, Naïve Bayes, and decision tree. bagging, AdaBoost, and voting are the ensemble methods that are combined then compared to standalone classifiers. From 1670 dataset of twitter mentions about tourist's attraction, ensemble methods did not show a specific improvement in accuracy and precision measurement since it generated the same result as decision tree as standalone classifier. Bagging method showed a significant development in recall, f-measure, and area under curve (AUC) measurement. For overall performance, decision tree as standalone classifier and decision tree with AdaBoost method have the highest score for accuracy and precision measurements, meanwhile support vector machine with bagging method has the highest score for recall, f-measure, and AUC.
Cyberbullying is a type of tormenting wherein technology is utilized as a medium to menace somebody. As the new blast of the web and other social media platforms are expanding, the quantity of users is additionally expanding and the... more
Cyberbullying is a type of tormenting wherein technology is utilized as a medium to menace somebody. As the new blast of the web and other social media platforms are expanding, the quantity of users is additionally expanding and the primary users of online networking are for the most part adolescents and young adults. As much as these social media platforms are utilized for getting new data and for amusement, it is increasingly inclined for bullies to utilizes these systems as helpless against assaults against casualties. Because of the expansion in cyberbullying on casualties, it is deprived to build up an appropriate strategy for the identification and anticipation of cyberbullying. A developing assortment of work is rising on mechanized ways to deal with cyberbullying location. These methodologies use machine learning and natural language processing techniques to identify the characteristics of a cyberbullying exchange and automatically detect cyberbullying by matching Textual data. The primary goal of this task is to distinguish cyberbullying by coordinating both Image and Textual information. The test cases are utilized to characterize the dataset and distinguish the bullying. Machine learning techniques are utilized to proficiently anticipate and identify cyberbullying.
The profitability promoted by Google in its well-known video distribution platform YouTube has attracted an increasing number of users. However, such success has also attracted a large number of malicious users, which aim to self-promote... more
The profitability promoted by Google in its well-known video distribution platform YouTube has attracted an increasing number of users. However, such success has also attracted a large number of malicious users, which aim to self-promote their videos or circulate viruses and malware. As we know that YouTube offers limited tools for comment moderation, so spam increases very rapidly and that's why the comment section of the owners is disabled. It is very difficult to established classification methods for automatic spam filtering since the messages are very short and often widespread with slangs, symbols, and abbreviations. In this paper, we have evaluated several top-performance classification techniques for detecting and analyzing spam comments. The statistical analysis of results indicates that, with 99.9% of confidence level, decision trees, logistic regression, Bernoulli Naive Bayes, random forests, linear and Gaussian SVMs are statistically equivalent in maximum rate. Therefore, it is very important to find a way to detect these comments on videos and report them before they are viewed by innocent users.
Lung cancer is a malignant lung tumour that is characterised by the regulated growth of cells in the lung tissue. The most common cancer diagnosed worldwide is lung cancer. More deaths than any other kind of cancer occur due to lung... more
Lung cancer is a malignant lung tumour that is characterised by the regulated growth of cells in the lung tissue. The most common cancer diagnosed worldwide is lung cancer. More deaths than any other kind of cancer occur due to lung cancer. Early diagnosis and care are very useful and efficient for the survival of cancer patients. Different image processing and soft computing methods may be used for identifying cancer cells from medical images. Classification depends on features extracted from the images. In order to produce better classification results, the focus is on the feature extraction level. In order to distinguish a pattern that can provide some useful insights into what combination of features is most likely to result in an abnormality, this knowledge is then given to machine learning algorithms. The prediction of lung cancer is analysed using various machine learning classification algorithms such as Naive Bayes, Support Vector Machine, Artificial Neural Network and Logistic Regression. The key aim of this paper is to diagnose lung cancer early by examining the performance of exist classification algorithms.
- by IJCSMC Journal and +1
- •
- Computer Science, Algorithms, Information Technology, Technology
In our current generation we are very much habituated to many mobile services like communication, ecommerce etc. In mobile communication services SMS's (Short Message Service's) are very common and important services which we are using in... more
In our current generation we are very much habituated to many mobile services like communication, ecommerce etc. In mobile communication services SMS's (Short Message Service's) are very common and important services which we are using in personal purposes and profession. In these services some messages may cause spam attacks which is trap to users to access their personal information or attracting them to purchase a product from unauthorized websites. It is very easy for companies send any information or service or alert to their customers/users with these SMS API's. Based on these services it is also possible for sending spam messages. So in this system we are using advance Machine Learning concepts for detection of the spam filtering in the SMS's. In this system we are importing the dataset from UCI repository and for spam SMS detection we implementing machine learning classifiers like Support Vector Machine (SVM), K-Nearest Neighbor (KNN), and Neural Networks (NN) algorithms and with their metrics like accuracy, precision, recall and f-score. We calculate performances between there algorithms as well as we show the experiment results with visualization techniques and analyses which algorithm is best for spam SMS detection.
Sentiment analysis is an opinion mining process, in which computational analysis and categorization of opinion of a piece of text is done to obtain an unbiased understanding of the writer’s opinion towards any specific topic. In this... more
Sentiment analysis is an opinion mining process, in which computational analysis and categorization of opinion of a
piece of text is done to obtain an unbiased understanding of the writer’s opinion towards any specific topic. In this paper,
Sentiment Analysis of the twitter user demographic towards Citizenship Amendment Act, which came into effect in India from
January 10th, 2020, has been done. CAA was considered, as it had garnered mixed opinions from different sections of the Indian
demographic, so there was no clear understanding of the overall sentiment of the public towards it. It had also led to protests and
riots in various parts of India, which the Government struggled to handle as it was unexpected
The main reason behind the spread of fake news is because of many fake and hyperpartisan sites present on the Internet. These fake sites try to manipulate the truth which creates misunderstanding in society. Therefore, it is important to... more
The main reason behind the spread of fake news is because of many fake and hyperpartisan sites present on the Internet. These fake sites try to manipulate the truth which creates misunderstanding in society. Therefore, it is important to detect fake news and try to make people aware of the truth. This paper gives an insight into how to detect fake news using Machine Learning and Deep Learning Techniques. On observing our data, we have categorized our data into five attributes namely Title, Text, Subject, Date, and Labels. In order to develop an efficient fake news detection system, the feature along with its degree of impact on the system must be taken into consideration. This paper attempts at providing a detailed analysis of detecting fake news using various models such as LSTM, ANN, Naïve Bayes, SVM, Logistic Regression, XGBoost, and Bert.
The combat between viral diseases and the human has been continuing since long history. But according to the evolution theory every entity of the world works for their survival always so, even the minute viruses also. Therefore, spreading... more
The combat between viral diseases and the human has been continuing since long history. But according to the evolution theory every entity of the world works for their survival always so, even the minute viruses also. Therefore, spreading infection through viruses are rapidly-evolving day by day and therefore it imposes a substantial burden on humans in terms of morbidity and mortality. Even though in this era, we have many advance techniques for diagnosis, prevention and treatment of infectious diseases, still the arrival of new diseases put a critical and emergence challenge to the global population. Recent example is the novel coronavirus, COVID-19, which was first found in Wuhan, China, and promptly became a global pandemic. There was no medicine available to cure the patient from this novel virus disease. In this tough situation, doctors and drug specialist were manually recommending the existing medicine based on the symptom occurs in the patients. During this process lots of infected people have died due to a lack of proper medicine. Therefore, in this work we have implemented a disease prediction system based on various symptoms of the disease. In addition to this, we came with an idea which can help the medicine industries towards the development of medicine for any viral disease using Machine Learning technique. Basically, this technique analyses the symptoms and predicts the best suitable medicine for any new disease. Moreover, this method, also predict the required composition of chemical elements that can be used by the medicine companies medicine to develop the new medicine under the supervisions of drug experts
Internet of things (IoT) has led to several security threats and challenges within society. Regardless of the benefits that it has brought with it to the society, IoT could compromise the security and privacy of individuals and companies... more
Internet of things (IoT) has led to several security threats and challenges within society. Regardless of the benefits that it has brought with it to the society, IoT could compromise the security and privacy of individuals and companies at various levels. Denial of Service (DoS) and Distributed DoS (DDoS) attacks, among others, are the most common attack types that face the IoT networks. To counter such attacks, companies should implement an efficient classification/detection model, which is not an easy task. This paper proposes a classification model to examine the effectiveness of several machine-learning algorithms, namely, Random Forest (RF), k-Nearest Neighbors (KNN), and Naïve Bayes. The machine learning algorithms are used to detect attacks on the UNSW-NB15 benchmark dataset. The UNSW-NB15 contains normal network traffic and malicious traffic instants. The experimental results reveal that RF and KNN classifiers give the best performance with an accuracy of 100% (without noise injection) and 99% (with 10% noise filtering), while the Naïve Bayes classifier gives the worst performance with an accuracy of 95.35% and 82.77 without noise and with 10% noise, respectively. Other evaluation matrices, such as precision and recall, also show the effectiveness of RF and KNN classifiers over Naïve Bayes.
Internet of things (IoT) has led to several security threats and challenges within society. Regardless of the benefits that it has brought with it to the society, IoT could compromise the security and privacy of individuals and companies... more
Internet of things (IoT) has led to several security threats and challenges within society. Regardless of the benefits that it has brought with it to the society, IoT could compromise the security and privacy of individuals and companies at various levels. Denial of Service (DoS) and Distributed DoS (DDoS) attacks, among others, are the most common attack types that face the IoT networks. To counter such attacks, companies should implement an efficient classification/detection model, which is not an easy task. This paper proposes a classification model to examine the effectiveness of several machine-learning algorithms, namely, Random Forest (RF), k-Nearest Neighbors (KNN), and Naïve Bayes. The machine learning algorithms are used to detect attacks on the UNSW-NB15 benchmark dataset. The UNSW-NB15 contains normal network traffic and malicious traffic instants. The experimental results reveal that RF and KNN classifiers give the best performance with an accuracy of 100% (without noise injection) and 99% (with 10% noise filtering), while the Naïve Bayes classifier gives the worst performance with an accuracy of 95.35% and 82.77 without noise and with 10% noise, respectively. Other evaluation matrices, such as precision and recall, also show the effectiveness of RF and KNN classifiers over Naïve Bayes.
This paper is present a novel approach for solving the pending under-reach problem encountered by distance relay protection scheme in the 3rd zones protection coverage for a midpoint STATCOM compensated transmission lines. The propose... more
This paper is present a novel approach for solving the pending under-reach problem encountered by distance relay protection scheme in the 3rd zones protection coverage for a midpoint STATCOM compensated transmission lines. The propose transmission line model is develop in Matlab for analyzed feature extraction using Discrete Wavelet multiresolution analysis approach. Extracted feature from standard deviation and entropy energy contents of SLG transient faults current at location beyond the integrated STATCOM used for machine learning algorithm model building using WEKA software. The Naïve Bayes classifier model perform best with robustness prediction and detection of faults with quick convergence even with less training data. The outperformance of the proposed classifier has been 100 % for the relay algorithm modification for under-reach problem elimination in 3rd zones protection coverage.
The combat between viral diseases and the human has been continuing since long history. But according to the evolution theory every entity of the world works for their survival always so, even the minute viruses also. Therefore, spreading... more
The combat between viral diseases and the human has been continuing since long history. But according to the evolution theory every entity of the world works for their survival always so, even the minute viruses also. Therefore, spreading infection through viruses are rapidly-evolving day by day and therefore it imposes a substantial burden on humans in terms of morbidity and mortality. Even though in this era, we have many advance techniques for diagnosis, prevention and treatment of infectious diseases, still the arrival of new diseases put a critical and emergence challenge to the global population. Recent example is the novel coronavirus, COVID-19, which was first found in Wuhan, China, and promptly became a global pandemic. There was no medicine available to cure the patient from this novel virus disease. In this tough situation, doctors and drug specialist were manually recommending the existing medicine based on the symptom occurs in the patients. During this process lots of infected people have died due to a lack of proper medicine. Therefore, in this work we have implemented a disease prediction system based on various symptoms of the disease. In addition to this, we came with an idea which can help the medicine industries towards the development of medicine for any viral disease using Machine Learning technique. Basically, this technique analyses the symptoms and predicts the best suitable medicine for any new disease. Moreover, this method, also predict the required composition of chemical elements that can be used by the medicine companies medicine to develop the new medicine under the supervisions of drug experts.
Emosi bersifat umum dan penting dalam kehidupan yang membentuk perilaku manusia. Mendeteksi emosi memberikan peranan penting dalam berbagai aspek karena dapat diterapkan dalam berbagai bidang seperti pengambilan keputusan, memprediksi... more
Emosi bersifat umum dan penting dalam kehidupan yang membentuk perilaku manusia. Mendeteksi emosi memberikan peranan penting dalam berbagai aspek karena dapat diterapkan dalam berbagai bidang seperti pengambilan keputusan, memprediksi keadaan emosi manusia, memberikan review terhadap kualitas produk, melacak dukungan pada masalah politik, dan mengenali gangguan depresi seseorang. Mengidentifikasi emosi dapat menggunakan data tekstual yaitu berupa teks, karena teks dapat digunakan untuk berkomunikasi dan menyampaikan informasi. Salah satu media sosial yang digunakan untuk bertukar informasi adalah Twitter. Twitter berisi informasi tentang sikap serta keadaan emosi seseorang. Oleh karena itu, dilakukan deteksi emosi pada Twitter untuk menentukan emosi seseorang dengan menggunakan metode Naive Bayes dan kombinasi fitur. Penelitian ini digunakan beberapa model klasifikasi Naive Bayes yaitu Bernoulli Naive Bayes untuk tipe data biner dan Multinomial Naive Bayes untuk tipe data diskrit, k...
The main reason behind the spread of fake news is because of many fake and hyperpartisan sites present on the Internet. These fake sites try to manipulate the truth which creates misunderstanding in society. Therefore, it is important to... more
The main reason behind the spread of fake news is because of many fake and hyperpartisan sites present on the Internet. These fake sites try to manipulate the truth which creates misunderstanding in society. Therefore, it is important to detect fake news and try to make people aware of the truth. This paper gives an insight into how to detect fake news using Machine Learning and Deep Learning Techniques. On observing our data, we have categorized our data into five attributes namely Title, Text, Subject, Date, and Labels. In order to develop an efficient fake news detection system, the feature along with its degree of impact on the system must be taken into consideration. This paper attempts at providing a detailed analysis of detecting fake news using various models such as LSTM, ANN, Naïve Bayes, SVM, Logistic Regression, XGBoost, and Bert.
A spam filter is a program which is used to identify unwanted emails and prevents those messages from getting into a user's mail. The study was focused on how the algorithms can be applied on a number of e-mails consisting of both ham and... more
A spam filter is a program which is used to identify unwanted emails and prevents those messages from getting into a user's mail. The study was focused on how the algorithms can be applied on a number of e-mails consisting of both ham and spam e-mails. First, the working principle and steps which are followed for implementation of stop words, TF-IDF and stemming algorithm on NVIDIA's Tesla P100 GPU are discussed and to verify the findings by executing of Naïve Bayes algorithm. After complete training and testing of the spam e-mails dataset taken from Kaggle by using the proposed method, we got a high training accuracy of 99.67% and got a testing accuracy of about 99.03% on the multicore GPU that boosted the speed of execution of training time period and testing time period which is improved of training and testing accuracy around 0.22% and 0.18% respectively when compared to that after applying only Naïve Bayes i.e. conventional method to the same dataset where we found training and testing accuracy to be 99.45% and 98.85% respectively. Also, we found that training time taken on GPU is 1.361 seconds which was about 1.49X faster than that taken on CPU which is 2.029 seconds. And the testing time taken on GPU is 1.978 seconds which was about 1.15X faster than that taken on CPU which is 2.280 seconds.
Diabetes is one of the most common non-communicable diseases in the world. Diabetes affects the ability to produce the hormone insulin. Thus, complications may occur if diabetes remains untreated and unidentified. That features a... more
Diabetes is one of the most common non-communicable diseases in the world.
Diabetes affects the ability to produce the hormone insulin. Thus, complications may
occur if diabetes remains untreated and unidentified. That features a significant
contribution to increased morbidity, mortality, and admission rates of patients in
both developed and developing countries. When disease is not detected early, it leads
to complications. Medical records of the cases were retrospective. Anthropometric
and biochemical information was collected. From this data, four ML classification
algorithms, including Decision Tree (J48), Naive-Bayes, PART rule induction, and
JRIP, were used to prognosticate diabetes. Precision, recall, F-Measure, Receiver
Operating Characteristics (ROC) scores, and the confusion matrix were calculated to
determine the performance of the various algorithms. The performance was also
measured by sensitivity and specificity. They have high classification accuracy and
are generally comparable in predicting diabetes and free diabetes patients. Among
the selected algorithms tested, the Decision Tree Classifier (J48) algorithm scored the
highest accuracy and was the best predictor, with a classification accuracy of 92.74%.
- by IJOEST Journal
- •
- Data Mining, Diabetes, J48, Naïve Bayes
In this paper, we proposed to make different forecasting models in the University education through the algorithms K-means, K-closest neighbor, neural network, and naïve Bayes, which apply to specific exams of engineering, licensed and... more
In this paper, we proposed to make different forecasting models in the University education through the algorithms K-means, K-closest neighbor, neural network, and naïve Bayes, which apply to specific exams of engineering, licensed and scientific mathematical thinking in Saber Pro of Colombia. ICFES Saber Pro is an exam required for the degree of all students who carry out undergraduate programs in higher education. The Colombian government regulated this exam in 2009 in the decree 3963 intending to verify the development of competencies, knowledge level, and quality of the programs and institutions. The objective is to use data to convert into information, search patterns, and select the best variables and harness the potential of data (average 650.000 data per semester). The study has found that the combination of features was: women have greater participation (68%) in Mathematics, Engineering, and Teaching careers, the urban area continues to be the preferred place to apply for higher studies (94%), Internet use increased by 50% in the last year, the support of the family nucleus is still relevant for the support in the formation of the children.
Data mining technique can help bridging this knowledge gap in the higher educational system. The data mining methodology helps for betterment of efficiency in educational institutions. The data mining approach such as classification,... more
Data mining technique can help bridging this knowledge gap in the higher educational system. The data mining methodology helps for betterment of efficiency in educational institutions. The data mining approach such as classification, association rule mining, clustering, prediction, etc.is used to improve students’ achievement. It helps in their life cycle management and assist in the selection of the course. The classification is an important data mining task and it can be applied very effectively in educational data. In this study, the application of a classification technique in education data mining is focused. The comparative study was conducted for the prediction of a student’s academic performance based on social variable, pervious exam grades and other attributes related to the performance of students. The J48, Naïve Bayes, Bayes Net, Back propagation network and Radial Basis Function Network classification techniques were considered for the experiment. The result revealed that correctly classify instance percent is 100% of Radial Basis Function Network and its high compare to other classifier.
In clinical, sciences expectation of heart malady is one of the foremost troublesome undertakings. Nowadays, coronary illness may be a significant reason for bleakness and mortality in present-day society. Coronary illness could be a term... more
In clinical, sciences expectation of heart malady is one of the foremost troublesome undertakings. Nowadays, coronary illness may be a significant reason for bleakness and mortality in present-day society. Coronary illness could be a term that doles intent on countless ailments identified with the heart. Clinical determination is incredibly a big, however entangled errand that must be performed precisely, effectively, and unequivocally. Although huge advancement has been imagined within the finding and treatment of coronary illness, further examination is required. The accessibility of enormous measures of clinical information prompts the requirement for amazing information examination instruments to get rid of valuable information. Coronary illness determination is one in all the applications where information mining and AI instruments have demonstrated victories. This study used the machine learning algorithms KNN, Naïve Bayes, Random forest, Logistic regression, Support vector machine, J48, and Decision tree by WEKA software to spot which method provides maximum performance and accuracy. Using these algorithms with WEKA software, we made an ensemble (Vote) hybrid model by combining individual methods. Our research aims to access the effectiveness of various machine learning algorithms to diagnose the center disease and find the feasible algorithm, which is that the best for a heart condition.
The main reason behind the spread of fake news is because of many fake and hyperpartisan sites present on the Internet. These fake sites try to manipulate the truth which creates misunderstanding in society. Therefore, it is important to... more
The main reason behind the spread of fake news is because of many fake and hyperpartisan sites present on the Internet. These fake sites try to manipulate the truth which creates misunderstanding in society. Therefore, it is important to detect fake news and try to make people aware of the truth. This paper gives an insight into how to detect fake news using Machine Learning and Deep Learning Techniques. On observing our data, we have categorized our data into five attributes namely Title, Text, Subject, Date, and Labels. In order to develop an efficient fake news detection system, the feature along with its degree of impact on the system must be taken into consideration. This paper attempts at providing a detailed analysis of detecting fake news using various models such as LSTM, ANN, Naïve Bayes, SVM, Logistic Regression, XGBoost, and Bert.
In this paper research work implemented Naïve Bayes classifier approach to Word Sense Disambiguation using WordNet as complete lexicon English words and senseval 3 for evaluation WSD approach. The data set consists of 10 nouns and 5... more
In this paper research work implemented Naïve Bayes classifier approach to Word Sense Disambiguation using WordNet as complete lexicon English words and senseval 3 for evaluation WSD approach. The data set consists of 10 nouns and 5 verbs. To meet the goal of the study research work developed Java code. The result reported in this work achieved it was (62.86%) accuracy according to the senseval 3
- by IAEME Publication
- •
- NLP, Wordnet, Accuracy, Senseval
Data mining technique can help bridging this knowledge gap in the higher educational system. The data mining methodology helps for betterment of efficiency in educational institutions. The data mining approach such as classification,... more
Data mining technique can help bridging this knowledge gap in the higher educational system. The data mining methodology helps for betterment of efficiency in educational institutions. The data mining approach such as classification, association rule mining, clustering, prediction, etc.is used to improve students’ achievement. It helps in their life cycle management and assist in the selection of the course. The classification is an important data mining task and it can be applied very effectively in educational data. In this study, the application of a classification technique in education data mining is focused. The comparative study was conducted for the prediction of a student’s academic performance based on social variable, pervious exam grades and other attributes related to the performance of students. The J48, Naïve Bayes, Bayes Net, Back propagation network and Radial Basis Function Network classification techniques were considered for the experiment. The result revealed that correctly classify instance percent is 100% of Radial Basis Function Network and its high compare to other classifier.
The recent years have witnessed an upsurge in the number of published documents. Organizations are showing an increased interest in text classification for effective use of the information. Manual procedures for text classification can be... more
The recent years have witnessed an upsurge in the number of published documents. Organizations are showing an increased interest in text classification for effective use of the information. Manual procedures for text classification can be fruitful for a handful of documents, but the same lack in credibility when the number of documents increases besides being laborious and time-consuming. Text mining techniques facilitate assigning text strings to categories rendering the process of classification fast, accurate, and hence reliable. This paper classifies chemistry documents using machine learning and statistical methods. The procedure of text classification has been described in chronological order like data preparation followed by processing, transformation, and application of classification techniques culminating in the validation of the results.
As an agriculturally dependent country, India's economic standing is totally and partially reliant on it. Agricultural yield is influenced by Causes include organic, economic, and seasonal factors. Calculation of this country's... more
As an agriculturally dependent country, India's economic standing is totally and partially reliant on it. Agricultural yield is influenced by Causes include organic, economic, and seasonal factors. Calculation of this country's agricultural output is a major challenge. Taking into account the current population situation Recently, Nowadays, the people that grow these and similar items include Due to the suddenness of the production, it is extremely unstable. environmental factors such as weather and a lack of groundwater resources. The major goal is to collect data that can be used to make decisions. stored and analyzed for crop yield predictions For Machine learning techniques for agricultural yield prediction implemented. This aids farmers in selecting the best crops. appropriate crop In addition, this study tries to provide an enhancement in the world of agriculture, by improving crop production prediction accuracy. A statistical model is constructed using machine learning techniques and sufficient optimizations to produce accurate and precise decisions. The results of this research will assist farmers in selecting the best crops to cultivate based on characteristics such as season and available land, with the least amount of risk.
We present an effective and fast method for static hand gesture recognition. This method is based on classifying the different gestures according to geometric-based invariants which are obtained from image data after segmentation; thus,... more
We present an effective and fast method for static hand gesture recognition. This method is based on classifying the different gestures according to geometric-based invariants which are obtained from image data after segmentation; thus, unlike many other recognition methods, this method is not dependent on skin color. Gestures are extracted from each frame of the video, with a static background. The segmentation is done by dynamic extraction of background pixels according to the histogram of each image. Gestures are classified using a weighted K-Nearest Neighbors Algorithm which is combined with a naive Bayes approach to estimate the probability of each gesture type. When this method was tested in the domain of the JAST humanrobot dialog system, it classified more than 93% of the gestures correctly into one of three classes.
The term non-permanent employee first appeared in the rule of law, namely in Act Number 13 Year 2003 concerning Manpower.This Act has an impact on the emergence of clarity about staffing status so that the salaries obtained by employees... more
The term non-permanent employee first appeared in the rule of law, namely in Act Number 13 Year 2003 concerning Manpower.This Act has an impact on the emergence of clarity about staffing status so that the salaries obtained by employees do not match their workload.Therefore, this study aims to determine employees who are eligible to earn the same income at one of the private universities in Palembang based on university's strategic plan, namely class, employment status, membership, and education permit using the Naïve Bayes method. The results showed that the highest accuracy of predictive conclusions for non-permanent and permanent employees is 83.33%, while the lowest accuracy value is 50%.
Standardization and wider use of Electronic Health records (EHR) creates opportunities for better understanding patterns of illness and care within and across medical systems. In the healthcare systems, hidden event signatures allow... more
Standardization and wider use of Electronic Health records (EHR) creates opportunities for better understanding patterns of illness and care within and across medical systems. In the healthcare systems, hidden event signatures allow taking decision for patient’s diagnosis, prognosis, and management. Temporal history of event codes embedded in patients' records, investigates frequently occurring sequences of event codes across patients. There is a framework that enables the representation, retrieval, and mining of high order latent event structure and relationships within single and multiple event sequences. There is a wealth of hidden information present in the large databases. Different data mining techniques can be used for retrieving data. A classifier approach for detection of diabetes is presented in this paper and shows how Naive Bayes can be used for classification purpose. In this system, medical data is categories into five categories namely low, average, high and very high and critical, treatment is given as per the predicted category. The system will predict the class label of unknown sample. Hence two basic functions namely classification (training) and prediction (testing) will be performed. An algorithm and database used affects the accuracy of the system. It can answer complex queries for diagnosing diabetes disease and thus assist healthcare practitioners to make intelligent clinical decisions which traditional decision support systems cannot.Over the last decade, so many information visualization techniques have been developed to support the exploration of large data sets. There are various interactive visual data mining tools available for visual data analysis. It is possible to perform clinical assessment for visual interactive knowledge discovery in large electronic health record databases. In this paper, we proposed that it is possible to develop a tool for data visualization for interactive knowledge discovery.
Twitter is the popular and commonly used social networking platform because it permits users to express their thoughts, opinions about any item, and allows them to post comments or messages all around the world. Sentiment Analysis... more
Twitter is the popular and commonly used social networking platform because it permits users to express their thoughts, opinions about any item, and allows them to post comments or messages all around the world. Sentiment Analysis techniques are used to study and analyze these reviews or opinions. Sentiment analysis is a NLP technique that is used to express opinions into different sentiments like positive, negative, and neutral. In this paper, we take Airline Dataset from Twitter and did sentiment analysis on that dataset using machine learning algorithms like SVM, Naïve Bayes and Random Forest. Sentiments are expressed in three categories positive, negative and neutral. Our dataset contains 11533 tweets and the dataset is not balanced. The performance of various machine learning algorithms is discussed in this paper.
Substantial rainfall prediction is a significant issue for meteorological department as it is firmly connected with the economy and life of human. It is a reason for cataclysmic events like flood and drought which are experienced by... more
Substantial rainfall prediction is a significant issue for meteorological department as it is firmly connected with the economy and life of human. It is a reason for cataclysmic events like flood and drought which are experienced by people across the globe consistently. Exactness of rainfall forecasting has great importance for countries like India whose economy is largely dependent on agriculture. The proposed framework utilizes Machine Learning strategy which encourages us to predict the rainfall in an efficient way by using naïve bayes technique.
Globally, heart diseases are the number one cause of death. About 80 % of deaths occurred in low- and middle income countries. If current trends are allowed to continue, by 2030 an estimated 23.6 million people will die from... more
Globally, heart diseases are the number one cause of death. About 80 % of deaths occurred in low- and middle income countries. If current trends are allowed to continue, by 2030 an estimated 23.6 million people will die from cardiovascular disease (mainly from heart attacks and strokes). The healthcare industry gathers enormous amounts of heart disease data which, unfortunately, are not “mined ” to discover hidden information for effective decision making. The reduction of blood and oxygen supply to the heart leads to heart disease. However, there is a lack of effective analysis tools to discover hidden relationships and trends in data. This research paper intends to provide a survey of current techniques of knowledge discovery in databases using data mining techniques which will be useful for medical practitioners to take effective decision. The objective of this research work is to predict more accurately the presence of heart disease with reduced number of attributes. Originally,...
Internet of things (IoT) has led to several security threats and challenges within society. Regardless of the benefits that it has brought with it to the society, IoT could compromise the security and privacy of individuals and companies... more
Internet of things (IoT) has led to several security threats and challenges within society. Regardless of the benefits that it has brought with it to the society, IoT could compromise the security and privacy of individuals and companies at various levels. Denial of Service (DoS) and Distributed DoS (DDoS) attacks, among others, are the most common attack types that face the IoT networks. To counter such attacks, companies should implement an efficient classification/detection model, which is not an easy task. This paper proposes a classification model to examine the effectiveness of several machine-learning algorithms, namely, Random Forest (RF), k-Nearest Neighbors (KNN), and Naïve Bayes. The machine learning algorithms are used to detect attacks on the UNSW-NB15 benchmark dataset. The UNSW-NB15 contains normal network traffic and malicious traffic instants. The experimental results reveal that RF and KNN classifiers give the best performance with an accuracy of 100% (without nois...
The popularity of social media has drawn the attention of researchers who have conducted cross-disciplinary studies examining the relationship between personality traits and behavior on social media. Most current work focuses on... more
The popularity of social media has drawn the attention of researchers who have conducted cross-disciplinary studies examining the relationship between personality traits and behavior on social media. Most current work focuses on personality prediction analysis of English texts, but Indonesian has received scant attention. Therefore, this research aims to predict user's personalities based on Indonesian text from social media using machine learning techniques. This paper evaluates several machine learning techniques, including naive Bayes (NB), K-nearest neighbors (KNN), and support vector machine (SVM), based on semantic features including emotion, sentiment, and publicly available Twitter profile. We predict the personality based on the Big Five personality model, the most appropriate model for predicting user personality in social media. We examine the relationships between the semantic features and the Big Five personality dimensions. The experimental results indicate that the Big Five personality exhibit distinct emotional, sentimental, and social characteristics and that SVM outperformed NB and KNN for Indonesian. In addition, we observe several terms in Indonesian that specifically refer to each personality type, each of which has distinct emotional, sentimental, and social features.
This paper describes the information properties of museum specimen labels and machine learning tools to automatically extract Darwin Core (DwC) and other metadata from these labels processed through Optical Character Recognition (OCR).... more
This paper describes the information properties of museum specimen labels and machine learning tools to automatically extract Darwin Core (DwC) and other metadata from these labels processed through Optical Character Recognition (OCR). The DwC is a metadata profile describing the core set of access points for search and retrieval of natural history collections and observation databases. Using the HERBIS Learning System (HLS) we extract 74 independent elements from these labels. The automated text extraction tools are provided as a web service so that users can reference digital images of specimens and receive back an extended Darwin Core XML representation of the content of the label. This automated extraction task is made more difficult by the high variability of museum label formats, OCR errors and the open class nature of some elements. In this paper we introduce our overall system architecture, and variability robust solutions including, the application of Hidden Markov and Naiv...
Abstract: In the last few years, as Internet usage becomes the main artery of the life's daily activities, the problem of spam becomes very serious for internet community. Spam pages form a real threat for all types of users. This... more
Abstract: In the last few years, as Internet usage becomes the main artery of the life's daily activities, the problem of spam becomes very serious for internet community. Spam pages form a real threat for all types of users. This threat proved to evolve continuously without any ...