Natural Language Processing(NLP) Research Papers (original) (raw)

This study extracted and analyzed the linguistic speech patterns that characterize Japanese anime or game characters. Conventional morphological analyzers, such as MeCab, segment words with high performance, but they are unable to segment... more

This study extracted and analyzed the linguistic speech patterns that characterize Japanese anime or game characters. Conventional morphological analyzers, such as MeCab, segment words with high performance, but they are unable to segment broken expressions or utterance endings that are not listed in the dictionary, which often appears in lines of anime or game characters. To overcome this challenge, we propose segmenting lines of Japanese anime or game characters using subword units that were proposed mainly for deep learning, and extracting frequently occurring strings to obtain expressions that characterize their utterances. We analyzed the subword units weighted by TF/IDF according to gender, age, and each anime character and show that they are linguistic speech patterns that are specific for each feature. Additionally, a classification experiment shows that the model with subword units outperformed that with the conventional method.

From many years Assisted Instruction (AI) has used to bring the power of the computer system to stand on the educational process. Now a day instead of assisted instruction artificial intelligence methods was applied for the development of... more

From many years Assisted Instruction (AI) has used to bring the power of the computer system to stand on the educational process. Now a day instead of assisted instruction artificial intelligence methods was applied for the development of intelligent computer assisted instruction (ICAI) systems. It is an attempt to create computerized tutors. This tutors shape the teaching techniques to fit the learning patterns of individual students. This research paper based on the literature reviews available on this emerging topic. The paper gives detail information regarding Artificial Intelligence and machine learning. So that information professionals can use it better way in academic and their research. In this research paper the advantages and disadvantages of artificial intelligence and machine learning are given.

Smart contracts are software systems that partially automate, monitor and control the execution of legal contracts. The requirements of such systems consist of a formal specification of the legal contract whose execution is to be... more

Smart contracts are software systems that partially automate, monitor and control the execution of legal contracts. The requirements of such systems consist of a formal specification of the legal contract whose execution is to be monitored and controlled. Legal contracts are always available as text expressed in natural language. We have been working on the translation of such text documents into formal specifications. Our translation process consists of four steps that (a) Semantic annotation of text identifying obligations, powers, contracting parties and assets, (b) Identification of relationships among the concepts identified in (a), (c) Generation of a domain model for terms used in the contract, as well as identification of parameters and local variables for the contract, (d) Generation of formal expressions that formalize the constituents of obligations and powers. This paper reports on the status of the project and the results that have been achieved.

Analysis and modeling of crime text report data has important applications, including refinement of crime classifications, clustering of documents, and feature extraction for spatiotemporal forecasts. Having better neural network... more

Analysis and modeling of crime text report data has important applications, including refinement of crime classifications, clustering of documents, and feature extraction for spatiotemporal forecasts. Having better neural network representations
of crime text data may facilitate all of these tasks. This paper evaluates the ability of generative adversarial network models to represent crime text data and generate realistic crime reports. We compare four state-of-the-art GAN algorithms in terms of quantitative metrics such as coherence, embedding similarity, negative log-likelihood, and qualitatively based on inspection of generated text. We discuss current challenges with crime text
representation and directions for future research

'El Diario de Juárez' is a local newspaper in a city of 1.5 million Spanish-speaking inhabitants that publishes texts of which citizens read them on both a website and an RSS (Really Simple Syndication) service. This research applies... more

'El Diario de Juárez' is a local newspaper in a city of 1.5 million Spanish-speaking inhabitants that publishes texts of which citizens read them on both a website and an RSS (Really Simple Syndication) service. This research applies natural-language-processing and machine-learning algorithms to the news provided by the RSS service in order to classify them based on whether they are about a traffic incident or not, with the final intention of notifying citizens where such accidents occur. The classification process explores the bag-of-words technique with five learners (Classification and Regression Tree (CART), Naïve Bayes, kNN, Random Forest, and Support Vector Machine (SVM)) on a class-imbalanced benchmark; this challenging issue is dealt with via five sampling algorithms: synthetic minority oversampling technique (SMOTE), borderline SMOTE, adaptive synthetic sampling, random oversampling, and random undersampling. Consequently, our final classifier reaches a sensitivity of 0.86 and an area under the precision-recall curve of 0.86, which is an acceptable performance when considering the complexity of analyzing unstructured texts in Spanish.

Present paper describes the details of the study of the work that has been done in the field of text searching, a sub-division of Natural Language Processing (NLP) till date. The work in this project includes the study and analysis of... more

Present paper describes the details of the study of the work that has been done in the field of text searching, a sub-division of Natural Language Processing (NLP) till date. The work in this project includes the study and analysis of some of the algorithms devised under this topic, finding the faults or loopholes and trying to increase the efficiency of these algorithms devised, taking forward the range of work done on it. Experiment is done on the various text search algorithms that have been devised namely Knuth-Morris Pratt Algorithm, Naïve Search Algorithm and Boyer-Moore Algorithm by providing text input of various sizes and analyzing their behavior on these variable inputs. After analyzing and doing the study on these algorithms the results states that BoyerMoore‟s Algorithm worked quite well and efficiently than the rest of them when dealing with larger data sets. When working on larger alphabets the Knuth-Morris Pratt Algorithm works quite well. These algorithms do have dra...

protocols in wireless sensor network is used for selection of best route to fulfill the different criteria like shortest route selection, congestion control, efficient data delivery, minimizing traffic delay and reliability. This paper... more

protocols in wireless sensor network is used for selection of best route to fulfill the different criteria like shortest route selection, congestion control, efficient data delivery, minimizing traffic delay and reliability. This paper presents the architecture of wireless sensor network, their design issues and different routing protocols available for wireless sensor network. These protocols are classified in four different groups that are; data centric, hierarchal, location based and QoS based. Under the each category, their strength and limitations are discussed. Some open research issues of energy consumption, network stability period and its life time are also discussed.

This work studies how we can obtain feature-level ratings of the mobile products from the customer reviews and review votes to influence decision-making, both for new customers and manufacturers. Such a rating system gives a more... more

This work studies how we can obtain feature-level ratings of the mobile products from the customer reviews and review votes to influence decision-making, both for new customers and manufacturers. Such a rating system gives a more comprehensive picture of the product than what a product-level rating system offers. While product-level ratings are too generic, feature-level ratings are particular; we exactly know what is good or bad about the product. There has always been a need to know which features fall short or are doing well according to the customer's perception. It keeps both the manufacturer and the customer well-informed in the decisions to make in improving the product and buying, respectively. Different customers are interested in different features. Thus, feature-level ratings can make buying decisions personalized. We analyze the customer reviews collected on an online shopping site (Amazon) about various mobile products and the review votes. Explicitly, we carry out a feature-focused sentiment analysis for this purpose. Eventually, our analysis yields ratings to 108 features for 4000+ mobiles sold online. It helps in decision-making on how to improve the product (from the manufacturer's perspective) and in making the personalized buying decisions (from the buyer's perspective) a possibility. Our analysis has applications in recommender systems, consumer research, and so on.

With natural language processing and machine learning, researchers are identifying patient emotional and medical needs that are not being met by clinicians and patient advocacy groups. This study leveraged artificial intelligence and... more

With natural language processing and machine learning, researchers are identifying patient emotional and medical needs that are not being met by clinicians and patient advocacy groups. This study leveraged artificial intelligence and natural language processing to algorithmically analyze over 500,000 public, anonymous patient comments online to understand what needs of chronic patients are unmet. The analysis has utilized the power of the machine to move past unconscious assumptions on what patients want and to augment traditional focus groups in which a small group of people often represent the voice of a population. It demonstrates the utility of large scale text mining for delivering real value to chronic patients and ensuring every voice is heard.
The study highlights the collective voices of the online patient communities across 10 different diseases with 6 out of 8 of patients’ top unmet needs to be more emotional than medical. Patients want to understand how to live with their diseases, adapt daily routines to accommodate side effects, and connect with other patients to share their journeys. Current systems fail in humanizing patients. Emotional needs are still largely unaddressed while medical needs are the focus of current efforts and innovation. There remains a large opportunity for health care stakeholders to come together to treat patients as people rather than a set of symptoms and to provide greater support, awareness, and education. By doing so, we can define real patient value through the patient’s lens, addressing both their emotional and medical needs. Only then can we innovate, build and deliver the solutions that will ultimately help patients live the lives that they choose.

Development of information technologies is growing steadily. With the latest software technologies development and application of the methods of artificial intelligence and machine learning intelligence embedded in computers, the... more

Development of information technologies is growing steadily. With the latest software technologies development and application of the methods of artificial intelligence and machine learning intelligence embedded in computers, the expectations are that in near future computers will be able to solve problems themselves like people do. Artificial intelligence emulates human behavior on computers. Rather than executing instructions one by one, as they are programmed, machine learning employs prior experience/data that is used in the process of system’s training. In this state of the art paper, common methods in AI, such as machine learning, pattern recognition and the natural language processing (NLP) are discussed. Also are given standard architecture of NLP processing system and the level that is needed for understanding NLP. Lastly the statistical NLP processing and multi-word expressions are described.

In this paper, we introduce EVALution 1.0, a dataset designed for the training and the evaluation of Distributional Semantic Models (DSMs). This version consists of almost 7.5K tuples, instantiating several semantic relations between word... more

In this paper, we introduce EVALution 1.0, a dataset designed for the training and the evaluation of Distributional Semantic Models (DSMs). This version consists of almost 7.5K tuples, instantiating several semantic relations between word pairs (including hypernymy, synonymy, antonymy, meronymy). The dataset is enriched with a large amount of additional information (i.e. relation domain, word frequency, word POS, word semantic field, etc.) that can be used for either filtering the pairs or performing an in-depth analysis of the results. The tuples were extracted from a combination of ConceptNet 5.0 andWordNet 4.0, and subsequently filtered through automatic methods and crowdsourcing in order to ensure their quality. The dataset is freely downloadable. An extension in RDF format, including also scripts for data processing, is under development.

Industrial safety stays basic worry in many nations. Industrial accidents cause human suffering as well as result in immense money related misfortune and ecological effects. To counteract these accidents in the future, the examination of... more

Industrial safety stays basic worry in many nations. Industrial accidents cause human suffering as well as result in immense money related misfortune and ecological effects. To counteract these accidents in the future, the examination of the risk control plan is basic. In every industry, casualty and accident reports could be accessible for past accidents. This ―design research paper‖ proposes the Accident reports mining using NLP.

Research on explanation is currently of intense interest as documented in the DARPA 2021 investments reported by the USA Department of Defense. An emerging theme for explanation techniques research is their application to the improvement... more

Research on explanation is currently of intense interest as documented in the DARPA 2021 investments reported by the USA Department of Defense. An emerging theme for explanation techniques research is their application to the improvement of human-system interfaces for autonomous anti-drone or C-UAV defense systems. In the present paper a novel proposal based on natural language processing technology concerning explanatory discourse using relations is briefly described. The proposal is based on the use of relations pertaining to the possible malicious actions of an intruding alien drone swarm and the defense decisions proposed by an autonomous anti-drone system. The aim of such an interface is to facilitate the supervision that a user must exercise on an autonomous defense system in order to minimize the risk of wrong mitigation actions and unnecessary spending of ammunition.

People with a low literacy level have problems understanding complex texts. Especially legal texts can be challenging. Automatic Text Simplification (TS) can help to make the legal text more accessible. However, most TS research is based... more

People with a low literacy level have problems understanding complex texts. Especially legal texts can be challenging. Automatic Text Simplification (TS) can help to make the legal text more accessible. However, most TS research is based on Wikipedia articles and newspaper articles. To be able to use automatic TS on the legal text we have to understand what constitutes simple legal text. Therefore, we examine the English translation of South Korean legislation and its official simplification. Subsequently, we use state of the art TS models on the legislation text. The models simplify the text only quantitatively lacking in retaining the context of the original text.

Conference Program February 3-4 2022

Driven by the visions of Data Science, recent years have seen a paradigm shift in Natural Language Processing (NLP). NLP has set the milestone in text processing and proved to be the preferred choice for researchers in the healthcare... more

Driven by the visions of Data Science, recent years have seen a paradigm shift in Natural Language Processing (NLP). NLP has set the milestone in text processing and proved to be the preferred choice for researchers in the healthcare domain. The objective of this paper is to identify the potential of NLP, especially, how NLP is used to support the knowledge management process in the healthcare domain, making data a critical and trusted component in improving the health outcomes. This paper provides a comprehensive survey of the state-of-the-art NLP research with a particular focus on how knowledge is created, captured, shared, and applied in the healthcare domain. Our findings suggest, first, the techniques of NLP those supporting knowledge management extraction and knowledge capture processes in healthcare. Second, we propose a conceptual model for the knowledge extraction process through NLP. Finally, we discuss a set of issues, challenges, and proposed future research areas.

Classification of large unlabelled document collection is a difficult task. Natural language processing can be used as a powerful tool for the classification of an unlabelled large document collection. Document classification is the task... more

Classification of large unlabelled document collection is a difficult task. Natural language processing can be used as a powerful tool for the classification of an unlabelled large document collection. Document classification is the task of assigning some predefined class to unlabelled large bodies of documents. In this paper, a new way of classifying Bangla news documents was proposed using a Deep Recurrent Neural Network. At first the collected news data was preprocessed. Designing the model architecture training data was fitted out into the model. Finally, model performance was evaluated by calculating the accuracy and F1-score on testing dataset. The Deep Recurrent Neural Network with BiLSTM achieved 98.33% accuracy which is higher than other well-known classification algorithms in Bangla text classification.

Semantic analysis is an essential feature of the NLP approach. It indicates, in the appropriate format, the context of a sentence or paragraph. Semantics is about language significance study. The vocabulary used conveys the importance of... more

Semantic analysis is an essential feature of the NLP approach. It indicates, in the appropriate format, the context of a sentence or paragraph. Semantics is about language significance study. The vocabulary used conveys the importance of the subject because of the interrelationship between linguistic classes. In this article, semantic interpretation is carried out in the area of Natural Language Processing. The findings suggest that the best-achieved accuracy of checked papers and those who relied on the Sentiment Analysis approach and the prediction error is minimal.

Handwriting is the human way of communicating using written media. Nowadays, there are a lot of changes of technology in terms of communication. Handwriting offers an attractive and efficient method to interact with computer, in order to... more

Handwriting is the human way of communicating using written media. Nowadays, there are a lot of changes of technology in terms of communication. Handwriting offers an attractive and efficient method to interact with computer, in order to enhance human computer interaction. This research work focus on Yoruba handwriting sentence recognition using Cheriet algorithm. Many handwriting recognition system have been developed but these only captured Yoruba handwriting word, there is need to develop Yoruba handwriting sentence recognition system. Four methods were adopted for the recognition process as follows: Data acquisition, image processing, feature extraction and recognition. Fifty handwriting were acquired from literate indigenous writers, which was subjected to image preprocessing to enhance the quality of the digitized image. The features of the pre-processed image were extracted using Surf Algorithm and the extracted feature vectors were subjected to Cheriet algorithm for recognition. The recognition system developed was evaluated based on the recognition rate and 92.8% recognition rate was achieved. 2

The study focuses on building an informational Russian language chatbot, which aims to answer neurotypical and atypical people's questions about the inclusion of people with autism spectrum disorder and Asperger syndrome, in particular.... more

The study focuses on building an informational Russian language chatbot, which aims to answer neurotypical and atypical people's questions about the inclusion of people with autism spectrum disorder and Asperger syndrome, in particular. Assuming that lack of awareness about the inclusion process and characteristics of people with special needs might cause communication difficulties or even conflicts between pupils, university and college students, or co-workers, a chatbot, which is based on reliable sources, provides the information informally and allows asking uncomfortable questions, could perhaps reduce stress levels during the inclusion. The paper describes two conceptual models of the chatbot. The first one is based on traditional language modeling with GPT-2, and the second one is based on BERT applied to question answering. The training data is collected from the informational websites about ASD, and its usage is agreed with the administration. For training BERT for question answering, the dataset structure was transformed according to the Stanford Question Answering Dataset (SQuAD). F1-score and perplexity metrics were used to evaluate the systems. The study shows the effectiveness of building conceptual models in tracking weaknesses and making significant adjustments at the design stage.

Conversational Bots have become a common user interface for many software services. Conversational Bots can come in very handy when we need any human level interaction with the system. These Conversational chat-bot provide cost-effective... more

Conversational Bots have become a common user interface for many software services. Conversational Bots can come in very handy when we need any human level interaction with the system. These Conversational chat-bot provide cost-effective and very reliable support especially in the field of health management. They provide us a wide range of options from setting us reminders to scheduling appointments with our doctors and even taking care of our daily basic regular needs. Normally Users are not aware about all the treatment or symptoms regarding the particular disease. For small problem user have to go personally to the hospital for checkup which is more time consuming. Also handling the telephonic calls for the complaints is quite hectic. Such a problem can be solved by using medical ChatBot by giving proper guidance regarding healthy living.

—Named Entity Recognition (NER) plays a significant role in Information Extraction (IE). In English, the NER systems have achieved excellent performance, but for the Indonesian language, the systems still need a lot of improvement. To... more

—Named Entity Recognition (NER) plays a significant role in Information Extraction (IE). In English, the NER systems have achieved excellent performance, but for the Indonesian language, the systems still need a lot of improvement. To create a reliable NER system using machine learning approach, a massive dataset to train the classifier is a must. Several studies have proposed methods in automatically building dataset for Indonesian NER using Indonesian Wikipedia articles as the source of the dataset and DBpedia as the reference in determining entity types automatically. The objective of our research is to improve the quality of the automatically tagged dataset. We proposed a new method in using DBpedia as the referenced named entities. We have created some rules in expanding DBpedia entities corpus for category person, place, and organization. The resulting training dataset is trained using Stanford NER tool to build an Indonesian NER classifier. The evaluation shows that our method improves recall significantly but has lower precision compared to the previous research.

Intelligent Conversational Agent development using Artificial Intelligence or Machine Learning technique is an interesting problem in the field of Natural Language Processing. With the rise of deep learning, these models were quickly... more

Intelligent Conversational Agent development using Artificial Intelligence or Machine Learning technique is an interesting problem in the field of Natural Language Processing. With the rise of deep learning, these models were quickly replaced by end to end trainable neural networks.

Word sense disambiguation is a core problem in many tasks related to language processing and was recognized at the beginning of the scientific interest in machine translation and artificial intelligence. In this paper, we introduce the... more

Word sense disambiguation is a core problem in many tasks related to language processing and was recognized at the beginning of the scientific interest in machine translation and artificial intelligence. In this paper, we introduce the possibilities of using the Support Vector Machine (SVM) classifier to solve the Word Sense Disambiguation problem in a supervised manner after using the Levenshtein Distance algorithm to measure the matching distance between words through the usage of the lexical samples of five Arabic words. The performance of the proposed technique is compared to supervised and unsupervised machine learning algorithms, namely the Naïve Bayes Classifier (NBC) and Latent Semantic Analysis (LSA) with K-means clustering, representing the baseline and state-of-the-art algorithms for WSD.

Parkinson's Disease is a neurodegenerative disorder caused by a loss of nerves in the brain, specifically in the Substantia Nigra. This part of the brain is responsible for creating dopamine, a chemical transmitter, and hormone used by... more

Parkinson's Disease is a neurodegenerative disorder caused by a loss of nerves in the brain, specifically in the Substantia Nigra. This part of the brain is responsible for creating dopamine, a chemical transmitter, and hormone used by neurons in a paracrine connection. This hormone is correlated with feelings of pleasure, however, it has further purposes such as learning, memory, and most physically relevant to Parkinson's Disease-motor control. Today, there is no effective way of monitoring Parkinson's Disease in a clinical setting. So far, the only way to detect the signs of Parkinson's Disease is to study the patient's ancestry in relation to Parkinson's and to perform qualitative tests monitored by a physician. Through this project, "MobiTest", I aim to digitize these tests to monitor the progression of Parkinson's Disease. The main goal of this project is to ensure that constant monitoring no longer requires the assistance of a physician, and to bring monitoring to those suffering from Parkinson's disease all over the world-onto their smartphones. MobiTest utilizes Advanced Statistics algorithms and Support Vector Classifiers to quantify drawings of spirals into a score on a UPDRS (Unified Parkinson's Disease Rating Scale) from 1 to 5, where 1 is least severe and 5 is most severe. MobiTest also provides charts in which users can identify trends in their scores that can be sent to physicians. Apart from progression monitoring, MobiTest also includes a specialized keyboard that serves to help the caretakers of speech-impaired Parkinson's Disease patients to look up commonly used words. This keyboard uses a Bayesian theorem for a progressive NLP prediction algorithm to make suggestions for a fast and efficient lookup time.

With the rapid growth of the cyber attacks, sharing of cyber threat intelligence (CTI) becomes essential to identify and respond to cyber attack in timely and cost-effective manner. However, with the lack of standard languages and... more

With the rapid growth of the cyber attacks, sharing of cyber threat intelligence (CTI) becomes essential to identify and respond to cyber attack in timely and cost-effective manner. However, with the lack of standard languages and automated analytics of cyber threat information, analyzing complex and unstructured text of CTI reports is extremely time-and labor-consuming. Without addressing this challenge, CTI sharing will be highly impractical, and attack uncertainty and time-to-defend will continue to increase. Considering the high volume and speed of CTI sharing, our aim in this paper is to develop automated and context-aware analytics of cyber threat intelligence to accurately learn attack pattern (TTPs) from commonly available CTI sources in order to timely implement cyber defense actions. Our paper has three key contributions. First, it presents a novel threat-action ontology that is suï¿¿ciently rich to understand the speciï¿¿cations and context of malicious actions. Second, we developed a novel text mining approach that combines enhanced techniques of Natural Language Processing (NLP) and Information retrieval (IR) to extract threat actions based on semantic (rather than syntactic) relationship. ï¿¿ird, our CTI analysis can construct a complete aï¿¿ack paï¿¿ern by mapping each threat action to the appropriate techniques, tactics and kill chain phases, and translating it any threat sharing standards, such as STIX 2.1. Our CTI analytic techniques were implemented in a tool, called TTPDrill, and evaluated using a randomly selected set of Symantec ï¿¿reat Reports. Our evaluation tests show that TTPDrill achieves more than 82% of precision and recall in a variety of measures, very reasonable for this problem domain.

In the last decade, sentiment analysis has been widely applied in many domains, including business, social networks and education. Particularly in the education domain, where dealing with and processing students’ opinions is a complicated... more

In the last decade, sentiment analysis has been widely applied in many domains, including business, social networks and education. Particularly in the education domain, where dealing with and processing students’ opinions is a complicated task due to the nature of the language used by students and the large volume of information, the application of sentiment analysis is growing yet remains challenging. Several literature reviews reveal the state of the application of sentiment analysis in this domain from different perspectives and contexts. However, the body of literature is lacking a review that systematically classifies the research and results of the application of natural language processing (NLP), deep learning (DL), and machine learning (ML) solutions for sentiment analysis in the education domain. In this article, we present the results of a systematic mapping study to structure the published information available. We used a stepwise PRISMA framework to guide the search proc...

Natural Language Processing with a combination of Neural Network methods such as Convolutional Neural Network (CNN) that is included in the Deep Learning method and carries out a repetitive learning process to get the best representation... more

Natural Language Processing with a combination of Neural Network methods
such as Convolutional Neural Network (CNN) that is included in the Deep
Learning method and carries out a repetitive learning process to get the best
representation of each word in the text. CNN Works by finding the pattern of a
word among other words in the input matrix. The learning process in several
convolution layers is carried out parallel and in sequence. Thus, each word is
independent of other words around it. Twitter is a source of data that interests
researchers to make research objects. However, the text in tweets contains many
non-formal languages, abbreviations and everyday languages. Thus, it is more
difficult to identify the information in it, when compared with the formal text. In
this research, the Natural Language Processing method is implemented using the
CNN algorithm to classify information related to the emergency-respond phase.
This classification model was trained using two types of datasets, namely the
crawling dataset of 1967 texts, and the dataset in the form of tweet texts from Twitter totalling 853 sentences and tested using 89 different text tweets. From the results of 3 iterations with 10 epoch training per iteration, an accuracy of 98% was obtained and a loss of 4% was obtained. Thus, it can be concluded that the algorithm functions optimally in identifying information.

Development of information technologies is growing steadily. With the latest software technologies development and application of the methods of artificial intelligence and machine learning intelligence embedded in computers, the... more

Development of information technologies is growing steadily. With the latest software technologies development and application of the methods of artificial intelligence and machine learning intelligence embedded in computers, the expectations are that in near future computers will be able to solve problems themselves like people do. Artificial intelligence emulates human behavior on computers. Rather than executing instructions one by one, as they are programmed, machine learning employs prior experience/data that is used in the process of system’s training. In this state of the art paper, common methods in AI, such as machine learning, pattern recognition and the natural language processing (NLP) are discussed. Also are given standard architecture of NLP processing system and the level that is needed for understanding NLP. Lastly the statistical NLP processing and multi-word expressions are described.

Sentiment analysis plays an important role in most of human activities and has a significant impact on our behaviours. With the development and use of web technology, there is a huge amount of data that represents users opinions in many... more

Sentiment analysis plays an important role in most of human activities and has a significant impact on our behaviours. With the development and use of web technology, there is a huge amount of data that represents users opinions in many areas such as politics and business. This paper applied Naïve Bayes (NB) to analyse the opinions by exploring categories from a text and classified it to the right class (Reform, Conservative and Revolutionary). It investigates the effect of using two feature extraction i.e. Term Frequency (TF) and Term Frequency-Inverse Document Frequency (TF-IDF) methods with Naïve Bayes classifiers (Gaussian, Multinomial, Complement and Bernoulli) on the accuracy of classifying Arabic articles. Precision, recall, F1-score and number of correct predict have been used to evaluate the performance of the applied classifiers. The results reveal that, using TF with TF-IDF improved the accuracy to 96.77%. The Complement was deemed the most suitable for our model.

L'objectif de ce projet consiste à réaliser une étude automatique des opinions qui se trouvent dans un échantillon de tweets, messages courts limités à 140 caractères issus du célèbre média social Twitter. L'échantillon fourni provient de... more

L'objectif de ce projet consiste à réaliser une étude automatique des opinions qui se trouvent dans un échantillon de tweets, messages courts limités à 140 caractères issus du célèbre média social Twitter. L'échantillon fourni provient de la compétition internationale SemEval 1 (tâche 2B), qui a eu lieu à l'occasion de la conférence NAACL-HLT2, réputée dans le domaine du traitement automatique des langues.

The information resulting from the use of the organization's products and services is a valuable resource for business analytics. Therefore, it is necessary to have systems to analyze customer reviews. This article is about categorizing... more

The information resulting from the use of the organization's products and services is a valuable resource for business analytics. Therefore, it is necessary to have systems to analyze customer reviews. This article is about categorizing and predicting customer sentiments. In this article, a new framework for categorizing and predicting customer sentiments was proposed. The customer reviews were collected from an international hotel. In the next step, the customer reviews processed, and then entered into various machine learning algorithms. The algorithms used in this paper were support vector machine (SVM), artificial neural network (ANN), naive bayes (NB), decision tree (DT), C4.5 and k-nearest neighbor (K-NN). Among these algorithms, the DT provided better results. In addition, the most important factors influencing the great customer experience were extracted with the help of the DT. Finally, very interesting results were observed in terms of the effect of the number of features on the performance of machine learning algorithms.

This article examines a very important issue of natural language processing and concerns the of natural language understanding and transferring from high-resource languages to low-resource languages. A study of the cross-lingual XNLI... more

This article examines a very important issue of natural language processing and concerns the of natural language understanding and transferring from high-resource languages to low-resource languages. A study of the cross-lingual XNLI dataset is carried out, the contribution of which is crucial for the easier development of cross-lingual models. Then, there is a presentation of well-known Cross-lingual state of-arts models such as mBERT, RoBERTa, XML-R, XLNET and their advantages and capabilities are recorded. Finally, some thoughts are expressed to improve the understanding of natural language by computer systems with the help of the fields of Artificial Intelligence, Machine Learning and Computational Linguistics for the near future.

With rapid digitalization, organizations are producing a lot of data as part of their day-today operations. These data are stored either on their legacy platforms or in any cloud storage. The maximum volume of these data stored in the... more

With rapid digitalization, organizations are producing a lot of data as part of their day-today operations. These data are stored either on their legacy platforms or in any cloud storage. The maximum volume of these data stored in the cloud is unstructured, and much-hidden information is kept inside those. Our objective is to read those unstructured data elements like pdf or text files, audio, or video files to extract meaning from what can be used for the enterprises. Utilizing the cloud-based big data platforms, we use spark engines to read these unstructured data to securely store and transform them to a structured format, which will be our inputs to the natural language processing (NLP).

In a financially volatile market, as in the case of asset market, it is important to have a very precise prediction of the upcoming trends to take advantage of the market changes. This requires highly advanced machine learning algorithms... more

In a financially volatile market, as in the case of asset market, it is important to have a very precise prediction of the upcoming trends to take advantage of the market changes. This requires highly advanced machine learning algorithms with factorization of human sentiments. The value of an asset is highly gullible due to factors such as market news and human behaviour, which can instantly increase or decrease the asset price. Therefore, the issue becomes that of buying or selling asset at an asset exchange at right moment for generating profit. This aspect has attracted researchers for years and has led to creation of various different algorithms that can predict the outcome in this nonlinear volatile market. This wave of algorithm in Artificial Intelligence, mainly machine learning are still being tried to tackle the above problem. It is also very well known that news articles and related information have great impact on the asset price and its trends. Hence, we aim to combine these two approaches in an attempt to understand the relationship between the attributes in order to yield a predicted output with great accuracy.

The system provides a background on depression, use of social media platforms for prediction and analysis using machine learning and deep learning algorithms. The system monitors the social media activities of each person and predicts... more

The system provides a background on depression, use of social media platforms for prediction and analysis using machine learning and deep learning algorithms. The system monitors the social media activities of each person and predicts their mental health factors such as depression, anxiety, stress, etc. This system also uses real time online social media data by exploring the parallelism between the users' mental health and the content they post on their social media handle. This system could be used by mentors such as teachers, doctors, etc to acquire a weekly analysis of the person’s stress levels, thereby helping in providing consultation accordingly.
Keywords: Social computing, healthcare, artificial intelligence, deep learning, data analysis, natural language processing, convolutional neural network

With the rapid growth of Twitter in recent years, there has been a tremendous increase in the number of tweets generated by users. Twitter allows users to make use of hashtags to facilitate effective categorization and retrieval of... more

With the rapid growth of Twitter in recent years, there has been a tremendous increase in the number of tweets generated by users. Twitter allows users to make use of hashtags to facilitate effective categorization and retrieval of tweets. Despite the usefulness of hashtags, a major fraction of tweets do not contain hashtags. Several methods have been proposed to recommend hashtags based on lexical and topical features of tweets. However, semantic features and data sparsity in tweet representation have rarely been addressed by existing methods. In this paper, we propose a novel method for hashtag recommendation that resolves the data sparseness problem by exploiting the most relevant tweet information from external knowledge sources. In addition to lexical features and topical features, the proposed method incorporates the semantic features based on word-embeddings and user influence feature based on users' influential position. To gain the advantage of various hashtag recommendation methods based on different features, our proposed method aggregates these methods using learning-to-rank and generates top-ranked hashtags. Experimental results show that the proposed method significantly outperforms the current state-of-the-art methods.

Parse details from a resume using natural language clarifying, find the keywords, assemble them onto sectors based on their keywords and lastly show the most relevant resume to the manageress based on keyword matching. Initially, the user... more

Parse details from a resume using natural language clarifying, find the keywords, assemble them onto sectors based on their keywords and lastly show the most relevant resume to the manageress based on keyword matching. Initially, the user transmits a resume to the web platform. The parser parses all the crucial information from the resume and auto fills a form for the user to analyze. Once the user confirms, the resume is displayed to the employers. Also, the user gets their resume in both JSON format and GUI view. Parsed data include name, email address, social profiles, years of work experience, work experiences, years of edification, education experiences, publications, certifications, volunteer experiences, keywords and finally the cluster of the resume , Computer science, human resource are the best examples.

— Depression is one of the leading mental health problems. It is a cause of psychological disability and economical burden to a country. Around 1.5 Thai people suffer from depression and its prevalence has been growing up fast. Although... more

— Depression is one of the leading mental health problems. It is a cause of psychological disability and economical burden to a country. Around 1.5 Thai people suffer from depression and its prevalence has been growing up fast. Although it is a serious psychological problem, less than a half of those who have this emotional problem gained access to mental health service. This could be a result of many factors including having lack awareness about the disease. One of the solutions would be providing a tool that depression could be easily and early detected. This would help people to be aware of their emotional states and seek help from professional services. Given Facebook is the most popular social network platform in Thailand, it could be a large-scale resource to develop a depression detection tool. This research employs Natural Language Processing (NLP) techniques to develop a depression detection algorithm for Thai language on Facebook where people use it as a tool for sharing opinions, feelings, and life events. Results from 35 Facebook users indicated that Facebook behaviours could predict depression level.

The world is intrigued by data. In fact, huge capitals are invested to devise means that implements statistics and extract analytics from these sources. However, when we examine the studies performed on applicant tracking systems that... more

The world is intrigued by data. In fact, huge capitals are invested to devise means that implements statistics and extract analytics from these sources. However, when we examine the studies performed on applicant tracking systems that retrieve valuable information from candidates’ CVs and job-descriptions, they are mostly rule-based and
hardly manage to employ contemporary techniques. Even though these documents vary in contents: the structure is almost identical. Accordingly, in this paper, we implement an NLP pipeline for the extraction of such structured information from a wide variety of textual documents. As a reference, textual documents which are used in applicant tracking systems like CV (Curriculum Vitae) and job vacancy information
have been considered. The proposed NLP pipeline is built with several NLP techniques like document classification, document segmentation and text extraction. Initially for the classification of textual documents, Support Vector Machines (SVM) and XGBoost algorithms have been implemented. Different segments of the identified document are categorized using NLP techniques such as chunking, regex matching and POS tagging. Relevant information from every segment is further extracted using techniques like Named Entity Recognition (NER), regex matching and pool parsing. Extraction of such structured information from textual documents can help to gain insights and use those insights in document maintenance, document scoring, matching and auto filling forms.

In this paper, we focus on cross-modal (visual and textual) e-commerce search within the fashion domain. Particularly, we investigate two tasks: 1) given a query image, we retrieve textual descriptions that correspond to the visual... more

In this paper, we focus on cross-modal (visual and textual) e-commerce search within the fashion domain. Particularly, we investigate two tasks: 1) given a query image, we retrieve textual descriptions that correspond to the visual attributes in the query; and 2) given a textual query that may express an interest in specific visual product characteristics, we retrieve relevant images that exhibit the required visual attributes. To this end, we introduce a new dataset that consists of 53,689 images coupled with textual descriptions. The images contain fashion garments that display a great variety of visual attributes, such as different shapes, colors and textures in natural language. Unlike previous datasets, the text provides a rough and noisy description of the item in the image. We extensively analyze this dataset in the context of cross-modal e-commerce search. We investigate two state-of-the-art latent variable models to bridge between textual and visual data: bilingual latent Dirichlet allocation and canonical correlation analysis. We use state-of-the-art visual and textual features and report promising results.

Text detection in images is an important research area that has attracted the attention of many researchers. Due to the growing image databases, information retrieval from databases in less time and with maximum precision is more... more

Text detection in images is an important research area that has attracted the attention of many researchers. Due to the growing image databases, information retrieval from databases in less time and with maximum precision is more important. The low level features which are extracted from the images and videos include metrics of color, texture and shape, although these features can easily be obtained, they do not have enough accuracy to extract the content. Today, text retrieval in images is an important part of content recovery. Detection of vehicle plates, text ads or video frames, book title and addresses in mailing envelopes are among text mining applications. In the current article a method for combing image processing and neural networks for text detection in images is presented. In this approach, after detecting text location in image using MLP (Multiple Layer Perception) neural network structure, the text in the images is detected. Finally, the performance of two neural network MLP and LVQ (Learning Vector Quantization) for text detection are compared. The results show that the MLP neural network with suggested implemented algorithm in this network has a better performance.

Clouds provide a powerful computing platform that enables individuals and organizations to perform variety levels of tasks such as: use of online storage space, adoption of business applications, development of customized computer... more

Clouds provide a powerful computing platform that enables individuals and organizations to perform variety levels of tasks such as: use of online storage space, adoption of business applications, development of customized computer software, and creation of a "realistic" network environment. In previous years, the number of people using cloud services has dramatically increased and lots of data has been stored in cloud computing environments. In the meantime, data breaches to cloud services are also increasing every year due to hackers who are always trying to exploit the security vulnerabilities of the architecture of cloud. In this paper, three cloud service models were compared; cloud security risks and threats were investigated based on the nature of the cloud service models. Real world cloud attacks were included to demonstrate the techniques that hackers used against cloud computing systems. In addition, countermeasures to cloud security breaches are presented.

Ancient History relies on disciplines such as Epigraphy, the study of ancient inscribed texts, for evidence of the recorded past. However, these texts, "inscriptions", are often damaged over the centuries, and illegible parts of the text... more

Ancient History relies on disciplines such as Epigraphy, the study of ancient inscribed texts, for evidence of the recorded past. However, these texts, "inscriptions", are often damaged over the centuries, and illegible parts of the text must be restored by specialists, known as epigraphists. This work presents PYTHIA, the first ancient text restoration model that recovers missing characters from a damaged text input using deep neural networks. Its architecture is carefully designed to handle longterm context information, and deal efficiently with missing or corrupted character and word representations. To train it, we wrote a nontrivial pipeline to convert PHI, the largest digital corpus of ancient Greek inscriptions, to machine actionable text, which we call PHI-ML. On PHI-ML, PYTHIA's predictions achieve a 30.1% character error rate, compared to the 57.3% of human epigraphists. Moreover, in 73.5% of cases the ground-truth sequence was among the Top-20 hypotheses of PYTHIA, which effectively demonstrates the impact of this assistive method on the field of digital epigraphy, and sets the state-of-the-art in ancient text restoration.

As the pandemic goes by, psychological state has become a vital side of our daily lives. Individuals still don't feel comfortable to speak to people regarding their psychological state and frequently tend to keep their issues to... more

As the pandemic goes by, psychological state has become a vital side of our daily lives. Individuals still don't feel comfortable to speak to people regarding their psychological state and frequently tend to keep their issues to themselves, this typically results in the buildup of stress in their minds and thus leads to a hampered productivity in their work. As these cases tend to extend, we tend to conceive it by introducing a Therapy Chatbot, that may assist and check with the person regarding his mental state. The user gets to share his feelings while not having the concern of being judged. Thus, reducing the number of deaths because of depression.