Improved Text mining for bulk data using Deep learning approach (original) (raw)

Text Document Clustering using Hashing Deep Learning Method

International Journal of Soft Computing

Internet exploration is the use of data mining systems to discover patterns of information and relationships from internet data via data mining techniques. The aim of this study is to locate an efficient algorithm for web news mining with analysis of web news data using data clustering and classification procedures based on deep learning, as well as to evaluate the best way to use site news information algorithms compared to other technologies, and to assess the reliability of the internet news databases that are used as tools and techniques for data mining. In this study, we used an effective algorithm (H ash) that is used to collect and classify data for the best classification of Internet news with results close to 95. 66% accuracy.

Text mining: identification of similarity of text documents using hybrid similarity model

Iran Journal of Computer Science, 2022

The volume of data that are accessible on the internet has increased dramatically. This growth of data will only increase exponentially in the future as more data exhaust devices are connected to the network. A part of these data consists of documents from various sources. As the data from various digital sources increase, it becomes tough to perform the process of identification of relevant information which is most essentially needed for their further usage. The goal of this research is to present a hybrid similarity algorithm that uses text summarization techniques to identify papers that are similar in terms of both semantic and contextual similarity. Some of these methods aim to quantify the corpus's polysemy quotient using deep learning with numerous layers and prebuilt Natural Language Processing (NPL) models to determine document similarity. In comparison with other conventional algorithms, the experimental results of our model showed an accuracy of 76.25%.

Text Mining using Deep Learning Article Review

International Journal of Scientific & Engineering Research, 2018

Deep Learning has efficient and accurate methods of learning which come back to the research area again after rapidly developments in the hardware, Also the text learning either supervised or unsupervised open area for the research. This paper aims to provide the researcher in (deep learning for text learning supervised or unsupervised) domain by comprehensive knowledge in this domain, it represents an overview of important articles over the last five years and discus methods that used and the conclusion. This article conducted to address relevant researches about the deep learning use in text mining by using the Google Scholar to define the period (issued between 2013 and 2018).

Document Classification using LSTM Neural Network

2017

Document Classification is one of the most important topic in Computer Science as the number of electronic documents are increasingly very rapidly each day. Document classification is also known as Document Categorization. Classification is training of known labels to predict the unknown labels. It is the process of assigning a particular document to predefined categories. In this paper, we apply machine learning methods for classification of Documents. Recurrent Neural Networks of which LSTM is one of the most successful and have been developed for Controlling Robots, Natural Language Text Compression, Automatic Speech Recognition, Time Series Prediction, Handwriting Recognition and many more. LSTM can also be used for document classification. Document Classification includes text processing, feature extraction, feature vector construction and label prediction or final classification. Furthermore, we first try some data processing on 20 Newsgroup Dataset, and then we extract a feat...

SURVEY ON DOCUMENT CLASSIFICATION USING DEEP LEARNING TECHNIQUES

IJCSIS Vol 17 No 4 April Issue, 2019

Document classification focuses to allocate at least one class or category to a document, making it easier to to find the relevant information at the right time and for filtering and routing documents directly to users. Documents can be classified according to their attributes or subjects. Document classification methods involve: Concept Mining, tf-idf, Support vector machines (SVM), , Naive Bayes classifier, Artificial neural network, , Instantaneously trained neural networks, K-nearest neighbor algorithms, Natural language processing and different methodologies. The main advantage of deep learning over other technique is because deep learning techniques can outperform other techniques when data size is large, reduces the need for feature engineering and has high performance on complex problems. The main objective of the survey is to analyse the various deep learning techniques for document classification. Keywords: Shallow neural network, Convolutional neural network, Recurrent neural network, Recurrent convolutional neural network, Bi-directional neural network.

Calculation and Performance Evaluation of Text Similarity Based on Strong Classification Features

Applied Mathematics and Nonlinear Sciences

Based on the strong classification feature recognition algorithm, the calculation algorithm of a text semantic similarity is studied with the performance evaluation in this paper. In order to achieve a general algorithm for this function, the semantic function library based on a semantic recognition code as a comparison object is designed. It drives the algorithm modules of two fuzzy neuron deep convolution machine learning, and between these two processes of machine learning, a rigid algorithm based on Fourier transform frequency domain feature is extracted. Finally, a more complex machine learning general algorithm is realized by the use of external data fuzzy algorithm and de-fuzzy algorithm before and after the algorithm module. It is also a technical innovation in this paper. Through the performance evaluation based on the subjective evaluation of volunteers, it is found that the system focuses on the text semantic similarity evaluation of the Chinese language, and achieves a c...

Textual data dimensionality reduction - a deep learning approach

Multimedia Tools and Applications, 2018

The growth of Internet has produced a high volume of natural language textual data. Such data can be sparse and may contain uninformative features which increase the dimensions of the data. This high dimensionality in turn, decreases the efficiency of text mining tasks such as clustering. Transforming the high dimensional data into a lower dimension is an important pre-processing step before applying clustering. In this paper, dimensionality reduction method based on deep Autoencoder neural network named as DRDAE, is proposed to provide optimized and robust features for text clustering. DRDAE selects less correlated and salient feature space from the high dimensional feature space. To evaluate proposed algorithm, k-means is used to cluster text documents. The proposed method is tested on five benchmark text datasets. Simulation results demonstrate that the proposed algorithm clearly outperforms other conventional dimensionality reduction methods in the literature in terms of RI measure.

A Survey of Numerous Text Similarity Approach

International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 2023

One of the most common NLP use cases is text similarity. Every domain comes with a variety of use cases. The most common uses of text similarity include finding related articles/news/genres, efficient use of search engines, classification of related issues on any topic, etc. It serves as a framework for many text analytics use cases. Methods to solve text similarity use cases have been around for a while, but the main drawbacks of the old methods are loss of dependency information, difficulty remembering long conversations, exploding gradient problems, etc. Recent advanced deep learning-based models pay attention to both contiguous and distant words, making their learning ability more rigorous. This white paper focuses on various text similarity techniques that can be used in everyday life to solve these use cases.

EFFECTIVE LINEAR-TIME DOCUMENT CLUSTERING IN TEXT MINING USING WEB DOCUMENT CATEGORIZATION

Among data mining technique, clustering is one of the most important and traditional concept also an unsupervised learning paradigm. Similarity of a document pairs can be measured by matching of concepts. Finding or extracting the most relevant concept from the documents is a challengeable task. To address this issue, in this paper we introduce a concept of multi view point based similarity measure. Our proposed methods uses multiple point of reference between document pairs to extract more relevant match concept rather than extracting only ideas based on similarity measure. Using multiple view point, gathers more information about a particular topic from many different but relevant sources or concept. This strategy works well with smaller documents but is especially effective with longer documents. By gathering more relevant concepts from the documents with multiple points of reference, the document organization and retrieval can enhance the ability to make the most use of the documents held in storage and make retrieval of ideas as well as relevant task or concept much easier and faster. Experimental results shows that our proposed method efficiently extract more relevant concept.