anish basnet - Academia.edu (original) (raw)
Uploads
Papers by anish basnet
Computational Intelligence and Neuroscience, 2021
COVID-19 has claimed several human lives to this date. People are dying not only because of physi... more COVID-19 has claimed several human lives to this date. People are dying not only because of physical infection of the virus but also because of mental illness, which is linked to people’s sentiments and psychologies. People’s written texts/posts scattered on the web could help understand their psychology and the state they are in during this pandemic. In this paper, we analyze people’s sentiment based on the classification of tweets collected from the social media platform, Twitter, in Nepal. For this, we, first, propose to use three different feature extraction methods—fastText-based (ft), domain-specific (ds), and domain-agnostic (da)—for the representation of tweets. Among these three methods, two methods (“ds” and “da”) are the novel methods used in this study. Second, we propose three different convolution neural networks (CNNs) to implement the proposed features. Last, we ensemble such three CNNs models using ensemble CNN, which works in an end-to-end manner, to achieve the en...
Our dataset, which is Nepali news dataset, contains 17 categories, including Art, Bank, Blog, Bus... more Our dataset, which is Nepali news dataset, contains 17 categories, including Art, Bank, Blog, Business, Diaspora, Entertainment, Filmy, Health, Hollywood-bollywood, Koseli, Literature, Music, National, Opinion, Society, Sports, and World.
PeerJ Computer Science
Document representation with outlier tokens exacerbates the classification performance due to the... more Document representation with outlier tokens exacerbates the classification performance due to the uncertain orientation of such tokens. Most existing document representation methods in different languages including Nepali mostly ignore the strategies to filter them out from documents before learning their representations. In this article, we propose a novel document representation method based on a supervised codebook to represent the Nepali documents, where our codebook contains only semantic tokens without outliers. Our codebook is domain-specific as it is based on tokens in a given corpus that have higher similarities with the class labels in the corpus. Our method adopts a simple yet prominent representation method for each word, called probability-based word embedding. To show the efficacy of our method, we evaluate its performance in the document classification task using Support Vector Machine and validate against widely used document representation methods such as Bag of Wor...
Neural Information Processing
The existing image feature extraction methods are primarily based on the content and structure in... more The existing image feature extraction methods are primarily based on the content and structure information of images, and rarely consider the contextual semantic information. Regarding some types of images such as scenes and objects, the annotations and descriptions of them available on the web may provide reliable contextual semantic information for feature extraction. In this paper, we introduce novel semantic features of an image based on the annotations and descriptions of its similar images available on the web. Specifically, we propose a new method which consists of two consecutive steps to extract our semantic features. For each image in the training set, we initially search the top k most similar images from the internet and extract their annotations/descriptions (e.g., tags or keywords). The annotation information is employed to design a filter bank for each image category and generate filter words (codebook). Finally, each image is represented by the histogram of the occurrences of filter words in all categories. We evaluate the performance of the proposed features in scene image classification on three commonly-used scene image datasets (i.e., MIT-67, Scene15 and Event8). Our method typically produces a lower feature dimension than existing feature extraction methods. Experimental results show that the proposed features generate better classification accuracies than vision based and tag based features, and comparable results to deep learning based features.
2020 International Joint Conference on Neural Networks (IJCNN)
Thesis Chapters by anish basnet
Tribhuvan University, 2019
“Model Evaluation of Embedding Based Document Classification” is a project which helps us to clas... more “Model Evaluation of Embedding Based Document Classification” is a project which helps us to classify multiple documents based on the templates. The main purpose of this project is to provide the document classifier model based on the content of the different documents. This system takes the documents in the form of PDF, images, email, social media platform,s etc. And based on these document set, we will classify the document based on the template. We first generate all the content of the document and then we create a feature vector to classify the documents. Templates are the target class that we have to classify the document.
Computational Intelligence and Neuroscience, 2021
COVID-19 has claimed several human lives to this date. People are dying not only because of physi... more COVID-19 has claimed several human lives to this date. People are dying not only because of physical infection of the virus but also because of mental illness, which is linked to people’s sentiments and psychologies. People’s written texts/posts scattered on the web could help understand their psychology and the state they are in during this pandemic. In this paper, we analyze people’s sentiment based on the classification of tweets collected from the social media platform, Twitter, in Nepal. For this, we, first, propose to use three different feature extraction methods—fastText-based (ft), domain-specific (ds), and domain-agnostic (da)—for the representation of tweets. Among these three methods, two methods (“ds” and “da”) are the novel methods used in this study. Second, we propose three different convolution neural networks (CNNs) to implement the proposed features. Last, we ensemble such three CNNs models using ensemble CNN, which works in an end-to-end manner, to achieve the en...
Our dataset, which is Nepali news dataset, contains 17 categories, including Art, Bank, Blog, Bus... more Our dataset, which is Nepali news dataset, contains 17 categories, including Art, Bank, Blog, Business, Diaspora, Entertainment, Filmy, Health, Hollywood-bollywood, Koseli, Literature, Music, National, Opinion, Society, Sports, and World.
PeerJ Computer Science
Document representation with outlier tokens exacerbates the classification performance due to the... more Document representation with outlier tokens exacerbates the classification performance due to the uncertain orientation of such tokens. Most existing document representation methods in different languages including Nepali mostly ignore the strategies to filter them out from documents before learning their representations. In this article, we propose a novel document representation method based on a supervised codebook to represent the Nepali documents, where our codebook contains only semantic tokens without outliers. Our codebook is domain-specific as it is based on tokens in a given corpus that have higher similarities with the class labels in the corpus. Our method adopts a simple yet prominent representation method for each word, called probability-based word embedding. To show the efficacy of our method, we evaluate its performance in the document classification task using Support Vector Machine and validate against widely used document representation methods such as Bag of Wor...
Neural Information Processing
The existing image feature extraction methods are primarily based on the content and structure in... more The existing image feature extraction methods are primarily based on the content and structure information of images, and rarely consider the contextual semantic information. Regarding some types of images such as scenes and objects, the annotations and descriptions of them available on the web may provide reliable contextual semantic information for feature extraction. In this paper, we introduce novel semantic features of an image based on the annotations and descriptions of its similar images available on the web. Specifically, we propose a new method which consists of two consecutive steps to extract our semantic features. For each image in the training set, we initially search the top k most similar images from the internet and extract their annotations/descriptions (e.g., tags or keywords). The annotation information is employed to design a filter bank for each image category and generate filter words (codebook). Finally, each image is represented by the histogram of the occurrences of filter words in all categories. We evaluate the performance of the proposed features in scene image classification on three commonly-used scene image datasets (i.e., MIT-67, Scene15 and Event8). Our method typically produces a lower feature dimension than existing feature extraction methods. Experimental results show that the proposed features generate better classification accuracies than vision based and tag based features, and comparable results to deep learning based features.
2020 International Joint Conference on Neural Networks (IJCNN)
Tribhuvan University, 2019
“Model Evaluation of Embedding Based Document Classification” is a project which helps us to clas... more “Model Evaluation of Embedding Based Document Classification” is a project which helps us to classify multiple documents based on the templates. The main purpose of this project is to provide the document classifier model based on the content of the different documents. This system takes the documents in the form of PDF, images, email, social media platform,s etc. And based on these document set, we will classify the document based on the template. We first generate all the content of the document and then we create a feature vector to classify the documents. Templates are the target class that we have to classify the document.