Mohsen Kahani | Ferdowsi University of Mashhad (original) (raw)

Papers by Mohsen Kahani

PeerJ Computer Science

Today, several attempts to manage question answering (QA) have been made in three separate areas:... more Today, several attempts to manage question answering (QA) have been made in three separate areas: (1) knowledge-based (KB), (2) text-based and (3) hybrid, which takes advantage of both prior areas in extracting the response. On the other hand, in question answering on a large number of sources, source prediction to ensure scalability is very important. In this paper, a method for source prediction is presented in hybrid QA, involving several KB sources and a text source. In a few hybrid methods for source selection, including only one KB source in addition to the textual source, prioritization or heuristics have been used that have not been evaluated so far. Most methods available in source selection services are based on general metadata or triple instances. These methods are not suitable due to the unstructured source in hybrid QA. In this research, we need data details to predict the source. In addition, unlike KB federated methods that are based on triple instances, we use the b...

Social Network Analysis and Mining

Nowadays, social media play an essential role in spreading the news with low cost and high speed ... more Nowadays, social media play an essential role in spreading the news with low cost and high speed in publishing, and easy availability. As anybody can publish some news, many fake materials are also published in this media. Multi-class detection of fake news (more than just false or true) have gained more attention, as some news are half-true. In addition, early detection of fake news has an important role, before their impacts on the society become great. This paper investigates an early detection of fake news using multi-class classi cation. This is achieved by extracting the content features from the news, such as sentiment and semantic features. The proposed model employs ve classi ers (Random Forest, Support Vector Machine, Decision Tree, LightGBM, and XGBoost) as primary classi ers. Furthermore, the AdaBoost is used for the meta-learning algorithm to develop a stacking generalization model concerning fake news detection. Stacking generalization is an ensemble learning method that uses all data produced by the rst-level algorithms. We trained our model with Politifact data in the evaluation, and the model performance was evaluated by their Accuracy, Precision, Recall, and F1 score. Performance improvements are visible over previous models, in both binary and multi-class classi cations.

Much research has been conducted extracting a response from either text sources or a knowledge ba... more Much research has been conducted extracting a response from either text sources or a knowledge base (KB). The challenge becomes more complicated when the goal is to answer a question with the help of both text and KB. In these hybrid systems, we address the following challenges: i) excessive growth of search space, ii) extraction of the answer from both KB and text, iii) extracting the path to reach to the answer, and vi) the scalability in terms of the volume of documents explored. A heterogeneous graph is utilized to tackle the first challenge guided by question decomposition. The second challenge is met with the usage of the idea behind an existing text-based method, and its customization for graph development. Based on this method for multi-hop questions, an approach is proposed for the extraction of answer explanation to address the third challenge. Since the basic method uses a dense vector for scalability, the final challenge is also addressed in the proposed hybrid method. E...

2021 11th International Conference on Computer Engineering and Knowledge (ICCKE), 2021

Nowadays, news plays a significant role in everyday life. Due to the increasing usage of social m... more Nowadays, news plays a significant role in everyday life. Due to the increasing usage of social media and the dissemination of news by people who have access to social media, there is a problem that the validation of the news may be questioned, and people may publish fake news for their benefit. Automatic fake news detection is a complex issue. It is necessary to have up-to-date and reliable data to build an efficient model for detection. However, there are very few such datasets available for researchers. In this paper, we proposed a new fake news dataset extracted from three famous and reliable fact-checking websites. Because of the different labels used in each site, an algorithm was developed to integrated these 37 labels into five unified labels. Some experiments were conducted to show the usability and validity of the dataset.

Iranian Journal of Information Processing & Management, 2018

While ontology development is beneficial, it is very costly and time consuming. In order to reduc... more While ontology development is beneficial, it is very costly and time consuming. In order to reduce this cost as well as increasing the accuracy and quality of ontology development, researchers have proposed different methodologies. The goal of these methodologies is to present a systematic manual or semi-automated development of ontologies, while each differs and has its strengths and weaknesses. In this paper, after reviewing current methodologies, we present a new integrated collaborative methodology for ontology development, and compare it with the existing ones. This new system, called Ontirandoc, has been used in two ontology development projects and its accuracy has been evaluated.

In this paper we propose a multi-agent architecture for web information retrieval using fuzzy log... more In this paper we propose a multi-agent architecture for web information retrieval using fuzzy logic based result fusion mechanism. The model is designed in JADE framework and takes advantage of JXTA agent communication method to allow agent communication through firewalls and network address translators. This approach enables developers to build and deploy P2P applications through a unified medium to manage agent-based document retrieval from multiple sources.

Electrical Engineering (ICEE), Iranian Conference on, 2018

Despite the attraction of considerable interest from researchers and industries, the community st... more Despite the attraction of considerable interest from researchers and industries, the community still faces the problem of building reliable and efficient IDSs, capable of detecting intrusion with high accuracy and low time consuming. In this paper, we are investigating a hybrid scheme that combines advantages of deep learning methods and support vector machine to improve the accuracy and efficiency. Initially, a method of deep learning, such as stacked Auto-encoder (SAE) network, is utilized to reduce the dimensionality of the feature sets and gain the latent features. This is followed by a support vector machine (SVM) for binary classification of the events into normal or attacks. Our method is implemented and evaluated using ISCX IDS UNB dataset. Experimental result indicated that our combined method outperforms SVM alone in terms of both accuracy and run-time efficiency.

International Journal of Information Security, 2020

This paper successfully tackles the problem of processing a vast amount of security related data ... more This paper successfully tackles the problem of processing a vast amount of security related data for the task of network intrusion detection. It employs Apache Spark, as a big data processing tool, for processing a large size of network traffic data. Also, we propose a hybrid scheme that combines the advantages of deep network and machine learning methods. Initially, stacked autoencoder network is used for latent feature extraction, which is followed by several classification-based intrusion detection methods, such as support vector machine, random forest, decision trees, and naive Bayes which are used for fast and efficient detection of intrusion in massive network traffic data. A real time UNB ISCX 2012 dataset is used to validate our proposed method and the performance is evaluated in terms of accuracy, f-measure, sensitivity, precision and time.

2020 IEEE 24th International Enterprise Distributed Object Computing Conference (EDOC), 2020

Business process monitoring techniques have been investigated in depth over the last decade to en... more Business process monitoring techniques have been investigated in depth over the last decade to enable organizations to deliver process insight. Recently, a new stream of work in predictive business process monitoring leveraged deep learning techniques to unlock the potential business value locked in process execution event logs. These works use Recurrent Neural Networks, such as Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU), and suffer from misinformation and accuracy as they use the last hidden state (as the context vector) for the purpose of predicting the next event. On the other hand, in operational processes, traces may be very long, which makes the above methods inappropriate for analyzing them. In addition, in predicting the next events in a running case, some of the previous events should be given a higher priority. To address these shortcomings, in this paper, we present a novel approach inspired by the notion of attention mechanism, utilized in Natural Language Processing and, particularly, in Neural Machine Translation. Our proposed approach uses all hidden states to accurately predict future behavior and the outcome of individual activities. Experimental evaluation of real-world event logs revealed that the use of attention mechanisms in the proposed approach leads to a more accurate prediction.

ArXiv, 2020

The vision of the Linked Open Data (LOD) initiative is to provide a distributed model for publish... more The vision of the Linked Open Data (LOD) initiative is to provide a distributed model for publishing and meaningfully interlinking open data. The realization of this goal depends strongly on the quality of the data that is published as a part of the LOD. This paper focuses on the systematic quality assessment of datasets prior to publication on the LOD cloud. To this end, we identify important quality deficiencies that need to be avoided and/or resolved prior to the publication of a dataset. We then propose a set of metrics to measure these quality deficiencies in a dataset. This way, we enable the assessment and identification of undesirable quality characteristics of a dataset through our proposed metrics. This will help publishers to filter out low-quality data based on the quality assessment results, which in turn enables data consumers to make better and more informed decisions when using the open datasets.

Iranian Journal of Information Processing & Management, 2017

The aim of this study is to find out the bibliographic relationships between the metadata records... more The aim of this study is to find out the bibliographic relationships between the metadata records in the National Library and Archives of Iran (NLAI) according to FRBR model, in order to represent the knowledge network of Iranian-Islamic publications. To achieve this objective, content analysis method was used. The study population includes metadata records for books in NLAI for four bibliographic families including The Quran, Nahj al-Balagha, Shahnameh, and Masnavi (a total of 28213 records) that were accessible through the NLAI OPAC. In this study, the data gathering methods were structured (systematic) observation and documentary method. A checklist is used for data gathering, and a matrix is used to display the analyzed data. The results of the study showed that

By expanding bank services and increment in access requests for information resources, web server... more By expanding bank services and increment in access requests for information resources, web servers perform weak about response time. So it is necessary to use simulation and mathematical models in order to analyzing complicated systems, optimizing and managing web server systems. In this paper, services which are offered through internet are introduced and analyzed as a queue. Web servers are one of the main and effective parts in quality of internet services. Therefore, the operation of web servers is analyzed and optimized by concept of queue and simulation. One of the significant points in this paper is analyzing a problem taken from real world in internet field and also performing a new analysis from user’s requests in web servers. The main purpose in this paper is to persuade users and answering theme as soon as possible. In this paper, after introducing the problem’s structure, simulation and queue concept are used for analyzing, optimizing and managing web servers. Finally, a...

ArXiv, 2018

Question Answering (QA) systems provide easy access to the vast amount of knowledge without havin... more Question Answering (QA) systems provide easy access to the vast amount of knowledge without having to know the underlying complex structure of the knowledge. The research community has provided ad hoc solutions to the key QA tasks, including named entity recognition and disambiguation, relation extraction and query building. Furthermore, some have integrated and composed these components to implement many tasks automatically and efficiently. However, in general, the existing solutions are limited to simple and short questions and still do not address complex questions composed of several sub-questions. Exploiting the answer to complex questions is further challenged if it requires integrating knowledge from unstructured data sources, i.e., textual corpus, as well as structured data sources, i.e., knowledge graphs. In this paper, an approach (HCqa) is introduced for dealing with complex questions requiring federating knowledge from a hybrid of heterogeneous data sources (structured a...

Cross-organizational mining is a new research field in the process mining domain, which focuses o... more Cross-organizational mining is a new research field in the process mining domain, which focuses on the analysis and mining of processes in multiple organizations. Suitable access to collections of business process variants is necessary for researchers to evaluate their work in this research domain. To the best of our knowledge, no complete collection of process variants or any process variants/log generator tool exists for this purpose. In this paper, we propose an algorithm for generating random process variants for a given process model and a supporting toolset built on top of the PLG toolset. For this purpose, we classify different factors that can serve as variation points. Then, using the structure tree based representation of an input process, we present an algorithm for applying variation points based on a user-defined variation rate. The developed tool is publicly available for researchers to use.

Learning accessible any where and any time is the dream of human beings that can come into realit... more Learning accessible any where and any time is the dream of human beings that can come into reality by Mobile eLearning. There is a need for resource and information sharing among learners and tutors. That is why Mobile Grid is used as an infrastructure for such an environment. Mobile Grid learning environment is mobile and contextual information changes, so by context awareness and using it, learners can offered most probably the exact information they need. Modeling various types of contextual information, reasoning about it and managing it, can be done by storing them in Semantic Web formats. Central grid management causes many problems. Mobile grid management is more complicated because it has a dynamic nature and available resources and information changes frequently. A distributed grid management system for e-Learning Mobile Grid is proposed in this paper.

2019 9th International Conference on Computer and Knowledge Engineering (ICCKE), 2019

With the development and popularity of social networks, many human beings prefer to share their e... more With the development and popularity of social networks, many human beings prefer to share their experiences on these networks. There are various methods proposed by the researcher which utilized user-generated content in the location-based social networks (LBSN) and recommend locations to users. However, there is a high sparsity in the user check-in information makes it tough to recommend the appropriate and accurate location to the user. To fix this issue, we put forward a proposal as a framework which utilizes a wide range of information available in these networks, each of which has its own type and provides appropriate recommendation. For this purpose, we encode the information as a number of entities and its attributes in the form of a heterogeneous graph, then graph embedding methods are used to embed all nodes in unified semantic representation space. As a result, we are able to model relations between users and venues in an efficient way and ameliorate the accuracy of the proposed method that recommends a place to a user. Our method is implemented and evaluated using Foursquare dataset, and the evaluation results depict that our work, boost performance in terms of precision, recall, and f-measure compared to the baseline work.

2017 2nd Conference on Swarm Intelligence and Evolutionary Computation (CSIEC), 2017

the amount of ontologies and semantic annotations available on the Web is constantly growing and ... more the amount of ontologies and semantic annotations available on the Web is constantly growing and heterogeneous data raises new challenges for the data mining community. Yet there are still many problems causing users extra problems in discovering knowledge or even failing to obtain the real and useful knowledge they need. In this paper, we survey some semantic data mining methods specifically focusing on association rules. However, there are few works that have focused in mining semantic web data itself. For extracting rules in semantic data, we present an intelligent data mining approach incorporated with domain. The paper contributes a new algorithm for discovery of new type of patterns from semantic data. This new type of patterns is appropriate for some data such as stock market. We take advantage of the knowledge encoded in the ontology and MICF measure to inference in three steps to prune the search space and generated rules to derive appropriate rules from thousands of rules. Some experiments performed on stock market data and show the usefulness and efficiency of the approach.

Lecture Notes in Computer Science, 2017

In this paper, we address the problem of predicting future interests of users with regards to a s... more In this paper, we address the problem of predicting future interests of users with regards to a set of unobserved topics in microblogging services which enables forward planning based on potential future interests. Existing works in the literature that operate based on a known interest space cannot be directly applied to solve this problem. Such methods require at least a minimum user interaction with the topic to perform prediction. To tackle this problem, we integrate the semantic information derived from the Wikipedia category structure and the temporal evolution of user's interests into our prediction model. More specifically, to capture the temporal behaviour of the topics and user's interests, we consider discrete intervals and build user's topic profile in each time interval separately. Then, we generalize users' interests that have been observed over several time intervals by transferring them over the Wikipedia category structure. Our approach not only allows us to generalize users' interests but also enables us to transfer users' interests across different time intervals that do not necessarily have the same set of topics. Our experiments illustrate the superiority of our model compared to the state of the art.

Journal of Big Data, 2021

Clustering algorithm analysis, including time and space complexity analysis, has always been disc... more Clustering algorithm analysis, including time and space complexity analysis, has always been discussed in the literature. The emergence of big data has also created a lot of challenges for this issue. Because of high complexity and execution time, traditional clustering techniques cannot be used for such an amount of data. This problem has been addressed in this research. To present the clustering algorithm using a bee colony algorithm and high-speed read/write performance, Map-Reduce architecture is used. Using this architecture allows the proposed method to cluster any volume of data, and there is no limit to the amount of data. The presented algorithm has good performance and high precision. The simulation results on 3 datasets show that the presented algorithm is more efficient than other big data clustering methods. Also, the results of our algorithm execution time on huge datasets are much better than other big data clustering approaches.