Sattar Hashemi | Shiraz University (original) (raw)

Papers by Sattar Hashemi

Research paper thumbnail of A Distributed Clustering Approach for Heterogeneous Environments Using Fuzzy Rough Set Theory

International Journal of Information Science and Management, 2020

Vast majority of data mining algorithms have been designed to work on centralized data, unfortuna... more Vast majority of data mining algorithms have been designed to work on centralized data, unfortunately however, almost all of nowadays data sets are distributed both geographically and conceptually. Due to privacy and computation cost, centralizing distributed data sets before analyzing them is undoubtedly impractical. In this paper, we present a framework for clustering distributed data which takes into account privacy and computation cost. To do that, we remove uncertain instances and just send the label of the other instances to the central location. To remove the uncertain instances, we develop a new instance weighting method based on fuzzy and rough set theory. The achieved results on well-known data verify effectiveness of the proposed method compared to previous works.

Research paper thumbnail of Etection by Stackelberg Game

Many data mining applications, ranging from Spam filtering to intrusion detection, are forced wit... more Many data mining applications, ranging from Spam filtering to intrusion detection, are forced with active adversaries. Adversary deliberately manipulate data in order to reduce the classifier's accuracy, in all these applications, initially successful classifiers will degrade easily. In this paper we model the interaction between the adversary and the classifier as a two person sequential non cooperative Stackelberg game and analyze the payoff when there is a leader and a follower. We then proceed to model the interaction as an optimization problem and solve it with evolutionary strategy. Our experimental results are promising; since they show that our approach improves accuracy spam detection on several real world data sets.

Research paper thumbnail of A Generalized Kernel-Based Random K-Samplesets Method for Transfer Learning

Iranian Journal of Science and Technology-Transactions of Electrical Engineering, 2015

Transfer learning allows the knowledge transference from the source (training dataset) to target ... more Transfer learning allows the knowledge transference from the source (training dataset) to target (test dataset) domain. Feature selection for transfer learning (f-MMD) is a simple and effective transfer learning method, which tackles the domain shift problem. f-MMD has good performance on small-sized datasets, but it suffers from two major issues: i) computational efficiency and predictive performance of f-MMD is challenged by the application domains with large number of examples and features, and ii) f-MMD considers the domain shift problem in fully unsupervised manner. In this paper, we propose a new approach to break down the large initial set of samples into a number of small-sized random subsets, called samplesets. Moreover, we present a feature weighting and instance clustering approach, which categorizes the original feature samplesets into the variant and invariant features. In domain shift problem, invariant features have a vital role in transferring knowledge across domain...

Research paper thumbnail of Unsupervised Feature Selection Based on the Distribution of Features Attributed to Imbalanced Data Sets

Since dealing with high dimensional data is computationally complex and sometimes even intractabl... more Since dealing with high dimensional data is computationally complex and sometimes even intractable, recently several feature reduction methods have been developed to reduce the dimensionality of the data in order to simplify the calculation analysis in various applications such as text categorization, signal processing, image retrieval and gene expressions among many others. Among feature reduction techniques, feature selection is one of the most popular methods due to the preservation of the original meaning of features. However, most of the current feature selection methods do not have a good performance when fed on imbalanced data sets which are pervasive in real world applications. In this paper, we propose a new unsupervised feature selection method attributed to imbalanced data sets, which will remove redundant features from the original feature space based on the distribution of features. To show the effectiveness of the proposed method, popular feature selection methods have...

Research paper thumbnail of Correction to: Binary domain adaptation with independence maximization

International Journal of Machine Learning and Cybernetics

Research paper thumbnail of Deep Dive into Ransomware Threat Hunting and Intelligence at Fog Layer

Ransomware, a malware designed to encrypt data for ransom payments, is a threat to fog layer node... more Ransomware, a malware designed to encrypt data for ransom payments, is a threat to fog layer nodes as such nodes typically contain considerably amount of sensitive data. The capability to efficiently hunt abnormalities relating to ransomware activities is crucial in timely detection of ransomware. In this paper, we present our Deep Ransomware Threat Hunting and Intelligence System (DRTHIS) to distinguish ransomware from goodware and identify their families. Specifically, DRTHIS utilizes Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN), two deep learning techniques, for classification using the softmax algorithm. We then use 220 Locky, 220 Cerber and 220 TeslaCrypt ransomware samples, and 219 goodware samples, to train DRTHIS. Findings from our evaluations demonstrate that the proposed system achieves an F-measure of 99.6% with a true positive rate of 97.2% in the classification of ransomware instances. Additionally, we demonstrate that DRTHIS is capable of detect...

Research paper thumbnail of Unsupervised Domain Adaptation Based on Correlation Maximization

Research paper thumbnail of Exploiting kernel-based feature weighting and instance clustering to transfer knowledge across domains

TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES

Research paper thumbnail of Online Prediction via Continuous Artificial Prediction Markets

IEEE Intelligent Systems, 2017

Research paper thumbnail of Ordered Classifier Chains for Multi-label Classification

Journal of Machine Intelligence, 2016

Research paper thumbnail of Detection of Metamorphic Malware based on HMM: A Hierarchical Approach

International Journal of Intelligent Systems and Applications, 2016

Research paper thumbnail of Scaling up the hybrid Particle Swarm Optimization algorithm for nominal data-sets

Intelligent Data Analysis, 2015

Research paper thumbnail of Multi-Document Summarization Using Graph-Based Iterative Ranking Algorithms and Information Theoretical Distortion Measures

Research paper thumbnail of A statistical approach for clustering in streaming data

Artificial Intelligence Research, 2013

Research paper thumbnail of An enriched game-theoretic framework for multi-objective clustering

Applied Soft Computing, 2013

ABSTRACT The framework of multi-objective clustering can serve as a competent technique in nowada... more ABSTRACT The framework of multi-objective clustering can serve as a competent technique in nowadays human issues ranging from decision making process to machine learning and pattern recognition problems. Multi-objective clustering basically aims at placing similar objects into the same groups based on some conflicting objectives, which substantially supports the use of game theory to come to a resolution. Based on these understandings, this paper suggests Enriched Game Theory K-means, called EGTKMeans, as a novel multi-objective clustering technique based on the notion of game theory. EGTKMeans is specially designed to optimize two intrinsically conflicting objectives, named, compaction and equi-partitioning. The key contributions of the proposed approach are three folds. First, it formulates an elegant and novel payoff definition which considers both objectives with equal priority. The presented payoff function incorporates a desirable fairness into the final clustering results. Second, EGTKMeans performs better off by utilizing the advantages of mixed strategies as well as those of pure ones, considering the existence of mixed Nash Equilibrium in every game. The last but not the least is that EGTKMeans approaches the optimal solution in a very promising manner by optimizing both objectives simultaneously. The experimental results suggest that the proposed approach significantly outperforms other rival methods across real world and synthetic data sets with reasonable time complexity.

Research paper thumbnail of Unsupervised feature selection using feature similarity

IEEE Transactions on Pattern Analysis …, 2002

AbstractIn this article, we describe an unsupervised feature selection algorithm suitable for dat... more AbstractIn this article, we describe an unsupervised feature selection algorithm suitable for data sets, large in both dimension and size. The method is based on measuring similarity between features whereby redundancy therein is removed. This does not need any search and, ...

Research paper thumbnail of A Game-Theoretic Framework to Identify Top-K Teams in Social Networks

International Conference on Knowledge Discovery and Information Retrieval, 2012

Research paper thumbnail of Community Detection in Social Networks Using Information Diffusion

Advances in Social …, 2012

Research paper thumbnail of Discovering overlapping communities in social networks: A novel game-theoretic approach

AI COMMUNICATION JOURNAL, 2013

Research paper thumbnail of HDM-Analyser: a hybrid analysis approach based on data mining techniques for malware detection

Today’s security threats like malware are more sophisticated and targeted than ever, and they are... more Today’s security threats like malware are more sophisticated and targeted than ever, and they are growing at an unprecedented rate. To deal with them, various approaches are introduced. One of them is Signature-based detection, which is an effective method and widely used to detect malware; however, there is a substantial problem in detecting new instances. In other words, it is solely useful for the second malware attack. Due to the rapid proliferation of malware and the desperate need for human effort to extract some kinds of signature, this approach is a tedious solution; thus, an intelligent malware detection system is required to deal with new malware threats. Most of intelligent detection systems utilise some data mining techniques in order to distinguish malware from sane programs. One of the pivotal phases of these systems is extracting features from malware samples and benign ones in order to make at least a learning model. This phase is called “Malware Analysis” which play...

Research paper thumbnail of A Distributed Clustering Approach for Heterogeneous Environments Using Fuzzy Rough Set Theory

International Journal of Information Science and Management, 2020

Vast majority of data mining algorithms have been designed to work on centralized data, unfortuna... more Vast majority of data mining algorithms have been designed to work on centralized data, unfortunately however, almost all of nowadays data sets are distributed both geographically and conceptually. Due to privacy and computation cost, centralizing distributed data sets before analyzing them is undoubtedly impractical. In this paper, we present a framework for clustering distributed data which takes into account privacy and computation cost. To do that, we remove uncertain instances and just send the label of the other instances to the central location. To remove the uncertain instances, we develop a new instance weighting method based on fuzzy and rough set theory. The achieved results on well-known data verify effectiveness of the proposed method compared to previous works.

Research paper thumbnail of Etection by Stackelberg Game

Many data mining applications, ranging from Spam filtering to intrusion detection, are forced wit... more Many data mining applications, ranging from Spam filtering to intrusion detection, are forced with active adversaries. Adversary deliberately manipulate data in order to reduce the classifier's accuracy, in all these applications, initially successful classifiers will degrade easily. In this paper we model the interaction between the adversary and the classifier as a two person sequential non cooperative Stackelberg game and analyze the payoff when there is a leader and a follower. We then proceed to model the interaction as an optimization problem and solve it with evolutionary strategy. Our experimental results are promising; since they show that our approach improves accuracy spam detection on several real world data sets.

Research paper thumbnail of A Generalized Kernel-Based Random K-Samplesets Method for Transfer Learning

Iranian Journal of Science and Technology-Transactions of Electrical Engineering, 2015

Transfer learning allows the knowledge transference from the source (training dataset) to target ... more Transfer learning allows the knowledge transference from the source (training dataset) to target (test dataset) domain. Feature selection for transfer learning (f-MMD) is a simple and effective transfer learning method, which tackles the domain shift problem. f-MMD has good performance on small-sized datasets, but it suffers from two major issues: i) computational efficiency and predictive performance of f-MMD is challenged by the application domains with large number of examples and features, and ii) f-MMD considers the domain shift problem in fully unsupervised manner. In this paper, we propose a new approach to break down the large initial set of samples into a number of small-sized random subsets, called samplesets. Moreover, we present a feature weighting and instance clustering approach, which categorizes the original feature samplesets into the variant and invariant features. In domain shift problem, invariant features have a vital role in transferring knowledge across domain...

Research paper thumbnail of Unsupervised Feature Selection Based on the Distribution of Features Attributed to Imbalanced Data Sets

Since dealing with high dimensional data is computationally complex and sometimes even intractabl... more Since dealing with high dimensional data is computationally complex and sometimes even intractable, recently several feature reduction methods have been developed to reduce the dimensionality of the data in order to simplify the calculation analysis in various applications such as text categorization, signal processing, image retrieval and gene expressions among many others. Among feature reduction techniques, feature selection is one of the most popular methods due to the preservation of the original meaning of features. However, most of the current feature selection methods do not have a good performance when fed on imbalanced data sets which are pervasive in real world applications. In this paper, we propose a new unsupervised feature selection method attributed to imbalanced data sets, which will remove redundant features from the original feature space based on the distribution of features. To show the effectiveness of the proposed method, popular feature selection methods have...

Research paper thumbnail of Correction to: Binary domain adaptation with independence maximization

International Journal of Machine Learning and Cybernetics

Research paper thumbnail of Deep Dive into Ransomware Threat Hunting and Intelligence at Fog Layer

Ransomware, a malware designed to encrypt data for ransom payments, is a threat to fog layer node... more Ransomware, a malware designed to encrypt data for ransom payments, is a threat to fog layer nodes as such nodes typically contain considerably amount of sensitive data. The capability to efficiently hunt abnormalities relating to ransomware activities is crucial in timely detection of ransomware. In this paper, we present our Deep Ransomware Threat Hunting and Intelligence System (DRTHIS) to distinguish ransomware from goodware and identify their families. Specifically, DRTHIS utilizes Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN), two deep learning techniques, for classification using the softmax algorithm. We then use 220 Locky, 220 Cerber and 220 TeslaCrypt ransomware samples, and 219 goodware samples, to train DRTHIS. Findings from our evaluations demonstrate that the proposed system achieves an F-measure of 99.6% with a true positive rate of 97.2% in the classification of ransomware instances. Additionally, we demonstrate that DRTHIS is capable of detect...

Research paper thumbnail of Unsupervised Domain Adaptation Based on Correlation Maximization

Research paper thumbnail of Exploiting kernel-based feature weighting and instance clustering to transfer knowledge across domains

TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES

Research paper thumbnail of Online Prediction via Continuous Artificial Prediction Markets

IEEE Intelligent Systems, 2017

Research paper thumbnail of Ordered Classifier Chains for Multi-label Classification

Journal of Machine Intelligence, 2016

Research paper thumbnail of Detection of Metamorphic Malware based on HMM: A Hierarchical Approach

International Journal of Intelligent Systems and Applications, 2016

Research paper thumbnail of Scaling up the hybrid Particle Swarm Optimization algorithm for nominal data-sets

Intelligent Data Analysis, 2015

Research paper thumbnail of Multi-Document Summarization Using Graph-Based Iterative Ranking Algorithms and Information Theoretical Distortion Measures

Research paper thumbnail of A statistical approach for clustering in streaming data

Artificial Intelligence Research, 2013

Research paper thumbnail of An enriched game-theoretic framework for multi-objective clustering

Applied Soft Computing, 2013

ABSTRACT The framework of multi-objective clustering can serve as a competent technique in nowada... more ABSTRACT The framework of multi-objective clustering can serve as a competent technique in nowadays human issues ranging from decision making process to machine learning and pattern recognition problems. Multi-objective clustering basically aims at placing similar objects into the same groups based on some conflicting objectives, which substantially supports the use of game theory to come to a resolution. Based on these understandings, this paper suggests Enriched Game Theory K-means, called EGTKMeans, as a novel multi-objective clustering technique based on the notion of game theory. EGTKMeans is specially designed to optimize two intrinsically conflicting objectives, named, compaction and equi-partitioning. The key contributions of the proposed approach are three folds. First, it formulates an elegant and novel payoff definition which considers both objectives with equal priority. The presented payoff function incorporates a desirable fairness into the final clustering results. Second, EGTKMeans performs better off by utilizing the advantages of mixed strategies as well as those of pure ones, considering the existence of mixed Nash Equilibrium in every game. The last but not the least is that EGTKMeans approaches the optimal solution in a very promising manner by optimizing both objectives simultaneously. The experimental results suggest that the proposed approach significantly outperforms other rival methods across real world and synthetic data sets with reasonable time complexity.

Research paper thumbnail of Unsupervised feature selection using feature similarity

IEEE Transactions on Pattern Analysis …, 2002

AbstractIn this article, we describe an unsupervised feature selection algorithm suitable for dat... more AbstractIn this article, we describe an unsupervised feature selection algorithm suitable for data sets, large in both dimension and size. The method is based on measuring similarity between features whereby redundancy therein is removed. This does not need any search and, ...

Research paper thumbnail of A Game-Theoretic Framework to Identify Top-K Teams in Social Networks

International Conference on Knowledge Discovery and Information Retrieval, 2012

Research paper thumbnail of Community Detection in Social Networks Using Information Diffusion

Advances in Social …, 2012

Research paper thumbnail of Discovering overlapping communities in social networks: A novel game-theoretic approach

AI COMMUNICATION JOURNAL, 2013

Research paper thumbnail of HDM-Analyser: a hybrid analysis approach based on data mining techniques for malware detection

Today’s security threats like malware are more sophisticated and targeted than ever, and they are... more Today’s security threats like malware are more sophisticated and targeted than ever, and they are growing at an unprecedented rate. To deal with them, various approaches are introduced. One of them is Signature-based detection, which is an effective method and widely used to detect malware; however, there is a substantial problem in detecting new instances. In other words, it is solely useful for the second malware attack. Due to the rapid proliferation of malware and the desperate need for human effort to extract some kinds of signature, this approach is a tedious solution; thus, an intelligent malware detection system is required to deal with new malware threats. Most of intelligent detection systems utilise some data mining techniques in order to distinguish malware from sane programs. One of the pivotal phases of these systems is extracting features from malware samples and benign ones in order to make at least a learning model. This phase is called “Malware Analysis” which play...