Sattar Hashemi | Shiraz University (original) (raw)
Papers by Sattar Hashemi
International Journal of Information Science and Management, 2020
Vast majority of data mining algorithms have been designed to work on centralized data, unfortuna... more Vast majority of data mining algorithms have been designed to work on centralized data, unfortunately however, almost all of nowadays data sets are distributed both geographically and conceptually. Due to privacy and computation cost, centralizing distributed data sets before analyzing them is undoubtedly impractical. In this paper, we present a framework for clustering distributed data which takes into account privacy and computation cost. To do that, we remove uncertain instances and just send the label of the other instances to the central location. To remove the uncertain instances, we develop a new instance weighting method based on fuzzy and rough set theory. The achieved results on well-known data verify effectiveness of the proposed method compared to previous works.
Many data mining applications, ranging from Spam filtering to intrusion detection, are forced wit... more Many data mining applications, ranging from Spam filtering to intrusion detection, are forced with active adversaries. Adversary deliberately manipulate data in order to reduce the classifier's accuracy, in all these applications, initially successful classifiers will degrade easily. In this paper we model the interaction between the adversary and the classifier as a two person sequential non cooperative Stackelberg game and analyze the payoff when there is a leader and a follower. We then proceed to model the interaction as an optimization problem and solve it with evolutionary strategy. Our experimental results are promising; since they show that our approach improves accuracy spam detection on several real world data sets.
Iranian Journal of Science and Technology-Transactions of Electrical Engineering, 2015
Transfer learning allows the knowledge transference from the source (training dataset) to target ... more Transfer learning allows the knowledge transference from the source (training dataset) to target (test dataset) domain. Feature selection for transfer learning (f-MMD) is a simple and effective transfer learning method, which tackles the domain shift problem. f-MMD has good performance on small-sized datasets, but it suffers from two major issues: i) computational efficiency and predictive performance of f-MMD is challenged by the application domains with large number of examples and features, and ii) f-MMD considers the domain shift problem in fully unsupervised manner. In this paper, we propose a new approach to break down the large initial set of samples into a number of small-sized random subsets, called samplesets. Moreover, we present a feature weighting and instance clustering approach, which categorizes the original feature samplesets into the variant and invariant features. In domain shift problem, invariant features have a vital role in transferring knowledge across domain...
Since dealing with high dimensional data is computationally complex and sometimes even intractabl... more Since dealing with high dimensional data is computationally complex and sometimes even intractable, recently several feature reduction methods have been developed to reduce the dimensionality of the data in order to simplify the calculation analysis in various applications such as text categorization, signal processing, image retrieval and gene expressions among many others. Among feature reduction techniques, feature selection is one of the most popular methods due to the preservation of the original meaning of features. However, most of the current feature selection methods do not have a good performance when fed on imbalanced data sets which are pervasive in real world applications. In this paper, we propose a new unsupervised feature selection method attributed to imbalanced data sets, which will remove redundant features from the original feature space based on the distribution of features. To show the effectiveness of the proposed method, popular feature selection methods have...
International Journal of Machine Learning and Cybernetics
Ransomware, a malware designed to encrypt data for ransom payments, is a threat to fog layer node... more Ransomware, a malware designed to encrypt data for ransom payments, is a threat to fog layer nodes as such nodes typically contain considerably amount of sensitive data. The capability to efficiently hunt abnormalities relating to ransomware activities is crucial in timely detection of ransomware. In this paper, we present our Deep Ransomware Threat Hunting and Intelligence System (DRTHIS) to distinguish ransomware from goodware and identify their families. Specifically, DRTHIS utilizes Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN), two deep learning techniques, for classification using the softmax algorithm. We then use 220 Locky, 220 Cerber and 220 TeslaCrypt ransomware samples, and 219 goodware samples, to train DRTHIS. Findings from our evaluations demonstrate that the proposed system achieves an F-measure of 99.6% with a true positive rate of 97.2% in the classification of ransomware instances. Additionally, we demonstrate that DRTHIS is capable of detect...
TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES
IEEE Intelligent Systems, 2017
Journal of Machine Intelligence, 2016
International Journal of Intelligent Systems and Applications, 2016
Intelligent Data Analysis, 2015
Artificial Intelligence Research, 2013
Applied Soft Computing, 2013
ABSTRACT The framework of multi-objective clustering can serve as a competent technique in nowada... more ABSTRACT The framework of multi-objective clustering can serve as a competent technique in nowadays human issues ranging from decision making process to machine learning and pattern recognition problems. Multi-objective clustering basically aims at placing similar objects into the same groups based on some conflicting objectives, which substantially supports the use of game theory to come to a resolution. Based on these understandings, this paper suggests Enriched Game Theory K-means, called EGTKMeans, as a novel multi-objective clustering technique based on the notion of game theory. EGTKMeans is specially designed to optimize two intrinsically conflicting objectives, named, compaction and equi-partitioning. The key contributions of the proposed approach are three folds. First, it formulates an elegant and novel payoff definition which considers both objectives with equal priority. The presented payoff function incorporates a desirable fairness into the final clustering results. Second, EGTKMeans performs better off by utilizing the advantages of mixed strategies as well as those of pure ones, considering the existence of mixed Nash Equilibrium in every game. The last but not the least is that EGTKMeans approaches the optimal solution in a very promising manner by optimizing both objectives simultaneously. The experimental results suggest that the proposed approach significantly outperforms other rival methods across real world and synthetic data sets with reasonable time complexity.
IEEE Transactions on Pattern Analysis …, 2002
AbstractIn this article, we describe an unsupervised feature selection algorithm suitable for dat... more AbstractIn this article, we describe an unsupervised feature selection algorithm suitable for data sets, large in both dimension and size. The method is based on measuring similarity between features whereby redundancy therein is removed. This does not need any search and, ...
International Conference on Knowledge Discovery and Information Retrieval, 2012
Advances in Social …, 2012
AI COMMUNICATION JOURNAL, 2013
Today’s security threats like malware are more sophisticated and targeted than ever, and they are... more Today’s security threats like malware are more sophisticated and targeted than ever, and they are growing at an unprecedented rate. To deal with them, various approaches are introduced. One of them is Signature-based detection, which is an effective method and widely used to detect malware; however, there is a substantial problem in detecting new instances. In other words, it is solely useful for the second malware attack. Due to the rapid proliferation of malware and the desperate need for human effort to extract some kinds of signature, this approach is a tedious solution; thus, an intelligent malware detection system is required to deal with new malware threats. Most of intelligent detection systems utilise some data mining techniques in order to distinguish malware from sane programs. One of the pivotal phases of these systems is extracting features from malware samples and benign ones in order to make at least a learning model. This phase is called “Malware Analysis” which play...
International Journal of Information Science and Management, 2020
Vast majority of data mining algorithms have been designed to work on centralized data, unfortuna... more Vast majority of data mining algorithms have been designed to work on centralized data, unfortunately however, almost all of nowadays data sets are distributed both geographically and conceptually. Due to privacy and computation cost, centralizing distributed data sets before analyzing them is undoubtedly impractical. In this paper, we present a framework for clustering distributed data which takes into account privacy and computation cost. To do that, we remove uncertain instances and just send the label of the other instances to the central location. To remove the uncertain instances, we develop a new instance weighting method based on fuzzy and rough set theory. The achieved results on well-known data verify effectiveness of the proposed method compared to previous works.
Many data mining applications, ranging from Spam filtering to intrusion detection, are forced wit... more Many data mining applications, ranging from Spam filtering to intrusion detection, are forced with active adversaries. Adversary deliberately manipulate data in order to reduce the classifier's accuracy, in all these applications, initially successful classifiers will degrade easily. In this paper we model the interaction between the adversary and the classifier as a two person sequential non cooperative Stackelberg game and analyze the payoff when there is a leader and a follower. We then proceed to model the interaction as an optimization problem and solve it with evolutionary strategy. Our experimental results are promising; since they show that our approach improves accuracy spam detection on several real world data sets.
Iranian Journal of Science and Technology-Transactions of Electrical Engineering, 2015
Transfer learning allows the knowledge transference from the source (training dataset) to target ... more Transfer learning allows the knowledge transference from the source (training dataset) to target (test dataset) domain. Feature selection for transfer learning (f-MMD) is a simple and effective transfer learning method, which tackles the domain shift problem. f-MMD has good performance on small-sized datasets, but it suffers from two major issues: i) computational efficiency and predictive performance of f-MMD is challenged by the application domains with large number of examples and features, and ii) f-MMD considers the domain shift problem in fully unsupervised manner. In this paper, we propose a new approach to break down the large initial set of samples into a number of small-sized random subsets, called samplesets. Moreover, we present a feature weighting and instance clustering approach, which categorizes the original feature samplesets into the variant and invariant features. In domain shift problem, invariant features have a vital role in transferring knowledge across domain...
Since dealing with high dimensional data is computationally complex and sometimes even intractabl... more Since dealing with high dimensional data is computationally complex and sometimes even intractable, recently several feature reduction methods have been developed to reduce the dimensionality of the data in order to simplify the calculation analysis in various applications such as text categorization, signal processing, image retrieval and gene expressions among many others. Among feature reduction techniques, feature selection is one of the most popular methods due to the preservation of the original meaning of features. However, most of the current feature selection methods do not have a good performance when fed on imbalanced data sets which are pervasive in real world applications. In this paper, we propose a new unsupervised feature selection method attributed to imbalanced data sets, which will remove redundant features from the original feature space based on the distribution of features. To show the effectiveness of the proposed method, popular feature selection methods have...
International Journal of Machine Learning and Cybernetics
Ransomware, a malware designed to encrypt data for ransom payments, is a threat to fog layer node... more Ransomware, a malware designed to encrypt data for ransom payments, is a threat to fog layer nodes as such nodes typically contain considerably amount of sensitive data. The capability to efficiently hunt abnormalities relating to ransomware activities is crucial in timely detection of ransomware. In this paper, we present our Deep Ransomware Threat Hunting and Intelligence System (DRTHIS) to distinguish ransomware from goodware and identify their families. Specifically, DRTHIS utilizes Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN), two deep learning techniques, for classification using the softmax algorithm. We then use 220 Locky, 220 Cerber and 220 TeslaCrypt ransomware samples, and 219 goodware samples, to train DRTHIS. Findings from our evaluations demonstrate that the proposed system achieves an F-measure of 99.6% with a true positive rate of 97.2% in the classification of ransomware instances. Additionally, we demonstrate that DRTHIS is capable of detect...
TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES
IEEE Intelligent Systems, 2017
Journal of Machine Intelligence, 2016
International Journal of Intelligent Systems and Applications, 2016
Intelligent Data Analysis, 2015
Artificial Intelligence Research, 2013
Applied Soft Computing, 2013
ABSTRACT The framework of multi-objective clustering can serve as a competent technique in nowada... more ABSTRACT The framework of multi-objective clustering can serve as a competent technique in nowadays human issues ranging from decision making process to machine learning and pattern recognition problems. Multi-objective clustering basically aims at placing similar objects into the same groups based on some conflicting objectives, which substantially supports the use of game theory to come to a resolution. Based on these understandings, this paper suggests Enriched Game Theory K-means, called EGTKMeans, as a novel multi-objective clustering technique based on the notion of game theory. EGTKMeans is specially designed to optimize two intrinsically conflicting objectives, named, compaction and equi-partitioning. The key contributions of the proposed approach are three folds. First, it formulates an elegant and novel payoff definition which considers both objectives with equal priority. The presented payoff function incorporates a desirable fairness into the final clustering results. Second, EGTKMeans performs better off by utilizing the advantages of mixed strategies as well as those of pure ones, considering the existence of mixed Nash Equilibrium in every game. The last but not the least is that EGTKMeans approaches the optimal solution in a very promising manner by optimizing both objectives simultaneously. The experimental results suggest that the proposed approach significantly outperforms other rival methods across real world and synthetic data sets with reasonable time complexity.
IEEE Transactions on Pattern Analysis …, 2002
AbstractIn this article, we describe an unsupervised feature selection algorithm suitable for dat... more AbstractIn this article, we describe an unsupervised feature selection algorithm suitable for data sets, large in both dimension and size. The method is based on measuring similarity between features whereby redundancy therein is removed. This does not need any search and, ...
International Conference on Knowledge Discovery and Information Retrieval, 2012
Advances in Social …, 2012
AI COMMUNICATION JOURNAL, 2013
Today’s security threats like malware are more sophisticated and targeted than ever, and they are... more Today’s security threats like malware are more sophisticated and targeted than ever, and they are growing at an unprecedented rate. To deal with them, various approaches are introduced. One of them is Signature-based detection, which is an effective method and widely used to detect malware; however, there is a substantial problem in detecting new instances. In other words, it is solely useful for the second malware attack. Due to the rapid proliferation of malware and the desperate need for human effort to extract some kinds of signature, this approach is a tedious solution; thus, an intelligent malware detection system is required to deal with new malware threats. Most of intelligent detection systems utilise some data mining techniques in order to distinguish malware from sane programs. One of the pivotal phases of these systems is extracting features from malware samples and benign ones in order to make at least a learning model. This phase is called “Malware Analysis” which play...