Mostafa Salama | British University in Egypt (BUE) (original) (raw)
Papers by Mostafa Salama
Al-Azhar Bulletin of Science- Basic Science Sector, 2013
The production of several volatile substances like Geosmin responsible of the characteristic wet ... more The production of several volatile substances like Geosmin responsible of the characteristic wet earth odor distributed in terrestrial and aquatic ecosystems, especially in soil. Members of this group are produce clinically useful antitumor drugs such as Anthracyclines, Antimetabolites, Arzinophilin, Mitomycins and Vancomycins. DNA cleavage by topoisomerase I or II inhibition , Mitochondria permeabilization inhibition and inhibiting tumor-induced angiogenesis. Isolation , Purification of Actinobacter from different sources and Determination of biological activities (Antimicrobial / Anti-tumor). Optimization of production media (natural and synthetic) of the promising bioactive compounds (Antimicrobial / Anti-tumor). Finally, Determination of the anti-tumor activity of the bioactive compound (s) of Actinobacter against different tumors.
Proceedings of the 7th International Conference on Software and Information Engineering, 2018
Quantifying the amount and content of information transfer between neural populations is crucial ... more Quantifying the amount and content of information transfer between neural populations is crucial to understand brain dynamics and cognitive functions. Most data-driven methods exploit the notion of Wiener-Granger causality, a statistical concept based on temporal prediction. Transfer Entropy and Directed Information formalise this notion by means of information theoretical quantities and can capture any (linear and nonlinear) time-lagged conditional dependencies, thus quantifying the amount of information flow between neural signals. Nevertheless, none of these metrics can reveal what type of information is exchanged. To address this issue, we developed a new measure called Feature-specific Information Transfer (FIT) that is able to quantify both the amount and content of information transfer between neuronal signals. We tested the novel metric on simulated data and showed that it successfully captures feature-specific information transfer in different communication scenarios including feedforward communication, external confounding inputs and synergistic interactions. Moreover, the FIT measure displayed sensitivity to modulations in temporal parameters of information transfer and signal-to-noise ratios, and correctly inferred the directionality of transfer between signals. We then tested FIT's ability to track feature-specific information flow from neurophysiological data. First, we analysed human electroencephalographic data acquired during a face detection task and confirmed current hypotheses suggesting that information about the presence of an eye in a face image flows from the contralateral to the ipsilateral hemisphere with respect to the position of the eye. Second, we analysed multi-unit activity data recorded from thalamus and cortex of rat's brain, and showed that the FIT measure successfully detected bottom-up information transfer about visual or somatosensory stimuli in the corresponding neural pathway. Third, we analysed cortical high-gamma activity estimated from human magnetoencephalographic data during visuomotor mapping, and confirmed the notion that visuomotor-related information flows from superior parietal to premotor areas. Altogether our work suggests that the FIT measure has the potential to uncover previously hidden feature-specific information transfer from neural data and provide a better understanding of brain communication.
Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, 2019
Process Mining is a research field that aims to develop new techniques to discover, monitor and i... more Process Mining is a research field that aims to develop new techniques to discover, monitor and improve real processes by extracting knowledge from event logs. This relatively young research discipline has evidenced efficacy in various applications, especially in application domains where a dynamic behavior needs to be related to process models. Process Model Discovery is presumably the most important task in Process Mining since the discovered models can be used as an objective starting points for any further process analysis to be conducted. There are various quality dimensions the model should consider during discovery such as Replay-Fitness, Precision, Generalization, and Simplicity. It becomes evident that Process Model Discovery, with its current given settings, is a Multi-Objective Optimization Problem. However, most existing techniques does not approach the problem as a Multi-Objective Optimization Problem. Therefore, in this work we propose the use of one of the most robust and widely used Multi-Objective Optimizers in Process Model Discovery, the NSGA-II algorithm. Experimental results on a real life event log shows that the proposed technique outperforms existing techniques in various aspects. Also this work tries to establish a benchmarking system for comparing results of Multi-Objective Optimization based Process Model Discovery techniques. Recently, several ABPD techniques have been developed. Discovering process models which can be graphically represented in different process modelling
The International Conference on Electrical Engineering, 2006
Mobile agent and digital coin represent two growing technologies in E-Commerce systems. However, ... more Mobile agent and digital coin represent two growing technologies in E-Commerce systems. However, mobile agent systems suffer problems of cloning agents and the inability to detect the user who cloned the agent. While digital coin suffers a problem of spending the same coin more than once i.e. double spending. Merging these two technologies into one scheme may solve such arising problems. Lam-Wei's Scheme [1] proof that the double spending detect-and-accuse cloning algorithm in e-cash can be used to detect-and accuse cloning offenders. The presented paper follows an opposite approach to this scheme as it implements a digital coin as mobile agent using some cryptographic concepts. This proposed scheme implements an E-cash system that prevents the double spending problem.
International Journal of Online and Biomedical Engineering (iJOE), 2019
Multivariate feature selection techniques search for the optimal features subset to reduce the di... more Multivariate feature selection techniques search for the optimal features subset to reduce the dimensionality and hence the complexity of a classification task. Statistical feature selection techniques measure the mutual correlation between features well as the correlation of each feature to the tar- get feature. However, adding a feature to a feature subset could deteriorate the classification accuracy even though this feature positively correlates to the target class. Although most of existing feature ranking/selection techniques consider the interdependency between features, the nature of interaction be- tween features in relationship to the classification problem is still not well investigated. This study proposes a technique for forward feature selection that calculates the novel measure Partnership-Gain to select a subset of features whose partnership constructively correlates to the target feature classification. Comparative analysis to other well-known techniques shows that ...
International Journal of Emerging Technologies in Learning (iJET), 2019
This work enhances the analysis of the student performance in the high education level. This mode... more This work enhances the analysis of the student performance in the high education level. This model categorizes the features according to their relativeness to the teaching style and to the student activities on an Electronic Learning system. Several new features are proposed and calculated in each of these two categories/dimensions. This approach applies an extra level of machine learning that analyses the data based on a set of dimensions, and each dimensions includes a set of features. The prediction analysis is applied on each dimension separately based on a different classifiers. The best fitting classifier to each dimension ensures the enhancement of the local analysis accuracy and though enhances overall global accuracy. The accuracy of prediction of the student is enhanced to 87%. This study allows the detection of the correlation the features in different dimension. Furthermore, a study is applied on the scanned text documents for extracting and utilizing the features that r...
International Journal of Interactive Mobile Technologies (iJIM), 2017
In this work, we present a framework to integrate the functionalities of the Enterprise Resource ... more In this work, we present a framework to integrate the functionalities of the Enterprise Resource Planning (ERP) with the University academic and management functionalities at the British University in Egypt , in order to produce a University Resource Planning (URP) that facilitates the integration between educational and management processes within higher educational institutions.Our ERP systems should be enabled to:Automate admissions. Eliminate manual processes and save significant staff time by enabling prospective students to apply online through a self-service portal. Provide one-stop student access. Allow students to enroll, register, and pay for courses through the portal.Simplify records management. With a single system for all your data needs—and a single digital record for each student—any department on campus can find the student information they need.Engage faculty. Give faculty the means to enter and update grades, and have personalized access to timely, accurate, and i...
International Journal of Recent Contributions from Engineering, Science & IT (iJES), 2016
Recently, the corporate social performance (CSP) is not less important than the corporate financi... more Recently, the corporate social performance (CSP) is not less important than the corporate financial performance (CFP). Debate still exists about the nature of the relationship between the CSP and CFP, whether it is a positive, negative or a neutral correlation. The objective of this study is to explore the relationship between corporate social responsibility (CSR) reports and CFP. The study uses the accounting-based and market-based quantitative measures to quantify the financial performance of seven organizations listed on the Egyptian Stock Exchange in 2007-2014. Then uses the information retrieval technologies to quantify the contribution of each of the three dimensions of the corporate social responsibility report (environmental, social and economic). Finally, the correlation between these two sets of variables is viewed together in a model to detect the correlations between them. This model is applied on seven firms that generate social responsibility reports. The results show ...
MATEC Web of Conferences, 2016
Real life problems handled by machine learning deals with various forms of values in the data set... more Real life problems handled by machine learning deals with various forms of values in the data set attributes, like the continuous and discrete form. Discretization is an important step in the pre-processing stage as most of the attribute selection techniques assume the discreetness of the input values. This step could change the internal structure of the input attribute values with respect to the classification problem, and thus the quality of this step directly impact the quality of the selected features. This work discusses the problems existing in the current discretization techniques and proposes an attribute evaluation and selection technique to avoid these problems. Attributes are evaluated in its continuous form directly without biasing its internal structure and enhances the computational complexity by eliminating the discretization step. The basic insight of the proposed approach relies on the inverse relationship between class label distribution overlap and the relative information content of a given attribute. In order to estimate the validity of this assumption, a series of data sets were examined using several standard approaches including our own implementation, and the approaches ranked with respect to the overall classification accuracy. The results, at least with respect to the testing data sets deployed in this study, indicate that the proposed approach outperformed other methods selected for evaluation in this study. These results will be examined over a wider range of continuous attribute data sets from nonmedical domains in order to investigate the robustness of these results.
EURASIP Journal on Bioinformatics and Systems Biology, 2016
Viral evolution remains to be a main obstacle in the effectiveness of antiviral treatments. The a... more Viral evolution remains to be a main obstacle in the effectiveness of antiviral treatments. The ability to predict this evolution will help in the early detection of drug-resistant strains and will potentially facilitate the design of more efficient antiviral treatments. Various tools has been utilized in genome studies to achieve this goal. One of these tools is machine learning, which facilitates the study of structure-activity relationships, secondary and tertiary structure evolution prediction, and sequence error correction. This work proposes a novel machine learning technique for the prediction of the possible point mutations that appear on alignments of primary RNA sequence structure. It predicts the genotype of each nucleotide in the RNA sequence, and proves that a nucleotide in an RNA sequence changes based on the other nucleotides in the sequence. Neural networks technique is utilized in order to predict new strains, then a rough set theory based algorithm is introduced to extract these point mutation patterns. This algorithm is applied on a number of aligned RNA isolates time-series species of the Newcastle virus. Two different data sets from two sources are used in the validation of these techniques. The results show that the accuracy of this technique in predicting the nucleotides in the new generation is as high as 75 %. The mutation rules are visualized for the analysis of the correlation between different nucleotides in the same RNA sequence.
Procedia Computer Science, 2015
Data sets dealing with the same medical problems like Coronary artery disease (CAD) may show diff... more Data sets dealing with the same medical problems like Coronary artery disease (CAD) may show different results when applying the same machine learning technique. The classification accuracy results and the selected important features are based mainly on the efficiency of the medical diagnosis and analysis. The aim of this work is to apply an integration of the results of the machine learning analysis applied on different data sets targeting the CAD disease. This will avoid the missing, incorrect, and inconsistent data problems that may appear in the data collection. Fast decision tree and pruned C4.5 tree are applied where the resulted trees are extracted from different data sets and compared. Common features among these data sets are extracted and used in the later analysis for the same disease in any data set. The results show that the classification accuracy of the collected dataset is 78.06% higher than the average of the classification accuracy of all separate datasets which is 75.48%. c
International Journal of Service Science, Management, Engineering, and Technology, 2014
Euclidian calculations represent a cornerstone in many machine learning techniques such as the Fu... more Euclidian calculations represent a cornerstone in many machine learning techniques such as the Fuzzy C-Means (FCM) and Support Vector Machine (SVM) techniques. The FCM technique calculates the Euclidian distance between different data points, and the SVM technique calculates the dot product of two points in the Euclidian space. These calculations do not consider the degree of relevance of the selected features to the target class labels. This paper proposed a modification in the Euclidian space calculation for the FCM and SVM techniques based on the ranking of features extracted from evaluating the features. The authors consider the ranking as a membership value of this feature in Fuzzification of Euclidian calculations rather than using the crisp concept of feature selection, which selects some features and ignores others. Experimental results proved that applying the fuzzy value of memberships to Euclidian calculations in the FCM and SVM techniques has better accuracy than the ord...
Intrusion detection systems have been around for quite some time, to protect systems from inside ... more Intrusion detection systems have been around for quite some time, to protect systems from inside ad outside threats. Researchers and scientists are concerned on how to enhance the intrusion detection performance, to be able to deal with real-time attacks and detect them fast from quick response. One way to improve performance is to use minimal number of features to define a model in a way that it can be used to accurately discriminate normal from anomalous behaviour. Many feature selection techniques are out there to reduce feature sets or extract new features out of them. In this paper, we propose an anomaly detectors generation approach using genetic algorithm in conjunction with several features selection techniques, including principle components analysis, sequential floating, and correlation-based feature selection. A Genetic algorithm was applied with deterministic crowding niching technique, to generate a set of detectors from a single run. The results show that sequential-floating techniques with the genetic algorithm have the best results, compared to others tested, especially the sequential floating forward selection with detection accuracy 92.86% on the train set and 85.38% on the test set.
Representation and visualization of continuous data using the Formal Concept Analysis (FCA) becam... more Representation and visualization of continuous data using the Formal Concept Analysis (FCA) became an important requirement in real-life fields. Application of formal concept analysis (FCA) model on numerical data, a scaling or Discretization / binarization procedures should be applied as preprocessing stage. The Scaling procedure increases the complexity of computation of the FCA, while the binarization process leads to a distortion in the internal structure of the input data set. The proposed approach uses a binarization procedure prior to applying FCA model, and then applies a validation process to the generated lattice to measure or ensure its degree of accuracy. The introduced approach is based on the evaluation of each attribute according to the objects of its extent set. To prove the validity of the introduced approach, the technique is applied on two data sets in the medical field which are the Indian Diabetes and the Breast Cancer data sets. Both data sets show the generation of a valid lattice.
Communications in Computer and Information Science, 2012
The risk of hepatitis-C virus is considered as a challenge in the field of medicine. Applying fea... more The risk of hepatitis-C virus is considered as a challenge in the field of medicine. Applying feature reduction technique and generating rules based on the selected features were considered as an important step in data mining. It is needed by medical experts to analyze the generated rules to find out if these rules are important in real life cases. This paper presents an application of a rough set analysis to discover the dependency between the attributes, and to generate a set of reducts consisting of a minimal number of attributes. The experimental results obtained, show that the overall accuracy offered by the rough sets is high.
Computational Social Networks, 2012
The continuous self-growing nature of social networks makes it hard to define a line of safety ar... more The continuous self-growing nature of social networks makes it hard to define a line of safety around these networks. Users in social networks are not interacting with the Web only but also with trusted groups that may also contain enemies. There are different kinds of attacks on these networks including causing damage to the computer systems and stealing information about users. These attacks are not only affecting individuals but also the organizations they are belonging to. Protection from these attacks should be performed by the users and security experts of the network. Advices should be provided to users of these social networks. Also security experts should be sure that the contents transmitted through the network do not contain malicious or harmful data. This chapter presents an overview of the social networks security and privacy issues and illustrates the various security risks and the tasks applied to minimize those risks. In addition, this chapter explains some of the common strategies that attackers often use and some possible counter measures against such issues.
Computational Social Networks, 2012
Computational social science is a new emerging field that has overlapping regions from mathematic... more Computational social science is a new emerging field that has overlapping regions from mathematics, psychology, computer sciences, sociology, and management. Social computing is concerned with the intersection of social behavior and computational systems. It supports any sort of social behavior in or through computational systems. It is based on creating or recreating social conventions and social contexts through the use of software and technology. Thus, blogs, email, instant messaging, social network services, wikis, social bookmarking, and other instances of what is often called social software illustrate ideas from social computing. Social network analysis is the study of relationships among social entities. It is becoming an important tool for investigators. However all the necessary information is often distributed over a number of websites. Interest in this field is blossoming as traditional practitioners in the social and behavioral sciences are being joined by researchers from statistics, graph theory, machine learning,
2009 International Conference on Information Management and Engineering, 2009
Processing huge amount of collected network data to identify network intrusions needs high comput... more Processing huge amount of collected network data to identify network intrusions needs high computational cost. Reducing features in the collected data may therefore solve the problem. We proposed an approach for obtaining optimal number of features to build an efficient model for intrusion detection system (IDS). Two feature selection algorithms were involved to generate two feature sets. These two features
ELECTROPHORESIS, 2006
Entamoeba histolytica is a pathogenic protozoan parasite, which causes amoebic colitis, dysentery... more Entamoeba histolytica is a pathogenic protozoan parasite, which causes amoebic colitis, dysentery and liver abscesses in humans. Since the cyst and small trophozoite stages of this parasite are indistinguishable by light microscopy from Entamoeba dispar (which is nonpathogenic), specific diagnosis is compromised. To overcome this limitation, a PCR-coupled SSCP approach, utilising a sequence difference of 4.6% in a short region ( approximately 173-174 bp) of the small subunit of nuclear ribosomal DNA, was evaluated for the differentiation of the two species of Entamoeba. Including a range of well-defined control DNA samples (n = 67) to evaluate the specificity of the PCR, 45 DNA samples representing E. histolytica and E. dispar from human faecal samples were tested by SSCP, and unequivocal delineation between the species was achieved. This SSCP approach should provide a practical tool for the specific diagnosis of E. histolytica in humans and for investigating its epidemiology.
Critical Care Medicine, 2006
Al-Azhar Bulletin of Science- Basic Science Sector, 2013
The production of several volatile substances like Geosmin responsible of the characteristic wet ... more The production of several volatile substances like Geosmin responsible of the characteristic wet earth odor distributed in terrestrial and aquatic ecosystems, especially in soil. Members of this group are produce clinically useful antitumor drugs such as Anthracyclines, Antimetabolites, Arzinophilin, Mitomycins and Vancomycins. DNA cleavage by topoisomerase I or II inhibition , Mitochondria permeabilization inhibition and inhibiting tumor-induced angiogenesis. Isolation , Purification of Actinobacter from different sources and Determination of biological activities (Antimicrobial / Anti-tumor). Optimization of production media (natural and synthetic) of the promising bioactive compounds (Antimicrobial / Anti-tumor). Finally, Determination of the anti-tumor activity of the bioactive compound (s) of Actinobacter against different tumors.
Proceedings of the 7th International Conference on Software and Information Engineering, 2018
Quantifying the amount and content of information transfer between neural populations is crucial ... more Quantifying the amount and content of information transfer between neural populations is crucial to understand brain dynamics and cognitive functions. Most data-driven methods exploit the notion of Wiener-Granger causality, a statistical concept based on temporal prediction. Transfer Entropy and Directed Information formalise this notion by means of information theoretical quantities and can capture any (linear and nonlinear) time-lagged conditional dependencies, thus quantifying the amount of information flow between neural signals. Nevertheless, none of these metrics can reveal what type of information is exchanged. To address this issue, we developed a new measure called Feature-specific Information Transfer (FIT) that is able to quantify both the amount and content of information transfer between neuronal signals. We tested the novel metric on simulated data and showed that it successfully captures feature-specific information transfer in different communication scenarios including feedforward communication, external confounding inputs and synergistic interactions. Moreover, the FIT measure displayed sensitivity to modulations in temporal parameters of information transfer and signal-to-noise ratios, and correctly inferred the directionality of transfer between signals. We then tested FIT's ability to track feature-specific information flow from neurophysiological data. First, we analysed human electroencephalographic data acquired during a face detection task and confirmed current hypotheses suggesting that information about the presence of an eye in a face image flows from the contralateral to the ipsilateral hemisphere with respect to the position of the eye. Second, we analysed multi-unit activity data recorded from thalamus and cortex of rat's brain, and showed that the FIT measure successfully detected bottom-up information transfer about visual or somatosensory stimuli in the corresponding neural pathway. Third, we analysed cortical high-gamma activity estimated from human magnetoencephalographic data during visuomotor mapping, and confirmed the notion that visuomotor-related information flows from superior parietal to premotor areas. Altogether our work suggests that the FIT measure has the potential to uncover previously hidden feature-specific information transfer from neural data and provide a better understanding of brain communication.
Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, 2019
Process Mining is a research field that aims to develop new techniques to discover, monitor and i... more Process Mining is a research field that aims to develop new techniques to discover, monitor and improve real processes by extracting knowledge from event logs. This relatively young research discipline has evidenced efficacy in various applications, especially in application domains where a dynamic behavior needs to be related to process models. Process Model Discovery is presumably the most important task in Process Mining since the discovered models can be used as an objective starting points for any further process analysis to be conducted. There are various quality dimensions the model should consider during discovery such as Replay-Fitness, Precision, Generalization, and Simplicity. It becomes evident that Process Model Discovery, with its current given settings, is a Multi-Objective Optimization Problem. However, most existing techniques does not approach the problem as a Multi-Objective Optimization Problem. Therefore, in this work we propose the use of one of the most robust and widely used Multi-Objective Optimizers in Process Model Discovery, the NSGA-II algorithm. Experimental results on a real life event log shows that the proposed technique outperforms existing techniques in various aspects. Also this work tries to establish a benchmarking system for comparing results of Multi-Objective Optimization based Process Model Discovery techniques. Recently, several ABPD techniques have been developed. Discovering process models which can be graphically represented in different process modelling
The International Conference on Electrical Engineering, 2006
Mobile agent and digital coin represent two growing technologies in E-Commerce systems. However, ... more Mobile agent and digital coin represent two growing technologies in E-Commerce systems. However, mobile agent systems suffer problems of cloning agents and the inability to detect the user who cloned the agent. While digital coin suffers a problem of spending the same coin more than once i.e. double spending. Merging these two technologies into one scheme may solve such arising problems. Lam-Wei's Scheme [1] proof that the double spending detect-and-accuse cloning algorithm in e-cash can be used to detect-and accuse cloning offenders. The presented paper follows an opposite approach to this scheme as it implements a digital coin as mobile agent using some cryptographic concepts. This proposed scheme implements an E-cash system that prevents the double spending problem.
International Journal of Online and Biomedical Engineering (iJOE), 2019
Multivariate feature selection techniques search for the optimal features subset to reduce the di... more Multivariate feature selection techniques search for the optimal features subset to reduce the dimensionality and hence the complexity of a classification task. Statistical feature selection techniques measure the mutual correlation between features well as the correlation of each feature to the tar- get feature. However, adding a feature to a feature subset could deteriorate the classification accuracy even though this feature positively correlates to the target class. Although most of existing feature ranking/selection techniques consider the interdependency between features, the nature of interaction be- tween features in relationship to the classification problem is still not well investigated. This study proposes a technique for forward feature selection that calculates the novel measure Partnership-Gain to select a subset of features whose partnership constructively correlates to the target feature classification. Comparative analysis to other well-known techniques shows that ...
International Journal of Emerging Technologies in Learning (iJET), 2019
This work enhances the analysis of the student performance in the high education level. This mode... more This work enhances the analysis of the student performance in the high education level. This model categorizes the features according to their relativeness to the teaching style and to the student activities on an Electronic Learning system. Several new features are proposed and calculated in each of these two categories/dimensions. This approach applies an extra level of machine learning that analyses the data based on a set of dimensions, and each dimensions includes a set of features. The prediction analysis is applied on each dimension separately based on a different classifiers. The best fitting classifier to each dimension ensures the enhancement of the local analysis accuracy and though enhances overall global accuracy. The accuracy of prediction of the student is enhanced to 87%. This study allows the detection of the correlation the features in different dimension. Furthermore, a study is applied on the scanned text documents for extracting and utilizing the features that r...
International Journal of Interactive Mobile Technologies (iJIM), 2017
In this work, we present a framework to integrate the functionalities of the Enterprise Resource ... more In this work, we present a framework to integrate the functionalities of the Enterprise Resource Planning (ERP) with the University academic and management functionalities at the British University in Egypt , in order to produce a University Resource Planning (URP) that facilitates the integration between educational and management processes within higher educational institutions.Our ERP systems should be enabled to:Automate admissions. Eliminate manual processes and save significant staff time by enabling prospective students to apply online through a self-service portal. Provide one-stop student access. Allow students to enroll, register, and pay for courses through the portal.Simplify records management. With a single system for all your data needs—and a single digital record for each student—any department on campus can find the student information they need.Engage faculty. Give faculty the means to enter and update grades, and have personalized access to timely, accurate, and i...
International Journal of Recent Contributions from Engineering, Science & IT (iJES), 2016
Recently, the corporate social performance (CSP) is not less important than the corporate financi... more Recently, the corporate social performance (CSP) is not less important than the corporate financial performance (CFP). Debate still exists about the nature of the relationship between the CSP and CFP, whether it is a positive, negative or a neutral correlation. The objective of this study is to explore the relationship between corporate social responsibility (CSR) reports and CFP. The study uses the accounting-based and market-based quantitative measures to quantify the financial performance of seven organizations listed on the Egyptian Stock Exchange in 2007-2014. Then uses the information retrieval technologies to quantify the contribution of each of the three dimensions of the corporate social responsibility report (environmental, social and economic). Finally, the correlation between these two sets of variables is viewed together in a model to detect the correlations between them. This model is applied on seven firms that generate social responsibility reports. The results show ...
MATEC Web of Conferences, 2016
Real life problems handled by machine learning deals with various forms of values in the data set... more Real life problems handled by machine learning deals with various forms of values in the data set attributes, like the continuous and discrete form. Discretization is an important step in the pre-processing stage as most of the attribute selection techniques assume the discreetness of the input values. This step could change the internal structure of the input attribute values with respect to the classification problem, and thus the quality of this step directly impact the quality of the selected features. This work discusses the problems existing in the current discretization techniques and proposes an attribute evaluation and selection technique to avoid these problems. Attributes are evaluated in its continuous form directly without biasing its internal structure and enhances the computational complexity by eliminating the discretization step. The basic insight of the proposed approach relies on the inverse relationship between class label distribution overlap and the relative information content of a given attribute. In order to estimate the validity of this assumption, a series of data sets were examined using several standard approaches including our own implementation, and the approaches ranked with respect to the overall classification accuracy. The results, at least with respect to the testing data sets deployed in this study, indicate that the proposed approach outperformed other methods selected for evaluation in this study. These results will be examined over a wider range of continuous attribute data sets from nonmedical domains in order to investigate the robustness of these results.
EURASIP Journal on Bioinformatics and Systems Biology, 2016
Viral evolution remains to be a main obstacle in the effectiveness of antiviral treatments. The a... more Viral evolution remains to be a main obstacle in the effectiveness of antiviral treatments. The ability to predict this evolution will help in the early detection of drug-resistant strains and will potentially facilitate the design of more efficient antiviral treatments. Various tools has been utilized in genome studies to achieve this goal. One of these tools is machine learning, which facilitates the study of structure-activity relationships, secondary and tertiary structure evolution prediction, and sequence error correction. This work proposes a novel machine learning technique for the prediction of the possible point mutations that appear on alignments of primary RNA sequence structure. It predicts the genotype of each nucleotide in the RNA sequence, and proves that a nucleotide in an RNA sequence changes based on the other nucleotides in the sequence. Neural networks technique is utilized in order to predict new strains, then a rough set theory based algorithm is introduced to extract these point mutation patterns. This algorithm is applied on a number of aligned RNA isolates time-series species of the Newcastle virus. Two different data sets from two sources are used in the validation of these techniques. The results show that the accuracy of this technique in predicting the nucleotides in the new generation is as high as 75 %. The mutation rules are visualized for the analysis of the correlation between different nucleotides in the same RNA sequence.
Procedia Computer Science, 2015
Data sets dealing with the same medical problems like Coronary artery disease (CAD) may show diff... more Data sets dealing with the same medical problems like Coronary artery disease (CAD) may show different results when applying the same machine learning technique. The classification accuracy results and the selected important features are based mainly on the efficiency of the medical diagnosis and analysis. The aim of this work is to apply an integration of the results of the machine learning analysis applied on different data sets targeting the CAD disease. This will avoid the missing, incorrect, and inconsistent data problems that may appear in the data collection. Fast decision tree and pruned C4.5 tree are applied where the resulted trees are extracted from different data sets and compared. Common features among these data sets are extracted and used in the later analysis for the same disease in any data set. The results show that the classification accuracy of the collected dataset is 78.06% higher than the average of the classification accuracy of all separate datasets which is 75.48%. c
International Journal of Service Science, Management, Engineering, and Technology, 2014
Euclidian calculations represent a cornerstone in many machine learning techniques such as the Fu... more Euclidian calculations represent a cornerstone in many machine learning techniques such as the Fuzzy C-Means (FCM) and Support Vector Machine (SVM) techniques. The FCM technique calculates the Euclidian distance between different data points, and the SVM technique calculates the dot product of two points in the Euclidian space. These calculations do not consider the degree of relevance of the selected features to the target class labels. This paper proposed a modification in the Euclidian space calculation for the FCM and SVM techniques based on the ranking of features extracted from evaluating the features. The authors consider the ranking as a membership value of this feature in Fuzzification of Euclidian calculations rather than using the crisp concept of feature selection, which selects some features and ignores others. Experimental results proved that applying the fuzzy value of memberships to Euclidian calculations in the FCM and SVM techniques has better accuracy than the ord...
Intrusion detection systems have been around for quite some time, to protect systems from inside ... more Intrusion detection systems have been around for quite some time, to protect systems from inside ad outside threats. Researchers and scientists are concerned on how to enhance the intrusion detection performance, to be able to deal with real-time attacks and detect them fast from quick response. One way to improve performance is to use minimal number of features to define a model in a way that it can be used to accurately discriminate normal from anomalous behaviour. Many feature selection techniques are out there to reduce feature sets or extract new features out of them. In this paper, we propose an anomaly detectors generation approach using genetic algorithm in conjunction with several features selection techniques, including principle components analysis, sequential floating, and correlation-based feature selection. A Genetic algorithm was applied with deterministic crowding niching technique, to generate a set of detectors from a single run. The results show that sequential-floating techniques with the genetic algorithm have the best results, compared to others tested, especially the sequential floating forward selection with detection accuracy 92.86% on the train set and 85.38% on the test set.
Representation and visualization of continuous data using the Formal Concept Analysis (FCA) becam... more Representation and visualization of continuous data using the Formal Concept Analysis (FCA) became an important requirement in real-life fields. Application of formal concept analysis (FCA) model on numerical data, a scaling or Discretization / binarization procedures should be applied as preprocessing stage. The Scaling procedure increases the complexity of computation of the FCA, while the binarization process leads to a distortion in the internal structure of the input data set. The proposed approach uses a binarization procedure prior to applying FCA model, and then applies a validation process to the generated lattice to measure or ensure its degree of accuracy. The introduced approach is based on the evaluation of each attribute according to the objects of its extent set. To prove the validity of the introduced approach, the technique is applied on two data sets in the medical field which are the Indian Diabetes and the Breast Cancer data sets. Both data sets show the generation of a valid lattice.
Communications in Computer and Information Science, 2012
The risk of hepatitis-C virus is considered as a challenge in the field of medicine. Applying fea... more The risk of hepatitis-C virus is considered as a challenge in the field of medicine. Applying feature reduction technique and generating rules based on the selected features were considered as an important step in data mining. It is needed by medical experts to analyze the generated rules to find out if these rules are important in real life cases. This paper presents an application of a rough set analysis to discover the dependency between the attributes, and to generate a set of reducts consisting of a minimal number of attributes. The experimental results obtained, show that the overall accuracy offered by the rough sets is high.
Computational Social Networks, 2012
The continuous self-growing nature of social networks makes it hard to define a line of safety ar... more The continuous self-growing nature of social networks makes it hard to define a line of safety around these networks. Users in social networks are not interacting with the Web only but also with trusted groups that may also contain enemies. There are different kinds of attacks on these networks including causing damage to the computer systems and stealing information about users. These attacks are not only affecting individuals but also the organizations they are belonging to. Protection from these attacks should be performed by the users and security experts of the network. Advices should be provided to users of these social networks. Also security experts should be sure that the contents transmitted through the network do not contain malicious or harmful data. This chapter presents an overview of the social networks security and privacy issues and illustrates the various security risks and the tasks applied to minimize those risks. In addition, this chapter explains some of the common strategies that attackers often use and some possible counter measures against such issues.
Computational Social Networks, 2012
Computational social science is a new emerging field that has overlapping regions from mathematic... more Computational social science is a new emerging field that has overlapping regions from mathematics, psychology, computer sciences, sociology, and management. Social computing is concerned with the intersection of social behavior and computational systems. It supports any sort of social behavior in or through computational systems. It is based on creating or recreating social conventions and social contexts through the use of software and technology. Thus, blogs, email, instant messaging, social network services, wikis, social bookmarking, and other instances of what is often called social software illustrate ideas from social computing. Social network analysis is the study of relationships among social entities. It is becoming an important tool for investigators. However all the necessary information is often distributed over a number of websites. Interest in this field is blossoming as traditional practitioners in the social and behavioral sciences are being joined by researchers from statistics, graph theory, machine learning,
2009 International Conference on Information Management and Engineering, 2009
Processing huge amount of collected network data to identify network intrusions needs high comput... more Processing huge amount of collected network data to identify network intrusions needs high computational cost. Reducing features in the collected data may therefore solve the problem. We proposed an approach for obtaining optimal number of features to build an efficient model for intrusion detection system (IDS). Two feature selection algorithms were involved to generate two feature sets. These two features
ELECTROPHORESIS, 2006
Entamoeba histolytica is a pathogenic protozoan parasite, which causes amoebic colitis, dysentery... more Entamoeba histolytica is a pathogenic protozoan parasite, which causes amoebic colitis, dysentery and liver abscesses in humans. Since the cyst and small trophozoite stages of this parasite are indistinguishable by light microscopy from Entamoeba dispar (which is nonpathogenic), specific diagnosis is compromised. To overcome this limitation, a PCR-coupled SSCP approach, utilising a sequence difference of 4.6% in a short region ( approximately 173-174 bp) of the small subunit of nuclear ribosomal DNA, was evaluated for the differentiation of the two species of Entamoeba. Including a range of well-defined control DNA samples (n = 67) to evaluate the specificity of the PCR, 45 DNA samples representing E. histolytica and E. dispar from human faecal samples were tested by SSCP, and unequivocal delineation between the species was achieved. This SSCP approach should provide a practical tool for the specific diagnosis of E. histolytica in humans and for investigating its epidemiology.
Critical Care Medicine, 2006