Lyamine Hedjazi | University Pierre and Marie Curie (original) (raw)
Uploads
Papers by Lyamine Hedjazi
Information Sciences, 2015
ABSTRACT The present paper describes a new feature weighting method based on a membership margin.... more ABSTRACT The present paper describes a new feature weighting method based on a membership margin. Distinctive properties of the proposed method include its capability to process problems characterized by mixed-type data (quantitative, qualitative and interval) as well as a huge number of features. The key idea is to map simultaneously all the features of different types into a common space; the membership space. Once all features are represented in a homogeneous space, a feature weighting task can be performed in unified way. This weighting approach is integrated here within a fuzzy classifier through a fuzzy rule weighted concept in order to improve its performance. Each antecedent fuzzy set in the fuzzy if–then rule is weighted to characterize the importance of each proposition and therefore its corresponding feature. Weight estimation process is based on membership margin maximization to estimate a fuzzy weight of each feature in the membership space. Experiments on low and high dimensional real-world datasets demonstrate that the proposed approach can improve significantly the performance of the fuzzy rule-based as well as other state of the art classifiers and can even outperform classical feature weighting approaches. In particular, we show that this approach can yield meaningful results on two real-world applications for cancer prognosis and industrial process diagnosis.
Clinical factors, such as patient age and histo-pathological state, are still the basis of day-to... more Clinical factors, such as patient age and histo-pathological state, are still the basis of day-to-day decision for cancer management. However, with the high throughput technology, gene expression profiling and proteomic sequences have known recently a widespread use for cancer and other diseases management. We aim through this work to assess the importance of using both types of data to improve the breast cancer prognosis. Nevertheless, two challenges are faced for the integration of both types of information: high-dimensionality and heterogeneity of data. The first challenge is due to the presence of a large amount of irrelevant genes in microarray data whereas the second is related to the presence of mixed-type data (quantitative, qualitative and interval) in the clinical data. In this paper, an efficient fuzzy feature selection algorithm is used to alleviate simultaneously both challenges. The obtained results prove the effectiveness of the proposed approach.
Analytical chemistry, Jan 2, 2015
High-throughput (1)H nuclear magnetic resonance (NMR) is an increasingly popular robust approach ... more High-throughput (1)H nuclear magnetic resonance (NMR) is an increasingly popular robust approach for qualitative and quantitative metabolic profiling, which can be used in conjunction with genomic techniques to discover novel genetic associations through metabotype quantitative trait locus (mQTL) mapping. There is therefore a crucial necessity to develop specialized tools for an accurate detection and unbiased interpretability of the genetically determined metabolic signals. Here we introduce and implement a combined chemoinformatic approach for objective and systematic analysis of untargeted (1)H NMR-based metabolic profiles in quantitative genetic contexts. The R/Bioconductor mQTL.NMR package was designed to (i) perform a series of preprocessing steps restoring spectral dependency in collinear NMR data sets to reduce the multiple testing burden, (ii) carry out robust and accurate mQTL mapping in human cohorts as well as in rodent models, (iii) statistically enhance structural assi...
Computer Aided Chemical Engineering, 2011
Classification techniques have shown recently their usefulness for complex process diagnosis. Bes... more Classification techniques have shown recently their usefulness for complex process diagnosis. Besides the fact that no physical model for the process is required, they enable to study the problem of sensor location. Preliminary studies made previously in the domain of chemical process diagnosis have been the initial key point to extend its application to the medical diagnosis framework. Despite the
49th IEEE Conference on Decision and Control (CDC), 2010
Process monitoring and fault diagnosis are of great importance for operation safety and efficienc... more Process monitoring and fault diagnosis are of great importance for operation safety and efficiency of complex industrial plants. The present article proposes a novel methodology to address the sensor location problem for fault detection. Firstly, all the process situations are identified based on a fuzzy learning algorithm using measurements generated from the whole available set of sensors. Then, a fuzzy
Pattern Recognition Letters, 2011
In this paper we propose a feature selection method for symbolic interval data based on similarit... more In this paper we propose a feature selection method for symbolic interval data based on similarity margin. In this method, classes are parameterized by an interval prototype based on an appropriate learning process. A similarity measure is defined in order to estimate the similarity between the interval feature value and each class prototype. Then, a similarity margin concept has been introduced. The heuristic search is avoided by optimizing an objective function to evaluate the importance (weight) of each interval feature in a similarity margin framework. The experimental results show that the proposed method selects meaningful features for interval data. In particular, the method we propose yields a significant improvement on classification task of three real-world datasets.
Journal of Computational Biology, 2013
Microarray profiling has brought recently the hope to gain new insights into breast cancer biolog... more Microarray profiling has brought recently the hope to gain new insights into breast cancer biology and thereby improve the performance of current prognostic tools. However, it also poses several serious challenges to classical data analysis techniques related to the characteristics of resulted data, mainly high-dimensionality and low signal-to-noise ratio.
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2012
ABSTRACT Human knowledge about monitoring process variables is usually incomplete. To deal with t... more ABSTRACT Human knowledge about monitoring process variables is usually incomplete. To deal with this partial knowledge many types of representation other than the quantitative one are used to describe process variables (qualitative, symbolic interval). Thus, the development of automatic reasoning mechanisms about the process is faced with this problem of multiple data representations. In this paper, a unified principle for reasoning about heterogeneous data is introduced. This principle is based on a simultaneous mapping of data from initially heterogeneous spaces into only one homogeneous space based on a relative measure using appropriate characteristic functions. Once the heterogeneous data are represented in a unified space, a single processing for various analysis purposes can be performed using simple reasoning mechanisms. An application of this principle within a fuzzy logic framework is performed here to demonstrate its effectiveness. We show that simple fuzzy reasoning mechanisms can be used to reason in a unified way about heterogeneous data in three well known machine learning problems.
BMC Medical Genomics, 2015
Background: Personalized medicine has become a priority in breast cancer patient management. In a... more Background: Personalized medicine has become a priority in breast cancer patient management. In addition to the routinely used clinicopathological characteristics, clinicians will have to face an increasing amount of data derived from tumor molecular profiling. The aims of this study were to develop a new gene selection method based on a fuzzy logic selection and classification algorithm, and to validate the gene signatures obtained on breast cancer patient cohorts. Methods: We analyzed data from four published gene expression datasets for breast carcinomas. We identified the best discriminating genes by comparing molecular expression profiles between histologic grade 1 and 3 tumors for each of the training datasets. The most pertinent probes were selected and used to define fuzzy molecular grade 1-like (good prognosis) and fuzzy molecular grade 3-like (poor prognosis) profiles. To evaluate the prognostic performance of the fuzzy grade signatures in breast cancer tumors, a Kaplan-Meier analysis was conducted to compare the relapse-free survival deduced from histologic grade and fuzzy molecular grade classification. Results: We applied the fuzzy logic selection on breast cancer databases and obtained four new gene signatures. Analysis in the training public sets showed good performance of these gene signatures for grade (sensitivity from 90% to 95%, specificity 67% to 93%). To validate these gene signatures, we designed probes on custom microarrays and tested them on 150 invasive breast carcinomas. Good performance was obtained with an error rate of less than 10%. For one gene signature, among 74 histologic grade 3 and 18 grade 1 tumors, 88 cases (96%) were correctly assigned. Interestingly histologic grade 2 tumors (n = 58) were split in these two molecular grade categories. Conclusion: We confirmed the use of fuzzy logic selection as a new tool to identify gene signatures with good reliability and increased classification power. This method based on artificial intelligence algorithms was successfully applied to breast cancers molecular grade classification allowing histologic grade 2 classification into grade 1 and grade 2 like to improve patients prognosis. It opens the way to further development for identification of new biomarker combinations in other applications such as prediction of treatment response.
Information Sciences, 2015
ABSTRACT The present paper describes a new feature weighting method based on a membership margin.... more ABSTRACT The present paper describes a new feature weighting method based on a membership margin. Distinctive properties of the proposed method include its capability to process problems characterized by mixed-type data (quantitative, qualitative and interval) as well as a huge number of features. The key idea is to map simultaneously all the features of different types into a common space; the membership space. Once all features are represented in a homogeneous space, a feature weighting task can be performed in unified way. This weighting approach is integrated here within a fuzzy classifier through a fuzzy rule weighted concept in order to improve its performance. Each antecedent fuzzy set in the fuzzy if–then rule is weighted to characterize the importance of each proposition and therefore its corresponding feature. Weight estimation process is based on membership margin maximization to estimate a fuzzy weight of each feature in the membership space. Experiments on low and high dimensional real-world datasets demonstrate that the proposed approach can improve significantly the performance of the fuzzy rule-based as well as other state of the art classifiers and can even outperform classical feature weighting approaches. In particular, we show that this approach can yield meaningful results on two real-world applications for cancer prognosis and industrial process diagnosis.
Clinical factors, such as patient age and histo-pathological state, are still the basis of day-to... more Clinical factors, such as patient age and histo-pathological state, are still the basis of day-to-day decision for cancer management. However, with the high throughput technology, gene expression profiling and proteomic sequences have known recently a widespread use for cancer and other diseases management. We aim through this work to assess the importance of using both types of data to improve the breast cancer prognosis. Nevertheless, two challenges are faced for the integration of both types of information: high-dimensionality and heterogeneity of data. The first challenge is due to the presence of a large amount of irrelevant genes in microarray data whereas the second is related to the presence of mixed-type data (quantitative, qualitative and interval) in the clinical data. In this paper, an efficient fuzzy feature selection algorithm is used to alleviate simultaneously both challenges. The obtained results prove the effectiveness of the proposed approach.
Analytical chemistry, Jan 2, 2015
High-throughput (1)H nuclear magnetic resonance (NMR) is an increasingly popular robust approach ... more High-throughput (1)H nuclear magnetic resonance (NMR) is an increasingly popular robust approach for qualitative and quantitative metabolic profiling, which can be used in conjunction with genomic techniques to discover novel genetic associations through metabotype quantitative trait locus (mQTL) mapping. There is therefore a crucial necessity to develop specialized tools for an accurate detection and unbiased interpretability of the genetically determined metabolic signals. Here we introduce and implement a combined chemoinformatic approach for objective and systematic analysis of untargeted (1)H NMR-based metabolic profiles in quantitative genetic contexts. The R/Bioconductor mQTL.NMR package was designed to (i) perform a series of preprocessing steps restoring spectral dependency in collinear NMR data sets to reduce the multiple testing burden, (ii) carry out robust and accurate mQTL mapping in human cohorts as well as in rodent models, (iii) statistically enhance structural assi...
Computer Aided Chemical Engineering, 2011
Classification techniques have shown recently their usefulness for complex process diagnosis. Bes... more Classification techniques have shown recently their usefulness for complex process diagnosis. Besides the fact that no physical model for the process is required, they enable to study the problem of sensor location. Preliminary studies made previously in the domain of chemical process diagnosis have been the initial key point to extend its application to the medical diagnosis framework. Despite the
49th IEEE Conference on Decision and Control (CDC), 2010
Process monitoring and fault diagnosis are of great importance for operation safety and efficienc... more Process monitoring and fault diagnosis are of great importance for operation safety and efficiency of complex industrial plants. The present article proposes a novel methodology to address the sensor location problem for fault detection. Firstly, all the process situations are identified based on a fuzzy learning algorithm using measurements generated from the whole available set of sensors. Then, a fuzzy
Pattern Recognition Letters, 2011
In this paper we propose a feature selection method for symbolic interval data based on similarit... more In this paper we propose a feature selection method for symbolic interval data based on similarity margin. In this method, classes are parameterized by an interval prototype based on an appropriate learning process. A similarity measure is defined in order to estimate the similarity between the interval feature value and each class prototype. Then, a similarity margin concept has been introduced. The heuristic search is avoided by optimizing an objective function to evaluate the importance (weight) of each interval feature in a similarity margin framework. The experimental results show that the proposed method selects meaningful features for interval data. In particular, the method we propose yields a significant improvement on classification task of three real-world datasets.
Journal of Computational Biology, 2013
Microarray profiling has brought recently the hope to gain new insights into breast cancer biolog... more Microarray profiling has brought recently the hope to gain new insights into breast cancer biology and thereby improve the performance of current prognostic tools. However, it also poses several serious challenges to classical data analysis techniques related to the characteristics of resulted data, mainly high-dimensionality and low signal-to-noise ratio.
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2012
ABSTRACT Human knowledge about monitoring process variables is usually incomplete. To deal with t... more ABSTRACT Human knowledge about monitoring process variables is usually incomplete. To deal with this partial knowledge many types of representation other than the quantitative one are used to describe process variables (qualitative, symbolic interval). Thus, the development of automatic reasoning mechanisms about the process is faced with this problem of multiple data representations. In this paper, a unified principle for reasoning about heterogeneous data is introduced. This principle is based on a simultaneous mapping of data from initially heterogeneous spaces into only one homogeneous space based on a relative measure using appropriate characteristic functions. Once the heterogeneous data are represented in a unified space, a single processing for various analysis purposes can be performed using simple reasoning mechanisms. An application of this principle within a fuzzy logic framework is performed here to demonstrate its effectiveness. We show that simple fuzzy reasoning mechanisms can be used to reason in a unified way about heterogeneous data in three well known machine learning problems.
BMC Medical Genomics, 2015
Background: Personalized medicine has become a priority in breast cancer patient management. In a... more Background: Personalized medicine has become a priority in breast cancer patient management. In addition to the routinely used clinicopathological characteristics, clinicians will have to face an increasing amount of data derived from tumor molecular profiling. The aims of this study were to develop a new gene selection method based on a fuzzy logic selection and classification algorithm, and to validate the gene signatures obtained on breast cancer patient cohorts. Methods: We analyzed data from four published gene expression datasets for breast carcinomas. We identified the best discriminating genes by comparing molecular expression profiles between histologic grade 1 and 3 tumors for each of the training datasets. The most pertinent probes were selected and used to define fuzzy molecular grade 1-like (good prognosis) and fuzzy molecular grade 3-like (poor prognosis) profiles. To evaluate the prognostic performance of the fuzzy grade signatures in breast cancer tumors, a Kaplan-Meier analysis was conducted to compare the relapse-free survival deduced from histologic grade and fuzzy molecular grade classification. Results: We applied the fuzzy logic selection on breast cancer databases and obtained four new gene signatures. Analysis in the training public sets showed good performance of these gene signatures for grade (sensitivity from 90% to 95%, specificity 67% to 93%). To validate these gene signatures, we designed probes on custom microarrays and tested them on 150 invasive breast carcinomas. Good performance was obtained with an error rate of less than 10%. For one gene signature, among 74 histologic grade 3 and 18 grade 1 tumors, 88 cases (96%) were correctly assigned. Interestingly histologic grade 2 tumors (n = 58) were split in these two molecular grade categories. Conclusion: We confirmed the use of fuzzy logic selection as a new tool to identify gene signatures with good reliability and increased classification power. This method based on artificial intelligence algorithms was successfully applied to breast cancers molecular grade classification allowing histologic grade 2 classification into grade 1 and grade 2 like to improve patients prognosis. It opens the way to further development for identification of new biomarker combinations in other applications such as prediction of treatment response.