Lyamine Hedjazi | University Pierre and Marie Curie (original) (raw)

Uploads

Papers by Lyamine Hedjazi

Research paper thumbnail of Membership-margin based feature selection for mixed type and high-dimensional data: Theory and applications

Information Sciences, 2015

ABSTRACT The present paper describes a new feature weighting method based on a membership margin.... more ABSTRACT The present paper describes a new feature weighting method based on a membership margin. Distinctive properties of the proposed method include its capability to process problems characterized by mixed-type data (quantitative, qualitative and interval) as well as a huge number of features. The key idea is to map simultaneously all the features of different types into a common space; the membership space. Once all features are represented in a homogeneous space, a feature weighting task can be performed in unified way. This weighting approach is integrated here within a fuzzy classifier through a fuzzy rule weighted concept in order to improve its performance. Each antecedent fuzzy set in the fuzzy if–then rule is weighted to characterize the importance of each proposition and therefore its corresponding feature. Weight estimation process is based on membership margin maximization to estimate a fuzzy weight of each feature in the membership space. Experiments on low and high dimensional real-world datasets demonstrate that the proposed approach can improve significantly the performance of the fuzzy rule-based as well as other state of the art classifiers and can even outperform classical feature weighting approaches. In particular, we show that this approach can yield meaningful results on two real-world applications for cancer prognosis and industrial process diagnosis.

Research paper thumbnail of Improved Breast Cancer Prognosis based on a Hybrid Marker Selection Approach

Clinical factors, such as patient age and histo-pathological state, are still the basis of day-to... more Clinical factors, such as patient age and histo-pathological state, are still the basis of day-to-day decision for cancer management. However, with the high throughput technology, gene expression profiling and proteomic sequences have known recently a widespread use for cancer and other diseases management. We aim through this work to assess the importance of using both types of data to improve the breast cancer prognosis. Nevertheless, two challenges are faced for the integration of both types of information: high-dimensionality and heterogeneity of data. The first challenge is due to the presence of a large amount of irrelevant genes in microarray data whereas the second is related to the presence of mixed-type data (quantitative, qualitative and interval) in the clinical data. In this paper, an efficient fuzzy feature selection algorithm is used to alleviate simultaneously both challenges. The obtained results prove the effectiveness of the proposed approach.

Research paper thumbnail of Prognosis of Breast Cancer based on a Fuzzy Classification Method

Research paper thumbnail of R80 - Oral: Élaboration de signatures de cancers par apprentissage de données issues de puces ADN

Research paper thumbnail of mQTL.NMR: An Integrated Suite for Genetic Mapping of Quantitative Variations of (1)H NMR-Based Metabolic Profiles

Analytical chemistry, Jan 2, 2015

High-throughput (1)H nuclear magnetic resonance (NMR) is an increasingly popular robust approach ... more High-throughput (1)H nuclear magnetic resonance (NMR) is an increasingly popular robust approach for qualitative and quantitative metabolic profiling, which can be used in conjunction with genomic techniques to discover novel genetic associations through metabotype quantitative trait locus (mQTL) mapping. There is therefore a crucial necessity to develop specialized tools for an accurate detection and unbiased interpretability of the genetically determined metabolic signals. Here we introduce and implement a combined chemoinformatic approach for objective and systematic analysis of untargeted (1)H NMR-based metabolic profiles in quantitative genetic contexts. The R/Bioconductor mQTL.NMR package was designed to (i) perform a series of preprocessing steps restoring spectral dependency in collinear NMR data sets to reduce the multiple testing burden, (ii) carry out robust and accurate mQTL mapping in human cohorts as well as in rodent models, (iii) statistically enhance structural assi...

Research paper thumbnail of From Chemical Process Diagnosis to Cancer Prognosis

Computer Aided Chemical Engineering, 2011

Classification techniques have shown recently their usefulness for complex process diagnosis. Bes... more Classification techniques have shown recently their usefulness for complex process diagnosis. Besides the fact that no physical model for the process is required, they enable to study the problem of sensor location. Preliminary studies made previously in the domain of chemical process diagnosis have been the initial key point to extend its application to the medical diagnosis framework. Despite the

Research paper thumbnail of Sensor placement and fault detection using an efficient fuzzy feature selection approach

49th IEEE Conference on Decision and Control (CDC), 2010

Process monitoring and fault diagnosis are of great importance for operation safety and efficienc... more Process monitoring and fault diagnosis are of great importance for operation safety and efficiency of complex industrial plants. The present article proposes a novel methodology to address the sensor location problem for fault detection. Firstly, all the process situations are identified based on a fuzzy learning algorithm using measurements generated from the whole available set of sensors. Then, a fuzzy

Research paper thumbnail of Similarity-margin based feature selection for symbolic interval data

Pattern Recognition Letters, 2011

In this paper we propose a feature selection method for symbolic interval data based on similarit... more In this paper we propose a feature selection method for symbolic interval data based on similarity margin. In this method, classes are parameterized by an interval prototype based on an appropriate learning process. A similarity measure is defined in order to estimate the similarity between the interval feature value and each class prototype. Then, a similarity margin concept has been introduced. The heuristic search is avoided by optimizing an objective function to evaluate the importance (weight) of each interval feature in a similarity margin framework. The experimental results show that the proposed method selects meaningful features for interval data. In particular, the method we propose yields a significant improvement on classification task of three real-world datasets.

Research paper thumbnail of Symbolic Data Analysis to Defy Low Signal-to-Noise Ratio in Microarray Data for Breast Cancer Prognosis

Journal of Computational Biology, 2013

Microarray profiling has brought recently the hope to gain new insights into breast cancer biolog... more Microarray profiling has brought recently the hope to gain new insights into breast cancer biology and thereby improve the performance of current prognostic tools. However, it also poses several serious challenges to classical data analysis techniques related to the characteristics of resulted data, mainly high-dimensionality and low signal-to-noise ratio.

Research paper thumbnail of TOWARDS A UNIFIED PRINCIPLE FOR REASONING ABOUT HETEROGENEOUS DATA: A FUZZY LOGIC FRAMEWORK

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2012

ABSTRACT Human knowledge about monitoring process variables is usually incomplete. To deal with t... more ABSTRACT Human knowledge about monitoring process variables is usually incomplete. To deal with this partial knowledge many types of representation other than the quantitative one are used to describe process variables (qualitative, symbolic interval). Thus, the development of automatic reasoning mechanisms about the process is faced with this problem of multiple data representations. In this paper, a unified principle for reasoning about heterogeneous data is introduced. This principle is based on a simultaneous mapping of data from initially heterogeneous spaces into only one homogeneous space based on a relative measure using appropriate characteristic functions. Once the heterogeneous data are represented in a unified space, a single processing for various analysis purposes can be performed using simple reasoning mechanisms. An application of this principle within a fuzzy logic framework is performed here to demonstrate its effectiveness. We show that simple fuzzy reasoning mechanisms can be used to reason in a unified way about heterogeneous data in three well known machine learning problems.

Research paper thumbnail of Fuzzy logic selection as a new reliable tool to identify molecular grade signatures in breast cancer – the INNODIAG study

BMC Medical Genomics, 2015

Background: Personalized medicine has become a priority in breast cancer patient management. In a... more Background: Personalized medicine has become a priority in breast cancer patient management. In addition to the routinely used clinicopathological characteristics, clinicians will have to face an increasing amount of data derived from tumor molecular profiling. The aims of this study were to develop a new gene selection method based on a fuzzy logic selection and classification algorithm, and to validate the gene signatures obtained on breast cancer patient cohorts. Methods: We analyzed data from four published gene expression datasets for breast carcinomas. We identified the best discriminating genes by comparing molecular expression profiles between histologic grade 1 and 3 tumors for each of the training datasets. The most pertinent probes were selected and used to define fuzzy molecular grade 1-like (good prognosis) and fuzzy molecular grade 3-like (poor prognosis) profiles. To evaluate the prognostic performance of the fuzzy grade signatures in breast cancer tumors, a Kaplan-Meier analysis was conducted to compare the relapse-free survival deduced from histologic grade and fuzzy molecular grade classification. Results: We applied the fuzzy logic selection on breast cancer databases and obtained four new gene signatures. Analysis in the training public sets showed good performance of these gene signatures for grade (sensitivity from 90% to 95%, specificity 67% to 93%). To validate these gene signatures, we designed probes on custom microarrays and tested them on 150 invasive breast carcinomas. Good performance was obtained with an error rate of less than 10%. For one gene signature, among 74 histologic grade 3 and 18 grade 1 tumors, 88 cases (96%) were correctly assigned. Interestingly histologic grade 2 tumors (n = 58) were split in these two molecular grade categories. Conclusion: We confirmed the use of fuzzy logic selection as a new tool to identify gene signatures with good reliability and increased classification power. This method based on artificial intelligence algorithms was successfully applied to breast cancers molecular grade classification allowing histologic grade 2 classification into grade 1 and grade 2 like to improve patients prognosis. It opens the way to further development for identification of new biomarker combinations in other applications such as prediction of treatment response.

Research paper thumbnail of Membership-margin based feature selection for mixed type and high-dimensional data: Theory and applications

Information Sciences, 2015

ABSTRACT The present paper describes a new feature weighting method based on a membership margin.... more ABSTRACT The present paper describes a new feature weighting method based on a membership margin. Distinctive properties of the proposed method include its capability to process problems characterized by mixed-type data (quantitative, qualitative and interval) as well as a huge number of features. The key idea is to map simultaneously all the features of different types into a common space; the membership space. Once all features are represented in a homogeneous space, a feature weighting task can be performed in unified way. This weighting approach is integrated here within a fuzzy classifier through a fuzzy rule weighted concept in order to improve its performance. Each antecedent fuzzy set in the fuzzy if–then rule is weighted to characterize the importance of each proposition and therefore its corresponding feature. Weight estimation process is based on membership margin maximization to estimate a fuzzy weight of each feature in the membership space. Experiments on low and high dimensional real-world datasets demonstrate that the proposed approach can improve significantly the performance of the fuzzy rule-based as well as other state of the art classifiers and can even outperform classical feature weighting approaches. In particular, we show that this approach can yield meaningful results on two real-world applications for cancer prognosis and industrial process diagnosis.

Research paper thumbnail of Improved Breast Cancer Prognosis based on a Hybrid Marker Selection Approach

Clinical factors, such as patient age and histo-pathological state, are still the basis of day-to... more Clinical factors, such as patient age and histo-pathological state, are still the basis of day-to-day decision for cancer management. However, with the high throughput technology, gene expression profiling and proteomic sequences have known recently a widespread use for cancer and other diseases management. We aim through this work to assess the importance of using both types of data to improve the breast cancer prognosis. Nevertheless, two challenges are faced for the integration of both types of information: high-dimensionality and heterogeneity of data. The first challenge is due to the presence of a large amount of irrelevant genes in microarray data whereas the second is related to the presence of mixed-type data (quantitative, qualitative and interval) in the clinical data. In this paper, an efficient fuzzy feature selection algorithm is used to alleviate simultaneously both challenges. The obtained results prove the effectiveness of the proposed approach.

Research paper thumbnail of Prognosis of Breast Cancer based on a Fuzzy Classification Method

Research paper thumbnail of R80 - Oral: Élaboration de signatures de cancers par apprentissage de données issues de puces ADN

Research paper thumbnail of mQTL.NMR: An Integrated Suite for Genetic Mapping of Quantitative Variations of (1)H NMR-Based Metabolic Profiles

Analytical chemistry, Jan 2, 2015

High-throughput (1)H nuclear magnetic resonance (NMR) is an increasingly popular robust approach ... more High-throughput (1)H nuclear magnetic resonance (NMR) is an increasingly popular robust approach for qualitative and quantitative metabolic profiling, which can be used in conjunction with genomic techniques to discover novel genetic associations through metabotype quantitative trait locus (mQTL) mapping. There is therefore a crucial necessity to develop specialized tools for an accurate detection and unbiased interpretability of the genetically determined metabolic signals. Here we introduce and implement a combined chemoinformatic approach for objective and systematic analysis of untargeted (1)H NMR-based metabolic profiles in quantitative genetic contexts. The R/Bioconductor mQTL.NMR package was designed to (i) perform a series of preprocessing steps restoring spectral dependency in collinear NMR data sets to reduce the multiple testing burden, (ii) carry out robust and accurate mQTL mapping in human cohorts as well as in rodent models, (iii) statistically enhance structural assi...

Research paper thumbnail of From Chemical Process Diagnosis to Cancer Prognosis

Computer Aided Chemical Engineering, 2011

Classification techniques have shown recently their usefulness for complex process diagnosis. Bes... more Classification techniques have shown recently their usefulness for complex process diagnosis. Besides the fact that no physical model for the process is required, they enable to study the problem of sensor location. Preliminary studies made previously in the domain of chemical process diagnosis have been the initial key point to extend its application to the medical diagnosis framework. Despite the

Research paper thumbnail of Sensor placement and fault detection using an efficient fuzzy feature selection approach

49th IEEE Conference on Decision and Control (CDC), 2010

Process monitoring and fault diagnosis are of great importance for operation safety and efficienc... more Process monitoring and fault diagnosis are of great importance for operation safety and efficiency of complex industrial plants. The present article proposes a novel methodology to address the sensor location problem for fault detection. Firstly, all the process situations are identified based on a fuzzy learning algorithm using measurements generated from the whole available set of sensors. Then, a fuzzy

Research paper thumbnail of Similarity-margin based feature selection for symbolic interval data

Pattern Recognition Letters, 2011

In this paper we propose a feature selection method for symbolic interval data based on similarit... more In this paper we propose a feature selection method for symbolic interval data based on similarity margin. In this method, classes are parameterized by an interval prototype based on an appropriate learning process. A similarity measure is defined in order to estimate the similarity between the interval feature value and each class prototype. Then, a similarity margin concept has been introduced. The heuristic search is avoided by optimizing an objective function to evaluate the importance (weight) of each interval feature in a similarity margin framework. The experimental results show that the proposed method selects meaningful features for interval data. In particular, the method we propose yields a significant improvement on classification task of three real-world datasets.

Research paper thumbnail of Symbolic Data Analysis to Defy Low Signal-to-Noise Ratio in Microarray Data for Breast Cancer Prognosis

Journal of Computational Biology, 2013

Microarray profiling has brought recently the hope to gain new insights into breast cancer biolog... more Microarray profiling has brought recently the hope to gain new insights into breast cancer biology and thereby improve the performance of current prognostic tools. However, it also poses several serious challenges to classical data analysis techniques related to the characteristics of resulted data, mainly high-dimensionality and low signal-to-noise ratio.

Research paper thumbnail of TOWARDS A UNIFIED PRINCIPLE FOR REASONING ABOUT HETEROGENEOUS DATA: A FUZZY LOGIC FRAMEWORK

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2012

ABSTRACT Human knowledge about monitoring process variables is usually incomplete. To deal with t... more ABSTRACT Human knowledge about monitoring process variables is usually incomplete. To deal with this partial knowledge many types of representation other than the quantitative one are used to describe process variables (qualitative, symbolic interval). Thus, the development of automatic reasoning mechanisms about the process is faced with this problem of multiple data representations. In this paper, a unified principle for reasoning about heterogeneous data is introduced. This principle is based on a simultaneous mapping of data from initially heterogeneous spaces into only one homogeneous space based on a relative measure using appropriate characteristic functions. Once the heterogeneous data are represented in a unified space, a single processing for various analysis purposes can be performed using simple reasoning mechanisms. An application of this principle within a fuzzy logic framework is performed here to demonstrate its effectiveness. We show that simple fuzzy reasoning mechanisms can be used to reason in a unified way about heterogeneous data in three well known machine learning problems.

Research paper thumbnail of Fuzzy logic selection as a new reliable tool to identify molecular grade signatures in breast cancer – the INNODIAG study

BMC Medical Genomics, 2015

Background: Personalized medicine has become a priority in breast cancer patient management. In a... more Background: Personalized medicine has become a priority in breast cancer patient management. In addition to the routinely used clinicopathological characteristics, clinicians will have to face an increasing amount of data derived from tumor molecular profiling. The aims of this study were to develop a new gene selection method based on a fuzzy logic selection and classification algorithm, and to validate the gene signatures obtained on breast cancer patient cohorts. Methods: We analyzed data from four published gene expression datasets for breast carcinomas. We identified the best discriminating genes by comparing molecular expression profiles between histologic grade 1 and 3 tumors for each of the training datasets. The most pertinent probes were selected and used to define fuzzy molecular grade 1-like (good prognosis) and fuzzy molecular grade 3-like (poor prognosis) profiles. To evaluate the prognostic performance of the fuzzy grade signatures in breast cancer tumors, a Kaplan-Meier analysis was conducted to compare the relapse-free survival deduced from histologic grade and fuzzy molecular grade classification. Results: We applied the fuzzy logic selection on breast cancer databases and obtained four new gene signatures. Analysis in the training public sets showed good performance of these gene signatures for grade (sensitivity from 90% to 95%, specificity 67% to 93%). To validate these gene signatures, we designed probes on custom microarrays and tested them on 150 invasive breast carcinomas. Good performance was obtained with an error rate of less than 10%. For one gene signature, among 74 histologic grade 3 and 18 grade 1 tumors, 88 cases (96%) were correctly assigned. Interestingly histologic grade 2 tumors (n = 58) were split in these two molecular grade categories. Conclusion: We confirmed the use of fuzzy logic selection as a new tool to identify gene signatures with good reliability and increased classification power. This method based on artificial intelligence algorithms was successfully applied to breast cancers molecular grade classification allowing histologic grade 2 classification into grade 1 and grade 2 like to improve patients prognosis. It opens the way to further development for identification of new biomarker combinations in other applications such as prediction of treatment response.