Mourad Elloumi - Academia.edu (original) (raw)
Papers by Mourad Elloumi
Journal of Software, 2018
Journal of Computer Science & Systems Biology, Jun 2, 2015
Alexandria Engineering Journal
Deep Learning for Biomedical Data Analysis, 2021
Chest x-rays images are the primary source to detect tuberculosis and different thoracic diseases... more Chest x-rays images are the primary source to detect tuberculosis and different thoracic diseases. Lung nodules also can be identified from the chest x-rays images. In the past decade, the concept of automatic disease detection from the datasets of chest x-rays images has gained importance and researchers have proposed a variety of techniques for tuberculosis screening, thoracic disease detection and lung nodule detection. With the availability of massive public datasets of chest x-rays images, Deep Learning (DL) techniques played a key role in lung disease detection. This book chapter tries to cover the DL techniques used to detect lung diseases from chest x-rays datasets. It contains the description of the public datasets of chest x-rays images available for thoracic disease detection, tuberculosis screening and lung nodule detection. It also lists most commonly used performance metrics for the evaluation of disease detection techniques.
hierarchical n-grams extraction approach for
World Academy of Science, Engineering and Technology, International Journal of Computer and Information Engineering, 2015
Tienda online donde Comprar Bioinformatics Research and Development · Second International Confer... more Tienda online donde Comprar Bioinformatics Research and Development · Second International Conference, BIRD 2008, Vienna, Austria, July 7-9, 2008 Proceedings al precio 91,77 € de Elloumi, Mourad | KA¼ng, Josef | Linial, Michal | Murphy, Robert | Schneider, Kristan | Toma, Cristian, tienda de Libros de Medicina, Libros de Biologia - Bioinformatica
AIMS Medical Science, 2017
The identification of the interactions of polymorphisms with other genetic or environmental facto... more The identification of the interactions of polymorphisms with other genetic or environmental factors for the detection of multifactorial diseases has now become both a challenge and an objective for geneticists. Unlike monogenic Mendelian diseases, the classical methods have not become too efficient for the identification of these interactions, especially with the exponential increase in the number of genetic interactions as well as the number of combinations of genotypes. Several methods have been proposed for the detection of susceptibility variants such as metaheuristics and statistical methods. Using metaheuristics, we focus on the feature selection of variables, and more precisely on the determination of the genes that increase the susceptibility to the disease, especially as these methods are more suitable for the description of complex data. Statistical methods are divided into two submethods including linkage studies and association studies. Generally these two methods are used one after the other since they are complementary. The linkage study is used initially because its objective is the localization of the chromosomal regions containing the gene(s) involved in the disease. Then, in a second step, the association study is set up to specify precisely the location of the gene. In this paper, we will present a survey of metaheuristics and statistical methods integrated in the field of human genetics and specifically multifactorial diseases in order to help genetics to find interaction between genes and environemental factor involved in those diseases.
A track to solve the problem of errors caused by the third generation of sequencing technology is... more A track to solve the problem of errors caused by the third generation of sequencing technology is to use the high coverage of the high quality of short reads generated by the second-generation sequencing technology. This paper presents a new approach for error correction and de novo assembly for long reads. We present MiRCA a hybrid approach based on the sequences alignments that detects and corrects errors for MinIon long reads using Illumina short reads. With this new error correction approach, we were able to make an effective and quick de novo assembly. Experiments on Saccharomyces cerevisiae and the Escherichia coli genomes show that MiRCA is much better than the available tools. MiRCA is tested on Linux platforms and freely available at https://github.com/Mkchouk/MiRCA.
Journal of Integrative Bioinformatics, 2021
Diseases can be tied to changes at the molecular level within affected cells. This can be concern... more Diseases can be tied to changes at the molecular level within affected cells. This can be concerning transcription, translation, or any other mechanism involved in gene expression, such as post-transcriptional regulation. Instrumentation for the measurement of such molecular changes is readily available and produces large amounts of data. For example, DNA and RNA sequencing, as well as protein quantitation, and sequencing can be achieved via next-generation sequencing and mass spectrometry, respectively. One current challenge is the analysis and integration of the resulting heterogeneous and large datasets. Bioinformatics is the field of study which produces algorithms and integrative approaches to attempt such data analyses. The primary aim in algorithmic bioinformatics is, however, the development of algorithms and not their application. Typically, novel algorithms are introduced with a proof of principle, and they are applied to some data for that purpose, but usually not comprehensively. Their data might slightly differ from the proof of principle, inducing further data analysis challenges. Additionally, applying such algorithms to their data may be involved for researchers from the biomedical domain. The 1st International Applied Bioinformatics Conference was conceived to bring together representatives from all research fields involved to increase knowledge transfer. First planned for 2020 and then deferred to 2021 due to the pandemic caused by the Coronavirus [1], the conference was held online. Despite the virtual nature of the conference, attention was great. We received many good manuscripts and invited a few to submit their full versions to this special issue. The range of topics was extensive, but many submissions concerned the interface of bioinformatics and its application. The selected papers for this special issue also discuss various topics such as sequence alignment and gene network reconstruction. The first paper in this special issue concerns a challenging issue in bioinformatics, the usage of pangenomes instead of single reference genomes and offers a fast variation-aware read mapping algorithm [2]. Mapping is also vital to investigate gene expression, which is essential for the second manuscript. It discusses how microRNA and mRNA expression profiles can be investigated [3]. From this, modular networks are inferred, describing post-transcriptional regulatory networks. Such networks are challenging to visualize, which is the focus of the third paper [4]. The work summarizes the state-of-the-art in bicluster visualization and is also based on gene expression data. Next, we move from transcriptomics to metabolomics. A disparity filter was applied to perform network analysis for colorectal cancer as a proof of principle [5]. The final two manuscripts focus more on practical application in cancer. First, the prostate, ovary, testes, and embryo
International Journal of Biomedical Data Mining, 2018
Biclustering algorithms have matured from their initial applications in bioinformatics, evolving ... more Biclustering algorithms have matured from their initial applications in bioinformatics, evolving towards different approaches and bicluster definitions, which makes sometimes hard for the analyst to determine which one of the available algorithms best fits her problem. As a way of benchmarking these algorithms, several quality measures have been proposed in literature. Such measures cover numerical aspects related to the accuracy, the recovery power or the capability of retrieving previous biomedical knowledge. However, biclustering apparently remains as an uncommon option for biomedicine analysis. Here we review the impact of biclustering algorithms in biomedicine and bioinformatics with the object of measuring and understanding non-numerical aspects of biclustering algorithms focusing on citation-based statistics that can be relevant for their application on the domain. In order to achieve this, we performed analyses of the citations impact of several clustering and biclustering algorithms, and propose a methodology that can cover this aspect of biclustering usage.
Journal of Parasitic Diseases, 2018
Cutaneous leishmaniasis (CL) is a major disease in many parts of the world. Since no vaccine has ... more Cutaneous leishmaniasis (CL) is a major disease in many parts of the world. Since no vaccine has been developed, treatment is the best way to control it. In most areas, antimonial resistance whose mechanisms have not been completely understood has been reported. The main aim of this study is gene expression assessing of J-binging protein 1 and J-binding protein 2 in clinical Leishmania major isolates. The patients with CL from central and north Iran were considered for this study. The samples were transferred in RNAlater solution and stored in-20°C. RNA extraction and cDNA synthesis were performed. The gene expression analysis was done with SYBR Green realtime PCR using DDCT. Written informed consent forms were filled out by patients, and then, information forms were filled out based on the Helsinki Declaration. Statistical analysis was done with SPSS (16.0; SPSS Inc, Chicago) using independent t test, Shapiro-Wilk, and Pearson's and Spearman's rank correlation coefficients. P B 0.05 was considered significant. The gene expression of JBP1 and JBP2 had no relation with sex and age. The JBP1 gene expression was high in sensitive isolates obtained from north of the country. The JBP2 gene expression was significant in sensitive and no responseantimonial isolates from the north, but no significant differences were detected in sensitive and resistant isolates from central Iran. Differential gene expression of JBP1 and JBP2 in various clinical resistances isolates in different geographical areas shows multifactorial ways of developing resistance in different isolates.
2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2016
A track to solve the problem of errors caused by the third generation of sequencing technology is... more A track to solve the problem of errors caused by the third generation of sequencing technology is to use the high coverage of the high quality of short reads generated by the second-generation sequencing technology. This paper presents a new approach for error correction and de novo assembly for long reads. We present MiRCA a hybrid approach based on the sequences alignments that detects and corrects errors for MinIon long reads using Illumina short reads. With this new error correction approach, we were able to make an effective and quick de novo assembly. Experiments on Saccharomyces cerevisiae and the Escherichia coli genomes show that MiRCA is much better than the available tools. MiRCA is tested on Linux platforms and freely available at https://github.com/Mkchouk/MiRCA.
Journal of Engineering Technology, 2016
Lecture Notes in Computer Science, 2016
In this paper, we make an experimental study to compare the performances of different data mining... more In this paper, we make an experimental study to compare the performances of different data mining classification algorithms for predicting osteoporosis in Tunisian postmenopausal women. This study aims to identify the best algorithms with the optimum classification parameters values and to determine the most important risk factors that have a significant impact on the osteoporosis occurrence. The obtained results show that Support Vector Machine (SVM) classifier and Artificial Neural Network (ANN) classifier give the best classification performances when dealing with the three bone statuses (normal, osteopenia, osteoporosis). On the other hand, the decision tree classifier C4.5 enables to extract the most important risk factors for osteoporosis occurrence. The selected risk factors are validated by biologists.
2015 26th International Workshop on Database and Expert Systems Applications (DEXA), 2015
A attractive way to perform biclustering of genes and conditions is to adopt the notion of fuzzy ... more A attractive way to perform biclustering of genes and conditions is to adopt the notion of fuzzy sets, which is useful for discovering overlapping biclusters. Fuzzy clustering is well known as a robust and efficient way to reduce computation cost to obtain the better results. However, this approach is not explored very well. In this paper, we propose a new algorithm called, Refine Bicluster for biclustering of microarray data using the fuzzy approach. This algorithm adopts the strategy of one bicluster at a time, assigning to each data matrix element, i.e. each gene and for each condition, a membership to bicluster. The biclustering problem, in where one would maximize the size of the bicluster and minimize the residual, is faced as the optimization of a proper functional. Applied on continuous synthetic datasets, our algorithm outperforms other biclustering algorithms for microarray data.
2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2015
The new MinIon sequencer provided by the Oxford Nanopore Technologies is characterized by his sma... more The new MinIon sequencer provided by the Oxford Nanopore Technologies is characterized by his small size and is powered from the USB 3.0 port of a laptop computer. This sequencer produces long reads with a low production costs and with high throughput. However, long reads generated by the MinIon sequencer have a high error rate (about 25% [1]) which deteriorates the quality of results obtained by analyzing these long reads. A solution to correct long reads is to use the high coverage and the high quality of short reads generated by the second generation sequencing technology. Here, we present MiRCA (MinIon Reads Correction Algorithm) a hybrid pipeline that detects and corrects errors for MinIon long reads using preassembled Illumina MiSeq short reads and we use the Overlap-Layout-Consensus(OLC) approach to assemble the corrected reads. MiRCA is able to correct: deletions, insertions and substitutions errors by forming a multiple sequence alignment and does not require a large memory space. We use the Saccharomyces cerevisiae W303 genome and the Escherichia coli K-12 MG1655 bacterial genome to test the efficiency of our pipeline.
Biological Knowledge Discovery Handbook, 2013
Abbass, J., Nebel, J.-C. and Mansour, N.(2013) Ab initio protein structure prediction: methods an... more Abbass, J., Nebel, J.-C. and Mansour, N.(2013) Ab initio protein structure prediction: methods and challenges. In: Elloumi, M. and Zoyama, AY,(eds.) Biological knowledge discovery handbook: preprocessing, mining and postprocessing of biological data. New Jersey, US: Wiley-Blackwell.(Computational techniques and engineering) ISBN 9781118132739 (In Press)
Journal of Software, 2018
Journal of Computer Science & Systems Biology, Jun 2, 2015
Alexandria Engineering Journal
Deep Learning for Biomedical Data Analysis, 2021
Chest x-rays images are the primary source to detect tuberculosis and different thoracic diseases... more Chest x-rays images are the primary source to detect tuberculosis and different thoracic diseases. Lung nodules also can be identified from the chest x-rays images. In the past decade, the concept of automatic disease detection from the datasets of chest x-rays images has gained importance and researchers have proposed a variety of techniques for tuberculosis screening, thoracic disease detection and lung nodule detection. With the availability of massive public datasets of chest x-rays images, Deep Learning (DL) techniques played a key role in lung disease detection. This book chapter tries to cover the DL techniques used to detect lung diseases from chest x-rays datasets. It contains the description of the public datasets of chest x-rays images available for thoracic disease detection, tuberculosis screening and lung nodule detection. It also lists most commonly used performance metrics for the evaluation of disease detection techniques.
hierarchical n-grams extraction approach for
World Academy of Science, Engineering and Technology, International Journal of Computer and Information Engineering, 2015
Tienda online donde Comprar Bioinformatics Research and Development · Second International Confer... more Tienda online donde Comprar Bioinformatics Research and Development · Second International Conference, BIRD 2008, Vienna, Austria, July 7-9, 2008 Proceedings al precio 91,77 € de Elloumi, Mourad | KA¼ng, Josef | Linial, Michal | Murphy, Robert | Schneider, Kristan | Toma, Cristian, tienda de Libros de Medicina, Libros de Biologia - Bioinformatica
AIMS Medical Science, 2017
The identification of the interactions of polymorphisms with other genetic or environmental facto... more The identification of the interactions of polymorphisms with other genetic or environmental factors for the detection of multifactorial diseases has now become both a challenge and an objective for geneticists. Unlike monogenic Mendelian diseases, the classical methods have not become too efficient for the identification of these interactions, especially with the exponential increase in the number of genetic interactions as well as the number of combinations of genotypes. Several methods have been proposed for the detection of susceptibility variants such as metaheuristics and statistical methods. Using metaheuristics, we focus on the feature selection of variables, and more precisely on the determination of the genes that increase the susceptibility to the disease, especially as these methods are more suitable for the description of complex data. Statistical methods are divided into two submethods including linkage studies and association studies. Generally these two methods are used one after the other since they are complementary. The linkage study is used initially because its objective is the localization of the chromosomal regions containing the gene(s) involved in the disease. Then, in a second step, the association study is set up to specify precisely the location of the gene. In this paper, we will present a survey of metaheuristics and statistical methods integrated in the field of human genetics and specifically multifactorial diseases in order to help genetics to find interaction between genes and environemental factor involved in those diseases.
A track to solve the problem of errors caused by the third generation of sequencing technology is... more A track to solve the problem of errors caused by the third generation of sequencing technology is to use the high coverage of the high quality of short reads generated by the second-generation sequencing technology. This paper presents a new approach for error correction and de novo assembly for long reads. We present MiRCA a hybrid approach based on the sequences alignments that detects and corrects errors for MinIon long reads using Illumina short reads. With this new error correction approach, we were able to make an effective and quick de novo assembly. Experiments on Saccharomyces cerevisiae and the Escherichia coli genomes show that MiRCA is much better than the available tools. MiRCA is tested on Linux platforms and freely available at https://github.com/Mkchouk/MiRCA.
Journal of Integrative Bioinformatics, 2021
Diseases can be tied to changes at the molecular level within affected cells. This can be concern... more Diseases can be tied to changes at the molecular level within affected cells. This can be concerning transcription, translation, or any other mechanism involved in gene expression, such as post-transcriptional regulation. Instrumentation for the measurement of such molecular changes is readily available and produces large amounts of data. For example, DNA and RNA sequencing, as well as protein quantitation, and sequencing can be achieved via next-generation sequencing and mass spectrometry, respectively. One current challenge is the analysis and integration of the resulting heterogeneous and large datasets. Bioinformatics is the field of study which produces algorithms and integrative approaches to attempt such data analyses. The primary aim in algorithmic bioinformatics is, however, the development of algorithms and not their application. Typically, novel algorithms are introduced with a proof of principle, and they are applied to some data for that purpose, but usually not comprehensively. Their data might slightly differ from the proof of principle, inducing further data analysis challenges. Additionally, applying such algorithms to their data may be involved for researchers from the biomedical domain. The 1st International Applied Bioinformatics Conference was conceived to bring together representatives from all research fields involved to increase knowledge transfer. First planned for 2020 and then deferred to 2021 due to the pandemic caused by the Coronavirus [1], the conference was held online. Despite the virtual nature of the conference, attention was great. We received many good manuscripts and invited a few to submit their full versions to this special issue. The range of topics was extensive, but many submissions concerned the interface of bioinformatics and its application. The selected papers for this special issue also discuss various topics such as sequence alignment and gene network reconstruction. The first paper in this special issue concerns a challenging issue in bioinformatics, the usage of pangenomes instead of single reference genomes and offers a fast variation-aware read mapping algorithm [2]. Mapping is also vital to investigate gene expression, which is essential for the second manuscript. It discusses how microRNA and mRNA expression profiles can be investigated [3]. From this, modular networks are inferred, describing post-transcriptional regulatory networks. Such networks are challenging to visualize, which is the focus of the third paper [4]. The work summarizes the state-of-the-art in bicluster visualization and is also based on gene expression data. Next, we move from transcriptomics to metabolomics. A disparity filter was applied to perform network analysis for colorectal cancer as a proof of principle [5]. The final two manuscripts focus more on practical application in cancer. First, the prostate, ovary, testes, and embryo
International Journal of Biomedical Data Mining, 2018
Biclustering algorithms have matured from their initial applications in bioinformatics, evolving ... more Biclustering algorithms have matured from their initial applications in bioinformatics, evolving towards different approaches and bicluster definitions, which makes sometimes hard for the analyst to determine which one of the available algorithms best fits her problem. As a way of benchmarking these algorithms, several quality measures have been proposed in literature. Such measures cover numerical aspects related to the accuracy, the recovery power or the capability of retrieving previous biomedical knowledge. However, biclustering apparently remains as an uncommon option for biomedicine analysis. Here we review the impact of biclustering algorithms in biomedicine and bioinformatics with the object of measuring and understanding non-numerical aspects of biclustering algorithms focusing on citation-based statistics that can be relevant for their application on the domain. In order to achieve this, we performed analyses of the citations impact of several clustering and biclustering algorithms, and propose a methodology that can cover this aspect of biclustering usage.
Journal of Parasitic Diseases, 2018
Cutaneous leishmaniasis (CL) is a major disease in many parts of the world. Since no vaccine has ... more Cutaneous leishmaniasis (CL) is a major disease in many parts of the world. Since no vaccine has been developed, treatment is the best way to control it. In most areas, antimonial resistance whose mechanisms have not been completely understood has been reported. The main aim of this study is gene expression assessing of J-binging protein 1 and J-binding protein 2 in clinical Leishmania major isolates. The patients with CL from central and north Iran were considered for this study. The samples were transferred in RNAlater solution and stored in-20°C. RNA extraction and cDNA synthesis were performed. The gene expression analysis was done with SYBR Green realtime PCR using DDCT. Written informed consent forms were filled out by patients, and then, information forms were filled out based on the Helsinki Declaration. Statistical analysis was done with SPSS (16.0; SPSS Inc, Chicago) using independent t test, Shapiro-Wilk, and Pearson's and Spearman's rank correlation coefficients. P B 0.05 was considered significant. The gene expression of JBP1 and JBP2 had no relation with sex and age. The JBP1 gene expression was high in sensitive isolates obtained from north of the country. The JBP2 gene expression was significant in sensitive and no responseantimonial isolates from the north, but no significant differences were detected in sensitive and resistant isolates from central Iran. Differential gene expression of JBP1 and JBP2 in various clinical resistances isolates in different geographical areas shows multifactorial ways of developing resistance in different isolates.
2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2016
A track to solve the problem of errors caused by the third generation of sequencing technology is... more A track to solve the problem of errors caused by the third generation of sequencing technology is to use the high coverage of the high quality of short reads generated by the second-generation sequencing technology. This paper presents a new approach for error correction and de novo assembly for long reads. We present MiRCA a hybrid approach based on the sequences alignments that detects and corrects errors for MinIon long reads using Illumina short reads. With this new error correction approach, we were able to make an effective and quick de novo assembly. Experiments on Saccharomyces cerevisiae and the Escherichia coli genomes show that MiRCA is much better than the available tools. MiRCA is tested on Linux platforms and freely available at https://github.com/Mkchouk/MiRCA.
Journal of Engineering Technology, 2016
Lecture Notes in Computer Science, 2016
In this paper, we make an experimental study to compare the performances of different data mining... more In this paper, we make an experimental study to compare the performances of different data mining classification algorithms for predicting osteoporosis in Tunisian postmenopausal women. This study aims to identify the best algorithms with the optimum classification parameters values and to determine the most important risk factors that have a significant impact on the osteoporosis occurrence. The obtained results show that Support Vector Machine (SVM) classifier and Artificial Neural Network (ANN) classifier give the best classification performances when dealing with the three bone statuses (normal, osteopenia, osteoporosis). On the other hand, the decision tree classifier C4.5 enables to extract the most important risk factors for osteoporosis occurrence. The selected risk factors are validated by biologists.
2015 26th International Workshop on Database and Expert Systems Applications (DEXA), 2015
A attractive way to perform biclustering of genes and conditions is to adopt the notion of fuzzy ... more A attractive way to perform biclustering of genes and conditions is to adopt the notion of fuzzy sets, which is useful for discovering overlapping biclusters. Fuzzy clustering is well known as a robust and efficient way to reduce computation cost to obtain the better results. However, this approach is not explored very well. In this paper, we propose a new algorithm called, Refine Bicluster for biclustering of microarray data using the fuzzy approach. This algorithm adopts the strategy of one bicluster at a time, assigning to each data matrix element, i.e. each gene and for each condition, a membership to bicluster. The biclustering problem, in where one would maximize the size of the bicluster and minimize the residual, is faced as the optimization of a proper functional. Applied on continuous synthetic datasets, our algorithm outperforms other biclustering algorithms for microarray data.
2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2015
The new MinIon sequencer provided by the Oxford Nanopore Technologies is characterized by his sma... more The new MinIon sequencer provided by the Oxford Nanopore Technologies is characterized by his small size and is powered from the USB 3.0 port of a laptop computer. This sequencer produces long reads with a low production costs and with high throughput. However, long reads generated by the MinIon sequencer have a high error rate (about 25% [1]) which deteriorates the quality of results obtained by analyzing these long reads. A solution to correct long reads is to use the high coverage and the high quality of short reads generated by the second generation sequencing technology. Here, we present MiRCA (MinIon Reads Correction Algorithm) a hybrid pipeline that detects and corrects errors for MinIon long reads using preassembled Illumina MiSeq short reads and we use the Overlap-Layout-Consensus(OLC) approach to assemble the corrected reads. MiRCA is able to correct: deletions, insertions and substitutions errors by forming a multiple sequence alignment and does not require a large memory space. We use the Saccharomyces cerevisiae W303 genome and the Escherichia coli K-12 MG1655 bacterial genome to test the efficiency of our pipeline.
Biological Knowledge Discovery Handbook, 2013
Abbass, J., Nebel, J.-C. and Mansour, N.(2013) Ab initio protein structure prediction: methods an... more Abbass, J., Nebel, J.-C. and Mansour, N.(2013) Ab initio protein structure prediction: methods and challenges. In: Elloumi, M. and Zoyama, AY,(eds.) Biological knowledge discovery handbook: preprocessing, mining and postprocessing of biological data. New Jersey, US: Wiley-Blackwell.(Computational techniques and engineering) ISBN 9781118132739 (In Press)