Lina Soualmia - Academia.edu (original) (raw)

Papers by Lina Soualmia

Research paper thumbnail of CIM-IND : Un système multilingue pour l’extraction d’information dans les textes cliniques

HAL (Le Centre pour la Communication Scientifique Directe), Nov 22, 2017

International audienc

Research paper thumbnail of Formalisation de la terminologie LOINC et évaluation de ses avantages pour la classification des tests de laboratoire

HAL (Le Centre pour la Communication Scientifique Directe), Jul 3, 2017

79 000 tests cliniques et de laboratoire 1 test = 6 dimensions [+4 optionnelles] Définition textu... more 79 000 tests cliniques et de laboratoire 1 test = 6 dimensions [+4 optionnelles] Définition textuelle des tests Absence d'organisation hiérarchique des tests 5

Research paper thumbnail of Exploitation de documents médicaux par les techniques d’embedding : application au typage automatique de documents

HAL (Le Centre pour la Communication Scientifique Directe), Jun 29, 2020

Research paper thumbnail of Combining DEVS simulation and ontological modeling for hierarchical analysis of the SARS-CoV-2 replication

Simulation, Jun 19, 2023

This article presents an hybrid and hierarchical model in which two modeling and simulation appro... more This article presents an hybrid and hierarchical model in which two modeling and simulation approaches, discrete event system specification simulation (DEVS) and semantic technologies, were used together in order to help in the analysis of a major healthcare problem, the severe acute respiratory syndrome-coronavirus 2 (SARS-CoV-2). Indeed, the complexity of the SARS-CoV-2 replication process, and the range of hierarchical scales over which it interacts with cellular components (extending from genomic and transcriptomic to proteomic and metabolomic scales), and the intricate way in which they are interwoven, make its understanding very challenging. It is therefore crucial to model the different scales of the replication process, by taking into account all interactions with the infected cell. By combining the advantages of both DEVS simulation and ontological modeling, we propose a hierarchical ontology-based DEVS simulation model of the SARS-CoV-2 viral replication at both the micro-molecular (proteomic and metabolomic) and macro-molecular (genomic and transcriptomic) scales. First, we demonstrate the usefulness of combining DEVS simulation and semantic technologies in a common modeling framework to face the complexity of the SARS-CoV-2 viral replication at different scales. Second, the modeling and simulation of the SARS-CoV-2 replication process on different levels provide valuable information on the different stages of the virus’s life cycle and lays the foundation for a system to anticipate future mutations selected by the virus.

Research paper thumbnail of Inclusive Digital Health

Yearbook of medical informatics, Aug 1, 2022

Objectives: To introduce the 2022 International Medical Informatics Association (IMIA) Yearbook b... more Objectives: To introduce the 2022 International Medical Informatics Association (IMIA) Yearbook by the editors. Methods: The editorial provides an introduction and overview to the 2022 IMIA Yearbook whose special topic is "Inclusive Digital Health: Addressing Equity, Literacy, and Bias for Resilient Health Systems". The special topic, survey papers, section editor synopses and some best papers are discussed. The sections' changes in the Yearbook Editorial Committee are also described. Results: As shown in the previous edition, health informatics in the context of a global pandemic has led to the development of ways to collect, standardize, disseminate and reuse data worldwide. The Corona Virus Disease 2019 (COVID-19) pandemic has demonstrated the need for timely, reliable, open, and globally available information to support decision making. It has also highlighted the need to address social inequities and disparities in access to care across communities. This edition of the Yearbook acknowledges the fact that much work has been done to study health equity in recent years in the various fields of health informatics research. Conclusion: There is a strong desire to better consider disparities between populations to avoid biases being induced in Artificial Intelligence algorithms in particular. Telemedicine and m-health must be more inclusive for people with disabilities or living in isolated geographical areas.

Research paper thumbnail of OntoBioStat: Supporting Causal Diagram Design and Analysis

IOS Press eBooks, May 25, 2022

Suitable causal inference in biostatistics can be best achieved by knowledge representation thank... more Suitable causal inference in biostatistics can be best achieved by knowledge representation thanks to causal diagrams or directed acyclic graphs. However, necessary and sufficient causes are not easily represented. Since existing ontologies do not fill this gap, we designed OntoBioStat in order to enable covariate selection support based on causal relation representations. OntoBioStat automatic ontological causal diagram construction and inferences are detailed in this study. OntoBioStat inferences are allowed by Semantic Web Rule Language rules and axioms. First, statements made by the users include outcome, exposure, covariate, and causal relation specification. Then, reasoning enable automatic construction using generic instances of Meta_Variable and Necessary_Variable classes. Finally, inferred classes highlighted potential bias such as confounder-like. Ontological causal diagram built with OntoBioStat was compared to a standard causal diagram (without OntoBioStat) in a theoretical study. It was found that confounding and bias were not completely identified by the standard causal diagram, and erroneous covariate sets were provided. Further research is needed in order to make OntoBioStat more usable.

Research paper thumbnail of Ontological Models Supporting Covariates Selection in Observational Studies

IOS Press eBooks, May 27, 2021

In the context of causal inference, biostatisticians use causal diagrams to select covariates in ... more In the context of causal inference, biostatisticians use causal diagrams to select covariates in order to build multivariate models. These diagrams represent datasets variables and their relations but have some limitations (representing interactions, bidirectional causal relations). The MetBrAYN project aims at building an ontological-based process to tackle these issues. The knowledge acquired by the biostatistician during a methodological consultation for a research question will be represented in a general ontology. In order to aggregate various forms of knowledge the ontology will act as a wrapper. Ontology-based causal diagrams will be semiautomatically built. Founded on inference rules, the global system will help biostatisticians to curate it and to visualize recommended covariates for their research question.

Research paper thumbnail of Ontological Representation of Causal Relations for a Deep Understanding of Associations Between Variables in Epidemiology

Research paper thumbnail of NoSQL technology in order to support Semantic Health Search Engine

HAL (Le Centre pour la Communication Scientifique Directe), Apr 24, 2018

Research paper thumbnail of Une technologie NoSQL au service de moteur de recherche en santé

HAL (Le Centre pour la Communication Scientifique Directe), Nov 22, 2017

Research paper thumbnail of Building a Semantic Health Data Warehouse in the Context of Clinical Trials: Development and Usability Study

JMIR medical informatics, Dec 20, 2019

Background: The huge amount of clinical, administrative, and demographic data recorded and mainta... more Background: The huge amount of clinical, administrative, and demographic data recorded and maintained by hospitals can be consistently aggregated into health data warehouses with a uniform data model. In 2017, Rouen University Hospital (RUH) initiated the design of a semantic health data warehouse enabling both semantic description and retrieval of health information. Objective: This study aimed to present a proof of concept of this semantic health data warehouse, based on the data of 250,000 patients from RUH, and to assess its ability to assist health professionals in prescreening eligible patients in a clinical trials context. Methods: The semantic health data warehouse relies on 3 distinct semantic layers: (1) a terminology and ontology portal, (2) a semantic annotator, and (3) a semantic search engine and NoSQL (not only structured query language) layer to enhance data access performances. The system adopts an entity-centered vision that provides generic search capabilities able to express data requirements in terms of the whole set of interconnected conceptual entities that compose health information. Results: We assessed the ability of the system to assist the search for 95 inclusion and exclusion criteria originating from 5 randomly chosen clinical trials from RUH. The system succeeded in fully automating 39% (29/74) of the criteria and was efficiently used as a prescreening tool for 73% (54/74) of them. Furthermore, the targeted sources of information and the search engine-related or data-related limitations that could explain the results for each criterion were also observed. Conclusions: The entity-centered vision contrasts with the usual patient-centered vision adopted by existing systems. It enables more genericity in the information retrieval process. It also allows to fully exploit the semantic description of health information. Despite their semantic annotation, searching within clinical narratives remained the major challenge of the system. A finer annotation of the clinical texts and the addition of specific functionalities would significantly improve the results. The semantic aspect of the system combined with its generic entity-centered vision enables the processing of a large range of clinical questions. However, an important part of health information remains in clinical narratives, and we are currently investigating novel approaches (deep learning) to enhance the semantic annotation of those unstructured data.

Research paper thumbnail of Assisting Data Retrieval with a Drug Knowledge Graph

IOS Press eBooks, Jan 14, 2022

The Normandy health data warehouse EDSaN integrates the medication orders from the University Hos... more The Normandy health data warehouse EDSaN integrates the medication orders from the University Hospital of Rouen (France). This study aims at describing the design and the evaluation of an information retrieval system founded on a complex and semantically augmented knowledge graph dedicated to EDSaN drugs' prescriptions. The system is intended to help the selection of drugs in the search process by health professionals. The manual evaluation of the relevance of the returned drugs showed encouraging results as expected. A deeper analysis in order to improve the ranking method is needed and will be performed in a future work.

Research paper thumbnail of The MeSH-Gram Neural Network Model: Extending Word Embedding Vectors with MeSH Concepts for Semantic Similarity

HAL (Le Centre pour la Communication Scientifique Directe), Aug 21, 2019

Eliciting semantic similarity between concepts remains a challenging task. Recent approaches foun... more Eliciting semantic similarity between concepts remains a challenging task. Recent approaches founded on embedding vectors have gained in popularity as they risen to efficiently capture semantic relationships. The underlying idea is that two words that have close meaning gather similar contexts. In this study, we propose a new neural network model named "MeSH-gram" which relies on a straightforward approach that extends the skip-gram neural network model by considering MeSH (Medical Subject Headings) descriptors instead words. Trained on publicly available PubMed/MEDLINE corpus, MesSH-gram is evaluated on reference standards manually annotated for semantic similarity. MeSH-gram is first compared to skip-gram with vectors of size 300 and at several windows' contexts. A deeper comparison is performed with twenty existing models. All the obtained results of Spearman's rank correlations between human scores and computed similarities show that MeSH-gram (i) outperforms the skip-gram model, and (ii) is comparable to the best methods but that need more computation and external resources.

Research paper thumbnail of Cimind: A phonetic-based tool for multilingual named entity recognition in biomedical texts

Journal of Biomedical Informatics, Jun 1, 2019

Background. Extracting concepts from biomedical texts is a key to support many advanced applicati... more Background. Extracting concepts from biomedical texts is a key to support many advanced applications such as biomedical information retrieval. However, in clinical notes Named Entity Recognition (NER) has to deal with various types of errors such as spelling errors, grammatical errors, truncated sentences, and non-standard abbreviations. Moreover, in numerous countries, NER is challenged by the availability of many resources originally developed and only suitable for English texts. This paper presents the Cimind system, a multilingual system dedicated to named entity recognition in medical texts based on a phonetic similarity measure. Methods. Cimind performs entity recognition by combining phonetic recognition using the DM phonetic algorithm to deal with spelling errors and string similarity measures. Three main steps are processed to identify terms in a controlled vocabulary: normalization, candidate selection by phonetic similarity and candidate ranking. Results. Cimind was evaluated in the 2016 and 2017 editions of the CLEF eHealth challenge in the CépiDC/CDC tasks. In 2017, it obtained on each corpus the following results:

Research paper thumbnail of OMIC-Onto : une Ressource pour l’Indexation et la Recherche d’Outils Omiques

HAL (Le Centre pour la Communication Scientifique Directe), Jul 6, 2015

International audienc

Research paper thumbnail of Tracing and analyzing COVID-19 dissemination using knowledge graphs

Procedia Computer Science, 2022

Research paper thumbnail of Building a Semantic Health Data Warehouse in the Context of Clinical Trials: Development and Usability Study (Preprint)

Background: The huge amount of clinical, administrative, and demographic data recorded and mainta... more Background: The huge amount of clinical, administrative, and demographic data recorded and maintained by hospitals can be consistently aggregated into health data warehouses with a uniform data model. In 2017, Rouen University Hospital (RUH) initiated the design of a semantic health data warehouse enabling both semantic description and retrieval of health information. Objective: This study aimed to present a proof of concept of this semantic health data warehouse, based on the data of 250,000 patients from RUH, and to assess its ability to assist health professionals in prescreening eligible patients in a clinical trials context. Methods: The semantic health data warehouse relies on 3 distinct semantic layers: (1) a terminology and ontology portal, (2) a semantic annotator, and (3) a semantic search engine and NoSQL (not only structured query language) layer to enhance data access performances. The system adopts an entity-centered vision that provides generic search capabilities able to express data requirements in terms of the whole set of interconnected conceptual entities that compose health information. Results: We assessed the ability of the system to assist the search for 95 inclusion and exclusion criteria originating from 5 randomly chosen clinical trials from RUH. The system succeeded in fully automating 39% (29/74) of the criteria and was efficiently used as a prescreening tool for 73% (54/74) of them. Furthermore, the targeted sources of information and the search engine-related or data-related limitations that could explain the results for each criterion were also observed. Conclusions: The entity-centered vision contrasts with the usual patient-centered vision adopted by existing systems. It enables more genericity in the information retrieval process. It also allows to fully exploit the semantic description of health information. Despite their semantic annotation, searching within clinical narratives remained the major challenge of the system. A finer annotation of the clinical texts and the addition of specific functionalities would significantly improve the results. The semantic aspect of the system combined with its generic entity-centered vision enables the processing of a large range of clinical questions. However, an important part of health information remains in clinical narratives, and we are currently investigating novel approaches (deep learning) to enhance the semantic annotation of those unstructured data.

Research paper thumbnail of Intégration de données cliniques et omiques pour la recherche d'information dans le Dossier Patient Informatisé

HAL (Le Centre pour la Communication Scientifique Directe), Jul 1, 2015

Nous décrivons dans cet article le modèle de données générique Information Retrieval for Omic and... more Nous décrivons dans cet article le modèle de données générique Information Retrieval for Omic and Clinical Sciences (IROmiCS) que nous proposons pour gérer les principaux types de données omiques (données d'expression, de méthylation de l'ADN et variants génomiques). Nous décrivons également le langage de requêtes que nous avons développé qui repose sur le modèle IROmiCS et qui est dédié à l'interrogation des données cliniques et omiques. Pour valider le modèle de données proposé, ainsi que le langage de requêtes associé, des données omiques expérimentales ont été intégrées dans ce modèle ainsi que des données de référence telle que les bases Gene du NCBI, Uniprot/Swissprot et la Gene Ontology. Plusieurs types de requêtes ciblant des données cliniques et des données omiques ont été réalisées sur les données intégrées. Une interface graphique facilite la visualisation des données intégrées par les cliniciens et les chercheurs. L'outil de recherche a permis de traiter des données symboliques, textuelles, numériques et chronologiques.

Research paper thumbnail of Multi-terminology cross-lingual modeling in the Health Terminology/Ontology Portal

AMIA, 2012

The Health Terminology/Ontology Portal (HeTOP) is a repository dedicated to health professionals ... more The Health Terminology/Ontology Portal (HeTOP) is a repository dedicated to health professionals and students. It provides access to 32 health terminologies (including MeSH, ICD-10, etc.) available in 23 different languages (English, French, German, Russian, etc.). Several methods and technologies have been developed to create this multi-terminology server, dedicated to both users and computers. HeTOP is a valuable tool to help in indexing, as well as for teaching and performing audits in terminology management.

Research paper thumbnail of Évaluation de la Qualité des Liens Sémantiques entre Vocabulaires Contrôlés

Research paper thumbnail of CIM-IND : Un système multilingue pour l’extraction d’information dans les textes cliniques

HAL (Le Centre pour la Communication Scientifique Directe), Nov 22, 2017

International audienc

Research paper thumbnail of Formalisation de la terminologie LOINC et évaluation de ses avantages pour la classification des tests de laboratoire

HAL (Le Centre pour la Communication Scientifique Directe), Jul 3, 2017

79 000 tests cliniques et de laboratoire 1 test = 6 dimensions [+4 optionnelles] Définition textu... more 79 000 tests cliniques et de laboratoire 1 test = 6 dimensions [+4 optionnelles] Définition textuelle des tests Absence d'organisation hiérarchique des tests 5

Research paper thumbnail of Exploitation de documents médicaux par les techniques d’embedding : application au typage automatique de documents

HAL (Le Centre pour la Communication Scientifique Directe), Jun 29, 2020

Research paper thumbnail of Combining DEVS simulation and ontological modeling for hierarchical analysis of the SARS-CoV-2 replication

Simulation, Jun 19, 2023

This article presents an hybrid and hierarchical model in which two modeling and simulation appro... more This article presents an hybrid and hierarchical model in which two modeling and simulation approaches, discrete event system specification simulation (DEVS) and semantic technologies, were used together in order to help in the analysis of a major healthcare problem, the severe acute respiratory syndrome-coronavirus 2 (SARS-CoV-2). Indeed, the complexity of the SARS-CoV-2 replication process, and the range of hierarchical scales over which it interacts with cellular components (extending from genomic and transcriptomic to proteomic and metabolomic scales), and the intricate way in which they are interwoven, make its understanding very challenging. It is therefore crucial to model the different scales of the replication process, by taking into account all interactions with the infected cell. By combining the advantages of both DEVS simulation and ontological modeling, we propose a hierarchical ontology-based DEVS simulation model of the SARS-CoV-2 viral replication at both the micro-molecular (proteomic and metabolomic) and macro-molecular (genomic and transcriptomic) scales. First, we demonstrate the usefulness of combining DEVS simulation and semantic technologies in a common modeling framework to face the complexity of the SARS-CoV-2 viral replication at different scales. Second, the modeling and simulation of the SARS-CoV-2 replication process on different levels provide valuable information on the different stages of the virus’s life cycle and lays the foundation for a system to anticipate future mutations selected by the virus.

Research paper thumbnail of Inclusive Digital Health

Yearbook of medical informatics, Aug 1, 2022

Objectives: To introduce the 2022 International Medical Informatics Association (IMIA) Yearbook b... more Objectives: To introduce the 2022 International Medical Informatics Association (IMIA) Yearbook by the editors. Methods: The editorial provides an introduction and overview to the 2022 IMIA Yearbook whose special topic is "Inclusive Digital Health: Addressing Equity, Literacy, and Bias for Resilient Health Systems". The special topic, survey papers, section editor synopses and some best papers are discussed. The sections' changes in the Yearbook Editorial Committee are also described. Results: As shown in the previous edition, health informatics in the context of a global pandemic has led to the development of ways to collect, standardize, disseminate and reuse data worldwide. The Corona Virus Disease 2019 (COVID-19) pandemic has demonstrated the need for timely, reliable, open, and globally available information to support decision making. It has also highlighted the need to address social inequities and disparities in access to care across communities. This edition of the Yearbook acknowledges the fact that much work has been done to study health equity in recent years in the various fields of health informatics research. Conclusion: There is a strong desire to better consider disparities between populations to avoid biases being induced in Artificial Intelligence algorithms in particular. Telemedicine and m-health must be more inclusive for people with disabilities or living in isolated geographical areas.

Research paper thumbnail of OntoBioStat: Supporting Causal Diagram Design and Analysis

IOS Press eBooks, May 25, 2022

Suitable causal inference in biostatistics can be best achieved by knowledge representation thank... more Suitable causal inference in biostatistics can be best achieved by knowledge representation thanks to causal diagrams or directed acyclic graphs. However, necessary and sufficient causes are not easily represented. Since existing ontologies do not fill this gap, we designed OntoBioStat in order to enable covariate selection support based on causal relation representations. OntoBioStat automatic ontological causal diagram construction and inferences are detailed in this study. OntoBioStat inferences are allowed by Semantic Web Rule Language rules and axioms. First, statements made by the users include outcome, exposure, covariate, and causal relation specification. Then, reasoning enable automatic construction using generic instances of Meta_Variable and Necessary_Variable classes. Finally, inferred classes highlighted potential bias such as confounder-like. Ontological causal diagram built with OntoBioStat was compared to a standard causal diagram (without OntoBioStat) in a theoretical study. It was found that confounding and bias were not completely identified by the standard causal diagram, and erroneous covariate sets were provided. Further research is needed in order to make OntoBioStat more usable.

Research paper thumbnail of Ontological Models Supporting Covariates Selection in Observational Studies

IOS Press eBooks, May 27, 2021

In the context of causal inference, biostatisticians use causal diagrams to select covariates in ... more In the context of causal inference, biostatisticians use causal diagrams to select covariates in order to build multivariate models. These diagrams represent datasets variables and their relations but have some limitations (representing interactions, bidirectional causal relations). The MetBrAYN project aims at building an ontological-based process to tackle these issues. The knowledge acquired by the biostatistician during a methodological consultation for a research question will be represented in a general ontology. In order to aggregate various forms of knowledge the ontology will act as a wrapper. Ontology-based causal diagrams will be semiautomatically built. Founded on inference rules, the global system will help biostatisticians to curate it and to visualize recommended covariates for their research question.

Research paper thumbnail of Ontological Representation of Causal Relations for a Deep Understanding of Associations Between Variables in Epidemiology

Research paper thumbnail of NoSQL technology in order to support Semantic Health Search Engine

HAL (Le Centre pour la Communication Scientifique Directe), Apr 24, 2018

Research paper thumbnail of Une technologie NoSQL au service de moteur de recherche en santé

HAL (Le Centre pour la Communication Scientifique Directe), Nov 22, 2017

Research paper thumbnail of Building a Semantic Health Data Warehouse in the Context of Clinical Trials: Development and Usability Study

JMIR medical informatics, Dec 20, 2019

Background: The huge amount of clinical, administrative, and demographic data recorded and mainta... more Background: The huge amount of clinical, administrative, and demographic data recorded and maintained by hospitals can be consistently aggregated into health data warehouses with a uniform data model. In 2017, Rouen University Hospital (RUH) initiated the design of a semantic health data warehouse enabling both semantic description and retrieval of health information. Objective: This study aimed to present a proof of concept of this semantic health data warehouse, based on the data of 250,000 patients from RUH, and to assess its ability to assist health professionals in prescreening eligible patients in a clinical trials context. Methods: The semantic health data warehouse relies on 3 distinct semantic layers: (1) a terminology and ontology portal, (2) a semantic annotator, and (3) a semantic search engine and NoSQL (not only structured query language) layer to enhance data access performances. The system adopts an entity-centered vision that provides generic search capabilities able to express data requirements in terms of the whole set of interconnected conceptual entities that compose health information. Results: We assessed the ability of the system to assist the search for 95 inclusion and exclusion criteria originating from 5 randomly chosen clinical trials from RUH. The system succeeded in fully automating 39% (29/74) of the criteria and was efficiently used as a prescreening tool for 73% (54/74) of them. Furthermore, the targeted sources of information and the search engine-related or data-related limitations that could explain the results for each criterion were also observed. Conclusions: The entity-centered vision contrasts with the usual patient-centered vision adopted by existing systems. It enables more genericity in the information retrieval process. It also allows to fully exploit the semantic description of health information. Despite their semantic annotation, searching within clinical narratives remained the major challenge of the system. A finer annotation of the clinical texts and the addition of specific functionalities would significantly improve the results. The semantic aspect of the system combined with its generic entity-centered vision enables the processing of a large range of clinical questions. However, an important part of health information remains in clinical narratives, and we are currently investigating novel approaches (deep learning) to enhance the semantic annotation of those unstructured data.

Research paper thumbnail of Assisting Data Retrieval with a Drug Knowledge Graph

IOS Press eBooks, Jan 14, 2022

The Normandy health data warehouse EDSaN integrates the medication orders from the University Hos... more The Normandy health data warehouse EDSaN integrates the medication orders from the University Hospital of Rouen (France). This study aims at describing the design and the evaluation of an information retrieval system founded on a complex and semantically augmented knowledge graph dedicated to EDSaN drugs' prescriptions. The system is intended to help the selection of drugs in the search process by health professionals. The manual evaluation of the relevance of the returned drugs showed encouraging results as expected. A deeper analysis in order to improve the ranking method is needed and will be performed in a future work.

Research paper thumbnail of The MeSH-Gram Neural Network Model: Extending Word Embedding Vectors with MeSH Concepts for Semantic Similarity

HAL (Le Centre pour la Communication Scientifique Directe), Aug 21, 2019

Eliciting semantic similarity between concepts remains a challenging task. Recent approaches foun... more Eliciting semantic similarity between concepts remains a challenging task. Recent approaches founded on embedding vectors have gained in popularity as they risen to efficiently capture semantic relationships. The underlying idea is that two words that have close meaning gather similar contexts. In this study, we propose a new neural network model named "MeSH-gram" which relies on a straightforward approach that extends the skip-gram neural network model by considering MeSH (Medical Subject Headings) descriptors instead words. Trained on publicly available PubMed/MEDLINE corpus, MesSH-gram is evaluated on reference standards manually annotated for semantic similarity. MeSH-gram is first compared to skip-gram with vectors of size 300 and at several windows' contexts. A deeper comparison is performed with twenty existing models. All the obtained results of Spearman's rank correlations between human scores and computed similarities show that MeSH-gram (i) outperforms the skip-gram model, and (ii) is comparable to the best methods but that need more computation and external resources.

Research paper thumbnail of Cimind: A phonetic-based tool for multilingual named entity recognition in biomedical texts

Journal of Biomedical Informatics, Jun 1, 2019

Background. Extracting concepts from biomedical texts is a key to support many advanced applicati... more Background. Extracting concepts from biomedical texts is a key to support many advanced applications such as biomedical information retrieval. However, in clinical notes Named Entity Recognition (NER) has to deal with various types of errors such as spelling errors, grammatical errors, truncated sentences, and non-standard abbreviations. Moreover, in numerous countries, NER is challenged by the availability of many resources originally developed and only suitable for English texts. This paper presents the Cimind system, a multilingual system dedicated to named entity recognition in medical texts based on a phonetic similarity measure. Methods. Cimind performs entity recognition by combining phonetic recognition using the DM phonetic algorithm to deal with spelling errors and string similarity measures. Three main steps are processed to identify terms in a controlled vocabulary: normalization, candidate selection by phonetic similarity and candidate ranking. Results. Cimind was evaluated in the 2016 and 2017 editions of the CLEF eHealth challenge in the CépiDC/CDC tasks. In 2017, it obtained on each corpus the following results:

Research paper thumbnail of OMIC-Onto : une Ressource pour l’Indexation et la Recherche d’Outils Omiques

HAL (Le Centre pour la Communication Scientifique Directe), Jul 6, 2015

International audienc

Research paper thumbnail of Tracing and analyzing COVID-19 dissemination using knowledge graphs

Procedia Computer Science, 2022

Research paper thumbnail of Building a Semantic Health Data Warehouse in the Context of Clinical Trials: Development and Usability Study (Preprint)

Background: The huge amount of clinical, administrative, and demographic data recorded and mainta... more Background: The huge amount of clinical, administrative, and demographic data recorded and maintained by hospitals can be consistently aggregated into health data warehouses with a uniform data model. In 2017, Rouen University Hospital (RUH) initiated the design of a semantic health data warehouse enabling both semantic description and retrieval of health information. Objective: This study aimed to present a proof of concept of this semantic health data warehouse, based on the data of 250,000 patients from RUH, and to assess its ability to assist health professionals in prescreening eligible patients in a clinical trials context. Methods: The semantic health data warehouse relies on 3 distinct semantic layers: (1) a terminology and ontology portal, (2) a semantic annotator, and (3) a semantic search engine and NoSQL (not only structured query language) layer to enhance data access performances. The system adopts an entity-centered vision that provides generic search capabilities able to express data requirements in terms of the whole set of interconnected conceptual entities that compose health information. Results: We assessed the ability of the system to assist the search for 95 inclusion and exclusion criteria originating from 5 randomly chosen clinical trials from RUH. The system succeeded in fully automating 39% (29/74) of the criteria and was efficiently used as a prescreening tool for 73% (54/74) of them. Furthermore, the targeted sources of information and the search engine-related or data-related limitations that could explain the results for each criterion were also observed. Conclusions: The entity-centered vision contrasts with the usual patient-centered vision adopted by existing systems. It enables more genericity in the information retrieval process. It also allows to fully exploit the semantic description of health information. Despite their semantic annotation, searching within clinical narratives remained the major challenge of the system. A finer annotation of the clinical texts and the addition of specific functionalities would significantly improve the results. The semantic aspect of the system combined with its generic entity-centered vision enables the processing of a large range of clinical questions. However, an important part of health information remains in clinical narratives, and we are currently investigating novel approaches (deep learning) to enhance the semantic annotation of those unstructured data.

Research paper thumbnail of Intégration de données cliniques et omiques pour la recherche d'information dans le Dossier Patient Informatisé

HAL (Le Centre pour la Communication Scientifique Directe), Jul 1, 2015

Nous décrivons dans cet article le modèle de données générique Information Retrieval for Omic and... more Nous décrivons dans cet article le modèle de données générique Information Retrieval for Omic and Clinical Sciences (IROmiCS) que nous proposons pour gérer les principaux types de données omiques (données d'expression, de méthylation de l'ADN et variants génomiques). Nous décrivons également le langage de requêtes que nous avons développé qui repose sur le modèle IROmiCS et qui est dédié à l'interrogation des données cliniques et omiques. Pour valider le modèle de données proposé, ainsi que le langage de requêtes associé, des données omiques expérimentales ont été intégrées dans ce modèle ainsi que des données de référence telle que les bases Gene du NCBI, Uniprot/Swissprot et la Gene Ontology. Plusieurs types de requêtes ciblant des données cliniques et des données omiques ont été réalisées sur les données intégrées. Une interface graphique facilite la visualisation des données intégrées par les cliniciens et les chercheurs. L'outil de recherche a permis de traiter des données symboliques, textuelles, numériques et chronologiques.

Research paper thumbnail of Multi-terminology cross-lingual modeling in the Health Terminology/Ontology Portal

AMIA, 2012

The Health Terminology/Ontology Portal (HeTOP) is a repository dedicated to health professionals ... more The Health Terminology/Ontology Portal (HeTOP) is a repository dedicated to health professionals and students. It provides access to 32 health terminologies (including MeSH, ICD-10, etc.) available in 23 different languages (English, French, German, Russian, etc.). Several methods and technologies have been developed to create this multi-terminology server, dedicated to both users and computers. HeTOP is a valuable tool to help in indexing, as well as for teaching and performing audits in terminology management.

Research paper thumbnail of Évaluation de la Qualité des Liens Sémantiques entre Vocabulaires Contrôlés