Bruno Bachimont | Université de Technologie de Compiègne (original) (raw)
Papers by Bruno Bachimont
Methods of Information in Medicine, 1998
Although medical language processing (MLP) has achieved some success, the actual use and dissemin... more Although medical language processing (MLP) has achieved some success, the actual use and dissemination of data extracted from free text by MLP systems is still very limited. We claim that the adoption of an ‘enricheddocument’ paradigm (or ‘document-centered’ view) can help to address this issue. We present this paradigm and explain how it can be implemented, then discuss its expected benefits both for end-users and MLP researchers.
Direction de la Recherche et de l’Expérimentation, Institut National de l’Audiovisuel 4, Avenue d... more Direction de la Recherche et de l’Expérimentation, Institut National de l’Audiovisuel 4, Avenue de l’Europe
Digital technology reconfigures the organization and status of archives. Immersed in the eternal ... more Digital technology reconfigures the organization and status of archives. Immersed in the eternal present of their technological youth necessary for their consultation, digital archives potentially no longer bear the marks of time, whereas they show the past. They gain a new appetence, based on the communication uses of the moment. But how then to give them their sense of archive, how to restore their own temporality? The challenge is to allow what we call 'historical empathy' without falling into psychological anachronism. We argue here that the mediatization of digital audiovisual archives must allow us to feel concerned, with the concessions no doubt necessary to the technology and aesthetics of the moment, while perceiving the strangeness of the contents and the definitively bygone aspect of this past. It is therefore a particular critical hermeneutic to build, where mediation must show a past that technology displays in a permanent and persistent contemporaneity.
La preservation et la transmission des œuvres musicales etaient fondees ces derniers siecles sur ... more La preservation et la transmission des œuvres musicales etaient fondees ces derniers siecles sur un substrat culturel stable tripartite : la partition, une organologie classifiant les instruments, une lutherie capable de fabriquer ces derniers. L'apparition des systemes electroniques et numeriques a bouleverse le monde de la composition, destabilisant ses traditions pluri-seculaires et mettant en peril la preservation de la musique contemporaine. Devant la necessite de mettre a jour les œuvres pour faire face a l'obsolescence, il nous parait fondamental d'etudier les processus de composition, d'en extraire les informations pertinentes, necessaires pour ressaisir certaines intentions originales du compositeur. Notre approche se fonde sur la modelisation des processus de production et la creation d'un langage permettant de representer ces pratiques de composition. Ce langage sera a la base d'un environnement permettant de capter et de naviguer au sein d'un ...
Methods of Information in Medicine, 1995
Medical natural language understanding basically aims at representing the contents of medical tex... more Medical natural language understanding basically aims at representing the contents of medical texts in a formal, conceptual representation. The understanding process itself increasingly relies on a body of domain knowledge, generally expressed in the same conceptual formalism. The design of such a conceptual representation is a key knowledge acquisition issue. When representing knowledge, the most important point is to ensure that the formal exploitation of the knowledge representation conforms to its meaning in the domain. In this paper, we examine some methodological and theoretical principles to enforce this conformity. These principles result from our experience in Menelas, a medical language understanding project. 1
Intellectica. Revue de l'Association pour la Recherche Cognitive, 2010
La présence de l'archive : réinventer et justifier Bruno BACHIMONT RÉSUMÉ. Avons nous de la mémoi... more La présence de l'archive : réinventer et justifier Bruno BACHIMONT RÉSUMÉ. Avons nous de la mémoire car nous conservons intacts des souvenirs, ou avons-nous des souvenirs parce que nous exerçons sans cesse notre mémoire, réactivant et réinventant ces derniers ? La fidélité au passé est alors gagée dans l'intégrité des souvenirs, ou dans la qualité de l'exercice de la mémoire. Cette ancienne question connaît une actualité brûlante dans le contexte nouveau dressé par les technologies numériques : d'un côté, une hybris de la numérisation conduisant à tout conserver et stocker, de l'autre, une pratique du numérique où chaque contenu est réinventé dès lors qu'on le consulte. Cet article revient sur les conditions de possibilités de la mémoire pour privilégier une conception dynamique selon laquelle la mémoire est un exercice permanent où l'on s'empare d'objets pour en faire des traces, réinventant ainsi le passé par leur témoignage. L'enjeu est alors la critique argumentée et raisonnée de leur réinvention.
Reservados todos os direitos. Toda a reprodução ou transmissão, por qualquer forma, seja esta mec... more Reservados todos os direitos. Toda a reprodução ou transmissão, por qualquer forma, seja esta mecânica, electrónica, fotocópia, gravação ou qualquer outra, sem a prévia autorização escrita do autor e editor é ilícita e passível de procedimento judicial contra o infractor.
Proceedings of the 16th conference on Computational linguistics -, 1996
We address here the treatment of me, tonymie expressions from a knowledge representation perspe(:... more We address here the treatment of me, tonymie expressions from a knowledge representation perspe(:tive, that is, in the context of a text understanding system whi('h aims to build a (:onceptual representation from texts according to a domain mode, l ext)resse, d in a knowledge representation formalism. We focus in this t)aper on the part of tile semantic analyser which deals with semantic eoml)osition. We explain how we use tile domain model to handle metonymy dynamically, and more generally, to un-(lerlie semantic (:omposition, using tile knowledge descriptions atta(:hed to ea(:h (:oneept of our olttology as a kind of eon('el)t-h;ve.l , multii)b.-role (lualia structure. YVe rely for this on ~t heuristic 1)ath search algorithm that exl)loits the gr~phic aspects of the eon(:eptual gratIhs formalism. The methods described have 1)een imi)lemente<l and applie(l on French texts in the medical domain.
Journées Francophones dIngénierie des Connaissances, 2007
Les contenus qui transitent sur les réseaux sont régulés par la loi et par des systèmes de gestio... more Les contenus qui transitent sur les réseaux sont régulés par la loi et par des systèmes de gestion technique appelés DRM (Digital Right Management). Ces systèmes comportent une brique sémantique permettant de décrire les licences d'utilisation des contenus : il s'agit des REL (Right Expression Language). Chaque DRM dispose d'un Langage d'Expression de Droits qui lui est propre (ODRL pour la DRM d'OMA, XrML pour Windows...). Du fait de l'usage grandissant des systèmes de DRM (Digital Right Management) dans la distribution de contenus sur Internet, les détenteurs de contenus sont contraints de déclarer autant de licences que de systèmes de DRM utilisés. En effet, si un détenteur de droit désire proposer au public son contenu sur PC et sur téléphone mobile, il devra payer pour que son distributeur de contenu lui génère deux licences distinctes (en ODRL pour les téléphones mobiles et en XrML pour les PC). On comprend donc le besoin de définir une représentation "générique" des licences. Nous présenterons dans cet article notre approche "express once, translate many", qui permet l'expression générique de licences à l'aide d'un REL basé sur une ontologie des licences documentée par les règles du droit d'auteur.
Lecture Notes in Computer Science, 2005
Proceedings / AMIA ... Annual Symposium. AMIA Symposium, 1998
The patient record is a repository for knowledge about a patient. Work in Artificial Intelligence... more The patient record is a repository for knowledge about a patient. Work in Artificial Intelligence and knowledge representation has evidenced the intrinsic difficulty of formalizing knowledge for computer processing. It is therefore not a surprise that most attempts at computerizing the patient record have only had a limited degree of success or applicability. We claim that this is due to the fact that medicine is an empirical domain, and thus fundamentally resists formalization. Therefore, the only way medical knowledge can be fully expressed is through natural languages which is indeed what clinicians actually use. We proposed and designed an electronic medical record which adheres to this hypothesis and where structured documents play a prominent role.
Proceedings : a conference of the American Medical Informatics Association / ... AMIA Annual Fall Symposium. AMIA Fall Symposium, 1997
The Menelas project aimed to produce a normalized conceptual representation from natural language... more The Menelas project aimed to produce a normalized conceptual representation from natural language patient discharge summaries. Because of the complex and detailed nature of conceptual representations, evaluating the quality of output of such a system is difficult. We present the method designed to measure the quality of Menelas output, and its application to the state of the French Menelas prototype as of the end of the project. We examine this method in the framework recently proposed by Friedman and Hripcsak. We also propose two conditions which enable to reduce the evaluation preparation workload.
Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care, 1995
The overall goal of MENELAS is to provide better access to the information contained in natural l... more The overall goal of MENELAS is to provide better access to the information contained in natural language patient discharge summaries (PDSs), through the design and implementation of a prototype able to analyse medical texts. The approach taken by MENELAS is based on the following key principles: (i) to maximise the usefulness of natural language analysis and the usability of its results, the output of natural language analysis must be a normalised conceptual representation of medical information; and (ii) to maximise the reuse of resources, language analysis should be domain-independent and conceptual representation should be language-independent. This paper discusses the results obtained and the issues raised when implementing these principles during the project.
Proceedings of the 11th international conference on Artificial intelligence and law - ICAIL '07, 2007
Digital contents distributed over the internet are regulated by law and by technical management s... more Digital contents distributed over the internet are regulated by law and by technical management systems. The latter include a semantic component that describes licenses, i.e. rights of use which are granted to the user. These elements of Digital Rights Management (DRM) systems are called Rights Expression Languages (REL), they gather terms and relations needed to build licenses. Some are based on an ontology of online licenses, not necessarily related to applicable law and various legal systems, and cannot interoperate. As a consequence, there is a need for a more generic way to express licenses. Here, generic means that rightholders should only need to express the license they need once, and semi-automatic tools should then translate this license so it can be browsed by any specific system. Hence it implies the necessity to be able to model concept semantics in order to translate a license expressed in generic terms into more specific terms that are compliant with the specific standards used by distribution systems. This work comes as part of larger studies on legal ontologies, legal systems and RELs.
Proceedings of the 2007 international workshop on Semantically aware document processing and indexing - SADPI '07, 2007
Documentaliste-Sciences de l'Information, 2011
Terminology, 2005
In this paper, we present an experiment dealing with corpus-based construction of “differential o... more In this paper, we present an experiment dealing with corpus-based construction of “differential ontologies”, which are organised according to semantic similarity and differential features. We argue that knowledge-rich defining contexts can be useful to help an ontology modeller in his task. We present a method, based on lexico-syntactic patterns, to spot such contexts in a corpus, then identify the terms they relate (definiendum and genus or “characteristics”) and the semantic relation that links them. We also show how potential co-hyponyms can be detected on the basis of shared words in their definiens. We evaluate the extracted defining sentences, semantic relations and co-hyponyms on a test corpus focusing on childhood and on an evaluation corpus about dietetics (both corpora are French). Definition extraction obtains 50% precision and recall of approximately 40%. Semantic relation identification reaches an average of 48% precision, and co-hyponyms 23.5%. We discuss the results o...
Actes de TALN, 2004
Pour construire une ontologie, un modéliseur a besoin d'objecter des informations sémantiques sur... more Pour construire une ontologie, un modéliseur a besoin d'objecter des informations sémantiques sur les termes principaux de son domaine d'étude. Les outils d'exploration de corpus peuvent aider à repérer ces types d'information, et l'identification de couples d'hyperonymes a fait l'objet de plusieurs travaux. Nous proposons d'exploiter des énoncés définitoires pour extraire d'un corpus des informations concernant les trois axes de l'ossature ontologique : l'axe vertical, lié à l'hyperonymie, l'axe horizontal, lié à la co-hyponymie et l'axe transversal, lié aux relations du domaine. Après un rappel des travaux existants en repérage d'énoncés définitoires en TAL, nous développons la méthode que nous avons mise en place, puis nous présentons son évaluation et les premiers résultats obtenus. Leur repérage atteint de 10% à 69% de précision suivant les patrons, celui des unités lexicales varie de 31% à 56%, suivant le référentiel adopté. In order to build an ontology, a modeler needs to objectivate semantic information about the main terms of his domain. Some tools meant to explore corpora can help pointing out this information, and previous work has focused on the identification of hyperonyms. We propose here to rely on lay definitions to extract the information necessary to build an ontology structure: the vertical axis, related to hypernymy, the horizontal axis, related to co-hyponymy, and the transversal axis, linked to domain-related cross relations. After a survey of previous work about the extraction of definitions in NLP, we develop the method we followed, then present its evaluation criteria and the first results. The mining of lay definitions reached from 10 to 69% of precision, depending on the pattern involved, the mining of lexical items varied from 31 to 56%, following the reference considered.
Methods of Information in Medicine, 1998
Although medical language processing (MLP) has achieved some success, the actual use and dissemin... more Although medical language processing (MLP) has achieved some success, the actual use and dissemination of data extracted from free text by MLP systems is still very limited. We claim that the adoption of an ‘enricheddocument’ paradigm (or ‘document-centered’ view) can help to address this issue. We present this paradigm and explain how it can be implemented, then discuss its expected benefits both for end-users and MLP researchers.
Direction de la Recherche et de l’Expérimentation, Institut National de l’Audiovisuel 4, Avenue d... more Direction de la Recherche et de l’Expérimentation, Institut National de l’Audiovisuel 4, Avenue de l’Europe
Digital technology reconfigures the organization and status of archives. Immersed in the eternal ... more Digital technology reconfigures the organization and status of archives. Immersed in the eternal present of their technological youth necessary for their consultation, digital archives potentially no longer bear the marks of time, whereas they show the past. They gain a new appetence, based on the communication uses of the moment. But how then to give them their sense of archive, how to restore their own temporality? The challenge is to allow what we call 'historical empathy' without falling into psychological anachronism. We argue here that the mediatization of digital audiovisual archives must allow us to feel concerned, with the concessions no doubt necessary to the technology and aesthetics of the moment, while perceiving the strangeness of the contents and the definitively bygone aspect of this past. It is therefore a particular critical hermeneutic to build, where mediation must show a past that technology displays in a permanent and persistent contemporaneity.
La preservation et la transmission des œuvres musicales etaient fondees ces derniers siecles sur ... more La preservation et la transmission des œuvres musicales etaient fondees ces derniers siecles sur un substrat culturel stable tripartite : la partition, une organologie classifiant les instruments, une lutherie capable de fabriquer ces derniers. L'apparition des systemes electroniques et numeriques a bouleverse le monde de la composition, destabilisant ses traditions pluri-seculaires et mettant en peril la preservation de la musique contemporaine. Devant la necessite de mettre a jour les œuvres pour faire face a l'obsolescence, il nous parait fondamental d'etudier les processus de composition, d'en extraire les informations pertinentes, necessaires pour ressaisir certaines intentions originales du compositeur. Notre approche se fonde sur la modelisation des processus de production et la creation d'un langage permettant de representer ces pratiques de composition. Ce langage sera a la base d'un environnement permettant de capter et de naviguer au sein d'un ...
Methods of Information in Medicine, 1995
Medical natural language understanding basically aims at representing the contents of medical tex... more Medical natural language understanding basically aims at representing the contents of medical texts in a formal, conceptual representation. The understanding process itself increasingly relies on a body of domain knowledge, generally expressed in the same conceptual formalism. The design of such a conceptual representation is a key knowledge acquisition issue. When representing knowledge, the most important point is to ensure that the formal exploitation of the knowledge representation conforms to its meaning in the domain. In this paper, we examine some methodological and theoretical principles to enforce this conformity. These principles result from our experience in Menelas, a medical language understanding project. 1
Intellectica. Revue de l'Association pour la Recherche Cognitive, 2010
La présence de l'archive : réinventer et justifier Bruno BACHIMONT RÉSUMÉ. Avons nous de la mémoi... more La présence de l'archive : réinventer et justifier Bruno BACHIMONT RÉSUMÉ. Avons nous de la mémoire car nous conservons intacts des souvenirs, ou avons-nous des souvenirs parce que nous exerçons sans cesse notre mémoire, réactivant et réinventant ces derniers ? La fidélité au passé est alors gagée dans l'intégrité des souvenirs, ou dans la qualité de l'exercice de la mémoire. Cette ancienne question connaît une actualité brûlante dans le contexte nouveau dressé par les technologies numériques : d'un côté, une hybris de la numérisation conduisant à tout conserver et stocker, de l'autre, une pratique du numérique où chaque contenu est réinventé dès lors qu'on le consulte. Cet article revient sur les conditions de possibilités de la mémoire pour privilégier une conception dynamique selon laquelle la mémoire est un exercice permanent où l'on s'empare d'objets pour en faire des traces, réinventant ainsi le passé par leur témoignage. L'enjeu est alors la critique argumentée et raisonnée de leur réinvention.
Reservados todos os direitos. Toda a reprodução ou transmissão, por qualquer forma, seja esta mec... more Reservados todos os direitos. Toda a reprodução ou transmissão, por qualquer forma, seja esta mecânica, electrónica, fotocópia, gravação ou qualquer outra, sem a prévia autorização escrita do autor e editor é ilícita e passível de procedimento judicial contra o infractor.
Proceedings of the 16th conference on Computational linguistics -, 1996
We address here the treatment of me, tonymie expressions from a knowledge representation perspe(:... more We address here the treatment of me, tonymie expressions from a knowledge representation perspe(:tive, that is, in the context of a text understanding system whi('h aims to build a (:onceptual representation from texts according to a domain mode, l ext)resse, d in a knowledge representation formalism. We focus in this t)aper on the part of tile semantic analyser which deals with semantic eoml)osition. We explain how we use tile domain model to handle metonymy dynamically, and more generally, to un-(lerlie semantic (:omposition, using tile knowledge descriptions atta(:hed to ea(:h (:oneept of our olttology as a kind of eon('el)t-h;ve.l , multii)b.-role (lualia structure. YVe rely for this on ~t heuristic 1)ath search algorithm that exl)loits the gr~phic aspects of the eon(:eptual gratIhs formalism. The methods described have 1)een imi)lemente<l and applie(l on French texts in the medical domain.
Journées Francophones dIngénierie des Connaissances, 2007
Les contenus qui transitent sur les réseaux sont régulés par la loi et par des systèmes de gestio... more Les contenus qui transitent sur les réseaux sont régulés par la loi et par des systèmes de gestion technique appelés DRM (Digital Right Management). Ces systèmes comportent une brique sémantique permettant de décrire les licences d'utilisation des contenus : il s'agit des REL (Right Expression Language). Chaque DRM dispose d'un Langage d'Expression de Droits qui lui est propre (ODRL pour la DRM d'OMA, XrML pour Windows...). Du fait de l'usage grandissant des systèmes de DRM (Digital Right Management) dans la distribution de contenus sur Internet, les détenteurs de contenus sont contraints de déclarer autant de licences que de systèmes de DRM utilisés. En effet, si un détenteur de droit désire proposer au public son contenu sur PC et sur téléphone mobile, il devra payer pour que son distributeur de contenu lui génère deux licences distinctes (en ODRL pour les téléphones mobiles et en XrML pour les PC). On comprend donc le besoin de définir une représentation "générique" des licences. Nous présenterons dans cet article notre approche "express once, translate many", qui permet l'expression générique de licences à l'aide d'un REL basé sur une ontologie des licences documentée par les règles du droit d'auteur.
Lecture Notes in Computer Science, 2005
Proceedings / AMIA ... Annual Symposium. AMIA Symposium, 1998
The patient record is a repository for knowledge about a patient. Work in Artificial Intelligence... more The patient record is a repository for knowledge about a patient. Work in Artificial Intelligence and knowledge representation has evidenced the intrinsic difficulty of formalizing knowledge for computer processing. It is therefore not a surprise that most attempts at computerizing the patient record have only had a limited degree of success or applicability. We claim that this is due to the fact that medicine is an empirical domain, and thus fundamentally resists formalization. Therefore, the only way medical knowledge can be fully expressed is through natural languages which is indeed what clinicians actually use. We proposed and designed an electronic medical record which adheres to this hypothesis and where structured documents play a prominent role.
Proceedings : a conference of the American Medical Informatics Association / ... AMIA Annual Fall Symposium. AMIA Fall Symposium, 1997
The Menelas project aimed to produce a normalized conceptual representation from natural language... more The Menelas project aimed to produce a normalized conceptual representation from natural language patient discharge summaries. Because of the complex and detailed nature of conceptual representations, evaluating the quality of output of such a system is difficult. We present the method designed to measure the quality of Menelas output, and its application to the state of the French Menelas prototype as of the end of the project. We examine this method in the framework recently proposed by Friedman and Hripcsak. We also propose two conditions which enable to reduce the evaluation preparation workload.
Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care, 1995
The overall goal of MENELAS is to provide better access to the information contained in natural l... more The overall goal of MENELAS is to provide better access to the information contained in natural language patient discharge summaries (PDSs), through the design and implementation of a prototype able to analyse medical texts. The approach taken by MENELAS is based on the following key principles: (i) to maximise the usefulness of natural language analysis and the usability of its results, the output of natural language analysis must be a normalised conceptual representation of medical information; and (ii) to maximise the reuse of resources, language analysis should be domain-independent and conceptual representation should be language-independent. This paper discusses the results obtained and the issues raised when implementing these principles during the project.
Proceedings of the 11th international conference on Artificial intelligence and law - ICAIL '07, 2007
Digital contents distributed over the internet are regulated by law and by technical management s... more Digital contents distributed over the internet are regulated by law and by technical management systems. The latter include a semantic component that describes licenses, i.e. rights of use which are granted to the user. These elements of Digital Rights Management (DRM) systems are called Rights Expression Languages (REL), they gather terms and relations needed to build licenses. Some are based on an ontology of online licenses, not necessarily related to applicable law and various legal systems, and cannot interoperate. As a consequence, there is a need for a more generic way to express licenses. Here, generic means that rightholders should only need to express the license they need once, and semi-automatic tools should then translate this license so it can be browsed by any specific system. Hence it implies the necessity to be able to model concept semantics in order to translate a license expressed in generic terms into more specific terms that are compliant with the specific standards used by distribution systems. This work comes as part of larger studies on legal ontologies, legal systems and RELs.
Proceedings of the 2007 international workshop on Semantically aware document processing and indexing - SADPI '07, 2007
Documentaliste-Sciences de l'Information, 2011
Terminology, 2005
In this paper, we present an experiment dealing with corpus-based construction of “differential o... more In this paper, we present an experiment dealing with corpus-based construction of “differential ontologies”, which are organised according to semantic similarity and differential features. We argue that knowledge-rich defining contexts can be useful to help an ontology modeller in his task. We present a method, based on lexico-syntactic patterns, to spot such contexts in a corpus, then identify the terms they relate (definiendum and genus or “characteristics”) and the semantic relation that links them. We also show how potential co-hyponyms can be detected on the basis of shared words in their definiens. We evaluate the extracted defining sentences, semantic relations and co-hyponyms on a test corpus focusing on childhood and on an evaluation corpus about dietetics (both corpora are French). Definition extraction obtains 50% precision and recall of approximately 40%. Semantic relation identification reaches an average of 48% precision, and co-hyponyms 23.5%. We discuss the results o...
Actes de TALN, 2004
Pour construire une ontologie, un modéliseur a besoin d'objecter des informations sémantiques sur... more Pour construire une ontologie, un modéliseur a besoin d'objecter des informations sémantiques sur les termes principaux de son domaine d'étude. Les outils d'exploration de corpus peuvent aider à repérer ces types d'information, et l'identification de couples d'hyperonymes a fait l'objet de plusieurs travaux. Nous proposons d'exploiter des énoncés définitoires pour extraire d'un corpus des informations concernant les trois axes de l'ossature ontologique : l'axe vertical, lié à l'hyperonymie, l'axe horizontal, lié à la co-hyponymie et l'axe transversal, lié aux relations du domaine. Après un rappel des travaux existants en repérage d'énoncés définitoires en TAL, nous développons la méthode que nous avons mise en place, puis nous présentons son évaluation et les premiers résultats obtenus. Leur repérage atteint de 10% à 69% de précision suivant les patrons, celui des unités lexicales varie de 31% à 56%, suivant le référentiel adopté. In order to build an ontology, a modeler needs to objectivate semantic information about the main terms of his domain. Some tools meant to explore corpora can help pointing out this information, and previous work has focused on the identification of hyperonyms. We propose here to rely on lay definitions to extract the information necessary to build an ontology structure: the vertical axis, related to hypernymy, the horizontal axis, related to co-hyponymy, and the transversal axis, linked to domain-related cross relations. After a survey of previous work about the extraction of definitions in NLP, we develop the method we followed, then present its evaluation criteria and the first results. The mining of lay definitions reached from 10 to 69% of precision, depending on the pattern involved, the mining of lexical items varied from 31 to 56%, following the reference considered.