M Musen | Stanford University (original) (raw)
Papers by M Musen
Lecture Notes in Computer Science, 2010
As knowledge bases move into the landscape of larger ontologies and have terabytes of related dat... more As knowledge bases move into the landscape of larger ontologies and have terabytes of related data, we must work on optimizing the performance of our tools. We are easily tempted to buy bigger machines or to fill rooms with armies of little ones to address the scalability problem. Yet, careful analysis and evaluation of the characteristics of our data-using metrics-often leads to dramatic improvements in performance. Firstly, are current scalable systems scalable enough? We found that for large or deep ontologies (some as large as 500,000 classes) it is hard to say because benchmarks obscure the load-time costs for materialization. Therefore, to expose those costs, we have synthesized a set of more representative ontologies. Secondly, in designing for scalability, how do we manage knowledge over time? By optimizing for data distribution and ontology evolution, we have reduced the population time, including materialization, for the NCBO Resource Index, a knowledge base of 16.4 billion annotations linking 2.4 million terms from 200 ontologies to 3.5 million data elements, from one week to less than one hour for one of the large datasets on the same machine.
Methods of information in medicine, 2011
AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium, 2010
Ontologies have become a critical component of many applications in biomedical informatics. Howev... more Ontologies have become a critical component of many applications in biomedical informatics. However, the landscape of the ontology tools today is largely fragmented, with independent tools for ontology editing, publishing, and peer review: users develop an ontology in an ontology editor, such as Protégé; and publish it on a Web server or in an ontology library, such as BioPortal, in order to share it with the community; they use the tools provided by the library or mailing lists and bug trackers to collect feedback from users. In this paper, we present a set of tools that bring the ontology editing and publishing closer together, in an integrated platform for the entire ontology lifecycle. This integration streamlines the workflow for collaborative development and increases integration between the ontologies themselves through the reuse of terms.
AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium, 2008
Semantic-similarity measures quantify concept similarities in a given ontology. Potential applica... more Semantic-similarity measures quantify concept similarities in a given ontology. Potential applications for these measures include search, data mining, and knowledge discovery in database or decision-support systems that utilize ontologies. To date, there have not been comparisons of the different semantic-similarity approaches on a single ontology. Such a comparison can offer insight on the validity of different approaches. We compared 3 approaches to semantic similarity-metrics (which rely on expert opinion, ontologies only, and information content) with 4 metrics applied to SNOMED-CT. We found that there was poor agreement among those metrics based on information content with the ontology only metric. The metric based only on the ontology structure correlated most with expert opinion. Our results suggest that metrics based on the ontology only may be preferable to information-content-based metrics, and point to the need for more research on validating the different approaches.
Studies in health technology and informatics, 2007
Reuse of ontologies is important for achieving better interoperability among health systems and r... more Reuse of ontologies is important for achieving better interoperability among health systems and relieving knowledge engineers from the burden of developing ontologies from scratch. Most of the work that aims to facilitate ontology reuse has focused on building ontology libraries that are simple repositories of ontologies or has led to keyword-based search tools that search among ontologies. To our knowledge, there are no operational methodologies that allow users to evaluate ontologies and to compare them in order to choose the most appropriate ontology for their task. In this paper, we present, Knowledge Zone - a Web-based portal that allows users to submit their ontologies, to associate metadata with their ontologies, to search for existing ontologies, to find ontology rankings based on user reviews, to post their own reviews, and to rate reviews.
Evidence report/technology assessment (Summary), 2002
This report may be used, in whole or in part, as the basis for development of clinical practice g... more This report may be used, in whole or in part, as the basis for development of clinical practice guidelines and other quality enhancement tools, or as a basis for reimbursement and coverage policies. Endorsement by the Agency for Healthcare Research and Quality (AHRQ) or the U.S. Department of Health and Human Services (DHHS) of such derivative products may not be stated or implied.
Proceedings / AMIA ... Annual Symposium. AMIA Symposium, 1999
We present a methodology and database mediator tool for integrating modern knowledge-based system... more We present a methodology and database mediator tool for integrating modern knowledge-based systems, such as the Stanford EON architecture for automated guideline-based decision-support, with legacy databases, such as the Veterans Health Information Systems & Technology Architecture (VISTA) systems, which are used nation-wide. Specifically, we discuss designs for database integration in ATHENA, a system for hypertension care based on EON, at the VA Palo Alto Health Care System. We describe a new database mediator that affords the EON system both physical and logical data independence from the legacy VA database. We found that to achieve our design goals, the mediator requires two separate mapping levels and must itself involve a knowledge-based component.
Machine Intelligence and Pattern Recognition, 1988
As tools for the construction of expert systems have become commonly available, workers in artifi... more As tools for the construction of expert systems have become commonly available, workers in artificial intelligence (Al) have begun to pay increasing attention to the problems of building and maintaining large knowledge bases.
MMWR. Morbidity and mortality weekly report, Jan 26, 2005
Syndromic surveillance offers the potential to rapidly detect outbreaks resulting from terrorism.... more Syndromic surveillance offers the potential to rapidly detect outbreaks resulting from terrorism. Despite considerable experience with implementing syndromic surveillance, limited evidence exists to describe the performance of syndromic surveillance systems in detecting outbreaks. To describe a model for simulating cases that might result from exposure to inhalational anthrax and then use the model to evaluate the ability of syndromic surveillance to detect an outbreak of inhalational anthrax after an aerosol release. Disease progression and health-care use were simulated for persons infected with anthrax. Simulated cases were then superimposed on authentic surveillance data to create test data sets. A temporal outbreak detection algorithm was applied to each test data set, and sensitivity and timeliness of outbreak detection were calculated by using syndromic surveillance. The earliest detection using a temporal algorithm was 2 days after a release. Earlier detection tended to occu...
Proceedings of the International Conference on Web Intelligence, Mining and Semantics - WIMS '11, 2011
Enhancing Semantic Web technologies with ability to express uncertainty and imprecision is widely... more Enhancing Semantic Web technologies with ability to express uncertainty and imprecision is widely discussed topic. While SWRL can provide additional expressivity to OWL-based ontologies, it does not provide any way to handle uncertainty or imprecision. There is a pressing need to provide a standard-based, simple and functioning solution. We describe an extension of SWRL called SWRL-F that we believe can
Les terminologies et les ontologies jouent un rôle central en sciences de la vie pour structurer ... more Les terminologies et les ontologies jouent un rôle central en sciences de la vie pour structurer les données biomédicales et les rendre interopérables [2]. L'utilisation d'ontologies pour indexer et intégrer les ressources de données est un moyen de valoriser la connaissance en facilitant la recherche et la fouille de données. Cependant, les découvertes qui pourraient être réalisées sont souvent limitées par la disponibilité et le traitement des données dans une langue seulement, le plus souvent l'anglais, pour laquelle il existe le plus d'ontologies et d'outils. Dans le cadre du projet Indexation sémantique de ressources biomédicales francophones (SIFR -http://www.lirmm.fr/sifr), nous nous intéressons à la gestion du multilinguisme dans la plateforme BioPortal (http://bioportal.bioontology.org) du Centre National pour les Ontologies Biomédicales (NCBO). BioPortal [7] permet d'accéder, visualiser, rechercher et commenter plus de 350 ontologies ou
With the growing popularity of large-scale biomedical collaborative ontology-engineering projects... more With the growing popularity of large-scale biomedical collaborative ontology-engineering projects, such as the creation of the 11 th revision of the International Classification of Diseases, new methods and insights are needed to help project-and communitymanagers to cope with the constantly growing complexity of such projects. In this paper we present a novel application of Markov Chains on the change-logs of collaborative ontology-engineering projects to extract and analyze sequential patterns. This method also allows to investigate memory and structure in human activity patterns when collaboratively creating an ontology by leveraging Markov Chain models of varying orders. We describe all necessary steps for applying the methodology to collaborative ontologyengineering projects and provide first results for the International Classification of Diseases in its 11 th revision. Furthermore, we show that the collected sequential-patterns provide actionable information for community-and project-managers to monitor, coordinate and dynamically adapt to the natural development processes that occur when collaboratively engineering an ontology. We hope that the adaption of the presented methodology will spur a new line of ontology-development tools and evaluationtechniques, which concentrate on the interactive nature of the collaborative ontology-engineering process.
Lecture Notes in Computer Science, 2010
As knowledge bases move into the landscape of larger ontologies and have terabytes of related dat... more As knowledge bases move into the landscape of larger ontologies and have terabytes of related data, we must work on optimizing the performance of our tools. We are easily tempted to buy bigger machines or to fill rooms with armies of little ones to address the scalability problem. Yet, careful analysis and evaluation of the characteristics of our data-using metrics-often leads to dramatic improvements in performance. Firstly, are current scalable systems scalable enough? We found that for large or deep ontologies (some as large as 500,000 classes) it is hard to say because benchmarks obscure the load-time costs for materialization. Therefore, to expose those costs, we have synthesized a set of more representative ontologies. Secondly, in designing for scalability, how do we manage knowledge over time? By optimizing for data distribution and ontology evolution, we have reduced the population time, including materialization, for the NCBO Resource Index, a knowledge base of 16.4 billion annotations linking 2.4 million terms from 200 ontologies to 3.5 million data elements, from one week to less than one hour for one of the large datasets on the same machine.
Methods of information in medicine, 2011
AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium, 2010
Ontologies have become a critical component of many applications in biomedical informatics. Howev... more Ontologies have become a critical component of many applications in biomedical informatics. However, the landscape of the ontology tools today is largely fragmented, with independent tools for ontology editing, publishing, and peer review: users develop an ontology in an ontology editor, such as Protégé; and publish it on a Web server or in an ontology library, such as BioPortal, in order to share it with the community; they use the tools provided by the library or mailing lists and bug trackers to collect feedback from users. In this paper, we present a set of tools that bring the ontology editing and publishing closer together, in an integrated platform for the entire ontology lifecycle. This integration streamlines the workflow for collaborative development and increases integration between the ontologies themselves through the reuse of terms.
AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium, 2008
Semantic-similarity measures quantify concept similarities in a given ontology. Potential applica... more Semantic-similarity measures quantify concept similarities in a given ontology. Potential applications for these measures include search, data mining, and knowledge discovery in database or decision-support systems that utilize ontologies. To date, there have not been comparisons of the different semantic-similarity approaches on a single ontology. Such a comparison can offer insight on the validity of different approaches. We compared 3 approaches to semantic similarity-metrics (which rely on expert opinion, ontologies only, and information content) with 4 metrics applied to SNOMED-CT. We found that there was poor agreement among those metrics based on information content with the ontology only metric. The metric based only on the ontology structure correlated most with expert opinion. Our results suggest that metrics based on the ontology only may be preferable to information-content-based metrics, and point to the need for more research on validating the different approaches.
Studies in health technology and informatics, 2007
Reuse of ontologies is important for achieving better interoperability among health systems and r... more Reuse of ontologies is important for achieving better interoperability among health systems and relieving knowledge engineers from the burden of developing ontologies from scratch. Most of the work that aims to facilitate ontology reuse has focused on building ontology libraries that are simple repositories of ontologies or has led to keyword-based search tools that search among ontologies. To our knowledge, there are no operational methodologies that allow users to evaluate ontologies and to compare them in order to choose the most appropriate ontology for their task. In this paper, we present, Knowledge Zone - a Web-based portal that allows users to submit their ontologies, to associate metadata with their ontologies, to search for existing ontologies, to find ontology rankings based on user reviews, to post their own reviews, and to rate reviews.
Evidence report/technology assessment (Summary), 2002
This report may be used, in whole or in part, as the basis for development of clinical practice g... more This report may be used, in whole or in part, as the basis for development of clinical practice guidelines and other quality enhancement tools, or as a basis for reimbursement and coverage policies. Endorsement by the Agency for Healthcare Research and Quality (AHRQ) or the U.S. Department of Health and Human Services (DHHS) of such derivative products may not be stated or implied.
Proceedings / AMIA ... Annual Symposium. AMIA Symposium, 1999
We present a methodology and database mediator tool for integrating modern knowledge-based system... more We present a methodology and database mediator tool for integrating modern knowledge-based systems, such as the Stanford EON architecture for automated guideline-based decision-support, with legacy databases, such as the Veterans Health Information Systems & Technology Architecture (VISTA) systems, which are used nation-wide. Specifically, we discuss designs for database integration in ATHENA, a system for hypertension care based on EON, at the VA Palo Alto Health Care System. We describe a new database mediator that affords the EON system both physical and logical data independence from the legacy VA database. We found that to achieve our design goals, the mediator requires two separate mapping levels and must itself involve a knowledge-based component.
Machine Intelligence and Pattern Recognition, 1988
As tools for the construction of expert systems have become commonly available, workers in artifi... more As tools for the construction of expert systems have become commonly available, workers in artificial intelligence (Al) have begun to pay increasing attention to the problems of building and maintaining large knowledge bases.
MMWR. Morbidity and mortality weekly report, Jan 26, 2005
Syndromic surveillance offers the potential to rapidly detect outbreaks resulting from terrorism.... more Syndromic surveillance offers the potential to rapidly detect outbreaks resulting from terrorism. Despite considerable experience with implementing syndromic surveillance, limited evidence exists to describe the performance of syndromic surveillance systems in detecting outbreaks. To describe a model for simulating cases that might result from exposure to inhalational anthrax and then use the model to evaluate the ability of syndromic surveillance to detect an outbreak of inhalational anthrax after an aerosol release. Disease progression and health-care use were simulated for persons infected with anthrax. Simulated cases were then superimposed on authentic surveillance data to create test data sets. A temporal outbreak detection algorithm was applied to each test data set, and sensitivity and timeliness of outbreak detection were calculated by using syndromic surveillance. The earliest detection using a temporal algorithm was 2 days after a release. Earlier detection tended to occu...
Proceedings of the International Conference on Web Intelligence, Mining and Semantics - WIMS '11, 2011
Enhancing Semantic Web technologies with ability to express uncertainty and imprecision is widely... more Enhancing Semantic Web technologies with ability to express uncertainty and imprecision is widely discussed topic. While SWRL can provide additional expressivity to OWL-based ontologies, it does not provide any way to handle uncertainty or imprecision. There is a pressing need to provide a standard-based, simple and functioning solution. We describe an extension of SWRL called SWRL-F that we believe can
Les terminologies et les ontologies jouent un rôle central en sciences de la vie pour structurer ... more Les terminologies et les ontologies jouent un rôle central en sciences de la vie pour structurer les données biomédicales et les rendre interopérables [2]. L'utilisation d'ontologies pour indexer et intégrer les ressources de données est un moyen de valoriser la connaissance en facilitant la recherche et la fouille de données. Cependant, les découvertes qui pourraient être réalisées sont souvent limitées par la disponibilité et le traitement des données dans une langue seulement, le plus souvent l'anglais, pour laquelle il existe le plus d'ontologies et d'outils. Dans le cadre du projet Indexation sémantique de ressources biomédicales francophones (SIFR -http://www.lirmm.fr/sifr), nous nous intéressons à la gestion du multilinguisme dans la plateforme BioPortal (http://bioportal.bioontology.org) du Centre National pour les Ontologies Biomédicales (NCBO). BioPortal [7] permet d'accéder, visualiser, rechercher et commenter plus de 350 ontologies ou
With the growing popularity of large-scale biomedical collaborative ontology-engineering projects... more With the growing popularity of large-scale biomedical collaborative ontology-engineering projects, such as the creation of the 11 th revision of the International Classification of Diseases, new methods and insights are needed to help project-and communitymanagers to cope with the constantly growing complexity of such projects. In this paper we present a novel application of Markov Chains on the change-logs of collaborative ontology-engineering projects to extract and analyze sequential patterns. This method also allows to investigate memory and structure in human activity patterns when collaboratively creating an ontology by leveraging Markov Chain models of varying orders. We describe all necessary steps for applying the methodology to collaborative ontologyengineering projects and provide first results for the International Classification of Diseases in its 11 th revision. Furthermore, we show that the collected sequential-patterns provide actionable information for community-and project-managers to monitor, coordinate and dynamically adapt to the natural development processes that occur when collaboratively engineering an ontology. We hope that the adaption of the presented methodology will spur a new line of ontology-development tools and evaluationtechniques, which concentrate on the interactive nature of the collaborative ontology-engineering process.