Lawrence Cavedon | RMIT University (original) (raw)
Papers by Lawrence Cavedon
Proceedings of the Third Edition Workshop on Speech, Language & Audio in Multimedia - SLAM '15, 2015
During clinical handover, clinicians exchange information about the patients and the state of cli... more During clinical handover, clinicians exchange information about the patients and the state of clinical management. To improve care safety and quality, both handover and its documentation have been standardized. Speech recognition and entity extraction provide a way to help health service providers to follow these standards by implementing the handover process as a structured form, whose headings guide the handover narrative, and the documentation process as proofing and sign-off of the automatically filled-out form. In this paper, we evaluate such systems. The form considers the sections of Handover nurse, Patient introduction, My shift, Medication, Appointments, and Future care, divided in 49 mutually exclusive headings to fill out with speech recognized and extracted entities. Our system correctly recognizes 10,244 out of 14,095 spoken words and regardless of 6,692 erroneous words, its error percentage is significantly smaller than for systems submitted to the CLEF eHealth Evaluation Lab 2015. In the extraction of 35 entities with training data (i.e., 14 headings were not present in the 101 expertannotated training documents with 8,487 words in total), the system correctly extracts 2,375 out of 3,793 words in 50 test documents after calibration on 3,937 words in 50 validation documents. This translates to over 90% F1 in extracting information for the patient's age, current bed, current room, and given name and over 70% F1 for patient's admission reason/diagnosis and last name. F1 for filtering out irrelevant information is 78%. We have made the data publicly available for 201 handover cases together with processing results and code and proposed the extraction task for CLEF eHealth 2016.
Proceedings of the 2010 Workshop on Companionable Dialogue Systems}, 2010
@Book{CDS:2010, editor = {Yorick Wilks and Bj\"{o}rn Gamb\&a... more @Book{CDS:2010, editor = {Yorick Wilks and Bj\"{o}rn Gamb\"{a}ck and Morena Danieli}, title = {Proceedings of the 2010 Workshop on Companionable Dialogue Systems}, month = {July}, year = {2010}, address = {Uppsala, Sweden}, publisher = {Association for Computational ...
Lung cancer is a leading cause of death in developed countries. This paper presents a text mining... more Lung cancer is a leading cause of death in developed countries. This paper presents a text mining system using Support Vector Machines for detecting lung cancer admissions. Performance of the system using different clinical data sources is evaluated. We use radiology reports as an initial data source and add other sources, such as pathology reports, patient demographic information and hospital admission information. Results show that mining over linked data sources significantly improves classification performance with a maximum F-Score improvement of 0.057.
Searching and selecting articles to be included in systematic reviews is a real challenge for hea... more Searching and selecting articles to be included in systematic reviews is a real challenge for healthcare agencies responsible for publishing these reviews. The current practice of manually reviewing all papers returned by complex hand-crafted boolean queries is human labour-intensive and difficult to maintain. We demonstrate a two-stage searching system that takes advantage of ranked queries and support-vector machine text classification to assist in the retrieval of relevant articles, and to restrict results to higher-quality documents. Our proposed approach shows significant work saved in the systematic review process over a baseline of a keyword-based retrieval system.
Nature Precedings, 2008
The Human Variome Project (HVP; www.humanvariomeproject.org) was initiated at a meeting in June 2... more The Human Variome Project (HVP; www.humanvariomeproject.org) was initiated at a meeting in June 2006 and addressed the problems of collecting genetic information and generated 96 recommendations (http://www.nature.com/ng/journal/v39/n4/full/ng0407-423.html) to overcome these, with the focus on Mendelian disease. A considerable number of projects have been added, to those that have been ongoing for a number of years, since that meeting. Also, a planning meeting is to be held May 25-29, 2008 in Spain (http://www.humanvariomeproject.org/HVP2008/).A dramatic boost has been given to the HVP by the preparedness and action of the International Society for Gastrointestinal Hereditary Tumours (InSiGHT; www.insight-group.org), to, in order to improve their own informatics systems for dealing with inherited colon cancer, set up a pilot system for collection and databasing mutation and phenotype information, i.e. to act as pilot for the HVP. This is then intended to be transferred to other gene...
2010 11th International Conference on Control Automation Robotics & Vision, 2010
Thinking Head project is a multidisciplinary approach to building intelligent agents for human ma... more Thinking Head project is a multidisciplinary approach to building intelligent agents for human machine interaction. The Thinking Head Framework evolved out of the Thinking Head Project and it facilitates loose coupling between various components and forms the central nerve system in a multimodal perception-action system. The paper presents the overall architecture, components and the attention system. The paper then concludes with a preliminary behavioral experiment that studies the intelligibility of the audiovisual speech output produced by the Embodied Conversational Agent (ECA) that is part of the system. These results provide the baseline for future evaluations of the system as the project progresses through multiple evaluate and refine cycles.
Human Mutation, 2009
This article is a US government work and, as such, is in the public domain in the United States o... more This article is a US government work and, as such, is in the public domain in the United States of America.
Database, 2013
This article introduces the Variome Annotation Schema, a schema that aims to capture the core con... more This article introduces the Variome Annotation Schema, a schema that aims to capture the core concepts and relations relevant to cataloguing and interpreting human genetic variation and its relationship to disease, as described in the published literature. The schema was inspired by the needs of the database curators of the International Society for Gastrointestinal Hereditary Tumours (InSiGHT) database, but is intended to have application to genetic variation information in a range of diseases. The schema has been applied to a small corpus of full text journal publications on the subject of inherited colorectal cancer. We show that the inter-annotator agreement on annotation of this corpus ranges from 0.78 to 0.95 F-score across different entity types when exact matching is measured, and improves to a minimum F-score of 0.87 when boundary matching is relaxed. Relations show more variability in agreement, but several are reliable, with the highest, cohort-has-size, reaching 0.90 F-score. We also explore the relevance of the schema to the InSiGHT database curation process. The schema and the corpus represent an important new resource for the development of text mining solutions that address relationships among patient cohorts, disease and genetic variation, and therefore, we also discuss the role text mining might play in the curation of information related to the human variome.
BMC Medical Informatics and Decision Making, 2010
Background: The process of constructing a systematic review, a document that compiles the publish... more Background: The process of constructing a systematic review, a document that compiles the published evidence pertaining to a specified medical topic, is intensely time-consuming, often taking a team of researchers over a year, with the identification of relevant published research comprising a substantial portion of the effort. The standard paradigm for this information-seeking task is to use Boolean search; however, this leaves the user(s) the requirement of examining every returned result. Further, our experience is that effective Boolean queries for this specific task are extremely difficult to formulate and typically require multiple iterations of refinement before being finalized. Methods: We explore the effectiveness of using ranked retrieval as compared to Boolean querying for the purpose of constructing a systematic review. We conduct a series of experiments involving ranked retrieval, using queries defined methodologically, in an effort to understand the practicalities of incorporating ranked retrieval into the systematic search task. Results: Our results show that ranked retrieval by itself is not viable for this search task requiring high recall. However, we describe a refinement of the standard Boolean search process and show that ranking within a Boolean result set can improve the overall search performance by providing early indication of the quality of the results, thereby speeding up the iterative query-refinement process. Conclusions: Outcomes of experiments suggest that an interactive query-development process using a hybrid ranked and Boolean retrieval system has the potential for significant time-savings over the current search process in the systematic reviewing.
BMC Bioinformatics, 2011
Aim Given a set of pre-defined medical categories used in Evidence Based Medicine, we aim to auto... more Aim Given a set of pre-defined medical categories used in Evidence Based Medicine, we aim to automatically annotate sentences in medical abstracts with these labels. Method We constructed a corpus of 1,000 medical abstracts annotated by hand with specified medical categories (e.g. Intervention, Outcome). We explored the use of various features based on lexical, semantic, structural, and sequential information in the data, using Conditional Random Fields (CRF) for classification. Results For the classification tasks over all labels, our systems achieved micro-averaged f-scores of 80.9% and 66.9% over datasets of structured and unstructured abstracts respectively, using sequential features. In labeling only the key sentences, our systems produced f-scores of 89.3% and 74.0% over structured and unstructured abstracts respectively, using the same sequential features. The results over an external dataset were lower (f-scores of 63.1% for all labels, and 83.8% for key sentences). Conclusi...
Efficient wide-coverage parsing is integral to large-scale NLP applications. Unfortunately, parse... more Efficient wide-coverage parsing is integral to large-scale NLP applications. Unfortunately, parsers for linguistically motivated formalisms, e.g. HPSG and TAG, are often too inefficient for these applications. This paper describes two modifications to the standard CKY chart parsing algorithm used in the Clark and Curran (2006) Combinatory Categorial Grammar (CCG) parser. The first modification extends the tight integration of the supertagger and parser, so that individual supertags can be added to the chart, which is then repaired rather than rebuilt. The second modification adds constraints to the chart that restrict which constituents can combine. Parsing speed is improved by 30-35% without a significant accuracy penalty and a small increase in coverage when both of these modifications are used.
We describe an architecture for practical mul-ti-application, multi-device spoken-language dialog... more We describe an architecture for practical mul-ti-application, multi-device spoken-language dialogue systems, based on the information-state update approach. Our system provides representation-neutral core components of a powerful dialogue system, while enabling: scripted domain-specific extensions to rou-tines such as dialogue move modeling and ref-erence resolution; easy substitution of specific semantic representations and associated rou-tines; and clean interfaces to external compo-nents for language-understanding (i.e. speech-recognition and parsing) and-generation, and to domain-specific knowledge sources. This infrastructure forms the basis of a “plug and play ” dialogue management capability, where-by new dialogue-enabled devices can be dy-namically introduced to the system. The plug-and-play infrastructure is an important aspect of an environment for dialogue control of in-car devices.
We explore the task of automatically classifying dialogue acts in 1-on-1 online chat forums, an i... more We explore the task of automatically classifying dialogue acts in 1-on-1 online chat forums, an increasingly popular means of providing customer service. In particular, we investigate the effectiveness of various features and machine learners for this task. While a simple bag-of-words approach provides a solid baseline, we find that adding information from dialogue structure and inter-utterance dependency provides some increase in performance; learners that account for sequential dependencies (CRFs) show the best performance. We report our results from testing using a corpus of chat dialogues derived from online shopping customer-feedback data.
We present an approach to dialogue management and interpretation that evaluates and selects among... more We present an approach to dialogue management and interpretation that evaluates and selects amongst candidate dialogue moves based on features at multiple levels. Multiple interpretation methods can be combined, multiple speech recognition and parsing hypotheses tested, and multiple candidate dialogue moves considered to choose the highest scoring hypothesis overall. We integrate hypotheses generated from shallow slot-filling methods and from relatively deep parsing, using pragmatic information. We show that this gives more robust performance than using either approach alone, allowing n-best list reordering to correct errors in speech recognition or parsing. Index Terms: dialogue management, robust interpretation
The Thinking Head project has as it aims to develop (i) a new generation Talking Thinking Head th... more The Thinking Head project has as it aims to develop (i) a new generation Talking Thinking Head that embodies human attributes, and improves human-machine interaction; and (ii) a plug-and-play research platform for users to test software in an interactive real-time environment. Here, project progress is discussed in terms of the four teams: 1. Head Building – (i) Plug-and-Play architecture, (ii) Thinking Media Framework, and (iii) Animation; 2. Human-Head Interaction (HHI) – (i) Wizard of Oz studies, and (ii) joint attention by human and head; 3. Evaluation; and 4. Performance in (i) the Beijing Head and (ii) the Pedestal Head. Directions for future research are outlined as appropriate. Index Terms: talking heads, human-computer interaction, evaluation, performance
This paper targets the content selection problem in generating appropriate information in the dom... more This paper targets the content selection problem in generating appropriate information in the domain of in-car navigation. It describes an algorithm that models driver’s knowledge about roads and routes and uses this knowledge to tailor turn-by-turn instructions from a commercial routing service to those more suitable to the individual driver’s background. This content selection component is one part of a domain independent generation system of a general purpose dialogue system toolkit. We claim that this type of adaptive generation facilitates more efficient and driver friendly navigation.
1. Summary Much clinical data available in electronic health records (EHRs) are in text format. D... more 1. Summary Much clinical data available in electronic health records (EHRs) are in text format. Developing text processing and mining techniques for such data is necessary for realizing the full value of this data, to support data-driven analysis, decision-making, and discovery. This abstract outlines our vision and describes case studies of text processing and text mining applications over EHRs to address challenges in Health and healthcare.
Searching and selecting articles to be included in systematic reviews is a real challenge for hea... more Searching and selecting articles to be included in systematic reviews is a real challenge for healthcare agencies responsible for publishing these reviews. The current practice of manually reviewing all papers returned by complex hand-crafted boolean queries is human labour-intensive. We demonstrate a two-stage searching system that takes advantage of ranked queries and support-vector machine text classification to assist retrieval of relevant articles, and to restrict results to higher-quality documents. Our proposed approach shows significant work saved in the systematic review process over a baseline of a keyword-based retrieval system
We show how topic models are useful for interpreting and understanding MeSH, the Medical Subject ... more We show how topic models are useful for interpreting and understanding MeSH, the Medical Subject Headings applied to articles in MEDLINE. We show how our resampled author model captures some of the advantages of both the topic model and the author-topic model. We demonstrate how the topic modeling approach can provide an alternative and complementary view of the relationship between MeSH headings that could be informative and helpful for people searching MEDLINE.
Proceedings of the Third Edition Workshop on Speech, Language & Audio in Multimedia - SLAM '15, 2015
During clinical handover, clinicians exchange information about the patients and the state of cli... more During clinical handover, clinicians exchange information about the patients and the state of clinical management. To improve care safety and quality, both handover and its documentation have been standardized. Speech recognition and entity extraction provide a way to help health service providers to follow these standards by implementing the handover process as a structured form, whose headings guide the handover narrative, and the documentation process as proofing and sign-off of the automatically filled-out form. In this paper, we evaluate such systems. The form considers the sections of Handover nurse, Patient introduction, My shift, Medication, Appointments, and Future care, divided in 49 mutually exclusive headings to fill out with speech recognized and extracted entities. Our system correctly recognizes 10,244 out of 14,095 spoken words and regardless of 6,692 erroneous words, its error percentage is significantly smaller than for systems submitted to the CLEF eHealth Evaluation Lab 2015. In the extraction of 35 entities with training data (i.e., 14 headings were not present in the 101 expertannotated training documents with 8,487 words in total), the system correctly extracts 2,375 out of 3,793 words in 50 test documents after calibration on 3,937 words in 50 validation documents. This translates to over 90% F1 in extracting information for the patient's age, current bed, current room, and given name and over 70% F1 for patient's admission reason/diagnosis and last name. F1 for filtering out irrelevant information is 78%. We have made the data publicly available for 201 handover cases together with processing results and code and proposed the extraction task for CLEF eHealth 2016.
Proceedings of the 2010 Workshop on Companionable Dialogue Systems}, 2010
@Book{CDS:2010, editor = {Yorick Wilks and Bj\"{o}rn Gamb\&a... more @Book{CDS:2010, editor = {Yorick Wilks and Bj\"{o}rn Gamb\"{a}ck and Morena Danieli}, title = {Proceedings of the 2010 Workshop on Companionable Dialogue Systems}, month = {July}, year = {2010}, address = {Uppsala, Sweden}, publisher = {Association for Computational ...
Lung cancer is a leading cause of death in developed countries. This paper presents a text mining... more Lung cancer is a leading cause of death in developed countries. This paper presents a text mining system using Support Vector Machines for detecting lung cancer admissions. Performance of the system using different clinical data sources is evaluated. We use radiology reports as an initial data source and add other sources, such as pathology reports, patient demographic information and hospital admission information. Results show that mining over linked data sources significantly improves classification performance with a maximum F-Score improvement of 0.057.
Searching and selecting articles to be included in systematic reviews is a real challenge for hea... more Searching and selecting articles to be included in systematic reviews is a real challenge for healthcare agencies responsible for publishing these reviews. The current practice of manually reviewing all papers returned by complex hand-crafted boolean queries is human labour-intensive and difficult to maintain. We demonstrate a two-stage searching system that takes advantage of ranked queries and support-vector machine text classification to assist in the retrieval of relevant articles, and to restrict results to higher-quality documents. Our proposed approach shows significant work saved in the systematic review process over a baseline of a keyword-based retrieval system.
Nature Precedings, 2008
The Human Variome Project (HVP; www.humanvariomeproject.org) was initiated at a meeting in June 2... more The Human Variome Project (HVP; www.humanvariomeproject.org) was initiated at a meeting in June 2006 and addressed the problems of collecting genetic information and generated 96 recommendations (http://www.nature.com/ng/journal/v39/n4/full/ng0407-423.html) to overcome these, with the focus on Mendelian disease. A considerable number of projects have been added, to those that have been ongoing for a number of years, since that meeting. Also, a planning meeting is to be held May 25-29, 2008 in Spain (http://www.humanvariomeproject.org/HVP2008/).A dramatic boost has been given to the HVP by the preparedness and action of the International Society for Gastrointestinal Hereditary Tumours (InSiGHT; www.insight-group.org), to, in order to improve their own informatics systems for dealing with inherited colon cancer, set up a pilot system for collection and databasing mutation and phenotype information, i.e. to act as pilot for the HVP. This is then intended to be transferred to other gene...
2010 11th International Conference on Control Automation Robotics & Vision, 2010
Thinking Head project is a multidisciplinary approach to building intelligent agents for human ma... more Thinking Head project is a multidisciplinary approach to building intelligent agents for human machine interaction. The Thinking Head Framework evolved out of the Thinking Head Project and it facilitates loose coupling between various components and forms the central nerve system in a multimodal perception-action system. The paper presents the overall architecture, components and the attention system. The paper then concludes with a preliminary behavioral experiment that studies the intelligibility of the audiovisual speech output produced by the Embodied Conversational Agent (ECA) that is part of the system. These results provide the baseline for future evaluations of the system as the project progresses through multiple evaluate and refine cycles.
Human Mutation, 2009
This article is a US government work and, as such, is in the public domain in the United States o... more This article is a US government work and, as such, is in the public domain in the United States of America.
Database, 2013
This article introduces the Variome Annotation Schema, a schema that aims to capture the core con... more This article introduces the Variome Annotation Schema, a schema that aims to capture the core concepts and relations relevant to cataloguing and interpreting human genetic variation and its relationship to disease, as described in the published literature. The schema was inspired by the needs of the database curators of the International Society for Gastrointestinal Hereditary Tumours (InSiGHT) database, but is intended to have application to genetic variation information in a range of diseases. The schema has been applied to a small corpus of full text journal publications on the subject of inherited colorectal cancer. We show that the inter-annotator agreement on annotation of this corpus ranges from 0.78 to 0.95 F-score across different entity types when exact matching is measured, and improves to a minimum F-score of 0.87 when boundary matching is relaxed. Relations show more variability in agreement, but several are reliable, with the highest, cohort-has-size, reaching 0.90 F-score. We also explore the relevance of the schema to the InSiGHT database curation process. The schema and the corpus represent an important new resource for the development of text mining solutions that address relationships among patient cohorts, disease and genetic variation, and therefore, we also discuss the role text mining might play in the curation of information related to the human variome.
BMC Medical Informatics and Decision Making, 2010
Background: The process of constructing a systematic review, a document that compiles the publish... more Background: The process of constructing a systematic review, a document that compiles the published evidence pertaining to a specified medical topic, is intensely time-consuming, often taking a team of researchers over a year, with the identification of relevant published research comprising a substantial portion of the effort. The standard paradigm for this information-seeking task is to use Boolean search; however, this leaves the user(s) the requirement of examining every returned result. Further, our experience is that effective Boolean queries for this specific task are extremely difficult to formulate and typically require multiple iterations of refinement before being finalized. Methods: We explore the effectiveness of using ranked retrieval as compared to Boolean querying for the purpose of constructing a systematic review. We conduct a series of experiments involving ranked retrieval, using queries defined methodologically, in an effort to understand the practicalities of incorporating ranked retrieval into the systematic search task. Results: Our results show that ranked retrieval by itself is not viable for this search task requiring high recall. However, we describe a refinement of the standard Boolean search process and show that ranking within a Boolean result set can improve the overall search performance by providing early indication of the quality of the results, thereby speeding up the iterative query-refinement process. Conclusions: Outcomes of experiments suggest that an interactive query-development process using a hybrid ranked and Boolean retrieval system has the potential for significant time-savings over the current search process in the systematic reviewing.
BMC Bioinformatics, 2011
Aim Given a set of pre-defined medical categories used in Evidence Based Medicine, we aim to auto... more Aim Given a set of pre-defined medical categories used in Evidence Based Medicine, we aim to automatically annotate sentences in medical abstracts with these labels. Method We constructed a corpus of 1,000 medical abstracts annotated by hand with specified medical categories (e.g. Intervention, Outcome). We explored the use of various features based on lexical, semantic, structural, and sequential information in the data, using Conditional Random Fields (CRF) for classification. Results For the classification tasks over all labels, our systems achieved micro-averaged f-scores of 80.9% and 66.9% over datasets of structured and unstructured abstracts respectively, using sequential features. In labeling only the key sentences, our systems produced f-scores of 89.3% and 74.0% over structured and unstructured abstracts respectively, using the same sequential features. The results over an external dataset were lower (f-scores of 63.1% for all labels, and 83.8% for key sentences). Conclusi...
Efficient wide-coverage parsing is integral to large-scale NLP applications. Unfortunately, parse... more Efficient wide-coverage parsing is integral to large-scale NLP applications. Unfortunately, parsers for linguistically motivated formalisms, e.g. HPSG and TAG, are often too inefficient for these applications. This paper describes two modifications to the standard CKY chart parsing algorithm used in the Clark and Curran (2006) Combinatory Categorial Grammar (CCG) parser. The first modification extends the tight integration of the supertagger and parser, so that individual supertags can be added to the chart, which is then repaired rather than rebuilt. The second modification adds constraints to the chart that restrict which constituents can combine. Parsing speed is improved by 30-35% without a significant accuracy penalty and a small increase in coverage when both of these modifications are used.
We describe an architecture for practical mul-ti-application, multi-device spoken-language dialog... more We describe an architecture for practical mul-ti-application, multi-device spoken-language dialogue systems, based on the information-state update approach. Our system provides representation-neutral core components of a powerful dialogue system, while enabling: scripted domain-specific extensions to rou-tines such as dialogue move modeling and ref-erence resolution; easy substitution of specific semantic representations and associated rou-tines; and clean interfaces to external compo-nents for language-understanding (i.e. speech-recognition and parsing) and-generation, and to domain-specific knowledge sources. This infrastructure forms the basis of a “plug and play ” dialogue management capability, where-by new dialogue-enabled devices can be dy-namically introduced to the system. The plug-and-play infrastructure is an important aspect of an environment for dialogue control of in-car devices.
We explore the task of automatically classifying dialogue acts in 1-on-1 online chat forums, an i... more We explore the task of automatically classifying dialogue acts in 1-on-1 online chat forums, an increasingly popular means of providing customer service. In particular, we investigate the effectiveness of various features and machine learners for this task. While a simple bag-of-words approach provides a solid baseline, we find that adding information from dialogue structure and inter-utterance dependency provides some increase in performance; learners that account for sequential dependencies (CRFs) show the best performance. We report our results from testing using a corpus of chat dialogues derived from online shopping customer-feedback data.
We present an approach to dialogue management and interpretation that evaluates and selects among... more We present an approach to dialogue management and interpretation that evaluates and selects amongst candidate dialogue moves based on features at multiple levels. Multiple interpretation methods can be combined, multiple speech recognition and parsing hypotheses tested, and multiple candidate dialogue moves considered to choose the highest scoring hypothesis overall. We integrate hypotheses generated from shallow slot-filling methods and from relatively deep parsing, using pragmatic information. We show that this gives more robust performance than using either approach alone, allowing n-best list reordering to correct errors in speech recognition or parsing. Index Terms: dialogue management, robust interpretation
The Thinking Head project has as it aims to develop (i) a new generation Talking Thinking Head th... more The Thinking Head project has as it aims to develop (i) a new generation Talking Thinking Head that embodies human attributes, and improves human-machine interaction; and (ii) a plug-and-play research platform for users to test software in an interactive real-time environment. Here, project progress is discussed in terms of the four teams: 1. Head Building – (i) Plug-and-Play architecture, (ii) Thinking Media Framework, and (iii) Animation; 2. Human-Head Interaction (HHI) – (i) Wizard of Oz studies, and (ii) joint attention by human and head; 3. Evaluation; and 4. Performance in (i) the Beijing Head and (ii) the Pedestal Head. Directions for future research are outlined as appropriate. Index Terms: talking heads, human-computer interaction, evaluation, performance
This paper targets the content selection problem in generating appropriate information in the dom... more This paper targets the content selection problem in generating appropriate information in the domain of in-car navigation. It describes an algorithm that models driver’s knowledge about roads and routes and uses this knowledge to tailor turn-by-turn instructions from a commercial routing service to those more suitable to the individual driver’s background. This content selection component is one part of a domain independent generation system of a general purpose dialogue system toolkit. We claim that this type of adaptive generation facilitates more efficient and driver friendly navigation.
1. Summary Much clinical data available in electronic health records (EHRs) are in text format. D... more 1. Summary Much clinical data available in electronic health records (EHRs) are in text format. Developing text processing and mining techniques for such data is necessary for realizing the full value of this data, to support data-driven analysis, decision-making, and discovery. This abstract outlines our vision and describes case studies of text processing and text mining applications over EHRs to address challenges in Health and healthcare.
Searching and selecting articles to be included in systematic reviews is a real challenge for hea... more Searching and selecting articles to be included in systematic reviews is a real challenge for healthcare agencies responsible for publishing these reviews. The current practice of manually reviewing all papers returned by complex hand-crafted boolean queries is human labour-intensive. We demonstrate a two-stage searching system that takes advantage of ranked queries and support-vector machine text classification to assist retrieval of relevant articles, and to restrict results to higher-quality documents. Our proposed approach shows significant work saved in the systematic review process over a baseline of a keyword-based retrieval system
We show how topic models are useful for interpreting and understanding MeSH, the Medical Subject ... more We show how topic models are useful for interpreting and understanding MeSH, the Medical Subject Headings applied to articles in MEDLINE. We show how our resampled author model captures some of the advantages of both the topic model and the author-topic model. We demonstrate how the topic modeling approach can provide an alternative and complementary view of the relationship between MeSH headings that could be informative and helpful for people searching MEDLINE.