Doug Talbert | Tennessee Technological University (original) (raw)

Papers by Doug Talbert

Research paper thumbnail of Group Bias and the Complexity/Accuracy Tradeoff in Machine Learning-Based Trauma Triage Models

The International FLAIRS Conference Proceedings

Trauma triage occurs in suboptimal environments for making consequential decisions. Published tri... more Trauma triage occurs in suboptimal environments for making consequential decisions. Published triage studies demonstrate the extremes of the complexity/accuracy tradeoff, either studying simple models with poor accuracy or very complex models with accuracies nearing published goals. Using a Level I Trauma Center’s registry cases (n=50,644), this study describes, uses, and derives observations from a methodology to more thoroughly examine this tradeoff. This or similar methods can provide the insight needed for practitioners to balance understandability with accuracy. Additionally, this study incorporates an evaluation of group-based fairness into this tradeoff analysis to provide an additional dimension of insight into model selection. The experiments allow us to draw several conclusions regarding the machine learning models in the domain of trauma triage and demonstrate the value of our tradeoff analysis to provide insight into choices regarding model complexity, model accuracy, an...

Research paper thumbnail of Analysis of Student Data for Retention Using Data Mining Techniques

At most universities, administrators and counselors are trying to devise sound methodologies to h... more At most universities, administrators and counselors are trying to devise sound methodologies to help increase student retention rates. Due to the vast amount of student data that is available, sorting through the data to extract useful knowledge is a daunting task. However, the data may be helpful in predicting future student trends - particularly as it relates to retention. In this paper, we describe data mining and machine learning techniques that can be used to predict future enrollment. In our experiments, we attempt to apply these techniques to the retention of Computer Science students - a major that traditionally has significant turnover in the first year of study. Specific algorithms are selected to classify the data in an attempt to extract relevant information. While traditional methods may focus on known issues with retention, we emphasize the importance of factors that may only be noticeable through the application of data mining. The goal of this research is to determin...

Research paper thumbnail of Detection of Anomalous Activity in Diabetic Patients Using Graph-Based Approach

Every year, billions of dollars are lost due to fraud in the U.S. health care system. Health care... more Every year, billions of dollars are lost due to fraud in the U.S. health care system. Health care claims are complex as they involve multiple parties including service providers, insurance subscribers, and insurance carriers. Medicare is susceptible to fraud because of this complexity. To build a comprehensive fraud detection system, one must take into consideration all of the financial practices involved among the associated parties. This paper is focused on graph-based analysis of CMS provided Medicare claims data to look for anomalies in the relationships and transactions among patients, service providers, claims, physicians, diagnosis, and procedures. In our experiments, we create graphs from in-patient, outpatient, and carrier claims data of the beneficiary. We then demonstrate the potential effectiveness of applying graph-based anomaly detection to the problem of discovering anomalies and potential fraud scenarios.

Research paper thumbnail of A Data Model to Represent Clinically Oriented Drug Dosing Rules

Research paper thumbnail of The Uncertainty of Counterfactuals in Deep Learning

The International FLAIRS Conference Proceedings

Counterfactuals have become a useful tool for explainable Artificial Intelligence (XAI). Counterf... more Counterfactuals have become a useful tool for explainable Artificial Intelligence (XAI). Counterfactuals provide various perturbations to a data instance to yield an alternate classification from a machine learning model. Several algorithms have been designed to generate counterfactuals using deep neural networks; however, despite their growing use in many mission-critical fields, there has been no investigation to date as to the epistemic uncertainty of generated counterfactuals. This could result in the use of risk-prone explanations in these fields. In this work, we use several data sets to compare the epistemic uncertainty of original instances to that of counterfactuals generated from those instances. As part of our analysis, we also measure the extent to which counterfactuals can be considered anomalies in those data sets. We find that counterfactual uncertainty is higher in three of the four datasets tested. Moreover, our experiments suggest a possible connection between reco...

Research paper thumbnail of Ensemble-Based Online Machine Learning Algorithms for Network Intrusion Detection Systems Using Streaming Data

Information

As new cyberattacks are launched against systems and networks on a daily basis, the ability for n... more As new cyberattacks are launched against systems and networks on a daily basis, the ability for network intrusion detection systems to operate efficiently in the big data era has become critically important, particularly as more low-power Internet-of-Things (IoT) devices enter the market. This has motivated research in applying machine learning algorithms that can operate on streams of data, trained online or “live” on only a small amount of data kept in memory at a time, as opposed to the more classical approaches that are trained solely offline on all of the data at once. In this context, one important concept from machine learning for improving detection performance is the idea of “ensembles”, where a collection of machine learning algorithms are combined to compensate for their individual limitations and produce an overall superior algorithm. Unfortunately, existing research lacks proper performance comparison between homogeneous and heterogeneous online ensembles. Hence, this p...

Research paper thumbnail of Consistent trade-offs in fungal trait expression across broad spatial scales

Research paper thumbnail of Interactive Knowledge Frontier Discovery with COBWEB-KFD

The Florida AI Research Society Conference, 2010

Research paper thumbnail of OPT-KD: An Algorithm for Optimizing Kd-Trees

Proceedings of the Sixteenth International Conference on Machine Learning, 1999

Research paper thumbnail of Developing a methodology to improve the allocation of specialized health resources for acutely injured persons

Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, Feb 1, 2003

Research paper thumbnail of Predicting cancer type with dimensionality-reduced gene expression micro-array data

Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, Feb 1, 2005

Predictive modeling based on gene expression data is complicated by the high dimensionality (numb... more Predictive modeling based on gene expression data is complicated by the high dimensionality (number of genes) of microarray data given the number of available samples. We investigate a method for reducing the dimensionality of the data using singular value decomposition.

Research paper thumbnail of An incremental pharmacy informatics model for use in a rural hospital

Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, Feb 1, 2005

We present an implementation model for pharmaceutical computerized decision support (CDS) that en... more We present an implementation model for pharmaceutical computerized decision support (CDS) that enables a hospital to incrementally target specific “high value” projects as needs are identified and support is secured. Our model, which we are currently implementing in a rural medical center, allows the hospital and its staff to quickly reap some benefits from CDS in spite of resource limitations.

Research paper thumbnail of Method and system for clinical action support

Research paper thumbnail of The dGrail toolkit for iterative deterministic record linkage

Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, Feb 1, 2007

The distributed Graph rule automata for iterative linkage (dGrail) toolkit is a software package ... more The distributed Graph rule automata for iterative linkage (dGrail) toolkit is a software package for deterministic record linkage. This toolkit allows for iterative development of linked record sets. While intended for the generation of gold standard record sets, the toolkit is applicable to any offline deterministic record linkage task. The dGrail toolkit embodies a flexible rule engine allowing the user to implement a wide variety of record matching rules, including those found in the literature.

Research paper thumbnail of A Grassroots Resource Locator System for Rural Healthcare Providers

Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, Feb 1, 2005

We present a model for a health resource locator to help rural primary healthcare providers care ... more We present a model for a health resource locator to help rural primary healthcare providers care for patients. We identify some unique needs of rural providers, argue that a grassroots effort, driven by the community, is the optimal way to address some of those needs, and propose a centralized Internet-based system to drive the whole process.

Research paper thumbnail of A comparison of a decision tree induction algorithm with the ACS guidelines for trauma triage

Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, Feb 1, 2007

Triage is a key component of trauma care. Unfortunately, mistriage rates remain high. Machine lea... more Triage is a key component of trauma care. Unfortunately, mistriage rates remain high. Machine learning techniques have the potential to improve triage. Our experiment showed that while decision tree induction was as accurate as the most widely accepted trauma triage guidelines, they performed differently with respect to over- and undertriage.

Research paper thumbnail of Design Challenges for Web-Based Database Maintenance

Center (VUMC) is the migration of data storage and maintenance from proprietary information syste... more Center (VUMC) is the migration of data storage and maintenance from proprietary information systems to a vendor-independent external environment. Benefits include easier access to the data by other information systems and data persistence that is independent of a specific vendor. One manner in which we are achieving this goal is through the use of the Web as a distributed user interface for maintaining locally developed relational databases. We chose to use the Web for many reasons including user familiarity, low distribution cost, and minimal development time. The aim of our project is to describe some design challenges of Webbased database maintenance and how we chose to address these challenges. At one level, goals for any database maintenance system should be the same, hiding data complexity, limiting access to sensitive data, adapting easily to changes in the data model, ensuring data integrity, and providing an intuitive user interface. However, the manner in which these goals are achieved in a Web-based architecture may vary from project to project. The project characteristics that influence these design decisions fall into two broad categories, data-related issues and user-related issues. Datarelated issues include complexity, sensitivity, stability, and criticality. User-related issues include

Research paper thumbnail of Embedding Drug Dispensing Logic to Provide a More Intelligent Clinician Order Entry/Pharmacy Interface

Research paper thumbnail of Analytic and adaptive techniques for improving nearest-neighbor search performance in k-dimensional trees /

Thesis (Ph. D. in Computer Science)--Vanderbilt University, 2001. Includes bibliographical refere... more Thesis (Ph. D. in Computer Science)--Vanderbilt University, 2001. Includes bibliographical references (leaves 140-145).

Research paper thumbnail of User Communication and Problem Tracking: A Multi-faceted Approach to Rapid Application Development

Research paper thumbnail of Group Bias and the Complexity/Accuracy Tradeoff in Machine Learning-Based Trauma Triage Models

The International FLAIRS Conference Proceedings

Trauma triage occurs in suboptimal environments for making consequential decisions. Published tri... more Trauma triage occurs in suboptimal environments for making consequential decisions. Published triage studies demonstrate the extremes of the complexity/accuracy tradeoff, either studying simple models with poor accuracy or very complex models with accuracies nearing published goals. Using a Level I Trauma Center’s registry cases (n=50,644), this study describes, uses, and derives observations from a methodology to more thoroughly examine this tradeoff. This or similar methods can provide the insight needed for practitioners to balance understandability with accuracy. Additionally, this study incorporates an evaluation of group-based fairness into this tradeoff analysis to provide an additional dimension of insight into model selection. The experiments allow us to draw several conclusions regarding the machine learning models in the domain of trauma triage and demonstrate the value of our tradeoff analysis to provide insight into choices regarding model complexity, model accuracy, an...

Research paper thumbnail of Analysis of Student Data for Retention Using Data Mining Techniques

At most universities, administrators and counselors are trying to devise sound methodologies to h... more At most universities, administrators and counselors are trying to devise sound methodologies to help increase student retention rates. Due to the vast amount of student data that is available, sorting through the data to extract useful knowledge is a daunting task. However, the data may be helpful in predicting future student trends - particularly as it relates to retention. In this paper, we describe data mining and machine learning techniques that can be used to predict future enrollment. In our experiments, we attempt to apply these techniques to the retention of Computer Science students - a major that traditionally has significant turnover in the first year of study. Specific algorithms are selected to classify the data in an attempt to extract relevant information. While traditional methods may focus on known issues with retention, we emphasize the importance of factors that may only be noticeable through the application of data mining. The goal of this research is to determin...

Research paper thumbnail of Detection of Anomalous Activity in Diabetic Patients Using Graph-Based Approach

Every year, billions of dollars are lost due to fraud in the U.S. health care system. Health care... more Every year, billions of dollars are lost due to fraud in the U.S. health care system. Health care claims are complex as they involve multiple parties including service providers, insurance subscribers, and insurance carriers. Medicare is susceptible to fraud because of this complexity. To build a comprehensive fraud detection system, one must take into consideration all of the financial practices involved among the associated parties. This paper is focused on graph-based analysis of CMS provided Medicare claims data to look for anomalies in the relationships and transactions among patients, service providers, claims, physicians, diagnosis, and procedures. In our experiments, we create graphs from in-patient, outpatient, and carrier claims data of the beneficiary. We then demonstrate the potential effectiveness of applying graph-based anomaly detection to the problem of discovering anomalies and potential fraud scenarios.

Research paper thumbnail of A Data Model to Represent Clinically Oriented Drug Dosing Rules

Research paper thumbnail of The Uncertainty of Counterfactuals in Deep Learning

The International FLAIRS Conference Proceedings

Counterfactuals have become a useful tool for explainable Artificial Intelligence (XAI). Counterf... more Counterfactuals have become a useful tool for explainable Artificial Intelligence (XAI). Counterfactuals provide various perturbations to a data instance to yield an alternate classification from a machine learning model. Several algorithms have been designed to generate counterfactuals using deep neural networks; however, despite their growing use in many mission-critical fields, there has been no investigation to date as to the epistemic uncertainty of generated counterfactuals. This could result in the use of risk-prone explanations in these fields. In this work, we use several data sets to compare the epistemic uncertainty of original instances to that of counterfactuals generated from those instances. As part of our analysis, we also measure the extent to which counterfactuals can be considered anomalies in those data sets. We find that counterfactual uncertainty is higher in three of the four datasets tested. Moreover, our experiments suggest a possible connection between reco...

Research paper thumbnail of Ensemble-Based Online Machine Learning Algorithms for Network Intrusion Detection Systems Using Streaming Data

Information

As new cyberattacks are launched against systems and networks on a daily basis, the ability for n... more As new cyberattacks are launched against systems and networks on a daily basis, the ability for network intrusion detection systems to operate efficiently in the big data era has become critically important, particularly as more low-power Internet-of-Things (IoT) devices enter the market. This has motivated research in applying machine learning algorithms that can operate on streams of data, trained online or “live” on only a small amount of data kept in memory at a time, as opposed to the more classical approaches that are trained solely offline on all of the data at once. In this context, one important concept from machine learning for improving detection performance is the idea of “ensembles”, where a collection of machine learning algorithms are combined to compensate for their individual limitations and produce an overall superior algorithm. Unfortunately, existing research lacks proper performance comparison between homogeneous and heterogeneous online ensembles. Hence, this p...

Research paper thumbnail of Consistent trade-offs in fungal trait expression across broad spatial scales

Research paper thumbnail of Interactive Knowledge Frontier Discovery with COBWEB-KFD

The Florida AI Research Society Conference, 2010

Research paper thumbnail of OPT-KD: An Algorithm for Optimizing Kd-Trees

Proceedings of the Sixteenth International Conference on Machine Learning, 1999

Research paper thumbnail of Developing a methodology to improve the allocation of specialized health resources for acutely injured persons

Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, Feb 1, 2003

Research paper thumbnail of Predicting cancer type with dimensionality-reduced gene expression micro-array data

Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, Feb 1, 2005

Predictive modeling based on gene expression data is complicated by the high dimensionality (numb... more Predictive modeling based on gene expression data is complicated by the high dimensionality (number of genes) of microarray data given the number of available samples. We investigate a method for reducing the dimensionality of the data using singular value decomposition.

Research paper thumbnail of An incremental pharmacy informatics model for use in a rural hospital

Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, Feb 1, 2005

We present an implementation model for pharmaceutical computerized decision support (CDS) that en... more We present an implementation model for pharmaceutical computerized decision support (CDS) that enables a hospital to incrementally target specific “high value” projects as needs are identified and support is secured. Our model, which we are currently implementing in a rural medical center, allows the hospital and its staff to quickly reap some benefits from CDS in spite of resource limitations.

Research paper thumbnail of Method and system for clinical action support

Research paper thumbnail of The dGrail toolkit for iterative deterministic record linkage

Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, Feb 1, 2007

The distributed Graph rule automata for iterative linkage (dGrail) toolkit is a software package ... more The distributed Graph rule automata for iterative linkage (dGrail) toolkit is a software package for deterministic record linkage. This toolkit allows for iterative development of linked record sets. While intended for the generation of gold standard record sets, the toolkit is applicable to any offline deterministic record linkage task. The dGrail toolkit embodies a flexible rule engine allowing the user to implement a wide variety of record matching rules, including those found in the literature.

Research paper thumbnail of A Grassroots Resource Locator System for Rural Healthcare Providers

Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, Feb 1, 2005

We present a model for a health resource locator to help rural primary healthcare providers care ... more We present a model for a health resource locator to help rural primary healthcare providers care for patients. We identify some unique needs of rural providers, argue that a grassroots effort, driven by the community, is the optimal way to address some of those needs, and propose a centralized Internet-based system to drive the whole process.

Research paper thumbnail of A comparison of a decision tree induction algorithm with the ACS guidelines for trauma triage

Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, Feb 1, 2007

Triage is a key component of trauma care. Unfortunately, mistriage rates remain high. Machine lea... more Triage is a key component of trauma care. Unfortunately, mistriage rates remain high. Machine learning techniques have the potential to improve triage. Our experiment showed that while decision tree induction was as accurate as the most widely accepted trauma triage guidelines, they performed differently with respect to over- and undertriage.

Research paper thumbnail of Design Challenges for Web-Based Database Maintenance

Center (VUMC) is the migration of data storage and maintenance from proprietary information syste... more Center (VUMC) is the migration of data storage and maintenance from proprietary information systems to a vendor-independent external environment. Benefits include easier access to the data by other information systems and data persistence that is independent of a specific vendor. One manner in which we are achieving this goal is through the use of the Web as a distributed user interface for maintaining locally developed relational databases. We chose to use the Web for many reasons including user familiarity, low distribution cost, and minimal development time. The aim of our project is to describe some design challenges of Webbased database maintenance and how we chose to address these challenges. At one level, goals for any database maintenance system should be the same, hiding data complexity, limiting access to sensitive data, adapting easily to changes in the data model, ensuring data integrity, and providing an intuitive user interface. However, the manner in which these goals are achieved in a Web-based architecture may vary from project to project. The project characteristics that influence these design decisions fall into two broad categories, data-related issues and user-related issues. Datarelated issues include complexity, sensitivity, stability, and criticality. User-related issues include

Research paper thumbnail of Embedding Drug Dispensing Logic to Provide a More Intelligent Clinician Order Entry/Pharmacy Interface

Research paper thumbnail of Analytic and adaptive techniques for improving nearest-neighbor search performance in k-dimensional trees /

Thesis (Ph. D. in Computer Science)--Vanderbilt University, 2001. Includes bibliographical refere... more Thesis (Ph. D. in Computer Science)--Vanderbilt University, 2001. Includes bibliographical references (leaves 140-145).

Research paper thumbnail of User Communication and Problem Tracking: A Multi-faceted Approach to Rapid Application Development