Doug Talbert | Tennessee Technological University (original) (raw)
Papers by Doug Talbert
The International FLAIRS Conference Proceedings
Trauma triage occurs in suboptimal environments for making consequential decisions. Published tri... more Trauma triage occurs in suboptimal environments for making consequential decisions. Published triage studies demonstrate the extremes of the complexity/accuracy tradeoff, either studying simple models with poor accuracy or very complex models with accuracies nearing published goals. Using a Level I Trauma Center’s registry cases (n=50,644), this study describes, uses, and derives observations from a methodology to more thoroughly examine this tradeoff. This or similar methods can provide the insight needed for practitioners to balance understandability with accuracy. Additionally, this study incorporates an evaluation of group-based fairness into this tradeoff analysis to provide an additional dimension of insight into model selection. The experiments allow us to draw several conclusions regarding the machine learning models in the domain of trauma triage and demonstrate the value of our tradeoff analysis to provide insight into choices regarding model complexity, model accuracy, an...
At most universities, administrators and counselors are trying to devise sound methodologies to h... more At most universities, administrators and counselors are trying to devise sound methodologies to help increase student retention rates. Due to the vast amount of student data that is available, sorting through the data to extract useful knowledge is a daunting task. However, the data may be helpful in predicting future student trends - particularly as it relates to retention. In this paper, we describe data mining and machine learning techniques that can be used to predict future enrollment. In our experiments, we attempt to apply these techniques to the retention of Computer Science students - a major that traditionally has significant turnover in the first year of study. Specific algorithms are selected to classify the data in an attempt to extract relevant information. While traditional methods may focus on known issues with retention, we emphasize the importance of factors that may only be noticeable through the application of data mining. The goal of this research is to determin...
Every year, billions of dollars are lost due to fraud in the U.S. health care system. Health care... more Every year, billions of dollars are lost due to fraud in the U.S. health care system. Health care claims are complex as they involve multiple parties including service providers, insurance subscribers, and insurance carriers. Medicare is susceptible to fraud because of this complexity. To build a comprehensive fraud detection system, one must take into consideration all of the financial practices involved among the associated parties. This paper is focused on graph-based analysis of CMS provided Medicare claims data to look for anomalies in the relationships and transactions among patients, service providers, claims, physicians, diagnosis, and procedures. In our experiments, we create graphs from in-patient, outpatient, and carrier claims data of the beneficiary. We then demonstrate the potential effectiveness of applying graph-based anomaly detection to the problem of discovering anomalies and potential fraud scenarios.
The International FLAIRS Conference Proceedings
Counterfactuals have become a useful tool for explainable Artificial Intelligence (XAI). Counterf... more Counterfactuals have become a useful tool for explainable Artificial Intelligence (XAI). Counterfactuals provide various perturbations to a data instance to yield an alternate classification from a machine learning model. Several algorithms have been designed to generate counterfactuals using deep neural networks; however, despite their growing use in many mission-critical fields, there has been no investigation to date as to the epistemic uncertainty of generated counterfactuals. This could result in the use of risk-prone explanations in these fields. In this work, we use several data sets to compare the epistemic uncertainty of original instances to that of counterfactuals generated from those instances. As part of our analysis, we also measure the extent to which counterfactuals can be considered anomalies in those data sets. We find that counterfactual uncertainty is higher in three of the four datasets tested. Moreover, our experiments suggest a possible connection between reco...
Information
As new cyberattacks are launched against systems and networks on a daily basis, the ability for n... more As new cyberattacks are launched against systems and networks on a daily basis, the ability for network intrusion detection systems to operate efficiently in the big data era has become critically important, particularly as more low-power Internet-of-Things (IoT) devices enter the market. This has motivated research in applying machine learning algorithms that can operate on streams of data, trained online or “live” on only a small amount of data kept in memory at a time, as opposed to the more classical approaches that are trained solely offline on all of the data at once. In this context, one important concept from machine learning for improving detection performance is the idea of “ensembles”, where a collection of machine learning algorithms are combined to compensate for their individual limitations and produce an overall superior algorithm. Unfortunately, existing research lacks proper performance comparison between homogeneous and heterogeneous online ensembles. Hence, this p...
The Florida AI Research Society Conference, 2010
Proceedings of the Sixteenth International Conference on Machine Learning, 1999
Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, Feb 1, 2003
Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, Feb 1, 2005
Predictive modeling based on gene expression data is complicated by the high dimensionality (numb... more Predictive modeling based on gene expression data is complicated by the high dimensionality (number of genes) of microarray data given the number of available samples. We investigate a method for reducing the dimensionality of the data using singular value decomposition.
Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, Feb 1, 2005
We present an implementation model for pharmaceutical computerized decision support (CDS) that en... more We present an implementation model for pharmaceutical computerized decision support (CDS) that enables a hospital to incrementally target specific “high value” projects as needs are identified and support is secured. Our model, which we are currently implementing in a rural medical center, allows the hospital and its staff to quickly reap some benefits from CDS in spite of resource limitations.
Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, Feb 1, 2007
The distributed Graph rule automata for iterative linkage (dGrail) toolkit is a software package ... more The distributed Graph rule automata for iterative linkage (dGrail) toolkit is a software package for deterministic record linkage. This toolkit allows for iterative development of linked record sets. While intended for the generation of gold standard record sets, the toolkit is applicable to any offline deterministic record linkage task. The dGrail toolkit embodies a flexible rule engine allowing the user to implement a wide variety of record matching rules, including those found in the literature.
Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, Feb 1, 2005
We present a model for a health resource locator to help rural primary healthcare providers care ... more We present a model for a health resource locator to help rural primary healthcare providers care for patients. We identify some unique needs of rural providers, argue that a grassroots effort, driven by the community, is the optimal way to address some of those needs, and propose a centralized Internet-based system to drive the whole process.
Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, Feb 1, 2007
Triage is a key component of trauma care. Unfortunately, mistriage rates remain high. Machine lea... more Triage is a key component of trauma care. Unfortunately, mistriage rates remain high. Machine learning techniques have the potential to improve triage. Our experiment showed that while decision tree induction was as accurate as the most widely accepted trauma triage guidelines, they performed differently with respect to over- and undertriage.
Center (VUMC) is the migration of data storage and maintenance from proprietary information syste... more Center (VUMC) is the migration of data storage and maintenance from proprietary information systems to a vendor-independent external environment. Benefits include easier access to the data by other information systems and data persistence that is independent of a specific vendor. One manner in which we are achieving this goal is through the use of the Web as a distributed user interface for maintaining locally developed relational databases. We chose to use the Web for many reasons including user familiarity, low distribution cost, and minimal development time. The aim of our project is to describe some design challenges of Webbased database maintenance and how we chose to address these challenges. At one level, goals for any database maintenance system should be the same, hiding data complexity, limiting access to sensitive data, adapting easily to changes in the data model, ensuring data integrity, and providing an intuitive user interface. However, the manner in which these goals are achieved in a Web-based architecture may vary from project to project. The project characteristics that influence these design decisions fall into two broad categories, data-related issues and user-related issues. Datarelated issues include complexity, sensitivity, stability, and criticality. User-related issues include
Thesis (Ph. D. in Computer Science)--Vanderbilt University, 2001. Includes bibliographical refere... more Thesis (Ph. D. in Computer Science)--Vanderbilt University, 2001. Includes bibliographical references (leaves 140-145).
The International FLAIRS Conference Proceedings
Trauma triage occurs in suboptimal environments for making consequential decisions. Published tri... more Trauma triage occurs in suboptimal environments for making consequential decisions. Published triage studies demonstrate the extremes of the complexity/accuracy tradeoff, either studying simple models with poor accuracy or very complex models with accuracies nearing published goals. Using a Level I Trauma Center’s registry cases (n=50,644), this study describes, uses, and derives observations from a methodology to more thoroughly examine this tradeoff. This or similar methods can provide the insight needed for practitioners to balance understandability with accuracy. Additionally, this study incorporates an evaluation of group-based fairness into this tradeoff analysis to provide an additional dimension of insight into model selection. The experiments allow us to draw several conclusions regarding the machine learning models in the domain of trauma triage and demonstrate the value of our tradeoff analysis to provide insight into choices regarding model complexity, model accuracy, an...
At most universities, administrators and counselors are trying to devise sound methodologies to h... more At most universities, administrators and counselors are trying to devise sound methodologies to help increase student retention rates. Due to the vast amount of student data that is available, sorting through the data to extract useful knowledge is a daunting task. However, the data may be helpful in predicting future student trends - particularly as it relates to retention. In this paper, we describe data mining and machine learning techniques that can be used to predict future enrollment. In our experiments, we attempt to apply these techniques to the retention of Computer Science students - a major that traditionally has significant turnover in the first year of study. Specific algorithms are selected to classify the data in an attempt to extract relevant information. While traditional methods may focus on known issues with retention, we emphasize the importance of factors that may only be noticeable through the application of data mining. The goal of this research is to determin...
Every year, billions of dollars are lost due to fraud in the U.S. health care system. Health care... more Every year, billions of dollars are lost due to fraud in the U.S. health care system. Health care claims are complex as they involve multiple parties including service providers, insurance subscribers, and insurance carriers. Medicare is susceptible to fraud because of this complexity. To build a comprehensive fraud detection system, one must take into consideration all of the financial practices involved among the associated parties. This paper is focused on graph-based analysis of CMS provided Medicare claims data to look for anomalies in the relationships and transactions among patients, service providers, claims, physicians, diagnosis, and procedures. In our experiments, we create graphs from in-patient, outpatient, and carrier claims data of the beneficiary. We then demonstrate the potential effectiveness of applying graph-based anomaly detection to the problem of discovering anomalies and potential fraud scenarios.
The International FLAIRS Conference Proceedings
Counterfactuals have become a useful tool for explainable Artificial Intelligence (XAI). Counterf... more Counterfactuals have become a useful tool for explainable Artificial Intelligence (XAI). Counterfactuals provide various perturbations to a data instance to yield an alternate classification from a machine learning model. Several algorithms have been designed to generate counterfactuals using deep neural networks; however, despite their growing use in many mission-critical fields, there has been no investigation to date as to the epistemic uncertainty of generated counterfactuals. This could result in the use of risk-prone explanations in these fields. In this work, we use several data sets to compare the epistemic uncertainty of original instances to that of counterfactuals generated from those instances. As part of our analysis, we also measure the extent to which counterfactuals can be considered anomalies in those data sets. We find that counterfactual uncertainty is higher in three of the four datasets tested. Moreover, our experiments suggest a possible connection between reco...
Information
As new cyberattacks are launched against systems and networks on a daily basis, the ability for n... more As new cyberattacks are launched against systems and networks on a daily basis, the ability for network intrusion detection systems to operate efficiently in the big data era has become critically important, particularly as more low-power Internet-of-Things (IoT) devices enter the market. This has motivated research in applying machine learning algorithms that can operate on streams of data, trained online or “live” on only a small amount of data kept in memory at a time, as opposed to the more classical approaches that are trained solely offline on all of the data at once. In this context, one important concept from machine learning for improving detection performance is the idea of “ensembles”, where a collection of machine learning algorithms are combined to compensate for their individual limitations and produce an overall superior algorithm. Unfortunately, existing research lacks proper performance comparison between homogeneous and heterogeneous online ensembles. Hence, this p...
The Florida AI Research Society Conference, 2010
Proceedings of the Sixteenth International Conference on Machine Learning, 1999
Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, Feb 1, 2003
Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, Feb 1, 2005
Predictive modeling based on gene expression data is complicated by the high dimensionality (numb... more Predictive modeling based on gene expression data is complicated by the high dimensionality (number of genes) of microarray data given the number of available samples. We investigate a method for reducing the dimensionality of the data using singular value decomposition.
Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, Feb 1, 2005
We present an implementation model for pharmaceutical computerized decision support (CDS) that en... more We present an implementation model for pharmaceutical computerized decision support (CDS) that enables a hospital to incrementally target specific “high value” projects as needs are identified and support is secured. Our model, which we are currently implementing in a rural medical center, allows the hospital and its staff to quickly reap some benefits from CDS in spite of resource limitations.
Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, Feb 1, 2007
The distributed Graph rule automata for iterative linkage (dGrail) toolkit is a software package ... more The distributed Graph rule automata for iterative linkage (dGrail) toolkit is a software package for deterministic record linkage. This toolkit allows for iterative development of linked record sets. While intended for the generation of gold standard record sets, the toolkit is applicable to any offline deterministic record linkage task. The dGrail toolkit embodies a flexible rule engine allowing the user to implement a wide variety of record matching rules, including those found in the literature.
Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, Feb 1, 2005
We present a model for a health resource locator to help rural primary healthcare providers care ... more We present a model for a health resource locator to help rural primary healthcare providers care for patients. We identify some unique needs of rural providers, argue that a grassroots effort, driven by the community, is the optimal way to address some of those needs, and propose a centralized Internet-based system to drive the whole process.
Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, Feb 1, 2007
Triage is a key component of trauma care. Unfortunately, mistriage rates remain high. Machine lea... more Triage is a key component of trauma care. Unfortunately, mistriage rates remain high. Machine learning techniques have the potential to improve triage. Our experiment showed that while decision tree induction was as accurate as the most widely accepted trauma triage guidelines, they performed differently with respect to over- and undertriage.
Center (VUMC) is the migration of data storage and maintenance from proprietary information syste... more Center (VUMC) is the migration of data storage and maintenance from proprietary information systems to a vendor-independent external environment. Benefits include easier access to the data by other information systems and data persistence that is independent of a specific vendor. One manner in which we are achieving this goal is through the use of the Web as a distributed user interface for maintaining locally developed relational databases. We chose to use the Web for many reasons including user familiarity, low distribution cost, and minimal development time. The aim of our project is to describe some design challenges of Webbased database maintenance and how we chose to address these challenges. At one level, goals for any database maintenance system should be the same, hiding data complexity, limiting access to sensitive data, adapting easily to changes in the data model, ensuring data integrity, and providing an intuitive user interface. However, the manner in which these goals are achieved in a Web-based architecture may vary from project to project. The project characteristics that influence these design decisions fall into two broad categories, data-related issues and user-related issues. Datarelated issues include complexity, sensitivity, stability, and criticality. User-related issues include
Thesis (Ph. D. in Computer Science)--Vanderbilt University, 2001. Includes bibliographical refere... more Thesis (Ph. D. in Computer Science)--Vanderbilt University, 2001. Includes bibliographical references (leaves 140-145).