David Hand - Academia.edu (original) (raw)
Papers by David Hand
Computational Statistics & Data Analysis, 2005
ABSTRACT
Computers & Mathematics with Applications, 1986
Statistical Science, 2006
The American Statistician, 2003
The American Statistician, 1998
The American Statistician, 1992
Abstract Choice between two treatments, A and B, is sometimes based on the probability that A wil... more Abstract Choice between two treatments, A and B, is sometimes based on the probability that A will be more effective (score higher, say) than B. Ideally, to estimate this probability a sample of subjects would receive both A and B and the proportion of (A — B) differences which are positive would be used as the estimate. Often, however, both treatments cannot be given to each subject, and inference is based on a trial using two independent samples. Unfortunately, probability structures exist for which P(A — B > 0) for two independent samples is not equal to P(A — B > 0) for matched samples. The two-independent-sample Wilcoxon test statistic addresses the former probability and hence cannot be used to answer the question, “Is the probability that A will do better than B greater than 1/2?” unless further assumptions are made.
Statistics and Computing, 2003
Principal Components Analysis (PCA) is traditionally a linear technique for projecting multidimen... more Principal Components Analysis (PCA) is traditionally a linear technique for projecting multidimensional data onto lower dimensional subspaces with minimal loss of variance. However, there are several applications where the data lie in a lower dimensional subspace that is not linear; in these cases linear PCA is not the optimal method to recover this subspace and thus account for the largest
American Journal of Mathematical and Management Sciences, 2000
ABSTRACT
Advances in Data Analysis and Classification, 2007
ABSTRACT
Advances in Data Analysis and Classification, 2009
Modern technology has allowed real-time data collection in a variety of domains, ranging from env... more Modern technology has allowed real-time data collection in a variety of domains, ranging from environmental monitoring to healthcare. Consequently, there is a growing need for algorithms capable of performing inferential tasks in an online manner, continuously revising their estimates to reflect the current status of the underlying process. In particular, we are interested in constructing online and temporally adaptive classifiers
Advances in Data Analysis and Classification, 2009
A great many comparative performance assessments of classification rules have been undertaken, ra... more A great many comparative performance assessments of classification rules have been undertaken, ranging from small ones involving just one or two methods, to large ones involving many tens of methods. We are undertaking a meta-analytic study of these studies, attempting to distil some overall conclusions. This paper describes just one of our observations. The dataset analysed in this paper contains
Data Mining and Knowledge Discovery, 2008
Journal of the …, 2007
... Top of page Acknowledgements. The work of Piotr Juszczak and Dave Weston described here was s... more ... Top of page Acknowledgements. The work of Piotr Juszczak and Dave Weston described here was supported by the EPSRC under grant number EP/C532589/1: ThinkCrime: Statistical and machine learning tools for plastic card and other personal fraud detection. ...
Journal of Classification, 1996
Statistical Science, 2002
The Knowledge Engineering Review, 1984
Statistical expert systems are attracting increasing attention as a possible way to alleviate the... more Statistical expert systems are attracting increasing attention as a possible way to alleviate the shortage of expert consultant statisticians. This paper summarises the requirements of such systems, showing how the demands of data analysis are different from those of other fields, and describes some recent work.
Computational Statistics & Data Analysis, 2005
ABSTRACT
Computers & Mathematics with Applications, 1986
Statistical Science, 2006
The American Statistician, 2003
The American Statistician, 1998
The American Statistician, 1992
Abstract Choice between two treatments, A and B, is sometimes based on the probability that A wil... more Abstract Choice between two treatments, A and B, is sometimes based on the probability that A will be more effective (score higher, say) than B. Ideally, to estimate this probability a sample of subjects would receive both A and B and the proportion of (A — B) differences which are positive would be used as the estimate. Often, however, both treatments cannot be given to each subject, and inference is based on a trial using two independent samples. Unfortunately, probability structures exist for which P(A — B > 0) for two independent samples is not equal to P(A — B > 0) for matched samples. The two-independent-sample Wilcoxon test statistic addresses the former probability and hence cannot be used to answer the question, “Is the probability that A will do better than B greater than 1/2?” unless further assumptions are made.
Statistics and Computing, 2003
Principal Components Analysis (PCA) is traditionally a linear technique for projecting multidimen... more Principal Components Analysis (PCA) is traditionally a linear technique for projecting multidimensional data onto lower dimensional subspaces with minimal loss of variance. However, there are several applications where the data lie in a lower dimensional subspace that is not linear; in these cases linear PCA is not the optimal method to recover this subspace and thus account for the largest
American Journal of Mathematical and Management Sciences, 2000
ABSTRACT
Advances in Data Analysis and Classification, 2007
ABSTRACT
Advances in Data Analysis and Classification, 2009
Modern technology has allowed real-time data collection in a variety of domains, ranging from env... more Modern technology has allowed real-time data collection in a variety of domains, ranging from environmental monitoring to healthcare. Consequently, there is a growing need for algorithms capable of performing inferential tasks in an online manner, continuously revising their estimates to reflect the current status of the underlying process. In particular, we are interested in constructing online and temporally adaptive classifiers
Advances in Data Analysis and Classification, 2009
A great many comparative performance assessments of classification rules have been undertaken, ra... more A great many comparative performance assessments of classification rules have been undertaken, ranging from small ones involving just one or two methods, to large ones involving many tens of methods. We are undertaking a meta-analytic study of these studies, attempting to distil some overall conclusions. This paper describes just one of our observations. The dataset analysed in this paper contains
Data Mining and Knowledge Discovery, 2008
Journal of the …, 2007
... Top of page Acknowledgements. The work of Piotr Juszczak and Dave Weston described here was s... more ... Top of page Acknowledgements. The work of Piotr Juszczak and Dave Weston described here was supported by the EPSRC under grant number EP/C532589/1: ThinkCrime: Statistical and machine learning tools for plastic card and other personal fraud detection. ...
Journal of Classification, 1996
Statistical Science, 2002
The Knowledge Engineering Review, 1984
Statistical expert systems are attracting increasing attention as a possible way to alleviate the... more Statistical expert systems are attracting increasing attention as a possible way to alleviate the shortage of expert consultant statisticians. This paper summarises the requirements of such systems, showing how the demands of data analysis are different from those of other fields, and describes some recent work.