R. W. Oldford (original) (raw)
Research overview
Statistical reasoning, exploratory data analysis, data visualization, and the development of interactive computational environments that support these activities, comprise the broad areas of my research interests.
Related to computational structure and methods for interactive data analysis, but somewhat removed from software implementation, I am also interested in the philosophical structure of statistical reasoning.
And I am always interested in applications of statistical methods in the natural and computational sciences.
Interests
- Data Visualization
- Data Analysis
- Statistical application
Education
- PhD in Statistics
University of Toronto - M.Sc. in Statistics
University of Toronto - B.Math. double major in Statistics and in Combinatorics & Optimization
University of Waterloo
Featured videos
- Hillary Clinton’s emails
- and the shiny server built to explore them shiny.math.uwaterloo.ca/sas/clinton/
- The (not so) humble (interactive) histogram
- 3D interactive scatterplots in loon
See my YouTube or BitChute channels for more.
Posters
with various students
Software Projects
*
Loon.ggplot
Turn ggplot2 graphic data structures into interactive loon plots
PairViz
Ordering visualizations using Graph Traversal
qqtest
Self Calibrating Quantile-Quantile Plots for Visual Testing
zenplots
Zigzag Expanded Navigation Plots
Loon
Exploratory interactive data visualization.
Featured Publications
May 2018 Significance, 15(3)
About "her emails"
Patterns in Secretary Clinton’s emails and a website (select “Code”) that allows anyone to interactively explore the patterns.
May 2018 SIAM Journal on Optimization, 28(1)
Euclidean distance matrix completion and point configurations from the minimal spanning tree
The paper introduces a special case of the Euclidean distance matrix completion problem of interest in statistical data analysis where only the minimal spanning tree distances are given and the matrix completion must preserve the minimal spanning tree. A guided random search algorithm is shown to outperform more standard optimization methods which also force peculiar and generally unwanted geometric structure on the point configurations their completions produce.
October 2017 Electronic Imaging 2018, Computational Imaging, XVI
Illuminant estimation using ensembles of multivariate regression trees
In this paper, we show that a simple and accurate ensemble model can be learned by (i) using multivariate regression trees to take into account that the chromaticity components of the illuminant are correlated and constrained, and (ii) fitting each tree by directly minimizing a loss function of interest—such as recovery angular error or reproduction angular error—rather than indirectly using the squared-error loss function as a surrogate. We show empirically that overall our method leads to improved performance on diverse image sets.
March 2016 The American Statistician, 70(1), pp. 74-90
Self-Calibrating Quantile–Quantile Plots
Quantile–quantile plots, or qqplots, are an important visual tool for many applications but their interpretation requires some care and often more experience. This apparent subjectivity is unnecessary. By drawing on the computational and display facilities now widely available, qqplots are easily enriched to help with their interpretation. An overview of quantile functions and quantile–quantile plots is presented against the backdrop of their early historical development. Strengths and shortcomings of the traditional display are described. A new enhanced qqplot, the self-calibrating qqplot, is introduced and demonstrated on a variety of examples—both synthetic and real. Real examples include normal qqplots, log-normal plots, half-normal plots for factorial experiments, qqplots for the average and standard deviation in process improvement applications, detection of multivariate outliers, and the comparison of empirical distributions. Self-calibration is had by visually incorporating sampling variation in the qqplot display in a variety of ways. The new qqplot is available through the function and R package qqtest.
December 2011 Computational Statistics, 26(4)
Graphs as navigational infrastructure for high dimensional data spaces
We propose using graph theoretic results to develop an infrastructure that tracks movement from a display of one set of variables to another. The illustrative example throughout is the real-time morphing of one scatterplot into another. Hurley and Oldford (J Comput Graph Stat 2010) made extensive use of the graph having variables as nodes and edges indicating a paired relationship between them. The present paper introduces several new graphs derivable from this one whose traversals can be described as particular movements through high dimensional spaces. These are connected to known results in graph theory and the graph theoretic results applied to the problem of visualizing high-dimensional data.
August 2000 Statistical Science, 15(3)
Scientific method, statistical method, and the speed of light
What is “statistical method”? Is it the same as “scientific method”? This paper answers the first question by specifying the elements and procedures common to all statistical investigations and organizing these into a single structure. This structure is illustrated by careful examination of the first scientific study on the speed of light carried out by A. A. Michelson in 1879. Our answer to the second question is negative. To understand this a history on the speed of light up to the time of Michelson’s study is presented. The larger history and the details of a single study allow us to place the method of statistics within the larger context of science.
Recent & Upcoming Talks
Recent Publications
Popular Topics
Students
Current Grad Students
Zehao Xu
Ph.D. student in Statistics
Size proportional Venn and Euler diagrams in 2 and 3 dimensions, vennplot(…) in R, data visualization systems
Former PhD students
Adam Rahman
Data Scientist
Preserving Measured Structure During Generation and Reduction of Multivariate Point Configurations, scagnostics distributions, data reduction, simulation
Adrian Waddell
Statistician
Interactive Visualization and Exploration of High-Dimensional Data, data visualization, loon
Greg Anglin
Research Advisor (Statistics)
A Statistical Programming Environment for Modelling Counting Processes, and An object-oriented array manipulation prorocol in a statistical programming environment, Statistical computing environments, event history analysis
Ruth Urner
Assistant Professor
Learning with non-Standard Supervision, theoretical machine learning, clustering, strong and weak learners
Wu Zhou
Senior Researcher, Data Scientist
A new framework for clustering, ensemble cluster analysis, data mining, and machine learning, also A review and implementation of some approaches to metric clustering
Former Masters students
Alex (Xian) Wang
Data Scientist
Interactive Micromaps in R with loon, spatial data visualization and interactive analysis, loon.micromaps
Amanda Murdoch
Senior Analyst
Tracking Eye Movement When Observing Statistical Graphics, data analysis, experimental design, statistical modelling, EyeTrackR
Derek (Daoxiang) Wang
Middle Office and Valuation Senior Analyst
A Visualizing Tool for Conditional Independence, Financial analysis and copula modelling
Erin McLeish
Ph.D. student in Computer Science
Visual Empirical Regions of Influence (VERI) Clustering, Assessment and Alternatives, computational geometry and graph-based clustering
Glenn Lee
Game Mathematician
Eikosograms and Their Software Implementation, Categorical data visualization and analysis, Gaming probability
Greg Anglin
Research Advisor (Statistics)
A Statistical Programming Environment for Modelling Counting Processes, and An object-oriented array manipulation prorocol in a statistical programming environment, Statistical computing environments, event history analysis
Hanna Kazhamiaka
Data Scientist
An Experiment in Visual Clustering Using Star Glyph Displays, data visualization, statistical modelling and machine learning
Hugh Chipman
Professor
The Use of Projection and Sectioning in the Graphic Analysis of Multidimensional Data, Statistics
Hudson (Hui) Zhao
?????
Implementing Surfaces in OpenGL … Calls from Macintosh Common Lisp, data visualization, OpenGL
(Jack) Jiahua Liu
Masters of Divinity student
Glyphs and pixel-oriented glyphs for data visualization in R
Jim Adams
Senior Advisor for Pricing and Contracting
A Study of Alaskan King Crabs, Paralithodes camtschatica (Tilesius), Near Kodiak Island, Alaska, 1960-1986, Statistical Data Analysis, Biostatistics
Lijie (Justine) Fu
Lead Software Engineer
Implementation of Three-dimensional Scagnostics, data visualization, geometric graphs, scagnostics3D
Michael Lewis
(Deceased)
Constraint-based programming in statistics
Nan (Tina) Zhao
?????
A Preliminary Statistical Analysis on Risk Factors for Dementia and CIND from the Canadian Study of Health and Aging
Natasha Wiebe
Research Manager
Colour Parameterization in a Multiparametric Image Interface, interactive data visualization, biostatistics
Paul Poirier
?????
Visualizing surfaces, common lisp implementation of hidden line 3d rotating surfaces
Qing Li
?????
Probability of Carrying a Mutation of Colorectal Cancer Gene hMSH2/hMLH1 Based on Family History
Tracey (Xin) Chen
Senior Fraud Analyst
Visual Patterns with CCmaps and Magnification Algorithms, data visualization, spatial statistics, local map distortion
Weicong (Vivi) Ma
Data Engineer
On the Utility of Adding An Abstract Domain and Attribute Paths to SQL, Data Base Theory and Engineering
Wenqing Liu
Data Scientist
TREC, tree reduced ensemble clustering, data science, clustering, machine learning
Wu Zhou
Senior Researcher, Data Scientist
A new framework for clustering, ensemble cluster analysis, data mining, and machine learning, also A review and implementation of some approaches to metric clustering
Kathy (Xiaomei) Yu
Marketing Campaign Design and Analytics Manager
R package tidytable, a visualization tool for multi-way tables, data visualization, automated table analyses
Zehao Xu
Ph.D. student in Statistics
Size proportional Venn and Euler diagrams in 2 and 3 dimensions, vennplot(…) in R, data visualization systems