David Steel - Academia.edu (original) (raw)

Papers by David Steel

Research paper thumbnail of Design and Analysis of Surveys Repeated over Time

Handbook of Statistics, 2009

This lecture will review the major issues associated with the design and analysis of repeated sur... more This lecture will review the major issues associated with the design and analysis of repeated surveys. The interaction between the design of a repeated survey and the methods used for estimation and analysis will be examined. The choice of rotation pattern will be considered in terms of the impact on the estimation of levels and changes. Composite and other forms of estimators will be reviewed and the interaction between design and estimation explored. Estimation of seasonally adjusted and trend estimates from repeated surveys will also be considered.

Research paper thumbnail of GENERATING INTER-CORRELATED OBSERVATIONS UNDER A SPECIFIED SPATIAL MODEL

he requirement to generate this random process needs only to define the variance-covariance matri... more he requirement to generate this random process needs only to define the
variance-covariance matrix of the random process. Since the random process
is defined in three dimensional space, then we can use a spatial model. One of
the spatial model to define the random process is in the form of variogram,
which is a function of distance between pairs of observations. The
variance-covariance matrix may be determined in relation with two other
properties, those are correlogram and covariogram.
The simulation process was started by generating a random points within a
particular shape of region. The locations are uniformly distributed within the
region. Lets V is a variance-covariance matrix of the random process Y[L].
The random process Y[L] may be defined by the semivariogram model γ(d ij ).
The d ij is a Cartesian distance between two different individual within domain
D of boundary B. The distribution-based approaches can be applied to
generate random observations using Choleski decomposition.

Research paper thumbnail of Identification of food groups for use in a self-administered, computer-assisted diet history interview for use in Australia

To develop a set of food groups for use in a self-administered, computer-assisted diet history in... more To develop a set of food groups for use in a self-administered, computer-assisted diet history interview for use in Australia by combining foods into groups so as to minimize database error in the macronutrient values for the groups. The program needs to appropriately balance the level of detail used with the load on respondents and errors associated with categorization of foods into groups.

Research paper thumbnail of Exploring Small Area Population Structure with Census Data

Methodology and Epistemology of Multilevel Analysis, 2003

Research paper thumbnail of Analysis Combining Survey and Geographically Aggregated Data

Wiley Series in Survey Methodology, 2003

Research paper thumbnail of Two-stage sample design with small clusters

K e y Words: sample design, household surveys, telephone surveys.

Research paper thumbnail of In Search of Spatial Structures

Research paper thumbnail of In Search of Spatial Structures (Using multilevel structures to understand the MAUP)

Research paper thumbnail of Accounting for the uncertainty of information on clustering in the design of a clustered sample

An important decision that has to be made in developing the design of a cluster or multi-stage sa... more An important decision that has to be made in developing the design of a cluster or multi-stage sampling scheme is the number of units to select at each stage of selection. For a two-stage design we need to decide the number of units to select from each Primary Sampling Unit (PSU) in the sample. A common approach is to estimate the costs and the variance components associated with each stage of selection and determine an optimal design. This is usually done for estimates of the means or totals of one or a small number of variables. In practice the measure of intra-cluster homogeneity, which is the ratio of the variance components, needs to be estimated from a pilot study or historical data. There may be considerable uncertainty about the intracluster correlation. The parameter can be close to zero and the estimate may even not differ significantly from zero, however a design based on zero intra-cluster correlation would be highly clustered and sensitive to any failure of this assumption. This paper considers the effect of uncertainty about the intra-cluster correlation and other relevant population parameters on sample design. We develop an approach to assess this uncertainty using a Bayesian bootstrap method.

Research paper thumbnail of Centre for Statistical and Survey Methodology

Semiparametric regression is a fusion between parametric regression and nonparametric regression ... more Semiparametric regression is a fusion between parametric regression and nonparametric regression and the title of a book that we published on the topic in early 2003. We review developments in the field during the five year period since the book was written. We find semiparametric regression to be a vibrant field with substantial involvement and activity, continual enhancement and widespread application.

Research paper thumbnail of Adjusting for Aggregation E ects in Ecological Regression

Research paper thumbnail of Research article Investigation of relative risk estimates from studies of the same population with contrasting response rates and designs

Research paper thumbnail of Pak. J. Statist. 2011 Vol. 27 (4), 529-541 CONDITIONAL AND UNCONDITIONAL MODELS IN MODEL-ASSISTED ESTIMATION OF FINITE POPULATION TOTALS

Research paper thumbnail of Measuring and analyzing the within group homogeneity of multi-category variables

Many variables have within group homogeneity (similarity of values for the individual units that ... more Many variables have within group homogeneity (similarity of values for the individual units that comprise the groups). Measures of within group homogeneity are useful for the sample design and statistical analysis of datasets for populations that contain groups, such as individuals in geographical areas. Homogeneity measures can easily be defined for continuous or dichotomous variables. Here, we propose a homogeneity measure for a multi-category variable, and show how this measure can be calculated without access to individual level data. We apply the measure to data from the UK census, and show how this measure can be related to the homogeneity of particular linear combinations of the categories, called Canonical Grouping Variables (CGVs), and explain how these are interpreted.

Research paper thumbnail of Conditional and Unconditional Models in Model-Assisted Estimation of Finite Population Totals

Research paper thumbnail of Multiple membership models for social network and group dependencies

... In the ego-net approach, we break the network into n ego-nets, where the egos then are the &#... more ... In the ego-net approach, we break the network into n ego-nets, where the egos then are the 'groups' and their alters are group members. ... Social Networks 24 (1), 21–47. URL http://linkinghub.elsevier.com/retrieve/pii/S0378873301000491 Snijders, TAB, Baerveldt, C., ...

Research paper thumbnail of National Institute for Applied Statistics Research Australia

Research paper thumbnail of Potential gains from using unit level cost information in a model-assisted framework

In developing the sample design for a survey we attempt to produce a good design for the funds av... more In developing the sample design for a survey we attempt to produce a good design for the funds available. Information on costs can be used to develop sample designs that minimise the sampling variance of an estimator of total for fixed cost. Improvements in survey management systems mean that it is now sometimes possible to estimate the cost of including each unit in the sample. This paper develops relatively simple approaches to determine whether the potential gains arising from using this unit level cost information are likely to be of practical use. It is shown that the key factor is the coefficient of variation of the costs relative to the coefficient of variation of the relative error on the estimated cost coefficients.

Research paper thumbnail of Adjustins for Aggregation Effects in Ecological Regression

Research paper thumbnail of Targets of Inference and Methods of Analysis in Multi-Level Populations

Research paper thumbnail of Design and Analysis of Surveys Repeated over Time

Handbook of Statistics, 2009

This lecture will review the major issues associated with the design and analysis of repeated sur... more This lecture will review the major issues associated with the design and analysis of repeated surveys. The interaction between the design of a repeated survey and the methods used for estimation and analysis will be examined. The choice of rotation pattern will be considered in terms of the impact on the estimation of levels and changes. Composite and other forms of estimators will be reviewed and the interaction between design and estimation explored. Estimation of seasonally adjusted and trend estimates from repeated surveys will also be considered.

Research paper thumbnail of GENERATING INTER-CORRELATED OBSERVATIONS UNDER A SPECIFIED SPATIAL MODEL

he requirement to generate this random process needs only to define the variance-covariance matri... more he requirement to generate this random process needs only to define the
variance-covariance matrix of the random process. Since the random process
is defined in three dimensional space, then we can use a spatial model. One of
the spatial model to define the random process is in the form of variogram,
which is a function of distance between pairs of observations. The
variance-covariance matrix may be determined in relation with two other
properties, those are correlogram and covariogram.
The simulation process was started by generating a random points within a
particular shape of region. The locations are uniformly distributed within the
region. Lets V is a variance-covariance matrix of the random process Y[L].
The random process Y[L] may be defined by the semivariogram model γ(d ij ).
The d ij is a Cartesian distance between two different individual within domain
D of boundary B. The distribution-based approaches can be applied to
generate random observations using Choleski decomposition.

Research paper thumbnail of Identification of food groups for use in a self-administered, computer-assisted diet history interview for use in Australia

To develop a set of food groups for use in a self-administered, computer-assisted diet history in... more To develop a set of food groups for use in a self-administered, computer-assisted diet history interview for use in Australia by combining foods into groups so as to minimize database error in the macronutrient values for the groups. The program needs to appropriately balance the level of detail used with the load on respondents and errors associated with categorization of foods into groups.

Research paper thumbnail of Exploring Small Area Population Structure with Census Data

Methodology and Epistemology of Multilevel Analysis, 2003

Research paper thumbnail of Analysis Combining Survey and Geographically Aggregated Data

Wiley Series in Survey Methodology, 2003

Research paper thumbnail of Two-stage sample design with small clusters

K e y Words: sample design, household surveys, telephone surveys.

Research paper thumbnail of In Search of Spatial Structures

Research paper thumbnail of In Search of Spatial Structures (Using multilevel structures to understand the MAUP)

Research paper thumbnail of Accounting for the uncertainty of information on clustering in the design of a clustered sample

An important decision that has to be made in developing the design of a cluster or multi-stage sa... more An important decision that has to be made in developing the design of a cluster or multi-stage sampling scheme is the number of units to select at each stage of selection. For a two-stage design we need to decide the number of units to select from each Primary Sampling Unit (PSU) in the sample. A common approach is to estimate the costs and the variance components associated with each stage of selection and determine an optimal design. This is usually done for estimates of the means or totals of one or a small number of variables. In practice the measure of intra-cluster homogeneity, which is the ratio of the variance components, needs to be estimated from a pilot study or historical data. There may be considerable uncertainty about the intracluster correlation. The parameter can be close to zero and the estimate may even not differ significantly from zero, however a design based on zero intra-cluster correlation would be highly clustered and sensitive to any failure of this assumption. This paper considers the effect of uncertainty about the intra-cluster correlation and other relevant population parameters on sample design. We develop an approach to assess this uncertainty using a Bayesian bootstrap method.

Research paper thumbnail of Centre for Statistical and Survey Methodology

Semiparametric regression is a fusion between parametric regression and nonparametric regression ... more Semiparametric regression is a fusion between parametric regression and nonparametric regression and the title of a book that we published on the topic in early 2003. We review developments in the field during the five year period since the book was written. We find semiparametric regression to be a vibrant field with substantial involvement and activity, continual enhancement and widespread application.

Research paper thumbnail of Adjusting for Aggregation E ects in Ecological Regression

Research paper thumbnail of Research article Investigation of relative risk estimates from studies of the same population with contrasting response rates and designs

Research paper thumbnail of Pak. J. Statist. 2011 Vol. 27 (4), 529-541 CONDITIONAL AND UNCONDITIONAL MODELS IN MODEL-ASSISTED ESTIMATION OF FINITE POPULATION TOTALS

Research paper thumbnail of Measuring and analyzing the within group homogeneity of multi-category variables

Many variables have within group homogeneity (similarity of values for the individual units that ... more Many variables have within group homogeneity (similarity of values for the individual units that comprise the groups). Measures of within group homogeneity are useful for the sample design and statistical analysis of datasets for populations that contain groups, such as individuals in geographical areas. Homogeneity measures can easily be defined for continuous or dichotomous variables. Here, we propose a homogeneity measure for a multi-category variable, and show how this measure can be calculated without access to individual level data. We apply the measure to data from the UK census, and show how this measure can be related to the homogeneity of particular linear combinations of the categories, called Canonical Grouping Variables (CGVs), and explain how these are interpreted.

Research paper thumbnail of Conditional and Unconditional Models in Model-Assisted Estimation of Finite Population Totals

Research paper thumbnail of Multiple membership models for social network and group dependencies

... In the ego-net approach, we break the network into n ego-nets, where the egos then are the &#... more ... In the ego-net approach, we break the network into n ego-nets, where the egos then are the 'groups' and their alters are group members. ... Social Networks 24 (1), 21–47. URL http://linkinghub.elsevier.com/retrieve/pii/S0378873301000491 Snijders, TAB, Baerveldt, C., ...

Research paper thumbnail of National Institute for Applied Statistics Research Australia

Research paper thumbnail of Potential gains from using unit level cost information in a model-assisted framework

In developing the sample design for a survey we attempt to produce a good design for the funds av... more In developing the sample design for a survey we attempt to produce a good design for the funds available. Information on costs can be used to develop sample designs that minimise the sampling variance of an estimator of total for fixed cost. Improvements in survey management systems mean that it is now sometimes possible to estimate the cost of including each unit in the sample. This paper develops relatively simple approaches to determine whether the potential gains arising from using this unit level cost information are likely to be of practical use. It is shown that the key factor is the coefficient of variation of the costs relative to the coefficient of variation of the relative error on the estimated cost coefficients.

Research paper thumbnail of Adjustins for Aggregation Effects in Ecological Regression

Research paper thumbnail of Targets of Inference and Methods of Analysis in Multi-Level Populations