David Steel - Academia.edu (original) (raw)
Papers by David Steel
Handbook of Statistics, 2009
This lecture will review the major issues associated with the design and analysis of repeated sur... more This lecture will review the major issues associated with the design and analysis of repeated surveys. The interaction between the design of a repeated survey and the methods used for estimation and analysis will be examined. The choice of rotation pattern will be considered in terms of the impact on the estimation of levels and changes. Composite and other forms of estimators will be reviewed and the interaction between design and estimation explored. Estimation of seasonally adjusted and trend estimates from repeated surveys will also be considered.
he requirement to generate this random process needs only to define the variance-covariance matri... more he requirement to generate this random process needs only to define the
variance-covariance matrix of the random process. Since the random process
is defined in three dimensional space, then we can use a spatial model. One of
the spatial model to define the random process is in the form of variogram,
which is a function of distance between pairs of observations. The
variance-covariance matrix may be determined in relation with two other
properties, those are correlogram and covariogram.
The simulation process was started by generating a random points within a
particular shape of region. The locations are uniformly distributed within the
region. Lets V is a variance-covariance matrix of the random process Y[L].
The random process Y[L] may be defined by the semivariogram model γ(d ij ).
The d ij is a Cartesian distance between two different individual within domain
D of boundary B. The distribution-based approaches can be applied to
generate random observations using Choleski decomposition.
To develop a set of food groups for use in a self-administered, computer-assisted diet history in... more To develop a set of food groups for use in a self-administered, computer-assisted diet history interview for use in Australia by combining foods into groups so as to minimize database error in the macronutrient values for the groups. The program needs to appropriately balance the level of detail used with the load on respondents and errors associated with categorization of foods into groups.
Methodology and Epistemology of Multilevel Analysis, 2003
Wiley Series in Survey Methodology, 2003
K e y Words: sample design, household surveys, telephone surveys.
An important decision that has to be made in developing the design of a cluster or multi-stage sa... more An important decision that has to be made in developing the design of a cluster or multi-stage sampling scheme is the number of units to select at each stage of selection. For a two-stage design we need to decide the number of units to select from each Primary Sampling Unit (PSU) in the sample. A common approach is to estimate the costs and the variance components associated with each stage of selection and determine an optimal design. This is usually done for estimates of the means or totals of one or a small number of variables. In practice the measure of intra-cluster homogeneity, which is the ratio of the variance components, needs to be estimated from a pilot study or historical data. There may be considerable uncertainty about the intracluster correlation. The parameter can be close to zero and the estimate may even not differ significantly from zero, however a design based on zero intra-cluster correlation would be highly clustered and sensitive to any failure of this assumption. This paper considers the effect of uncertainty about the intra-cluster correlation and other relevant population parameters on sample design. We develop an approach to assess this uncertainty using a Bayesian bootstrap method.
Semiparametric regression is a fusion between parametric regression and nonparametric regression ... more Semiparametric regression is a fusion between parametric regression and nonparametric regression and the title of a book that we published on the topic in early 2003. We review developments in the field during the five year period since the book was written. We find semiparametric regression to be a vibrant field with substantial involvement and activity, continual enhancement and widespread application.
Many variables have within group homogeneity (similarity of values for the individual units that ... more Many variables have within group homogeneity (similarity of values for the individual units that comprise the groups). Measures of within group homogeneity are useful for the sample design and statistical analysis of datasets for populations that contain groups, such as individuals in geographical areas. Homogeneity measures can easily be defined for continuous or dichotomous variables. Here, we propose a homogeneity measure for a multi-category variable, and show how this measure can be calculated without access to individual level data. We apply the measure to data from the UK census, and show how this measure can be related to the homogeneity of particular linear combinations of the categories, called Canonical Grouping Variables (CGVs), and explain how these are interpreted.
... In the ego-net approach, we break the network into n ego-nets, where the egos then are the &#... more ... In the ego-net approach, we break the network into n ego-nets, where the egos then are the 'groups' and their alters are group members. ... Social Networks 24 (1), 21–47. URL http://linkinghub.elsevier.com/retrieve/pii/S0378873301000491 Snijders, TAB, Baerveldt, C., ...
In developing the sample design for a survey we attempt to produce a good design for the funds av... more In developing the sample design for a survey we attempt to produce a good design for the funds available. Information on costs can be used to develop sample designs that minimise the sampling variance of an estimator of total for fixed cost. Improvements in survey management systems mean that it is now sometimes possible to estimate the cost of including each unit in the sample. This paper develops relatively simple approaches to determine whether the potential gains arising from using this unit level cost information are likely to be of practical use. It is shown that the key factor is the coefficient of variation of the costs relative to the coefficient of variation of the relative error on the estimated cost coefficients.
Handbook of Statistics, 2009
This lecture will review the major issues associated with the design and analysis of repeated sur... more This lecture will review the major issues associated with the design and analysis of repeated surveys. The interaction between the design of a repeated survey and the methods used for estimation and analysis will be examined. The choice of rotation pattern will be considered in terms of the impact on the estimation of levels and changes. Composite and other forms of estimators will be reviewed and the interaction between design and estimation explored. Estimation of seasonally adjusted and trend estimates from repeated surveys will also be considered.
he requirement to generate this random process needs only to define the variance-covariance matri... more he requirement to generate this random process needs only to define the
variance-covariance matrix of the random process. Since the random process
is defined in three dimensional space, then we can use a spatial model. One of
the spatial model to define the random process is in the form of variogram,
which is a function of distance between pairs of observations. The
variance-covariance matrix may be determined in relation with two other
properties, those are correlogram and covariogram.
The simulation process was started by generating a random points within a
particular shape of region. The locations are uniformly distributed within the
region. Lets V is a variance-covariance matrix of the random process Y[L].
The random process Y[L] may be defined by the semivariogram model γ(d ij ).
The d ij is a Cartesian distance between two different individual within domain
D of boundary B. The distribution-based approaches can be applied to
generate random observations using Choleski decomposition.
To develop a set of food groups for use in a self-administered, computer-assisted diet history in... more To develop a set of food groups for use in a self-administered, computer-assisted diet history interview for use in Australia by combining foods into groups so as to minimize database error in the macronutrient values for the groups. The program needs to appropriately balance the level of detail used with the load on respondents and errors associated with categorization of foods into groups.
Methodology and Epistemology of Multilevel Analysis, 2003
Wiley Series in Survey Methodology, 2003
K e y Words: sample design, household surveys, telephone surveys.
An important decision that has to be made in developing the design of a cluster or multi-stage sa... more An important decision that has to be made in developing the design of a cluster or multi-stage sampling scheme is the number of units to select at each stage of selection. For a two-stage design we need to decide the number of units to select from each Primary Sampling Unit (PSU) in the sample. A common approach is to estimate the costs and the variance components associated with each stage of selection and determine an optimal design. This is usually done for estimates of the means or totals of one or a small number of variables. In practice the measure of intra-cluster homogeneity, which is the ratio of the variance components, needs to be estimated from a pilot study or historical data. There may be considerable uncertainty about the intracluster correlation. The parameter can be close to zero and the estimate may even not differ significantly from zero, however a design based on zero intra-cluster correlation would be highly clustered and sensitive to any failure of this assumption. This paper considers the effect of uncertainty about the intra-cluster correlation and other relevant population parameters on sample design. We develop an approach to assess this uncertainty using a Bayesian bootstrap method.
Semiparametric regression is a fusion between parametric regression and nonparametric regression ... more Semiparametric regression is a fusion between parametric regression and nonparametric regression and the title of a book that we published on the topic in early 2003. We review developments in the field during the five year period since the book was written. We find semiparametric regression to be a vibrant field with substantial involvement and activity, continual enhancement and widespread application.
Many variables have within group homogeneity (similarity of values for the individual units that ... more Many variables have within group homogeneity (similarity of values for the individual units that comprise the groups). Measures of within group homogeneity are useful for the sample design and statistical analysis of datasets for populations that contain groups, such as individuals in geographical areas. Homogeneity measures can easily be defined for continuous or dichotomous variables. Here, we propose a homogeneity measure for a multi-category variable, and show how this measure can be calculated without access to individual level data. We apply the measure to data from the UK census, and show how this measure can be related to the homogeneity of particular linear combinations of the categories, called Canonical Grouping Variables (CGVs), and explain how these are interpreted.
... In the ego-net approach, we break the network into n ego-nets, where the egos then are the &#... more ... In the ego-net approach, we break the network into n ego-nets, where the egos then are the 'groups' and their alters are group members. ... Social Networks 24 (1), 21–47. URL http://linkinghub.elsevier.com/retrieve/pii/S0378873301000491 Snijders, TAB, Baerveldt, C., ...
In developing the sample design for a survey we attempt to produce a good design for the funds av... more In developing the sample design for a survey we attempt to produce a good design for the funds available. Information on costs can be used to develop sample designs that minimise the sampling variance of an estimator of total for fixed cost. Improvements in survey management systems mean that it is now sometimes possible to estimate the cost of including each unit in the sample. This paper develops relatively simple approaches to determine whether the potential gains arising from using this unit level cost information are likely to be of practical use. It is shown that the key factor is the coefficient of variation of the costs relative to the coefficient of variation of the relative error on the estimated cost coefficients.