Susana Rubin - Academia.edu (original) (raw)
Papers by Susana Rubin
Proceedings of the Amer. Statist. Assoc. Section on …, 2006
In business surveys, data typically are skewed and the standard approach for small area estimatio... more In business surveys, data typically are skewed and the standard approach for small area estimation (SAE) based on linear mixed models lead to inefficient estimates. In this paper, we discuss SAE techniques for skewed data that are linear following a suitable ...
Journal of Survey Statistics and Methodology, 2016
The pseudo–Empirical Best Linear Unbiased Predictor (pseudo-EBLUP) was previously developed for s... more The pseudo–Empirical Best Linear Unbiased Predictor (pseudo-EBLUP) was previously developed for simple means under the basic nested error regression model with constant error variances. In this paper, we extend the estimator to the pseudo-EBLUP of weighted means under the one-fold nested error regression model with heteroscedastic errors. The extended pseudo-EBLUP estimator takes into account the survey weights and the economic weights that make up the weighted mean. We obtain a second-order approximation to the mean squared error (MSE) of the extended pseudo-EBLUP estimator and an unbiased estimator of the MSE also up to the second order. We illustrate the methodology using a synthetic population based on a sample from the Canadian Survey of Employment, Payrolls and Hours (SEPH): We compare the extended pseudo-EBLUP with other model-based and direct cross-sectional domain estimators in terms of design-based MSE, and compare the model-based MSE estimates with the Monte Carlo design–based MSE.
Statistics & Risk Modeling, 1986
ABSTRACT
Journal of Multivariate Analysis, 2011
We are concerned about inference on a parameter of a stochastic model with an estimator using dat... more We are concerned about inference on a parameter of a stochastic model with an estimator using data from a complex sample. Classical sampling theory concerns
We study the feasibility of producing relevant statistics to measure the impact of globalization ... more We study the feasibility of producing relevant statistics to measure the impact of globalization in the Canadian economy. In this project we require means of key economic variables for the wholesale industry at the level of “Trade Group ” by “Province ” by “Globalization Indicators”. We consider small area estimation using area level models with random effects and use penalized splines in order to accommodate departures from linearity. We propose a new bootstrap method to estimate the mean squared errors of the small area estimators. We illustrate the methodology with data from a particular trade group.
We use the weighted sample partial likelihood score (SPLS) function to fit the proportional hazar... more We use the weighted sample partial likelihood score (SPLS) function to fit the proportional hazards regression model to survey data with complex sampling designs. The sample maximum partial likelihood estimator is the solution of the sample partial likelihood score function. Many authors applied this method to fit survival survey data. Binder (1992) dealt with inference on the descriptive census population parameter, that is, design-based inference on the maximum partial likelihood estimate that could be calculated had a census been taken on the finite population. Lin (2000) gave a formal justification of Binder’s method under the super-population approach and dealt with inference on the model parameter. Neither Binder nor Lin provided conditions for the respective asymptotic results to hold. Rubin-Bleuer (2003) uses Lin’s (2000) set up of the super-population approach and develops counting process methodology for a joint design-model space to obtain, under stated sufficient model a...
An approximation to the Sample Partial Likelihood Score (SPLS) in design probability was first pr... more An approximation to the Sample Partial Likelihood Score (SPLS) in design probability was first proposed by Binder (1992) and then considered by Lin (2000), in order to derive asymptotic theory for the parameters of the proportional hazards regression model. However they did not provide conditions under which this property held. In this paper I use Lin’s (2000) set up of the super-population approach and develop counting process methodology for a joint design-model space, to obtain, under sufficient model and design conditions, a rigorous proof of the approximation to the SPLS proposed by Binder (1992).
The Survey of Employment, Payroll and Hours provides monthly estimates of payroll, employment, pa... more The Survey of Employment, Payroll and Hours provides monthly estimates of payroll, employment, paid hours and earnings. The proposed new design uses survey and payroll deduction (administrative) data to produce generalized regression (GREG) estimators for average weekly earnings, which are approximately unbiased and have a controlled cv for pre-determined strata. As in most surveys, there are many domains of interest with small sample size for which the GREG estimators might have unacceptable large measures of error. In this paper, we extend the Pseudo-EBLUP method of You and Rao (2002) to the case of unequal error variances, and we study the relative performance of several cross-sectional small domain estimators using real and synthetic populations.
We study the feasibility of producing relevant statistics to measure the impact of globalization ... more We study the feasibility of producing relevant statistics to measure the impact of globalization in the Canadian economy. In this project we require means of key economic variables for the wholesale industry at the level of “Trade Group” by “Province” by “Globalization Indicators”. We consider small area estimation using area level models with random effects and use penalized splines in order to accommodate departures from linearity. We propose a new bootstrap method to estimate the mean squared errors of the small area estimators. We illustrate the methodology with data from a particular trade group.
The Survey of Employment, Payroll and Hours provides monthly estimates of payroll, employment, pa... more The Survey of Employment, Payroll and Hours provides monthly estimates of payroll, employment, paid hours and earnings. The proposed new design uses survey and payroll deduction (administrative) data to produce generalized regression (GREG) estimators for average weekly earnings, which are approximately unbiased and have a controlled cv for pre-determined strata. As in most surveys, there are many domains of interest with small sample size for which the GREG estimators might have unacceptable large measures of error. In this paper, we extend the Pseudo-EBLUP method of You and Rao (2002) to the case of unequal error variances, and we study the relative performance of several cross-sectional small domain estimators using real and synthetic populations.
We are concerned about inference on a parameter of a stochastic model with an estimator using dat... more We are concerned about inference on a parameter of a stochastic model with an estimator using data from a complex sample. Classical sampling theory concerns inferences for finite population parameters. H~jek (1960), Krewski and Rao (1981), Binder (1983) and others, studied and obtained results on the asymptotic properties of the sample estimator under simple random sample and some complex designs. On the other hand, Hartley and Silken (1975), Fuller (1975), Francisco and Fuller (1991) and others, studied the properties of the sample estimator with respect to a model parameter, some times called superpopulation parameter. They obtained asymptotic results for regression sample estimators using data from certain complex sampling designs. Underlying their set ups, there was the notion of a "superpopulation" defined on a probability space (~,F,P) and the finite population was a considered a realization of it for an outcome to e [Z . The observed sample would be the second phase...
We establish a mathematical framework that formally validates the two-phase "superpopulation view... more We establish a mathematical framework that formally validates the two-phase "superpopulation viewpoint" proposed by Hartley and Sielken (1975), by defining a product probability space which includes both the design space and the model space. We develop a general methodology that combines finite population sampling theory and classical theory of infinite population sampling to account for the underlying processes that produce the data. Key results in this article are: the sample estimator and the model statistic are asymptotically independent; if a sequence converges in design law, it also converges in the law of the product space; and the distribution theory of the sample estimating equation estimator around a super-population parameter. We also study the interplay between dependence and independence of random variables when viewed in the design space, the product space and the model space and apply it to show formally that under a "simple random sample without replacement" design, we can "ignore" the design and work on the realm of the model space, but that under "simple random sample with replacement" we cannot ignore the design.
An approximation to the Sample Partial Likelihood Score (SPLS) in design probability was first pr... more An approximation to the Sample Partial Likelihood Score (SPLS) in design probability was first proposed by Binder (1992) and then considered by Lin (2000), in order to derive asymptotic theory for the parameters of the proportional hazards regression model. However they did not provide conditions under which this property held. In this paper I use Lin's (2000) set up of
Statistics & Risk Modeling, 1986
ABSTRACT
Proceedings of the Amer. Statist. Assoc. Section on …, 2006
In business surveys, data typically are skewed and the standard approach for small area estimatio... more In business surveys, data typically are skewed and the standard approach for small area estimation (SAE) based on linear mixed models lead to inefficient estimates. In this paper, we discuss SAE techniques for skewed data that are linear following a suitable ...
Journal of Survey Statistics and Methodology, 2016
The pseudo–Empirical Best Linear Unbiased Predictor (pseudo-EBLUP) was previously developed for s... more The pseudo–Empirical Best Linear Unbiased Predictor (pseudo-EBLUP) was previously developed for simple means under the basic nested error regression model with constant error variances. In this paper, we extend the estimator to the pseudo-EBLUP of weighted means under the one-fold nested error regression model with heteroscedastic errors. The extended pseudo-EBLUP estimator takes into account the survey weights and the economic weights that make up the weighted mean. We obtain a second-order approximation to the mean squared error (MSE) of the extended pseudo-EBLUP estimator and an unbiased estimator of the MSE also up to the second order. We illustrate the methodology using a synthetic population based on a sample from the Canadian Survey of Employment, Payrolls and Hours (SEPH): We compare the extended pseudo-EBLUP with other model-based and direct cross-sectional domain estimators in terms of design-based MSE, and compare the model-based MSE estimates with the Monte Carlo design–based MSE.
Statistics & Risk Modeling, 1986
ABSTRACT
Journal of Multivariate Analysis, 2011
We are concerned about inference on a parameter of a stochastic model with an estimator using dat... more We are concerned about inference on a parameter of a stochastic model with an estimator using data from a complex sample. Classical sampling theory concerns
We study the feasibility of producing relevant statistics to measure the impact of globalization ... more We study the feasibility of producing relevant statistics to measure the impact of globalization in the Canadian economy. In this project we require means of key economic variables for the wholesale industry at the level of “Trade Group ” by “Province ” by “Globalization Indicators”. We consider small area estimation using area level models with random effects and use penalized splines in order to accommodate departures from linearity. We propose a new bootstrap method to estimate the mean squared errors of the small area estimators. We illustrate the methodology with data from a particular trade group.
We use the weighted sample partial likelihood score (SPLS) function to fit the proportional hazar... more We use the weighted sample partial likelihood score (SPLS) function to fit the proportional hazards regression model to survey data with complex sampling designs. The sample maximum partial likelihood estimator is the solution of the sample partial likelihood score function. Many authors applied this method to fit survival survey data. Binder (1992) dealt with inference on the descriptive census population parameter, that is, design-based inference on the maximum partial likelihood estimate that could be calculated had a census been taken on the finite population. Lin (2000) gave a formal justification of Binder’s method under the super-population approach and dealt with inference on the model parameter. Neither Binder nor Lin provided conditions for the respective asymptotic results to hold. Rubin-Bleuer (2003) uses Lin’s (2000) set up of the super-population approach and develops counting process methodology for a joint design-model space to obtain, under stated sufficient model a...
An approximation to the Sample Partial Likelihood Score (SPLS) in design probability was first pr... more An approximation to the Sample Partial Likelihood Score (SPLS) in design probability was first proposed by Binder (1992) and then considered by Lin (2000), in order to derive asymptotic theory for the parameters of the proportional hazards regression model. However they did not provide conditions under which this property held. In this paper I use Lin’s (2000) set up of the super-population approach and develop counting process methodology for a joint design-model space, to obtain, under sufficient model and design conditions, a rigorous proof of the approximation to the SPLS proposed by Binder (1992).
The Survey of Employment, Payroll and Hours provides monthly estimates of payroll, employment, pa... more The Survey of Employment, Payroll and Hours provides monthly estimates of payroll, employment, paid hours and earnings. The proposed new design uses survey and payroll deduction (administrative) data to produce generalized regression (GREG) estimators for average weekly earnings, which are approximately unbiased and have a controlled cv for pre-determined strata. As in most surveys, there are many domains of interest with small sample size for which the GREG estimators might have unacceptable large measures of error. In this paper, we extend the Pseudo-EBLUP method of You and Rao (2002) to the case of unequal error variances, and we study the relative performance of several cross-sectional small domain estimators using real and synthetic populations.
We study the feasibility of producing relevant statistics to measure the impact of globalization ... more We study the feasibility of producing relevant statistics to measure the impact of globalization in the Canadian economy. In this project we require means of key economic variables for the wholesale industry at the level of “Trade Group” by “Province” by “Globalization Indicators”. We consider small area estimation using area level models with random effects and use penalized splines in order to accommodate departures from linearity. We propose a new bootstrap method to estimate the mean squared errors of the small area estimators. We illustrate the methodology with data from a particular trade group.
The Survey of Employment, Payroll and Hours provides monthly estimates of payroll, employment, pa... more The Survey of Employment, Payroll and Hours provides monthly estimates of payroll, employment, paid hours and earnings. The proposed new design uses survey and payroll deduction (administrative) data to produce generalized regression (GREG) estimators for average weekly earnings, which are approximately unbiased and have a controlled cv for pre-determined strata. As in most surveys, there are many domains of interest with small sample size for which the GREG estimators might have unacceptable large measures of error. In this paper, we extend the Pseudo-EBLUP method of You and Rao (2002) to the case of unequal error variances, and we study the relative performance of several cross-sectional small domain estimators using real and synthetic populations.
We are concerned about inference on a parameter of a stochastic model with an estimator using dat... more We are concerned about inference on a parameter of a stochastic model with an estimator using data from a complex sample. Classical sampling theory concerns inferences for finite population parameters. H~jek (1960), Krewski and Rao (1981), Binder (1983) and others, studied and obtained results on the asymptotic properties of the sample estimator under simple random sample and some complex designs. On the other hand, Hartley and Silken (1975), Fuller (1975), Francisco and Fuller (1991) and others, studied the properties of the sample estimator with respect to a model parameter, some times called superpopulation parameter. They obtained asymptotic results for regression sample estimators using data from certain complex sampling designs. Underlying their set ups, there was the notion of a "superpopulation" defined on a probability space (~,F,P) and the finite population was a considered a realization of it for an outcome to e [Z . The observed sample would be the second phase...
We establish a mathematical framework that formally validates the two-phase "superpopulation view... more We establish a mathematical framework that formally validates the two-phase "superpopulation viewpoint" proposed by Hartley and Sielken (1975), by defining a product probability space which includes both the design space and the model space. We develop a general methodology that combines finite population sampling theory and classical theory of infinite population sampling to account for the underlying processes that produce the data. Key results in this article are: the sample estimator and the model statistic are asymptotically independent; if a sequence converges in design law, it also converges in the law of the product space; and the distribution theory of the sample estimating equation estimator around a super-population parameter. We also study the interplay between dependence and independence of random variables when viewed in the design space, the product space and the model space and apply it to show formally that under a "simple random sample without replacement" design, we can "ignore" the design and work on the realm of the model space, but that under "simple random sample with replacement" we cannot ignore the design.
An approximation to the Sample Partial Likelihood Score (SPLS) in design probability was first pr... more An approximation to the Sample Partial Likelihood Score (SPLS) in design probability was first proposed by Binder (1992) and then considered by Lin (2000), in order to derive asymptotic theory for the parameters of the proportional hazards regression model. However they did not provide conditions under which this property held. In this paper I use Lin's (2000) set up of
Statistics & Risk Modeling, 1986
ABSTRACT