A Large-Scale Validation Study of Measurement Errors in Longitudinal Survey Data (original) (raw)
Related papers
The Extent of Measurement Error in Longitudinal Earnings Data: Do Two Wrongs Make a Right?
Journal of Labor Economics, 1991
This paper examines the properties and prevalence of measurement error in longitudinal earnings data. The analysis compares Current Population Survey data to administrative Social Security payroll tax records for a sample of heads of households over two years. In contrast. to the typically assumed properties of measurement error, the results indicate that errors are serially correlated over two years and negatively correlated with true earnings (i.e., mean reverting). Moreover, reported earnings are more reliable for females than males. Overall, the ratio of the variance of the signal to the total variance is .82 for men and .92 for women. These ratios fall to .65 and .81 when the data are specified in first-differences. The estimates suggest that longitudinal earnings data may be more reliable than previously believed.
Estimating Measurement Error in Annual Job Earnings: A Comparison of Survey and Administrative Data
Review of Economics and Statistics, 2013
We propose a new methodology that does not assume a prior specification of the statistical properties of the measurement errors and treats all sources as noisy measures of some underlying true value. The unobservable true value can be represented as a weighted average of all available measures, using weights that must be specified a priori unless there has been a truth audit. The Census Bureau's Survey of Income and Program Participation (SIPP) survey jobs are linked to Social Security Administration earnings data, creating two potential annual earnings observations. The reliability statistics for both sources are quite similar except for cases where the SIPP used imputations for some missing monthly earnings reports.
2000
This paper analyzes income misreporting propensities and magnitudes using the 1992 SIPP longitudinal file matched to Social Security Summary Earnings Records. Specifically, we focus on wage and salary and self-employment earnings. Our findings suggest that the 1992 SIPP accurately estimates the net number of earnings recipients, but tends to underestimate the amounts received. The misreporting pattern reveals that respondents on the lowest end of the income distribution tend to overreport earnings, while those at the higher end of the earnings distribution are more likely to underreport earnings. Additionally, it is shown that demographic characteristics can be used in a predictive model of misreporting, but they explain a larger fraction of the variation in overreporting than underreporting.
Unit nonresponse errors in income surveys: a case study
Quality & Quantity, 2011
A survey on the economic and social conditions of households in the city of Modena was carried out in 2002 and in 2006 (two waves) by the CAPP (Centre for Analyses of Public Policies). In first wave of 2002, each designated sampling unit (i.e., the family) had three units as reserves. If the first refused to be interviewed, the interviewer contacted the three reserves, one after the other, until obtaining either one respondent or four non-participant units. At the end of the survey four categories of units were distinguished: interviewees, refusals, noncontacts, and unused reserves. All units were matched with their corresponding record in the databases of the Ministry of Finance of 2002 and the Census of 2001. The resulting data set permitted the analysis of unit or total nonresponses. The distribution of fiscal income showed different shapes for the four categories, implying a selective participation of the families. The interviewees yielded a positive bias of about 600€, holding constant other factors. The selection of the significant factors affecting nonresponse was performed via backward elimination in a logit model and with the lasso method. Participation increased as fiscal income and age increased and by education level (secondary school and university degree), while it decreased among entrepreneurs, independent workers, managers, and medium-to-low skilled workers.
Some Consequences of Measurement Error in Survey Data
American Journal of Political Science, 1974
socialization data, this paper first presents some estimates of the extent of measurement error in several standard face sheet items. After the presence of measurement error is demonstrated, two techniques involving multiple indicators and observations over time are employed to estimate the effects of measurement error on bivariate correlation coefficients with party identification providing the substantive vehicle of the analysis. In general, the analysis suggests that random measurement error may have a major impact on our coefficients and thereby result in misleading inferences. The advent of data archives such as the Inter-University Consortium for Political Research has been a boon to researchers wishing to engage in secondary analysis.' However, the reliance on data collected by others has a number of limitations, some quite obvious and others less so. In the former category is the likelihood that important variables were omitted in the data collection or that key concepts were not operationalized in a way suitable for the secondary analyst. But a more subtle problem of secondary analysis is that the investigator often has little feel for the quality of the data, for the extent and nature of the measurement error in the data. Hence, this paper will present some estimates of the amount of measurement error for some standard face sheet items in two survey data sets collected by a social science institute renowned for its quality control procedures. Then the effects of measurement error on correlation coefficients will be evaluated by a multiple-indicator approach and an observations-overtime strategy, both of which involve the use of path analysis techniques. By measurement error is meant any deviation from the true value of a *I am grateful to Aage Clausen, David Leege, and Robert Lehnen for their helpful comments and suggestions, and to M.
2007
Earnings nonresponse is currently about 30% in the CPS-ORG and 20% in the March CPS. Census imputes missing earnings by assigning nonrespondents the reported earnings of matched donors. Even if nonresponse is random, in a wage equation there exists severe "match bias" on coefficients attached to non-match imputation attributes (union status, industry, foreign-born, etc.) and imperfectly matched criteria (schooling, age, etc.). Bollinger-Hirsch (JOLE, July 2006) show that if nonresponse is conditional missing at random (i.e., ignorable nonresponse) then unbiased wage equation estimation can be achieved in several ways, most easily by deleting imputed earners from the sample. The focus of this paper is twofold. First, we examine whether or not nonresponse is ignorable in the CPS-ORG and March CPS. Second, we assess the effect of "proxy" respondents on earnings, since roughly half of CPS records are based on self-responses and half on responses from another household member. Earnings nonresponse varies little with respect to most earnings attributes, but is noticeably highest among those in the top percentiles of the predicted earnings distribution. Wage effects due to nonresponse and proxy reports are estimated using selection models and longitudinal analysis. Based on reasonable instruments to identify selection, we conclude that there is negative selection into response among men, but relatively little selection among women. Wage equation slope coefficients are affected little by selection (as compared to OLS results for respondent-only samples), but because of intercept shifts, our preliminary evidence suggests that men's (but not women's) wages are understated due to response bias by as much as 10%. By contrast, longitudinal results from the ORG suggest moderate response bias for women as well as men. OLS estimate of proxy effects on reported earnings indicates a modest negative effect (about 2-3%). What this masks, however, is large negative correlations between non-spousal proxy reports and earnings, combined with spousal proxy reports of earnings roughly equivalent to self-reports. The panel analysis indicates that much of the (non-spousal) proxy effect on earnings seen in cross-sectional analysis is due to worker heterogeneity or fixed effects (including selection into response, which is correlated with proxy reports). In short, proxy reports by spouses and non-spouses are only about 1-2% lower than are self-reports from these same individuals, but workers whose earnings are reported by non-spousal household members tend to have even lower wages for reasons not reflected by measured characteristics.