The Extent of Measurement Error in Longitudinal Earnings Data: Do Two Wrongs Make a Right? (original) (raw)
Related papers
Estimating Measurement Error in Annual Job Earnings: A Comparison of Survey and Administrative Data
Review of Economics and Statistics, 2013
We propose a new methodology that does not assume a prior specification of the statistical properties of the measurement errors and treats all sources as noisy measures of some underlying true value. The unobservable true value can be represented as a weighted average of all available measures, using weights that must be specified a priori unless there has been a truth audit. The Census Bureau's Survey of Income and Program Participation (SIPP) survey jobs are linked to Social Security Administration earnings data, creating two potential annual earnings observations. The reliability statistics for both sources are quite similar except for cases where the SIPP used imputations for some missing monthly earnings reports.
Sources of Measurement Errors in Earnings Data: New Estimates of Intergenerational Elasticities
2008
Using intergenerational data with a substantial part of the life-cycle earnings of children and almost the entire life-cycle earnings for their fathers, we present new estimates of intergenerational mobility in Norway. Extending the length of the fathers’ earnings windows from 5 to 30 years increases the estimated elasticities. Varying the age of father at observation has the opposite effect. Our
2000
This paper analyzes income misreporting propensities and magnitudes using the 1992 SIPP longitudinal file matched to Social Security Summary Earnings Records. Specifically, we focus on wage and salary and self-employment earnings. Our findings suggest that the 1992 SIPP accurately estimates the net number of earnings recipients, but tends to underestimate the amounts received. The misreporting pattern reveals that respondents on the lowest end of the income distribution tend to overreport earnings, while those at the higher end of the earnings distribution are more likely to underreport earnings. Additionally, it is shown that demographic characteristics can be used in a predictive model of misreporting, but they explain a larger fraction of the variation in overreporting than underreporting.
2007
Earnings nonresponse is currently about 30% in the CPS-ORG and 20% in the March CPS. Census imputes missing earnings by assigning nonrespondents the reported earnings of matched donors. Even if nonresponse is random, in a wage equation there exists severe "match bias" on coefficients attached to non-match imputation attributes (union status, industry, foreign-born, etc.) and imperfectly matched criteria (schooling, age, etc.). Bollinger-Hirsch (JOLE, July 2006) show that if nonresponse is conditional missing at random (i.e., ignorable nonresponse) then unbiased wage equation estimation can be achieved in several ways, most easily by deleting imputed earners from the sample. The focus of this paper is twofold. First, we examine whether or not nonresponse is ignorable in the CPS-ORG and March CPS. Second, we assess the effect of "proxy" respondents on earnings, since roughly half of CPS records are based on self-responses and half on responses from another household member. Earnings nonresponse varies little with respect to most earnings attributes, but is noticeably highest among those in the top percentiles of the predicted earnings distribution. Wage effects due to nonresponse and proxy reports are estimated using selection models and longitudinal analysis. Based on reasonable instruments to identify selection, we conclude that there is negative selection into response among men, but relatively little selection among women. Wage equation slope coefficients are affected little by selection (as compared to OLS results for respondent-only samples), but because of intercept shifts, our preliminary evidence suggests that men's (but not women's) wages are understated due to response bias by as much as 10%. By contrast, longitudinal results from the ORG suggest moderate response bias for women as well as men. OLS estimate of proxy effects on reported earnings indicates a modest negative effect (about 2-3%). What this masks, however, is large negative correlations between non-spousal proxy reports and earnings, combined with spousal proxy reports of earnings roughly equivalent to self-reports. The panel analysis indicates that much of the (non-spousal) proxy effect on earnings seen in cross-sectional analysis is due to worker heterogeneity or fixed effects (including selection into response, which is correlated with proxy reports). In short, proxy reports by spouses and non-spouses are only about 1-2% lower than are self-reports from these same individuals, but workers whose earnings are reported by non-spousal household members tend to have even lower wages for reasons not reflected by measured characteristics.
Modeling Earnings Measurement Error: A Multiple Imputation Approach
The Review of Economics and Statistics, 1996
Recent survey validation studies suggest that measurement error in earnings data is pervasive and violates classical measurement error assumptions, and therefore may bias estimation of cross-section and longitudinal earnings models. We model the structure of earnings measurement error using data from the Panel Study of Income Dynamics Validation Study (PSIDVS). We then use Rubin's (1987) multiple imputation techniques to estimate consistent earnings equations under nonclassical earnings measurement error in the PSID. Our technique is readily generalized, and the empirical results demonstrate the potential importance of correcting for measurement error in earnings and related data, particularly during recessions.
Journal of the Royal Statistical Society: Series A (Statistics in Society), 2011
This paper compares earnings data from the BHPS with those collected in the FRS, contrasting two different points in time (1995/96 and 2003/04), allowing us to assess the possible extent of differential attrition in the BHPS data. We perform non-parametric tests of equality at the centre of the distributions and over the whole earnings distributions. We then apply multivariate regression methods to establish whether the earnings data yield different results in relation to three typical uses of earnings data. The two surveys have fairly similar earnings data in the first comparison year, while sizable differences emerge in the later comparison. This finding suggests the important role played by attrition and 'vintage' effects.
SSRN Electronic Journal, 2011
The research program of the Center for Economic Studies (CES) produces a wide range of economic analyses to improve the statistical programs of the U.S. Census Bureau. Many of these analyses take the form of CES research papers. The papers have not undergone the review accorded Census Bureau publications and no endorsement should be inferred. Any opinions and conclusions expressed herein are those of the author(s) and do not necessarily represent the views of the U.S. Census Bureau. All results have been reviewed to ensure that no confidential information is disclosed. Republication in whole or part must be cleared with the authors. To obtain information about the series, see www.census.gov/ces or contact Cheryl Grim, Editor,
A Large-Scale Validation Study of Measurement Errors in Longitudinal Survey Data
Social Science Research Network, 2006
In this paper, we analyze measurement and classification errors in several key variables, including earnings and educational attainment, in a matched sample of survey and administrative longitudinal data. The data, spanning 1994-2001 and covering all sectors in the Danish economy, are much more comprehensive than usually seen in validation studies. Measurement errors in earnings are found to be much larger than reported in previous studies limited to one single firm. Individuals who attrite from the panel report their earnings significantly less accurate than individuals who are observed throughout the entire sampling period. Furthermore, females are found to report their earnings significantly more precise than males, part-time workers report significantly less accurate than full-time workers and low-income workers report significantly less accurate than workers with relatively higher income. Classification errors in categorical variables are found to be of about the same magnitude as previously found in the literature. We analyze whether response error in one variable makes it more likely that the same respondent will report other variables with error but do not find support for this hypothesis.
Impact of Non-Classical Measurement Error on Measures of Earnings Inequality and Mobility
Boston College and Social Security Administration. …, 2006
Measures of inequality and mobility based on self-reported earnings re ‡ect attributes of both the joint distribution of earnings and the joint distribution of measurement error and earnings. While classical measurement error would increase measures of inequality and mobility there is substantial evidence that measurement error in earnings is mean reverting.In this paper we present the analytical links between mean reversion and other sources of non-classical measurement error on meausres of inequality and mobility. The empirical importance of non-classical measurement error are explored using the Survey of Income and Program Participation matched to tax records.
Male Earnings Volatility in LEHD before, during, and after the Great Recession
arXiv: General Economics, 2020
This paper is part of a coordinated collection of papers on prime-age male earnings volatility. Each paper produces a similar set of statistics for the same reference population using a different primary data source. Our primary data source is the Census Bureau’s Longitudinal Employer-Household Dynamics (LEHD) infrastructure files. Using LEHD data from 1998 to 2016, we create a well-defined population frame to facilitate accurate estimation of temporal changes comparable to designed longitudinal samples of people. We show that earnings volatility, excluding increases during recessions, has declined over the analysis period, a finding robust to various sensitivity analyses. Although we find volatility is declining, the effect is not homogeneous, particularly for workers with tenuous labor force attachment for whom volatility is increasing. These “not stable” workers have earnings volatility approximately 30 times larger than stable workers, but more important for earnings volatility ...