Novel methods to construct a representative sample for surveying California's unhoused population: the California Statewide Study of People Experiencing Homelessness - PubMed (original) (raw)
Novel methods to construct a representative sample for surveying California's unhoused population: the California Statewide Study of People Experiencing Homelessness
Paul Wesson et al. Am J Epidemiol. 2025.
Abstract
Existing literature on people experiencing homelessness (PEH) draws on nonrepresentative samples from service providers, populations with comorbidities, or areas with disproportionately high amounts of sheltered homelessness, leading to bias. Nearly one-third of PEH in the United States and more than half of unsheltered PEH live in California. We designed a rigorous state-representative survey of PEH to investigate the antecedents of homelessness, understand health, and inform policy solutions. The multistage design randomized at 3 levels: county, venue, and individual. Stratifying the state into 8 regions, we sampled 1 county per region to reflect statewide demographics. Within counties, sampled venues matched the expected proportion of sheltered and unsheltered residents. Within venues, interviewers randomly sampled individuals. We adjusted for nonresponse and incorporated poststratification to benchmarks. In parallel, respondent-driven sampling reached subpopulations through social networks who may otherwise have been undersampled. Our community-engaged study yielded 3200 quantitative surveys. We purposively sampled 365 participants for qualitative interviews. Demographic estimates match those found in the PIT with the added strength of statistical inference. To our knowledge, this is the first large representative study of PEH, beyond a single county, to draw inference on a large population that did not depend on service utilization. Our methods may inform future efforts to understand homelessness.
Keywords: homelessness; housing; respondent-driven sampling; sampling methods; venue-based sampling.
© The Author(s) 2024. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Conflict of interest statement
None declared.
Figures
Figure 1
Map of California counties, stratified by geographic regions.
Figure 2
Venue-based sampling prediction model variable importance plot. (A) We used random forest to regress the selection probabilities for participants in the venue-based sample on to potential predictors of selection. Variables identified by random forest to be important predictors were then included in a prediction model to estimate venue-based selection probabilities for participants not sampled through venues (ie, participants sampled through respondent driven sampling). (B) Respondent-driven sampling (RDS) prediction model variable importance plot. Random forest was used to regress the binary outcome of selection by RDS, among all study participants, on to potential predictors for selection. Variables identified by random forest to be important predictors were then included in a prediction model to estimate RDS selection probabilities for participants not sampled through RDS (ie, participants sampled through venue-based sampling). (C) Respondent-driven sampling predicted probability distribution. Using the prediction model built from the random forest variable importance analysis (B), RDS selection probabilities were estimated for all participants. The green distribution depicts the distribution of predicted RDS selection probabilities for participants recruited through RDS. The orange distribution depicts the distribution of predicted RDS selection probabilities for participants recruited through venue-based sampling. (D) Overall inclusion probability distribution. Green distribution depicts the overall inclusion probabilities for participants recruited through RDS. Orange distribution depicts the overall inclusion probabilities for participants recruited through venue-based sampling. Abbreviations: AUC, area under the curve; RMSE, root mean squared error.
Figure 3
California Statewide Study of People Experiencing Homelessness participant flow diagram. RDS, respondent-driven sampling, VBS, venue-based sampling.
Figure 4
Respondent-driven sampling recruitment tree for the California Statewide Study of People Experiencing Homelessness. Each node represents a unique study participant. Nodes are color-coded by county (N = 8 counties). Lines connecting nodes, from top to bottom, indicate recruitment patterns (ie, who recruited whom).
References
- de Sousa T, Andrichik A, Prestera E, et al. The 2023 Annual Homelessness Assessment Report (AHAR) to Congress part 1: Point-in-time estimates of homelessness. US Department of Housing and Urban Development; 2023. Accessed January 4, 2024. https://www.huduser.gov/portal/sites/default/files/pdf/2023-AHAR-Part-1.pdf
- U.S. Department of Housing and Urban Development . HMIS Data and Technical Standards. Accessed November 1, 2023. https://www.hudexchange.info/programs/hmis/hmis-data-and-technical-stand...
- Model interview-based unsheltered night of count PIT survey. April 2023. Accessed November 2, 2023. https://www.hudexchange.info/resources/documents/Model-Interview-Based-U...
MeSH terms
Grants and funding
- K01 AI145572/AI/NIAID NIH HHS/United States
- K24 AG046372/AG/NIA NIH HHS/United States
- (grant 2K24AG046372)/GF/NIH HHS/United States
- grant K01 AI145572/NH/NIH HHS/United States
LinkOut - more resources
Full Text Sources
Medical