Deeper Spatial Statistical Insights into Small Geographic Area Data Uncertainty - PubMed (original) (raw)
Deeper Spatial Statistical Insights into Small Geographic Area Data Uncertainty
Daniel A Griffith et al. Int J Environ Res Public Health. 2020.
Abstract
Small areas refer to small geographic areas, a more literal meaning of the phrase, as well as small domains (e.g., small sub-populations), a more figurative meaning of the phrase. With post-stratification, even with big data, either case can encounter the problem of small local sample sizes, which tend to inflate local uncertainty and undermine otherwise sound statistical analyses. This condition is the opposite of that afflicting statistical significance in the context of big data. These two definitions can also occur jointly, such as during the standardization of data: small geographic units may contain small populations, which in turn have small counts in various age cohorts. Accordingly, big spatial data can become not-so-big spatial data after post-stratification by geography and, for example, by age cohorts. This situation can be ameliorated to some degree by the large volume of and high velocity of big spatial data. However, the variety of any big spatial data may well exacerbate this situation, compromising veracity in terms of bias, noise, and abnormalities in these data. The purpose of this paper is to establish deeper insights into big spatial data with regard to their uncertainty through one of the hallmarks of georeferenced data, namely spatial autocorrelation, coupled with small geographic areas. Impacts of interest concern the nature, degree, and mixture of spatial autocorrelation. The cancer data employed (from Florida for 2001-2010) represent a data category that is beginning to enter the realm of big spatial data; its volume, velocity, and variety are increasing through the widespread use of digital medical records.
Keywords: big data; big spatial data; cancer; small area; small geographic area.
Conflict of interest statement
The authors declare no conflict of interest.
Figures
Figure 1
Histograms for random samples from a continuous uniform distribution. Sample size n: left—10,000; middle—100,000; right—1,000,000. Bin size: top—0.1; middle—0.01; bottom—0.001.
Figure 2
The State of Florida, and the locations of the six studied metropolitan statistical areas (MSAs).
Figure 3
Scatterplots of reference population distributions, ages 20+. Left (a): paired by age cohorts; black denotes male, grey denotes female. Right (b): ordered by age cohorts; blue denotes World, red denotes US, and green denotes Florida (FL).
Figure 4
Scatterplots of crude cancer rates (vertical axis) versus age-and sex-adjusted cancer rates (horizontal axis); blue denotes World, red denotes US, and green denotes Florida (FL). (A). Top to bottom: cancer type (female breast, colorectal, lung & bronchus, melanoma skin, urinary bladder). Left to right: MSA (Jacksonville, Miami, Orland). (B). Top to bottom: cancer type (female breast, colorectal, lung & bronchus, melanoma skin, urinary bladder). Left to right: MSA Pensacola, Tallahassee, Tampa).
Figure 4
Scatterplots of crude cancer rates (vertical axis) versus age-and sex-adjusted cancer rates (horizontal axis); blue denotes World, red denotes US, and green denotes Florida (FL). (A). Top to bottom: cancer type (female breast, colorectal, lung & bronchus, melanoma skin, urinary bladder). Left to right: MSA (Jacksonville, Miami, Orland). (B). Top to bottom: cancer type (female breast, colorectal, lung & bronchus, melanoma skin, urinary bladder). Left to right: MSA Pensacola, Tallahassee, Tampa).
Similar articles
- Mind the Scales: Harnessing Spatial Big Data for Infectious Disease Surveillance and Inference.
Lee EC, Asher JM, Goldlust S, Kraemer JD, Lawson AB, Bansal S. Lee EC, et al. J Infect Dis. 2016 Dec 1;214(suppl_4):S409-S413. doi: 10.1093/infdis/jiw344. J Infect Dis. 2016. PMID: 28830109 Free PMC article. Review. - Quantifying the scale effect in geospatial big data using semi-variograms.
Chen L, Gao Y, Zhu D, Yuan Y, Liu Y. Chen L, et al. PLoS One. 2019 Nov 14;14(11):e0225139. doi: 10.1371/journal.pone.0225139. eCollection 2019. PLoS One. 2019. PMID: 31725781 Free PMC article. - Space-Time Statistical Insights about Geographic Variation in Lung Cancer Incidence Rates: Florida, USA, 2000⁻2011.
Hu L, Griffith DA, Chun Y. Hu L, et al. Int J Environ Res Public Health. 2018 Oct 30;15(11):2406. doi: 10.3390/ijerph15112406. Int J Environ Res Public Health. 2018. PMID: 30380763 Free PMC article. - Extended follow-up and spatial analysis of the American Cancer Society study linking particulate air pollution and mortality.
Krewski D, Jerrett M, Burnett RT, Ma R, Hughes E, Shi Y, Turner MC, Pope CA 3rd, Thurston G, Calle EE, Thun MJ, Beckerman B, DeLuca P, Finkelstein N, Ito K, Moore DK, Newbold KB, Ramsay T, Ross Z, Shin H, Tempalski B. Krewski D, et al. Res Rep Health Eff Inst. 2009 May;(140):5-114; discussion 115-36. Res Rep Health Eff Inst. 2009. PMID: 19627030 - Development and applications of GIS-based spatial analysis in environmental geochemistry in the big data era.
Xu H, Zhang C. Xu H, et al. Environ Geochem Health. 2023 Apr;45(4):1079-1090. doi: 10.1007/s10653-021-01183-8. Epub 2022 Jan 23. Environ Geochem Health. 2023. PMID: 35066745 Review.
Cited by
- The Community Assessment to Inform Rapid Response (CAIRR): A Novel Qualitative Data Collection and Analytic Process to Facilitate Hyperlocal COVID-19 Emergency Response Operations in New York City.
Ray M, Dannefer R, Pierre J, Shiman LJ, Helmy HL, Boyle SR, Chang JEM, Creighton A, Soto MA, Moran J. Ray M, et al. Disaster Med Public Health Prep. 2022 May 30;17:e180. doi: 10.1017/dmp.2022.135. Disaster Med Public Health Prep. 2022. PMID: 35634748 Free PMC article. - Modeling Community Health with Areal Data: Bayesian Inference with Survey Standard Errors and Spatial Structure.
Donegan C, Chun Y, Griffith DA. Donegan C, et al. Int J Environ Res Public Health. 2021 Jun 26;18(13):6856. doi: 10.3390/ijerph18136856. Int J Environ Res Public Health. 2021. PMID: 34206725 Free PMC article.
References
- De Mauro A., Greco M., Grimaldi M. A formal definition of big data based on its essential features. Libr. Rev. 2016;65:122–135. doi: 10.1108/LR-06-2015-0061. - DOI
- Jensen R., Griffith D., Monmonier M., De Gloria S., Herrington L., McMaster R., Can A. Final Report: New York State Program in Geographic Information and Analysis. Department of Geography, Syracuse University; Syracuse, NY, USA: 1990.
- Cressie N., Olsen A., Cook D. Massive data sets: Problems and possibilities, with applications to environmental monitoring. In: Committee on Applied and Theoretical Statistics; Board of Mathematical Sciences; National Research Council, editor. Massive Data Sets: Proceedings of a Workshop. National Academy Press; Washington, DC, USA: 1996. pp. 115–119.
- Ellis P. The Essential Guide to Effect Sizes: Statistical Power, Meta-Analysis, and the Interpretation of Research Results. Cambridge University Press; New York, NY, USA: 2010.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources