Quantifying the Relative Importance of Predictors in Multiple Linear Regression Analyses for Public Health Studies (original) (raw)
Related papers
Journal of Exposure Analysis and Environmental Epidemiology, 2003
Classification and regression tree methods represent a potentially powerful means of identifying patterns in exposure data that may otherwise be overlooked. Here, regression tree models are developed to identify associations between blood concentrations of benzene and lead and over 300 variables of disparate type (numerical and categorical), often with observations that are missing or below the quantitation limit. Benzene and lead are selected from among all the environmental agents measured in the NHEXAS Region V study because they are ubiquitous, and they serve as paradigms for volatile organic compounds (VOCs) and heavy metals, two classes of environmental agents that have very different properties. Two sets of regression models were developed. In the first set, only environmental and dietary measurements were employed as predictor variables, while in the second set these were supplemented with demographic and time-activity data. In both sets of regression models, the predictor variables were regressed on the blood concentrations of the environmental agents. Jack-knife cross-validation was employed to detect overfitting of the models to the data. Blood concentrations of benzene were found to be associated with: (a) indoor air concentrations of benzene; (b) the duration of time spent indoors with someone who was smoking; and (c) the number of cigarettes smoked by the subject. All these associations suggest that tobacco smoke is a major source of exposure to benzene. Blood concentrations of lead were found to be associated with: (a) house dust concentrations of lead; (b) the duration of time spent working in a closed workshop; and (c) the year in which the subject moved into the residence. An unexpected finding was that the regression trees identified timeactivity data as better predictors of the blood concentrations than the measurements in environmental and dietary media.
Environmental Research, 1999
DiuLion of Heolth Studies. Agency for Toric Substances and Disease Registry. A l l m a Georgia Recerved July 14.1998 ' We used previously collected data from the Agency for Toxic Sub. stances and Disease Registry's (ATSDRg) multisite lead and cad. mium study That study and the accompanying final report were supported in part by funds from the Comprehensive Environments1 Response, Compensation, and Liability Act trust fund through k a n t numbers H75lATH790082, H75lATH590119, and H75/ ATH790118, and through technical assistance provided by ATSDR, U.S. DepartmentofHealthand HumanServices, toillinois, Kansas. Missouri, and Pennsylvania. All study participantl; of the ATSDR multisite lead and cadmium study completed a participant consent form approved by the Centers for Disease Control and Prevention (CDC) and the human subjects review committee of eachof the re. specrive state health departments. Permission to collect blood and urine specimens for heavy metal exposure measurement and b i o medical testing was obtained from the parent or guardian ofeneh participant, dieted values and their corresponding predictioninter. vals varied by covariate level. The model shows that increased soil lead level is associated with elevated blood leads in children, but that predictions based on this regression model are subject to high levels of un certainty and variability. t iss8AcsdrrnicPrese
2019
This research case study explores possible limitations of current regression models in extrapolation to the low dose region of the dose-response curve, using epidemiological data for lead as an example, due to the existence of unrecognized and uncontrolled confounding. As described by Wilson and Wilson (2016), such confounding may arise “when the measured association between an exposure variable and an outcome is distorted by an effect of a third variable (called a confounding variable or confounder)”. Wilson and Wilson (2016) report that uncontrolled confounding may be contributing to overestimation of the effects of lead reported by Lanphear et al. (2005), specifically identifying confounding by parental education, intelligence, or household management.
The effect of confounding variables in studies of lead exposure and IQ
Critical Reviews in Toxicology, 2020
Methods proposed to address confounding variables frequently do not adequately distinguish confounding from covariation. A confounder is a variable that correlates both with the outcome and the major exposure variable. Accurate treatment of confounding is crucial to low dose extrapolation of the effects of chemical exposures based on epidemiology studies. This study explores the limitations of current regression models in extrapolation to the low dose region of the dose-response curve due to the existence of unrecognized and uncontrolled confounding, using epidemiological data for lead. Based on the reported data in analyses by Lanphear and colleagues and Crump and colleagues, and drawing on other studies, Wilson and Wilson considered maternal IQ, HOME score, SES, parental education, birthweight, smoking, and race as characteristic variables which may have interaction effects. This analysis identifies confounding variables based on the seven longitudinal cohorts in analyses conducted by Lanphear and colleagues and by Crump and colleagues and confirms maternal IQ, HOME score, maternal education and maternal marital status at birth are "Highly Likely" confounders, while race is a "Likely" confounder. The cohort data were reanalyzed using the methods presented by Crump and colleagues while also considering the interaction among the identified confounding variables. This analysis determined that confounders influence IQ estimates in a quantifiable way that may exceed or at least obscure previously-reported effects of blood lead on IQ with blood lead levels below 5 mg/dL; however, limitations in the datasets make predictions of the low dose doseresponse analysis questionable.
International Journal of Environmental Research and Public Health, 2013
In their recent article [1], Chari et al. call attention to the important subject of setting National Ambient Air Quality Standards (NAAQS) to provide requisite protection for public health, including the health of sensitive groups, as specified under the Clean Air Act (73 FR 66965) [2]. The authors focus on consideration of susceptibility to inform policy choices, using lead (Pb)-related neurocognitive effects and children from low socioeconomic status (SES) families in the context of alternative Pb standard levels. Our comments focus on the authors' analysis of the scientific evidence and not on policy. We agree with the authors that the health effects evidence for Pb indicates a role (or roles) for SES-related factors in influencing childhood Pb exposure and associated health effects. We disagree, however, with the authors' interpretation of the literature on SES influence on the shape of the concentration-response (C-R) relationship between children's blood Pb and IQ (e.g., steepness of the slope). We further address aspects of the scientific evidence that are important to the consideration of sensitive populations in the context of the Pb NAAQS, and how the U.S. Environmental Protection Agency (EPA) considered this evidence in setting the Pb NAAQS in 2008. The role of SES as a confounder and/or effect modifier of the associations between Pb exposure and health effects is complicated [3,4]. Lower SES is independently associated with an adverse impact on neurocognitive development [5], and often associated with higher Pb exposure and higher blood Pb concentration [3]). Consequently, SES is commonly treated as a potential confounder. Several studies have, however, examined SES as a potential modifier of the association of childhood Pb exposure with
Predicting Blood-Lead Levels Among U.S. Children at the Census Tract Level
Efforts to prevent childhood lead exposure are hindered by difficulty in predicting where exposure is concentrated in the absence of childhood blood-lead data. To help fill that gap, we created and validated a regression model to estimate childhood lead exposure in every census tract in the United States. Publicly available factors that were the most predictive of childhood blood-lead concentration were identified by a literature review and an evaluation of childhood blood-lead level (BLL) records from a public health surveillance program in Michigan (543,295 records for the years 1999–2009). The predictive power of the regression model was validated through a comparison to blood-lead surveillance program data from Mas-sachusetts (833,951 records for the years 2000–2009), Texas (838,368 records for the years 1999–2009), and National Health and Nutrition Examination Survey (NHANES) datasets. The regression model identified percentage of pre-1960 housing, percentage of population below poverty line, and percentage of population that is non-Hispanic black as the most predictive factors, with year, season, type of blood sample, and age of child as important covariates. The model based on Michigan data predicted geometric mean (GM) blood-lead concentrations within Michigan census tracts with an R 2 of 0.69, in Massachusetts with an R 2 of 0.28, and in Texas with an R 2 of 0.20 and represents a substantial improvement over the application of the NHANES national estimate to predict local childhood BLLs. Applying the model for 1-and 2-year olds combined across the United States found that the nationally aggregated predictions matched the NHANES blood-lead distributions within 10% of the GM and within 10% of the 95th percentile of the national distributions. Such estimates may help focus on childhood lead poisoning prevention efforts.
Science of The Total Environment, 2011
Previous studies identified a curvilinear association between aggregated blood lead (BL) and soil lead (SL) data in New Orleans census tracts. In this study we investigate the relationships between SL (mg/kg), age of child, and BL (μg/dL) of 55,551 children in 280 census tracts in metropolitan New Orleans, 2000 to 2005. Analyses include random effects regression models predicting BL levels of children (μg/dL) and random effects logistic regression models predicting the odds of BL in children exceeding 15, 10, 7, 5, and 3 μg/dL as a function of age and SL exposure. Economic benefits of SL reduction scenarios are estimated. A unit raise in median SL 0.5 significantly increases the BL level in children (b = 0.214 p = b0.01), and a unit change in Age 0.5 significantly increases child BL (b = 0.401, p = b 0.01). A unit change in Age 0.5 increases the odds of a child BL exceeding 10 μg/dL by a multiplicative factor of 1.23 (95% CI 1.21 to 1.25), and a unit (mg/kg) addition of SL increases the odds of child BL N 10 μg/dL by a factor of 1.13 (95% CI 1.12 to 1.14). Extrapolating from regression results, we find that a shift in SL regulatory standard from 400 to 100 mg/kg provides each child with an economic benefit ranging from 4710to4710 to 4710to12,624 ($US 2000). Children's BL is a curvilinear function of both age and level of exposure to neighborhood SL. Therefore, a change in SL regulatory standard from 400 to 100 mg/kg provides children with substantial economic benefit.
Assessing the relationship between environmental lead concentrations and adult blood lead levels
Risk Analysis, 1994
This paper presents a model for predicting blood lead levels in adults who are exposed to elevated environmental levels of lead. The model assumes a baseline blood lead level based on average blood lead levels for adults described in two recent U.S. studies. The baseline blood lead level in adults arises primarily from exposure to lead in diet. Media-specific ingestion and absorption parameters are assessed for the adult population, and a biokinetic slope factor that relates uptake of lead into the body to blood lead levels is estimated. These parameters are applied to predict blood lead levels for adults exposed to a hypothetical site with elevated lead levels in soil, dust and air. Blood lead levels ranging from approximately 3-57 &dl are predicted, depending on the exposure scenarios and assumptions. Gradient Corporation,