Introduction to widely used regression models in medical research using R, STATA, and SPSS : A tutorial (original) (raw)

Version 2.0-10 Date 2011/04/06 Title Companion to Applied Regression

2011

Calculates type-II or type-III analysis-of-variance tables for model objects produced by lm, glm, multinom (in the nnet package), polr (in the MASS package), coxph (in the survival package), coxme (in the coxme pckage), svyglm (in the survey package), rlm (in the MASS package), lmer in the lme4 package, lme in the nlme package, and (by the default method) for most models with a linear predictor and asymptotically normal coefficients (see details below). For linear models, F-tests are calculated; for generalized linear models, likelihood-ratio chisquare, Wald chisquare, or F-tests are calculated; for multinomial logit and proportional-odds logit models, likelihood-ratio tests are calculated. Various test statistics are provided for multivariate linear models produced by lm or manova. Partial-likelihood-ratio tests or Wald tests are provided for Cox models. Wald chi-square tests are provided for fixed effects in linear and generalized linear mixed-effects models. Wald chi-square or F tests are provided in the default case.

Version 2.0-2 Date 2010/07/30 Title Companion to Applied Regression

2010

Calculates type-II or type-III analysis-of-variance tables for model objects produced by lm, glm, multinom (in the nnet package), polr (in the MASS package), coxph (in the survival package), coxme (in the coxme pckage), svyglm (in the survey package), rlm (in the MASS package), lmer in the lme4 package, lme in the nlme package, and (by the default method) for most models with a linear predictor and asymptotically normal coefficients (see details below). For linear models, F-tests are calculated; for generalized linear models, likelihood-ratio chisquare, Wald chisquare, or F-tests are calculated; for multinomial logit and proportional-odds logit models, likelihood-ratio tests are calculated. Various test statistics are provided for multivariate linear models produced by lm or manova. Partial-likelihood-ratio tests or Wald tests are provided for Cox models. Wald chi-square tests are provided for fixed effects in linear and generalized linear mixed-effects models. Wald chi-square or F tests are provided in the default case.

A Review of the Logistic Regression Model with Emphasis on Medical Research

Journal of Data Analysis and Information Processing

This study explored and reviewed the logistic regression (LR) model, a multivariable method for modeling the relationship between multiple independent variables and a categorical dependent variable, with emphasis on medical research. Thirty seven research articles published between 2000 and 2018 which employed logistic regression as the main statistical tool as well as six text books on logistic regression were reviewed. Logistic regression concepts such as odds, odds ratio, logit transformation, logistic curve, assumption, selecting dependent and independent variables, model fitting, reporting and interpreting were presented. Upon perusing the literature, considerable deficiencies were found in both the use and reporting of LR. For many studies, the ratio of the number of outcome events to predictor variables (events per variable) was sufficiently small to call into question the accuracy of the regression model. Also, most studies did not report on validation analysis, regression diagnostics or goodness-of-fit measures; measures which authenticate the robustness of the LR model. Here, we demonstrate a good example of the application of the LR model using data obtained on a cohort of pregnant women and the factors that influence their decision to opt for caesarean delivery or vaginal birth. It is recommended that researchers should be more rigorous and pay greater attention to guidelines concerning the use and reporting of LR models.

The development of a statistical computer software resource for medical research

2000

This thesis is the result of my own work. The material contained in this thesis has not been presented, nor is currently being presented, either wholly or in part for any other degree or other qualification. Electronic enclosures This thesis is presented in written and electronic (computer software) forms. The software of this thesis (StatsDirect) can be accessed, from either the enclosed CD-ROM or Internet, as described in Appendix 1. The CD-ROM also contains video clips that relate to the Discussion and Conclusions chapter.

Regression as the Univariate General Linear Model: Examining Test Statistics, p values, Effect Sizes, and Descriptive Statistics Using R

General Linear Model Journal

This paper presents regression as the univariate general linear model (GLM). Building on the work of Cohen (1968), McNeil (1974), and Zientek and Thompson (2009), the paper uses descriptive statistics to build a small, simulated dataset that readers can use to verify that multiple linear regression (MLR) subsumes the univariate parametric analyses in the GLM. Unlike other related works, we provide R syntax that demonstrates how MLR produces equivalent test statistics, p values, effect sizes, and descriptive statistics when compared to the univariate analyses that MLR subsumes. The paper diverges from Zientek and Thompson by presenting an expanded hierarchy for MLR and demonstrating why only the case of the chi-square test of independence where the criterion variable is dichotomous, and not the general case, is subsumed by MLR. Readers will find an accessible treatment of the GLM as well as R syntax, which they can use to report descriptive statistics, p values, and effect sizes associated with the univariate parametric statistics in the GLM. n 1968, Cohen presented multiple linear regression (MLR) as the univariate general linear model (GLM). Since that time, Cohen's work has been extended to consider canonical correlation as the multivariate GLM (see Knapp, 1978) and structural equation modeling as an even more general case of the GLM (see Bagozzi, Fornell, & Larcker, 1981). As noted by Graham (2008), The vast majority of parametric statistical procedures in common use are part of [a single analytic family called] the General Linear Model (GLM), including the t test, analysis of variance (ANOVA), multiple regression, descriptive discriminant analysis (DDA), multivariate analysis of variance (MANOVA), canonical correlation analysis (CCA), and structural equation modeling (SEM). Moreover, these procedures are hierarchical in that some procedures are special cases of others. (p. 485). In addition to the hierarchical nature of the GLM is the concept that the subsumed analyses share three characteristics. Analyses in the GLM implicitly or explicitly are correlational in nature, yield variance accounted effect sizes, and produce scores on latent variables that are derived by applying weights to measured variables (Thompson, 2006, p. 360). Although the characteristics of the GLM seem to be straightforward, graduate students and emerging scholars are likely to benefit from being able to verify the hierarchical nature of the GLM through illustrations that compare univariate statistical analyses to MLR analyses. Not only has active learning been shown to be beneficial when learning statistics (White, 2015), research (e.g., Henson, Hull, & Williams, 2010) indicates that many graduate students and emerging scholars may have insufficient quantitative proficiency. Therefore, we offer an illustration of MLR as the univariate GLM that considers the similarities and differences in the test statistics, p values, effect sizes, and descriptive statistics generated. Namely, we consider ANCOVA, ANOVA, r, repeated measures ANOVA (RM ANOVA), independent samples t-test, paired-samples t-test, and single-sample t-test. Our interest in developing this work is similar to other methodologists who seek to "improve statistical practice, and thereby, improve the quality of the knowledge produced by the legions of researchers around the world who use these techniques on a daily basis" (Osborne, 2013, p. 1). We also make five unique contributions to the literature. We demonstrate MLR as the univariate GLM for parametric analyses using R, which is a free statistical programming language that is gaining popularity in social science research and that is compatible with Unix, Windows, and Mac operating systems (R Development Core Team, 2017). Prior contributions (e.g., Zientek & Thompson, 2009) have used commercial statistical software packages (e.g., SPSS). Second, we demonstrate that the hierarchical I

The ABC of linear regression analysis: What every author and editor should know

Regression analysis is a widely used statistical technique to build a model from a set of data on two or more variables. Linear regression is based on linear correlation, and assumes that change in one variable is accompanied by a proportional change in another variable. Simple linear regression, or bivariate regression, is used for predicting the value of one variable from another variable (predictor); however, multiple linear regression, which enables us to analyse more than one predictor or variable, is more commonly used. This paper explains both simple and multiple linear regressions illustrated with an example of analysis and also discusses some common errors in presenting the results of regression, including inappropriate titles, causal language, inappropriate conclusions, and misinterpretation.

%ggBaseline: a SAS macro for analyzing and reporting baseline characteristics automatically in medical research

Annals of Translational Medicine, 2018

Demographic tables are widely used to report baseline characteristics in medical research. However, the traditional copy-paste production method is time-consuming and frequently generates typing errors. Current available statistical tools are still far away from ideal, because they are difficult to understand and they lack flexibility. A user-friendly, dynamic, and flexible tool is needed for researchers to automate the creation of demographic tables. In this paper, we introduce a SAS macro, %ggBaseline, that automatically analyzing and reporting baseline characteristics with the final production of publication-quality demographic tables. The macro provides optional parameters that allow for the full customization of desired demographic tables. Since %ggBaseline allows for the quick creation of reproducible and fully customizable tables, it can be beneficial to academics, clinical trials and medical research studies by making the presentation and formatting of results faster and more efficient.

A Statistical Trainer: Regression Models in Biostatistics

PsycCRITIQUES, 2005

A Review of Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models by Eric Vittinghoff, David Glidden, Stephen Shiboski, and Charles McCulloch New York: Springer, 2005. 340 pp. ISBN 0-387-20275-7. $79.95