%svy_logistic_regression: A generic SAS macro for simple and multiple logistic regression and creating quality publication-ready tables using survey or non-survey data (original) (raw)

%svy_freqs: A generic SAS macro for cross-tabulation between a factor and a by-group variable given a third variable and creating publication-quality tables using data from complex surveys

2019

Introduction: In epidemiological studies, cross-tabulations are a simple but important tool for understanding the distribution of socio-demographic characteristics among study participants. They become more useful when comparisons are presented using a by-group variable such as key demographic characteristic or an outcome status; for instance, sex or the presence or absence of a disease status. Most available statistical analysis software can easily perform crosstabulations, however, output from these must be processed further to make it readily available for review and use in a publication. In addition, performing three-way cross-tabulations of complex survey data such as those required to show the distribution of disease prevalence across multiple factors and a by-group variable is not easily implemented directly using available standard procedures of commonly used statistical software. Methods: We developed a generic SAS macro, %svy_freqs, to create quality publication-ready tables from cross-tabulations between a factor and a by-group variable given a third variable using survey or non-survey data. The SAS macro also performs classical two-way crosstabulations and refines output into publication-quality tables. It provides extra features not available in existing procedures such as ability to incorporate parameters for survey design and replication-based variance estimation methods, performing validation checks for input parameters, transparently formatting character variable values into numeric ones and allowing for generalizability. Results: We demonstrate the application of the SAS macro in the analysis of data from the 2013-2014 National Health and Nutrition Examination Survey (NHANES), a complex survey also made available for use under a CC0 license.

LOGITSE: A SAS ® Macro for Logistic Regression Modeling in Complex Surveys

Traditional formulae for standard errors and subsequent statistical significance tests implemented in various popular statistical packages are based on the premise that the data are a simple random sample (SRS) of observations from a superpopulation. Equivalently, the observations are assumed to be independent and identically distributed (IID).

%ggBaseline: a SAS macro for analyzing and reporting baseline characteristics automatically in medical research

Annals of Translational Medicine, 2018

Demographic tables are widely used to report baseline characteristics in medical research. However, the traditional copy-paste production method is time-consuming and frequently generates typing errors. Current available statistical tools are still far away from ideal, because they are difficult to understand and they lack flexibility. A user-friendly, dynamic, and flexible tool is needed for researchers to automate the creation of demographic tables. In this paper, we introduce a SAS macro, %ggBaseline, that automatically analyzing and reporting baseline characteristics with the final production of publication-quality demographic tables. The macro provides optional parameters that allow for the full customization of desired demographic tables. Since %ggBaseline allows for the quick creation of reproducible and fully customizable tables, it can be beneficial to academics, clinical trials and medical research studies by making the presentation and formatting of results faster and more efficient.

SAS Global Forum 2008 Statistics and Data Analysis Paper 369-2008 How to use SAS ® to fit Multiple Logistic Regression Models

2010

When response outcomes are continuous error terms in models are normally distributed and a standard normal distribution function is adequate. The logistic distribution function which is very similar to the normal distribution function is required when the response variable is binary. Parameters of a logistic response function are often estimated using the method of maximum likelihood (ML). One of the problems with ML estimation is that, no closed-form solution exists for the values of the parameters that maximize the loglikelihood function. Hence sophisticated computer-intensive numerical search procedures (i.e: Newton Raphson) are required to find ML estimates of parameters. This paper is a step by step guide to develop a multiple logistic regression model for data sets with binary response variable using PROC LOGISTIC in SAS®. Since PROC LOGISTIC requires uniform coding and does not accommodate missing data, data need be corrected for missing values and for outliers, those can red...

Prevalence ratio estimation via logistic regression: a tool in R

Anais da Academia Brasileira de Ciências, 2021

The interpretation of odds ratios (OR) as prevalence ratios (PR) in cross-sectional studies have been criticized since this equivalence is not true unless under specific circumstances. The logistic regression model is a very well known statistical tool for analysis of binary outcomes and frequently used to obtain adjusted OR. Here, we introduce the prLogistic for the R statistical computing environment which can be obtained from The Comprehensive R Archive Network, https://cran.r-project.org/ package=prLogistic. The package prLogistic was built to assist the estimation of PR via logistic regression models adjusted by delta method and bootstrap for analysis of independent and correlated binary data. Two applications are presented to illustrate its use for analysis of independent observations and data from clustered studies.

%svy_freqs: A Generic SAS Macro for Creating Publication-Quality Three-Way Cross-Tabulations

Journal of open research software, 2021

Cross-tabulations are a simple but important tool for understanding the distribution of socio-demographic characteristics among participants in epidemiological studies. We developed a generic SAS macro, %svy_freqs, to create publication-quality tables from cross-tabulations between a factor and a by-group variable given a third variable using survey or non-survey data. The macro also performs two-way cross-tabulations and provides extra features not available in existing procedures such as ability to incorporate parameters for survey design and replication-based variance estimation methods, performing validation checks for input parameters, transparently formatting variable values from character into numeric and allowing for generalizability. We demonstrate the macro using the 2013-2014 National Health and Nutrition Examination Survey (NHANES), a complex survey designed to assess the health and nutritional status of adults and children in the United States.

Three algorithms and SAS macros for estimating power and sample size for logistic models with one or more independent variables of interest in the presence of covariates

Source Code for Biology and Medicine, 2014

Background: Commonly when designing studies, researchers propose to measure several independent variables in a regression model, a subset of which are identified as the main variables of interest while the rest are retained in a model as covariates or confounders. Power for linear regression in this setting can be calculated using SAS PROC POWER. There exists a void in estimating power for the logistic regression models in the same setting. Methods: Currently, an approach that calculates power for only one variable of interest in the presence of other covariates for logistic regression is in common use and works well for this special case. In this paper we propose three related algorithms along with corresponding SAS macros that extend power estimation for one or more primary variables of interest in the presence of some confounders. Results: The three proposed empirical algorithms employ likelihood ratio test to provide a user with either a power estimate for a given sample size, a quick sample size estimate for a given power, and an approximate power curve for a range of sample sizes. A user can specify odds ratios for a combination of binary, uniform and standard normal independent variables of interest, and or remaining covariates/confounders in the model, along with a correlation between variables. Conclusions: These user friendly algorithms and macro tools are a promising solution that can fill the void for estimation of power for logistic regression when multiple independent variables are of interest, in the presence of additional covariates in the model.

On the Examination of the Reliability of Statistical Software for Estimating Logistic Regression Models

2015

The numerical reliability of software packages was examined for the logistic regression model. Software tested include SAS 9.3, MATLAB R2012a, R 3.1.0, Stata/IC 13.1 and LIMDEP 10.5. Thirty benchmark datasets were created by simulating different conditional binary choice processes. To obtain certified values, this study followed the National Institute of Standards and Technology procedures when they generated certified values of parameter estimates and standard errors for the nonlinear logistic regression models used. The logarithm of the relative error was used as a measure of accuracy to examine the numerical reliability of these packages.

A Review of the Logistic Regression Model with Emphasis on Medical Research

Journal of Data Analysis and Information Processing

This study explored and reviewed the logistic regression (LR) model, a multivariable method for modeling the relationship between multiple independent variables and a categorical dependent variable, with emphasis on medical research. Thirty seven research articles published between 2000 and 2018 which employed logistic regression as the main statistical tool as well as six text books on logistic regression were reviewed. Logistic regression concepts such as odds, odds ratio, logit transformation, logistic curve, assumption, selecting dependent and independent variables, model fitting, reporting and interpreting were presented. Upon perusing the literature, considerable deficiencies were found in both the use and reporting of LR. For many studies, the ratio of the number of outcome events to predictor variables (events per variable) was sufficiently small to call into question the accuracy of the regression model. Also, most studies did not report on validation analysis, regression diagnostics or goodness-of-fit measures; measures which authenticate the robustness of the LR model. Here, we demonstrate a good example of the application of the LR model using data obtained on a cohort of pregnant women and the factors that influence their decision to opt for caesarean delivery or vaginal birth. It is recommended that researchers should be more rigorous and pay greater attention to guidelines concerning the use and reporting of LR models.