Detecting disease-causing genes by LASSO-Patternsearch algorithm - PubMed (original) (raw)

Detecting disease-causing genes by LASSO-Patternsearch algorithm

Weiliang Shi et al. BMC Proc. 2007.

Abstract

The Genetic Analysis Workshop 15 Problem 3 simulated rheumatoid arthritis data set provided 100 replicates of simulated single-nucleotide polymorphism (SNP) and covariate data sets for 1500 families with an affected sib pair and 2000 controls, modeled after real rheumatoid arthritis data. The data generation model included nine unobserved trait loci, most of which have one or more of the generated SNPs associated with them. These data sets provide an ideal experimental test bed for evaluating new and old algorithms for selecting SNPs and covariates that can separate cases from controls, because the cases and controls are known as well as the identities of the trait loci. LASSO-Patternsearch is a new multi-step algorithm with a LASSO-type penalized likelihood method at its core specifically designed to detect and model interactions between important predictor variables. In this article the original LASSO-Patternsearch algorithm is modified to handle the large number of SNPs plus covariates. We start with a screen step within the framework of parametric logistic regression. The patterns that survived the screen step were further selected by a penalized logistic regression with the LASSO penalty. And finally, a parametric logistic regression model were built on the patterns that survived the LASSO step. In our analysis of Genetic Analysis Workshop 15 Problem 3 data we have identified most of the associated SNPs and relevant covariates. Upon using the model as a classifier, very competitive error rates were obtained.

PubMed Disclaimer

Similar articles

Cited by

References

    1. Breiman L, Friedman J, Olshen R, Stone C. Classification and Regression Trees. New York: Chapman & Hall; 1984.
    1. Ruczinski I, Kooperberg C, Leblanc M. Logic regression. J Comput Graph Stat. 2003;12:475–511. doi: 10.1198/1061860032238. - DOI
    1. Breiman L. Random forests. Mach Learn. 2001;45:5–32. doi: 10.1023/A:1010933404324. - DOI
    1. Park M, Hastie T. Penalized Logistic Regression for Detecting Gene Interactions Tech Rep 00-25. Palo Alto: Department of Statistics, Stanford University; 2006. - PubMed
    1. Tibshirani R. Regression shrinkage and selection via the lasso. J Roy Stat Soc B. 1996;58:267–288.

LinkOut - more resources