Population Structure and Relatedness Inference using the GENESIS Package (original) (raw)

Contents

Overview

GENESIS provides statistical methodology for analyzing genetic data from samples with population structure and/or familial relatedness. This vignette provides a description of how to use GENESIS for inferring population structure, as well as estimating relatedness measures such as kinship coefficients, identity by descent (IBD) sharing probabilities, and inbreeding coefficients. GENESIS uses PC-AiR for population structure inference that is robust to known or cryptic relatedness, and it uses PC-Relate for accurate relatedness estimation in the presence of population structure, admixutre, and departures from Hardy-Weinberg equilibrium.

Data

Reading in Genotype Data

The functions in the GENESIS package can read genotype data from a GenotypeData class object as created by the GWASTools package. Through the use of GWASTools, a GenotypeData class object can easily be created from:

Example R code for creating a GenotypeData object is presented below. Much more detail can be found in the GWASTools package reference manual.

GENESIS can also work with genotype data from sequencing, starting with a VCF file. For examples using this format, see the vignette “Analyzing Sequence Data using the GENESIS Package”.

R Matrix

geno <- MatrixGenotypeReader(genotype = genotype, snpID = snpID, 
                             chromosome = chromosome, position = position, 
                             scanID = scanID)
genoData <- GenotypeData(geno)

GDS files

geno <- GdsGenotypeReader(filename = "genotype.gds")
genoData <- GenotypeData(geno)

The SNPRelate package provides the snpgdsBED2GDS function to convert binary PLINK files into a GDS file.

snpgdsBED2GDS(bed.fn = "genotype.bed", 
              bim.fn = "genotype.bim", 
              fam.fn = "genotype.fam", 
              out.gdsfn = "genotype.gds")

Once the PLINK files have been converted to a GDS file, then a GenotypeData object can be created as described above.

HapMap Data

To demonstrate PC-AiR and PC-Relate analyses with the GENESIS package, we analyze SNP data from the Mexican Americans in Los Angeles, California (MXL) and African American individuals in the southwestern USA (ASW) population samples of HapMap 3. Mexican Americans and African Americans have a diverse ancestral background, and familial relatives are present in these data. Genotype data at a subset of 20K autosomal SNPs for 173 individuals are provided as a GDS file.

gdsfile <- system.file("extdata", "HapMap_ASW_MXL_geno.gds", package="GENESIS")

References

Appendix