PyPop: A Software Framework for Population Genomics: Analyzing Large-Scale Multi-Locus Genotype Data (original) (raw)
Related papers
GEVALT: an integrated software tool for genotype analysis
BMC bioinformatics, 2007
Genotype information generated by individual and international efforts carries the promise of revolutionizing disease studies and the association of phenotypes with alleles and haplotypes. Given the enormous amounts of public genotype data, tools for analyzing, interpreting and visualizing these data sets are of critical importance to researchers. In past works we have developed algorithms for genotypes phasing and tag SNP selection, which were shown to be quick and accurate. Both algorithms were available until now only as batch executables. Here we present GEVALT (GEnotype Visualization and ALgorithmic Tool), a software package designed to simplify and expedite the process of genotype analysis, by providing a common interface to several tasks relating to such analysis. GEVALT combines the strong visual abilities of Haploview with our quick and powerful algorithms for genotypes phasing (GERBIL), tag SNP selection (STAMPA) and permutation testing for evaluating significance of assoc...
SNP_tools: A compact tool package for analysis and conversion of genotype data for MS-Excel
BMC Research Notes, 2009
Background: Single nucleotide polymorphism (SNP) genotyping is a major activity in biomedical research. Scientists prefer to have a facile access to the results which may require conversions between data formats. First hand SNP data is often entered in or saved in the MS-Excel format, but this software lacks genetic and epidemiological related functions. A general tool to do basic genetic and epidemiological analysis and data conversion for MS-Excel is needed.
iHAP--integrated haplotype analysis pipeline for characterizing the haplotype structure of genes
BMC bioinformatics, 2006
The advent of genotype data from large-scale efforts that catalog the genetic variants of different populations have given rise to new avenues for multifactorial disease association studies. Recent work shows that genotype data from the International HapMap Project have a high degree of transferability to the wider population. This implies that the design of genotyping studies on local populations may be facilitated through inferences drawn from information contained in HapMap populations. To facilitate analysis of HapMap data for characterizing the haplotype structure of genes or any chromosomal regions, we have developed an integrated web-based resource, iHAP. In addition to incorporating genotype and haplotype data from the International HapMap Project and gene information from the UCSC Genome Browser Database, iHAP also provides capabilities for inferring haplotype blocks and selecting tag SNPs that are representative of haplotype patterns. These include block partitioning algor...
snp.plotter: an R-based SNP/haplotype association and linkage disequilibrium plotting package
Bioinformatics, 2007
snp.plotter is a newly developed R package which produces high-quality plots of results from genetic association studies. The main features of the package include options to display a linkage disequilibrium (LD) plot below the P-value plot using either the r 2 or D 0 LD metric, to set the X-axis to equal spacing or to use the physical map of markers, and to specify plot labels, colors, symbols and LD heatmap color scheme. snp.plotter can plot single SNP and/ or haplotype data and simultaneously plot multiple sets of results. R is a free software environment for statistical computing and graphics available for most platforms. The proposed package provides a simple way to convey both association and LD information in a single appealing graphic for genetic association studies. Availability: Downloadable R package and example datasets are available at http://cbdb.nimh.nih.gov/\~kristin/snp.plotter.html and http://www.r-project.org Contact: nicodemusk@mail.nih.gov
Immunogenetics, 2006
There is presently much interest in utilizing patterns of linkage disequilibrium (LD) to further genetic association studies. This is particularly pertinent in the class III region of the human major histocompatibility complex (MHC), which has been extensively studied as a disease susceptibility locus in a number of ethnic groups. To date, however, few studies of LD in the MHC have considered non-Caucasian populations. With the advent of large-scale haplotyping of the human genome, the question of utilizing LD patterns across populations has come to the fore. We have previously used LD mapping to direct an MHC class III association study in a UK Caucasian population. As an extension of this, we sought to determine to what extent the pattern of LD observed in that study could be used to conduct a similar study in a West African Gambian population. We found that broad patterns of LD were similar in the two populations, resulting in similar candidate region delineations, but at a higher resolution, marker-specific patterns of LD and population-dependent allele frequencies confounded the choice of regional tagging SNPs. Our results have implications for the applicability of large-scale haplotype maps such as the HapMap to complex regions like the MHC.
Genome …, 2010
We present an extensible software model for the genotype and phenotype community, XGAP. Readers can download a standard XGAP (http://www.xgap.org) or auto-generate a custom version using MOLGENIS with programming interfaces to R-software and web-services or user interfaces for biologists. XGAP has simple load formats for any type of genotype, epigenotype, transcript, protein, metabolite or other phenotype data. Current functionality includes tools ranging from eQTL analysis in mouse to genome-wide association studies in humans.
SimHap GUI: An intuitive graphical user interface for genetic association analysis
BMC Bioinformatics, 2008
Background: Researchers wishing to conduct genetic association analysis involving single nucleotide polymorphisms (SNPs) or haplotypes are often confronted with the lack of user-friendly graphical analysis tools, requiring sophisticated statistical and informatics expertise to perform relatively straightforward tasks. Tools, such as the SimHap package for the R statistics language, provide the necessary statistical operations to conduct sophisticated genetic analysis, but lacks a graphical user interface that allows anyone but a professional statistician to effectively utilise the tool.
Variation analysis and gene annotation of eight MHC haplotypes: The MHC Haplotype Project
Immunogenetics, 2008
The human major histocompatibility complex (MHC) is contained within about 4 Mb on the short arm of chromosome 6 and is recognised as the most variable region in the human genome. The primary aim of the MHC Haplotype Project was to provide a comprehensively annotated reference sequence of a single, human leukocyte antigen-homozygous MHC haplotype and to use it as a basis against which variations could be assessed from seven other similarly homozygous cell lines, representative of the most common MHC haplotypes in the European population. Comparison of the haplotype sequences, including four haplotypes not previously analysed, resulted in the identification of >44,000 variations, both substitutions and indels (insertions and deletions), which have been submit-ted to the dbSNP database. The gene annotation uncovered haplotype-specific differences and confirmed the presence of more than 300 loci, including over 160 protein-coding genes. Combined analysis of the variation and annotation datasets revealed 122 gene loci with coding substitutions of which 97 were non-synonymous. The haplotype (A3-B7-DR15; PGF cell line) designated as the new MHC reference sequence, has been incorporated into the human genome assembly (NCBI35 and subsequent builds), and constitutes the largest single-haplotype sequence of the human genome to date. The extensive variation and annotation data derived from the analysis of seven further haplotypes have been made publicly available and provide a framework and resource for future association studies of all MHC-associated diseases and transplant medicine.
An analysis pipeline for genome-wide association studies
Cancer informatics, 2008
We developed an efficient pipeline to analyze genome-wide association study single nucleotide polymorphism scan results. Purl scripts were used to convert genotypes called using the BRLMM algorithm into a modified PB format. We computed summary statistics characteristic of our case and control populations including allele counts, missing values, heterozygosity, measures of compliance with Hardy-Weinberg equilibrium, and several population difference statistics. In addition, we computed association tests, including exact tests of association for genotypes, alleles, the Cochran-Armitage linear trend test, and dominant, recessive, and over dominant models at every single nucleotide polymorphism (SNP). In addition, pairwise linkage disequilibrium statistics were elaborated, using the command line version of HaploView, which was possible by writing a reformatting script. Additional Perl scripts permit loading the results into a MySQL database conjoined with a Generic Genome Browser (gbro...