SEQscoring: a tool to facilitate the interpretation of data generated with next generation sequencing technologies (original) (raw)
Related papers
SeqAnt: A web service to rapidly identify and annotate DNA sequence variations
BMC Bioinformatics, 2010
Background: The enormous throughput and low cost of second-generation sequencing platforms now allow research and clinical geneticists to routinely perform single experiments that identify tens of thousands to millions of variant sites. Existing methods to annotate variant sites using information from publicly available databases via web browsers are too slow to be useful for the large sequencing datasets being routinely generated by geneticists. Because sequence annotation of variant sites is required before functional characterization can proceed, the lack of a high-throughput pipeline to efficiently annotate variant sites can act as a significant bottleneck in genetics research. Results: SeqAnt (Sequence Annotator) is an open source web service and software package that rapidly annotates DNA sequence variants and identifies recessive or compound heterozygous loci in human, mouse, fly, and worm genome sequencing experiments. Variants are characterized with respect to their functional type, frequency, and evolutionary conservation. Annotated variants can be viewed on a web browser, downloaded in a tab-delimited text file, or directly uploaded in a BED format to the UCSC genome browser. To demonstrate the speed of SeqAnt, we annotated a series of publicly available datasets that ranged in size from 37 to 3,439,107 variant sites. The total time to completely annotate these data completely ranged from 0.17 seconds to 28 minutes 49.8 seconds.
SNiPlay3: a web-based application for exploration and large scale analyses of genomic variations
Nucleic acids research, 2015
SNiPlay is a web-based tool for detection, management and analysis of genetic variants including both single nucleotide polymorphisms (SNPs) and InDels. Version 3 now extends functionalities in order to easily manage and exploit SNPs derived from next generation sequencing technologies, such as GBS (genotyping by sequencing), WGRS (whole gre-sequencing) and RNA-Seq technologies. Based on the standard VCF (variant call format) format, the application offers an intuitive interface for filtering and comparing polymorphisms using user-defined sets of individuals and then establishing a reliable genotyping data matrix for further analyses. Namely, in addition to the various scaled-up analyses allowed by the application (genomic annotation of SNP, diversity analysis, haplotype reconstruction and network, linkage disequilibrium), SNiPlay3 proposes new modules for GWAS (genome-wide association studies), population stratification, distance tree analysis and visualization of SNP density. Addi...
2017
SummarySeqsLab is a platform that helps researchers to easily annotate and interpret genetic variants derived from a large quantity of personal genomes. It provides an integrated interface to annotate the variants based on curated databases as well as in silico estimation on the effects of the variants. SeqsLab adopts the scalable cluster computing framework, Spark, and incorporates several customized algorithms to speed up the process of variant annotation and interpretation. The key features of SeqsLab include efficient annotation on large structural variations, diverse combinations of variant filters, easy incorporation with a vast amount of public databases, and scalable architecture of analyzing hundreds of human whole genomes simultaneously.Availability and ImplementationSeqsLab is implemented with JAVA. The generated annotation will then be stored in Elasticsearch for real-time query and exploratory analysis. SeqsLab can be accessed by web browsers and is freely available at ...
Nucleic Acids Research, 2012
The massive use of Next-Generation Sequencing (NGS) technologies is uncovering an unexpected amount of variability. The functional characterization of such variability, particularly in the most common form of variation found, the Single Nucleotide Variants (SNVs), has become a priority that needs to be addressed in a systematic way. VARIANT (VARIant ANalyis Tool) reports information on the variants found that include consequence type and annotations taken from different databases and repositories (SNPs and variants from dbSNP and 1000 genomes, and disease-related variants from the Genome-Wide Association Study (GWAS) catalog, Online Mendelian Inheritance in Man (OMIM), Catalog of Somatic Mutations in Cancer (COSMIC) mutations, etc). VARIANT also produces a rich variety of annotations that include information on the regulatory (transcription factor or miRNAbinding sites, etc.) or structural roles, or on the selective pressures on the sites affected by the variation. This information allows extending the conventional reports beyond the coding regions and expands the knowledge on the contribution of non-coding or synonymous variants to the phenotype studied. Contrarily to other tools, VARIANT uses a remote database and operates through efficient RESTful Web Services that optimize search and transaction operations. In this way, local problems of installation, update or disk size limitations are overcome without the need of sacrifice speed (thousands of variants are processed per minute). VARIANT is available at: http://variant. bioinfo.cipf.es.
BMC Bioinformatics
Background: Bulked segregant analysis (BSA), coupled with next-generation sequencing, allows the rapid identification of both qualitative and quantitative trait loci (QTL), and this technique is referred to as BSA-Seq here. The current SNP index method and G-statistic method for BSA-Seq data analysis require relatively high sequencing coverage to detect significant single nucleotide polymorphism (SNP)-trait associations, which leads to high sequencing cost. Results: We developed a simple and effective algorithm for BSA-Seq data analysis and implemented it in Python; the program was named PyBSASeq. Using PyBSASeq, the significant SNPs (sSNPs), SNPs likely associated with the trait, were identified via Fisher's exact test, and then the ratio of the sSNPs to total SNPs in a chromosomal interval was used to detect the genomic regions that condition the trait of interest. The results obtained this way are similar to those generated via the current methods, but with more than five times higher sensitivity. This approach was termed the significant SNP method here. Conclusions: The significant SNP method allows the detection of SNP-trait associations at much lower sequencing coverage than the current methods, leading to~80% lower sequencing cost and making BSA-Seq more accessible to the research community and more applicable to the species with a large genome.
A framework for variation discovery and genotyping using next-generation DNA sequencing data
Nature Genetics, 2011
Recent advances in sequencing technology make it possible to comprehensively catalogue genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (1) initial read mapping; (2) local realignment around indels; (3) base quality score recalibration; (4) SNP discovery and genotyping to find all potential variants; and (5) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We discuss the application of these tools, instantiated in the Genome Analysis Toolkit (GATK), to deep whole-genome, whole-exome capture, and multi-sample low-pass 1000 Genomes Project datasets. of, implemented, and performed analytic approaches. M.A.D., E.B., R.E.P., K.V.G., G.d.A., A.M.K., M.J.D. wrote the manuscript. M.A.D., M.H., A.M. developed Picard and GATK infrastructure underlying the tools implemented here. M.A.D, S.B.G, D.A., M. J. D. lead the team.
Unraveling genomic variation from next generation sequencing data
BioData Mining, 2013
Elucidating the content of a DNA sequence is critical to deeper understand and decode the genetic information for any biological system. As next generation sequencing (NGS) techniques have become cheaper and more advanced in throughput over time, great innovations and breakthrough conclusions have been generated in various biological areas. Few of these areas, which get shaped by the new technological advances, involve evolution of species, microbial mapping, population genetics, genome-wide association studies (GWAs), comparative genomics, variant analysis, gene expression, gene regulation, epigenetics and personalized medicine. While NGS techniques stand as key players in modern biological research, the analysis and the interpretation of the vast amount of data that gets produced is a not an easy or a trivial task and still remains a great challenge in the field of bioinformatics. Therefore, efficient tools to cope with information overload, tackle the high complexity and provide meaningful visualizations to make the knowledge extraction easier are essential. In this article, we briefly refer to the sequencing methodologies and the available equipment to serve these analyses and we describe the data formats of the files which get produced by them. We conclude with a thorough review of tools developed to efficiently store, analyze and visualize such data with emphasis in structural variation analysis and comparative genomics. We finally comment on their functionality, strengths and weaknesses and we discuss how future applications could further develop in this field.
Bioinformatics, 2020
Summary With the advance of next-generation sequencing technologies and reductions in the costs of these techniques, bulked segregant analysis (BSA) has become not only a powerful tool for mapping quantitative trait loci but also a useful way to identify causal gene mutations underlying phenotypes of interest. However, due to the presence of background mutations and errors in sequencing, genotyping, and reference assembly, it is often difficult to distinguish true causal mutations from background mutations. In this study, we developed the BSAseq workflow, which includes an automated bioinformatics analysis pipeline with a probabilistic model for estimating the linked region (the region linked to the causal mutation) and an interactive Shiny web application for visualizing the results. We deeply sequenced a sorghum male-sterile parental line (ms8) to capture the majority of background mutations in our bulked F2 data. We applied the workflow to 11 bulked sorghum F2 populations and 1 r...
QTLseqr: An R Package for Bulk Segregant Analysis with Next‐Generation Sequencing
The Plant Genome, 2018
Next-Generation Sequencing Bulk Segregant Analysis (NGS-BSA) is efficient in detecting quantitative trait loci (QTL). Despite the popularity of NGS-BSA and the R statistical platform, no R packages are currently available for NGS-BSA. We present QTLseqr, an R package for NGS-BSA that identifies QTL using two statistical approaches: QTL-seq and G'. These approaches use a simulation method and a tricube smoothed G statistic, respectively, to identify and assess statistical significance of QTL. QTLseqr can import and filter SNP data, calculate SNP distributions, relative allele frequencies, G' values, and log 10 (p-values), enabling identification and plotting of QTL.