pybedtools documentation — pybedtools 0.10.0 documentation (original) (raw)

Overview

https://travis-ci.org/daler/pybedtools.png?branch=master https://badge.fury.io/py/pybedtools.svg?style=flat https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg

The BEDTools suite of programs is widely used for genomic interval manipulation or “genome algebra”. pybedtools wraps and extends BEDTools and offers feature-level manipulations from within Python.

See full online documentation, including installation instructions, athttp://daler.github.io/pybedtools/.

Why pybedtools?

Here is an example to get the names of genes that are <5 kb away from intergenic SNPs:

from pybedtools import BedTool

snps = BedTool('snps.bed.gz') # [1] genes = BedTool('hg19.gff') # [1]

intergenic_snps = snps.subtract(genes) # [2] nearby = genes.closest(intergenic_snps, d=True, stream=True) # [2, 3]

for gene in nearby: # [4] if int(gene[-1]) < 5000: # [4] print gene.name # [4]

Useful features shown here include:

In contrast, here is the same analysis using shell scripting. Note that this requires knowledge in Perl, bash, and awk. The run time is identical to thepybedtools version above:

snps=snps.bed.gz genes=hg19.gff intergenic_snps=/tmp/intergenic_snps

snp_fields=zcat $snps | awk '(NR == 2){print NF; exit;}' gene_fields=9 distance_field=$(($gene_fields + $snp_fields + 1))

intersectBed -a snps−bsnps -b snpsbgenes -v > $intergenic_snps

closestBed -a genes−bgenes -b genesbintergenic_snps -d
| awk '($'$distance_field' < 5000){print $9;}'
| perl -ne 'm/[ID|Name|gene_id]=(.*?);/; print "$1\n"'

rm $intergenic_snps

See the Shell script comparison in the docs for more details on this comparison, or keep reading the full documentation athttp://daler.github.io/pybedtools.

As of 2022, pybedtools is released under the MIT license; see LICENSE.txt for more info.

Note

If you use pybedtools in your work, please cite the pybedtools manuscriptand the BEDTools manuscript:

Dale RK, Pedersen BS, and Quinlan AR. 2011. Pybedtools: a flexible Python library for manipulating genomic datasets and annotations. Bioinformatics 27(24):3423-3424.

Quinlan AR and Hall IM, 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26(6):841–842.

Getting started

The documentation is separated into 4 main parts, depending on the depth you’d like to cover:

Contents:

Indices and tables