GitHub - konstantint/PyIntervalTree: A mutable, self-balancing interval tree. Queries may be by point, by range overlap, or by range containment. (original) (raw)

PyIntervalTree

NB: This package is deprecated. Please, use the intervaltree package instead (available via Github or PyPI).

The genome-related functionality is extracted to the intervaltree-bio package (Github, PyPI).

No future versions of this package are planned. Do not file issues.

A mutable, self-balancing interval tree. Queries may be by point, by range overlap, or by range envelopment.

This library was designed to allow tagging text and time intervals, where the intervals include the lower bound but not the upper bound.

Installation

The easiest way to install most Python packages is via easy_install or pip:

$ pip install PyIntervalTree

Features

Examples

t = IntervalTree(ivs)

Usage with Genomic Data

Interval trees are especially commonly used in bioinformatics, where intervals correspond to genes or various features along the genome. Such intervals are commonly stored in BED-format files. To simplify working with such data, the package intervaltree.bio provides a GenomeIntervalTree class.

GenomeIntervalTree is essentially a dict of IntervalTree-s, indexed by chromosome names:

gtree = GenomeIntervalTree() gtree['chr1'].addi(10000, 20000)

There is a convenience function for adding intervals:

gtree.addi('chr2', 20000, 30000)

You can create a GenomeIntervalTree instance from a BED file:

test_url = 'http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeAwgTfbsUniform/wgEncodeAwgTfbsBroadDnd41Ezh239875UniPk.narrowPeak.gz' data = zlib.decompress(urlopen(test_url).read(), 16+zlib.MAX_WBITS) gtree = GenomeIntervalTree.from_bed(StringIO(data))

In addition, special functions are offered to read in UCSC tables of gene positions:

You may add methods for parsing your own tabular files with genomic intervals, see the documentation for GenomeIntervalTree.from_table.

Based on