API Reference — cooler 0.10.3 documentation (original) (raw)

Quick reference

Cooler class

cooler.Cooler A convenient interface to a cooler data collection.
cooler.Cooler.binsize Resolution in base pairs if uniform else None
cooler.Cooler.chromnames List of reference sequence names
cooler.Cooler.chromsizes Ordered mapping of reference sequences to their lengths in bp
cooler.Cooler.bins(**kwargs) Bin table selector
cooler.Cooler.pixels([join]) Pixel table selector
cooler.Cooler.matrix([field, balance, ...]) Contact matrix selector
cooler.Cooler.open([mode]) Open the HDF5 group containing the Cooler with h5py
cooler.Cooler.info File information and metadata
cooler.Cooler.offset(region) Bin ID containing the left end of a genomic region
cooler.Cooler.extent(region) Bin IDs containing the left and right ends of a genomic region

Creation/reduction

cooler.create_cooler(cool_uri, bins, pixels) Create a cooler from bins and pixels at the specified URI.
cooler.merge_coolers(output_uri, input_uris, ...) Merge multiple coolers with identical axes.
cooler.coarsen_cooler(base_uri, output_uri, ...) Coarsen a cooler to a lower resolution by an integer factor k.
cooler.zoomify_cooler(base_uris, outfile, ...) Generate multiple cooler resolutions by recursive coarsening.
cooler.create_scool(cool_uri, bins, ...[, ...]) Create a single-cell (scool) file.

Manipulation

cooler.annotate(pixels, bins[, replace]) Add bin annotations to a data frame of pixels.
cooler.balance_cooler(clr, *[, cis_only, ...]) Iterative correction or matrix balancing of a sparse Hi-C contact map in Cooler HDF5 format.
cooler.rename_chroms(clr, rename_dict[, h5opts]) Substitute existing chromosome/contig names for new ones.

File operations

cooler.fileops.is_cooler(uri) Determine if a URI string references a cooler data collection.
cooler.fileops.is_multires_file(filepath[, ...]) Determine if a file is a multi-res cooler file.
cooler.fileops.list_coolers(filepath) List group paths to all cooler data collections in a file.
cooler.fileops.cp(src_uri, dst_uri[, overwrite]) Copy a group or dataset from one file to another or within the same file.
cooler.fileops.mv(src_uri, dst_uri[, overwrite]) Rename a group or dataset within the same file.
cooler.fileops.ln(src_uri, dst_uri[, soft, ...]) Create a hard link to a group or dataset in the same file.

cooler

class cooler.Cooler[source]

A convenient interface to a cooler data collection.

Parameters:

Notes

If store is a file path, the file will be opened temporarily in when performing operations. This allows Cooler objects to be serialized for multiprocess and distributed computations.

Metadata is accessible as a dictionary through the infoproperty.

Table selectors, created using chroms(), bins(), andpixels(), perform range queries over table rows, returning pd.DataFrame and pd.Series.

A matrix selector, created using matrix(), performs 2D matrix range queries, returning numpy.ndarray orscipy.sparse.coo_matrix.

__init__(store, root=None, **kwargs)[source]

Parameters:

bins(**kwargs)[source]

Bin table selector

Returns:

Table selector

Return type:

RangeSelector1D

property binsize_: int | None_

Resolution in base pairs if uniform else None

property chromnames_: list[str]_

List of reference sequence names

chroms(**kwargs)[source]

Chromosome table selector

Returns:

Table selector

Return type:

RangeSelector1D

property chromsizes_: Series_

Ordered mapping of reference sequences to their lengths in bp

extent(region)[source]

Bin IDs containing the left and right ends of a genomic region

Parameters:

region (str or tuple) – Genomic range

Returns:

2-tuple of ints

Return type:

tuple[int, int]

Examples

c.extent('chr3')
(1311, 2131)

property info_: dict_

File information and metadata

Returns:

dict

matrix(field=None, balance=True, sparse=False, as_pixels=False, join=False, ignore_index=True, divisive_weights=None, chunksize=10000000)[source]

Contact matrix selector

Parameters:

Returns:

Matrix selector

Return type:

RangeSelector2D

Notes

If as_pixels=True, only data explicitly stored in the pixel table will be returned: if the cooler’s storage mode is symmetric-upper, lower triangular elements will not be generated. Ifas_pixels=False, those missing non-zero elements will automatically be filled in.

offset(region)[source]

Bin ID containing the left end of a genomic region

Parameters:

region (str or tuple) – Genomic range

Returns:

int

Return type:

int

Examples

c.offset('chr3')
1311

open(mode='r', **kwargs)[source]

Open the HDF5 group containing the Cooler with h5py

Functions as a context manager. Any open_kws passed during construction are ignored.

Parameters:

mode (str , optional [ default: 'r' ]) –

Return type:

Group

Notes

For other parameters, see h5py.File.

pixels(join=False, **kwargs)[source]

Pixel table selector

Parameters:

join (bool , optional) – Whether to expand bin ID columns into chrom, start, and end columns. Default is False.

Returns:

Table selector

Return type:

RangeSelector1D

property storage_mode_: str_

Indicates whether ordinary sparse matrix encoding is used ("square") or whether a symmetric matrix is encoded by storing only the upper triangular elements ("symmetric-upper").

cooler.annotate(pixels, bins, replace=False)[source]

Add bin annotations to a data frame of pixels.

This is done by performing a relational “join” against the bin IDs of a table that describes properties of the genomic bins. New columns will be appended on the left of the output data frame.

Changed in version 0.8.0: The default value of replace changed to False.

Parameters:

Returns:

DataFrame

Return type:

DataFrame

cooler.create_cooler(cool_uri, bins, pixels, columns=None, dtypes=None, metadata=None, assembly=None, ordered=False, symmetric_upper=True, mode='w', mergebuf=20000000, delete_temp=True, temp_dir=None, max_merge=200, boundscheck=True, dupcheck=True, triucheck=True, ensure_sorted=False, h5opts=None, lock=None)[source]

Create a cooler from bins and pixels at the specified URI.

Because the number of pixels is often very large, the input pixels are normally provided as an iterable (e.g., an iterator or generator) of DataFrame chunks that fit in memory.

Added in version 0.8.0.

Parameters:

Return type:

None

Notes

If the pixel chunks are provided in the correct order required for the output to be properly sorted, then the cooler can be created in a single step by setting ordered=True.

If not, the cooler is created in two steps via an external sort mechanism. In the first pass, the sequence of pixel chunks are processed and sorted in memory and saved to temporary coolers. In the second pass, the temporary coolers are merged into the output file. This way the individual chunks do not need to be provided in any particular order. When ordered=False, the following options for the merge step are available: mergebuf,delete_temp, temp_dir, max_merge.

Each chunk of pixels will go through a validation pipeline, which can be customized with the following options: boundscheck, triucheck,dupcheck, ensure_sorted.

cooler.merge_coolers(output_uri, input_uris, mergebuf, columns=None, dtypes=None, agg=None, **kwargs)[source]

Merge multiple coolers with identical axes.

The merged cooler is stored at output_uri.

Added in version 0.8.0.

Parameters:

Return type:

None

Notes

The default output file mode is ‘w’. If appending output to an existing file, pass mode=’a’.

cooler.coarsen_cooler(base_uri, output_uri, factor, chunksize, nproc=1, columns=None, dtypes=None, agg=None, **kwargs)[source]

Coarsen a cooler to a lower resolution by an integer factor k.

This is done by pooling _k_-by-k neighborhoods of pixels and aggregating. Each chromosomal block is coarsened individually. Result is a coarsened cooler stored at output_uri.

Added in version 0.8.0.

Parameters:

Return type:

None

cooler.zoomify_cooler(base_uris, outfile, resolutions, chunksize, nproc=1, columns=None, dtypes=None, agg=None, **kwargs)[source]

Generate multiple cooler resolutions by recursive coarsening.

Result is a “zoomified” or “multires” cool file stored at outfileusing the MCOOL v2 layout, where coolers are stored under a hierarchy of the form resolutions/<r> for each resolution r.

Added in version 0.8.0.

Parameters:

Return type:

None

cooler.balance_cooler(clr, *, cis_only=False, trans_only=False, ignore_diags=2, mad_max=5, min_nnz=10, min_count=0, blacklist=None, rescale_marginals=True, x0=None, tol=1e-05, max_iters=200, chunksize=10000000, map=<class 'map'>, use_lock=False, store=False, store_name='weight')[source]

Iterative correction or matrix balancing of a sparse Hi-C contact map in Cooler HDF5 format.

Parameters:

Returns:

Return type:

tuple[ndarray, dict]

cooler.rename_chroms(clr, rename_dict, h5opts=None)[source]

Substitute existing chromosome/contig names for new ones. They will be written to the file and the Cooler object will be refreshed.

Parameters:

Return type:

None

cooler.create_scool(cool_uri, bins, cell_name_pixels_dict, columns=None, dtypes=None, metadata=None, assembly=None, ordered=False, symmetric_upper=True, mode='w', mergebuf=20000000, delete_temp=True, temp_dir=None, max_merge=200, boundscheck=True, dupcheck=True, triucheck=True, ensure_sorted=False, h5opts=None, lock=None, **kwargs)[source]

Create a single-cell (scool) file.

For each cell store a cooler matrix under /cells, where all matrices have the same dimensions.

Each cell is a regular cooler data collection, so the input must be a bin table and pixel table for each cell. The pixel tables are provided as a dictionary where the key is a unique cell name. The bin tables can be provided as a dict with the same keys or a single common bin table can be given.

Added in version 0.8.9.

Parameters:

Return type:

None

Notes

If the pixel chunks are provided in the correct order required for the output to be properly sorted, then the cooler can be created in a single step by setting ordered=True.

If not, the cooler is created in two steps via an external sort mechanism. In the first pass, the sequence of pixel chunks are processed and sorted in memory and saved to temporary coolers. In the second pass, the temporary coolers are merged into the output file. This way the individual chunks do not need to be provided in any particular order. When ordered=False, the following options for the merge step are available: mergebuf,delete_temp, temp_dir, max_merge.

Each chunk of pixels will go through a validation pipeline, which can be customized with the following options: boundscheck, triucheck,dupcheck, ensure_sorted.


cooler.create

cooler.create.sanitize_pixels(bins, **kwargs)[source]

Builds a function to sanitize an already-binned genomic data with genomic bin assignments.

Parameters:

Returns:

callable – Function of one argument that takes a raw dataframe and returns a sanitized dataframe.

Return type:

_Callable_[[_DataFrame_], _DataFrame_]

cooler.create.sanitize_records(bins, schema=None, **kwargs)[source]

Builds a funtion to sanitize and assign bin IDs to a data frame of paired genomic positions based on a provided genomic bin segmentation.

Parameters:

Returns:

callable – Function of one argument that takes a raw dataframe and returns a sanitized dataframe with bin IDs assigned.

Return type:

_Callable_[[_DataFrame_], _DataFrame_]

cooler.fileops

cooler.fileops.is_cooler(uri)[source]

Determine if a URI string references a cooler data collection. Returns False if the file or group path doesn’t exist.

Parameters:

uri (str)

Return type:

bool

cooler.fileops.is_multires_file(filepath, min_version=1)[source]

Determine if a file is a multi-res cooler file. Returns False if the file doesn’t exist.

Parameters:

Return type:

bool

cooler.fileops.list_coolers(filepath)[source]

List group paths to all cooler data collections in a file.

Parameters:

filepath (str)

Returns:

list – Cooler group paths in the file.

Return type:

list[str]

cooler.fileops.cp(src_uri, dst_uri, overwrite=False)[source]

Copy a group or dataset from one file to another or within the same file.

Parameters:

Return type:

None

cooler.fileops.mv(src_uri, dst_uri, overwrite=False)[source]

Rename a group or dataset within the same file.

Parameters:

Return type:

None

cooler.fileops.ln(src_uri, dst_uri, soft=False, overwrite=False)[source]

Create a hard link to a group or dataset in the same file. Also supports soft links (in the same file) or external links (different files).

Parameters:

Return type:

None

cooler.util

cooler.util.partition(start, stop, step)[source]

Partition an integer interval into equally-sized subintervals. Like builtin range(), but yields pairs of end points.

Examples

for lo, hi in partition(0, 9, 2): print(lo, hi) 0 2 2 4 4 6 6 8 8 9

Parameters:

Return type:

_Iterator_[tuple[int, int]]

cooler.util.fetch_chromsizes(db, **kwargs)[source]

Download chromosome sizes from UCSC as a pandas.Series, indexed by chromosome label.

Parameters:

db (str)

Return type:

Series

cooler.util.read_chromsizes(filepath_or, name_patterns=('^chr[0-9]+$', '^chr[XY]$', '^chrM$'), all_names=False, **kwargs)[source]

Parse a <db>.chrom.sizes or <db>.chromInfo.txt file from the UCSC database, where db is a genome assembly name.

Parameters:

Returns:

pandas.Series – Series of integer bp lengths indexed by sequence name.

Return type:

Series

References

cooler.util.binnify(chromsizes, binsize)[source]

Divide a genome into evenly sized bins.

Parameters:

Returns:

bins (pandas.DataFrame) – Dataframe with columns: chrom, start, end.

Return type:

DataFrame

cooler.util.digest(fasta_records, enzyme)[source]

Divide a genome into restriction fragments.

Parameters:

Returns:

frags (pandas.DataFrame) – Dataframe with columns: chrom, start, end.

Return type:

DataFrame