API Reference — cooler 0.10.3 documentation (original) (raw)

Quick reference

Cooler class

cooler.Cooler	A convenient interface to a cooler data collection.
cooler.Cooler.binsize	Resolution in base pairs if uniform else None
cooler.Cooler.chromnames	List of reference sequence names
cooler.Cooler.chromsizes	Ordered mapping of reference sequences to their lengths in bp
cooler.Cooler.bins(**kwargs)	Bin table selector
cooler.Cooler.pixels([join])	Pixel table selector
cooler.Cooler.matrix([field, balance, ...])	Contact matrix selector
cooler.Cooler.open([mode])	Open the HDF5 group containing the Cooler with h5py
cooler.Cooler.info	File information and metadata
cooler.Cooler.offset(region)	Bin ID containing the left end of a genomic region
cooler.Cooler.extent(region)	Bin IDs containing the left and right ends of a genomic region

Creation/reduction

cooler.create_cooler(cool_uri, bins, pixels)	Create a cooler from bins and pixels at the specified URI.
cooler.merge_coolers(output_uri, input_uris, ...)	Merge multiple coolers with identical axes.
cooler.coarsen_cooler(base_uri, output_uri, ...)	Coarsen a cooler to a lower resolution by an integer factor k.
cooler.zoomify_cooler(base_uris, outfile, ...)	Generate multiple cooler resolutions by recursive coarsening.
cooler.create_scool(cool_uri, bins, ...[, ...])	Create a single-cell (scool) file.

Manipulation

cooler.annotate(pixels, bins[, replace])	Add bin annotations to a data frame of pixels.
cooler.balance_cooler(clr, *[, cis_only, ...])	Iterative correction or matrix balancing of a sparse Hi-C contact map in Cooler HDF5 format.
cooler.rename_chroms(clr, rename_dict[, h5opts])	Substitute existing chromosome/contig names for new ones.

File operations

cooler.fileops.is_cooler(uri)	Determine if a URI string references a cooler data collection.
cooler.fileops.is_multires_file(filepath[, ...])	Determine if a file is a multi-res cooler file.
cooler.fileops.list_coolers(filepath)	List group paths to all cooler data collections in a file.
cooler.fileops.cp(src_uri, dst_uri[, overwrite])	Copy a group or dataset from one file to another or within the same file.
cooler.fileops.mv(src_uri, dst_uri[, overwrite])	Rename a group or dataset within the same file.
cooler.fileops.ln(src_uri, dst_uri[, soft, ...])	Create a hard link to a group or dataset in the same file.

cooler

class cooler.Cooler[source]

A convenient interface to a cooler data collection.

Parameters:

store (str, h5py.File or h5py.Group) – Path to a cooler file, URI string, or open handle to the root HDF5 group of a cooler data collection.
root (str , optional [ deprecated ]) – HDF5 Group path to root of cooler group if store is a file. This option is deprecated. Instead, use a URI string of the form<file_path>::<group_path>.
kwargs (optional) – Options to be passed to h5py.File() upon every access. By default, the file is opened with the default driver and mode=’r’.

Notes

If store is a file path, the file will be opened temporarily in when performing operations. This allows Cooler objects to be serialized for multiprocess and distributed computations.

Metadata is accessible as a dictionary through the infoproperty.

Table selectors, created using chroms(), bins(), andpixels(), perform range queries over table rows, returning pd.DataFrame and pd.Series.

A matrix selector, created using matrix(), performs 2D matrix range queries, returning numpy.ndarray orscipy.sparse.coo_matrix.

__init__(store, root=None, **kwargs)[source]

Parameters:

store (str | Group)
root (str | None)

bins(**kwargs)[source]

Bin table selector

Returns:

Table selector

Return type:

RangeSelector1D

property binsize_: int | None_

Resolution in base pairs if uniform else None

property chromnames_: list[str]_

List of reference sequence names

chroms(**kwargs)[source]

Chromosome table selector

Returns:

Table selector

Return type:

RangeSelector1D

property chromsizes_: Series_

Ordered mapping of reference sequences to their lengths in bp

extent(region)[source]

Bin IDs containing the left and right ends of a genomic region

Parameters:

region (str or tuple) – Genomic range

Returns:

2-tuple of ints

Return type:

tuple[int, int]

Examples

c.extent('chr3')
(1311, 2131)

property info_: dict_

File information and metadata

Returns:

dict

matrix(field=None, balance=True, sparse=False, as_pixels=False, join=False, ignore_index=True, divisive_weights=None, chunksize=10000000)[source]

Contact matrix selector

Parameters:

field (str , optional) – Which column of the pixel table to fill the matrix with. By default, the ‘count’ column is used.
balance (bool , optional) – Whether to apply pre-calculated matrix balancing weights to the selection. Default is True and uses a column named ‘weight’. Alternatively, pass the name of the bin table column containing the desired balancing weights. Set to False to return untransformed counts.
sparse (bool , optional) – Return a scipy.sparse.coo_matrix instead of a dense 2D numpy array.
as_pixels (bool , optional) – Return a DataFrame of the corresponding rows from the pixel table instead of a rectangular sparse matrix. False by default.
join (bool , optional) – If requesting pixels, specifies whether to expand the bin ID columns into (chrom, start, end). Has no effect when requesting a rectangular matrix. Default is True.
ignore_index (bool , optional) – If requesting pixels, don’t populate the index column with the pixel IDs to improve performance. Default is True.
divisive_weights (bool , optional) – Force balancing weights to be interpreted as divisive (True) or multiplicative (False). Weights are always assumed to be multiplicative by default unless named KR, VC or SQRT_VC, in which case they are assumed to be divisive by default.
chunksize (int)

Returns:

Matrix selector

Return type:

RangeSelector2D

Notes

If as_pixels=True, only data explicitly stored in the pixel table will be returned: if the cooler’s storage mode is symmetric-upper, lower triangular elements will not be generated. Ifas_pixels=False, those missing non-zero elements will automatically be filled in.

offset(region)[source]

Bin ID containing the left end of a genomic region

Parameters:

region (str or tuple) – Genomic range

Returns:

int

Return type:

int

Examples

c.offset('chr3')
1311

open(mode='r', **kwargs)[source]

Open the HDF5 group containing the Cooler with h5py

Functions as a context manager. Any open_kws passed during construction are ignored.

Parameters:

mode (str , optional [ default: 'r' ]) –

'r' (readonly)
'r+' or 'a' (read/write)

Return type:

Group

Notes

For other parameters, see h5py.File.

pixels(join=False, **kwargs)[source]

Pixel table selector

Parameters:

join (bool , optional) – Whether to expand bin ID columns into chrom, start, and end columns. Default is False.

Returns:

Table selector

Return type:

RangeSelector1D

property storage_mode_: str_

Indicates whether ordinary sparse matrix encoding is used ("square") or whether a symmetric matrix is encoded by storing only the upper triangular elements ("symmetric-upper").

cooler.annotate(pixels, bins, replace=False)[source]

Add bin annotations to a data frame of pixels.

This is done by performing a relational “join” against the bin IDs of a table that describes properties of the genomic bins. New columns will be appended on the left of the output data frame.

Changed in version 0.8.0: The default value of replace changed to False.

Parameters:

pixels (DataFrame) – A data frame containing columns named bin1_id and/or bin2_id. If columns bin1_id and bin2_id are both present in pixels, the adjoined columns will be suffixed with ‘1’ and ‘2’ accordingly.
bins (DataFrame or DataFrame selector) – Data structure that contains a full description of the genomic bins of the contact matrix, where the index corresponds to bin IDs.
replace (bool , optional) – Remove the original bin1_id and bin2_id columns from the output. Default is False.

Returns:

DataFrame

Return type:

DataFrame

cooler.create_cooler(cool_uri, bins, pixels, columns=None, dtypes=None, metadata=None, assembly=None, ordered=False, symmetric_upper=True, mode='w', mergebuf=20000000, delete_temp=True, temp_dir=None, max_merge=200, boundscheck=True, dupcheck=True, triucheck=True, ensure_sorted=False, h5opts=None, lock=None)[source]

Create a cooler from bins and pixels at the specified URI.

Because the number of pixels is often very large, the input pixels are normally provided as an iterable (e.g., an iterator or generator) of DataFrame chunks that fit in memory.

Added in version 0.8.0.

Parameters:

cool_uri (str) – Path to cooler file or URI string. If the file does not exist, it will be created.
bins (pandas.DataFrame) – Segmentation of the chromosomes into genomic bins as a BED-like DataFrame with columns chrom, start and end. May contain additional columns.
pixels (DataFrame , dictionary , or iterable of either) – A table, given as a dataframe or a column-oriented dict, containing columns labeled bin1_id, bin2_id and count, sorted by (bin1_id, bin2_id). If additional columns are included in the pixel table, their names and dtypes must be specified using thecolumns and dtypes arguments. For larger input data, aniterable can be provided that yields the pixel data as a sequence of chunks. If the input is a dask DataFrame, it will also be processed one chunk at a time.
columns (sequence of str , optional) – Customize which value columns from the input pixels to store in the cooler. Non-standard value columns will be given dtype float64unless overriden using the dtypes argument. If None, we only attempt to store a value column named "count".
dtypes (dict , optional) – Dictionary mapping column names to dtypes. Can be used to override the default dtypes of bin1_id, bin2_id or count or assign dtypes to custom value columns. Non-standard value columns given indtypes must also be provided in the columns argument or they will be ignored.
metadata (dict , optional) – Experiment metadata to store in the file. Must be JSON compatible.
assembly (str , optional) – Name of genome assembly.
ordered (bool , optional [ default: False ]) – If the input chunks of pixels are provided with correct triangularity and in ascending order of (bin1_id, bin2_id), set this toTrue to write the cooler in one step. If False (default), we create the cooler in two steps using an external sort mechanism. See Notes for more details.
symmetric_upper (bool , optional [ default: True ]) – If True, sets the file’s storage-mode property to symmetric-upper: use this only if the input data references the upper triangle of a symmetric matrix! For all other cases, set this option to False.
mode ({'w' , 'a'} , optional [ default: 'w' ]) – Write mode for the output file. ‘a’: if the output file exists, append the new cooler to it. ‘w’: if the output file exists, it will be truncated. Default is ‘w’.
mergebuf (int , optional) – Maximum number of records to buffer in memory at any give time during the merge step.
delete_temp (bool , optional) – Whether to delete temporary files when finished. Useful for debugging. Default is False.
temp_dir (str , optional) – Create temporary files in a specified directory instead of the same directory as the output file. Pass - to use the system default.
max_merge (int , optional) – If merging more than max_merge chunks, do the merge recursively in two passes.
h5opts (dict , optional) – HDF5 dataset filter options to use (compression, shuffling, checksumming, etc.). Default is to use autochunking and GZIP compression, level 6.
lock (multiprocessing.Lock , optional) – Optional lock to control concurrent access to the output file.
ensure_sorted (bool , optional) – Ensure that each input chunk is properly sorted.
boundscheck (bool , optional) – Input validation: Check that all bin IDs lie in the expected range.
dupcheck (bool , optional) – Input validation: Check that no duplicate pixels exist within any chunk.
triucheck (bool , optional) – Input validation: Check that bin1_id <= bin2_id when creating coolers in symmetric-upper mode.

Return type:

None

Notes

If the pixel chunks are provided in the correct order required for the output to be properly sorted, then the cooler can be created in a single step by setting ordered=True.

If not, the cooler is created in two steps via an external sort mechanism. In the first pass, the sequence of pixel chunks are processed and sorted in memory and saved to temporary coolers. In the second pass, the temporary coolers are merged into the output file. This way the individual chunks do not need to be provided in any particular order. When ordered=False, the following options for the merge step are available: mergebuf,delete_temp, temp_dir, max_merge.

Each chunk of pixels will go through a validation pipeline, which can be customized with the following options: boundscheck, triucheck,dupcheck, ensure_sorted.

cooler.merge_coolers(output_uri, input_uris, mergebuf, columns=None, dtypes=None, agg=None, **kwargs)[source]

Merge multiple coolers with identical axes.

The merged cooler is stored at output_uri.

Added in version 0.8.0.

Parameters:

output_uri (str) – Output cooler file path or URI.
input_uris (list of str) – List of input file path or URIs of coolers to combine.
mergebuf (int) – Maximum number of pixels processed at a time.
columns (list of str , optional) – Specify which pixel value columns to include in the aggregation. Default is to use all available value columns.
dtypes (dict , optional) – Specific dtypes to use for value columns. Default is to propagate the current dtypes of the value columns.
agg (dict , optional) – Functions to use for aggregating each value column. Pass the same kind of dict accepted by pandas.DataFrame.groupby.agg. Default is to apply ‘sum’ to every value column.
kwargs – Passed to cooler.create.

Return type:

None

Notes

The default output file mode is ‘w’. If appending output to an existing file, pass mode=’a’.

cooler.coarsen_cooler(base_uri, output_uri, factor, chunksize, nproc=1, columns=None, dtypes=None, agg=None, **kwargs)[source]

Coarsen a cooler to a lower resolution by an integer factor k.

This is done by pooling _k_-by-k neighborhoods of pixels and aggregating. Each chromosomal block is coarsened individually. Result is a coarsened cooler stored at output_uri.

Added in version 0.8.0.

Parameters:

base_uri (str) – Input cooler file path or URI.
output_uri (str) – Input cooler file path or URI.
factor (int) – Coarsening factor.
chunksize (int) – Number of pixels processed at a time per worker.
nproc (int , optional) – Number of workers for batch processing of pixels. Default is 1, i.e. no process pool.
columns (list of str , optional) – Specify which pixel value columns to include in the aggregation. Default is to use all available value columns.
dtypes (dict , optional) – Specific dtypes to use for value columns. Default is to propagate the current dtypes of the value columns.
agg (dict , optional) – Functions to use for aggregating each value column. Pass the same kind of dict accepted by pandas.DataFrame.groupby.agg. Default is to apply ‘sum’ to every value column.
kwargs – Passed to cooler.create.

Return type:

None

cooler.zoomify_cooler(base_uris, outfile, resolutions, chunksize, nproc=1, columns=None, dtypes=None, agg=None, **kwargs)[source]

Generate multiple cooler resolutions by recursive coarsening.

Result is a “zoomified” or “multires” cool file stored at outfileusing the MCOOL v2 layout, where coolers are stored under a hierarchy of the form resolutions/<r> for each resolution r.

Added in version 0.8.0.

Parameters:

base_uris (str or sequence of str) – One or more cooler URIs to use as “base resolutions” for aggregation.
outfile (str) – Output multires cooler (mcool) file path.
resolutions (list of int) – A list of target resolutions to generate.
chunksize (int) – Number of pixels processed at a time per worker.
nproc (int , optional) – Number of workers for batch processing of pixels. Default is 1, i.e. no process pool.
columns (list of str , optional) – Specify which pixel value columns to include in the aggregation. Default is to use only the column named ‘count’ if it exists.
dtypes (dict , optional) – Specific dtypes to use for value columns. Default is to propagate the current dtypes of the value columns.
agg (dict , optional) – Functions to use for aggregating each value column. Pass the same kind of dict accepted by pandas.DataFrame.groupby.agg. Default is to apply ‘sum’ to every value column.
kwargs – Passed to cooler.create.

Return type:

None

cooler.balance_cooler(clr, *, cis_only=False, trans_only=False, ignore_diags=2, mad_max=5, min_nnz=10, min_count=0, blacklist=None, rescale_marginals=True, x0=None, tol=1e-05, max_iters=200, chunksize=10000000, map=<class 'map'>, use_lock=False, store=False, store_name='weight')[source]

Iterative correction or matrix balancing of a sparse Hi-C contact map in Cooler HDF5 format.

Parameters:

clr (cooler.Cooler) – Cooler object
cis_only (bool , optional) – Do iterative correction on intra-chromosomal data only. Inter-chromosomal data is ignored.
trans_only (bool , optional) – Do iterative correction on inter-chromosomal data only. Intra-chromosomal data is ignored.
ignore_diags (int or False , optional) – Drop elements occurring on the first ignore_diags diagonals of the matrix (including the main diagonal).
chunksize (int or None , optional) – Split the contact matrix pixel records into equally sized chunks to save memory and/or parallelize. Set to None to use all the pixels at once.
mad_max (int , optional) – Pre-processing bin-level filter. Drop bins whose log marginal sum is less than mad_max median absolute deviations below the median log marginal sum.
min_nnz (int , optional) – Pre-processing bin-level filter. Drop bins with fewer nonzero elements than this value.
min_count (int , optional) – Pre-processing bin-level filter. Drop bins with lower marginal sum than this value.
blacklist (list or 1D array , optional) – An explicit list of IDs of bad bins to filter out when performing balancing.
rescale_marginals (bool , optional) – Normalize the balancing weights such that the balanced matrix has rows / columns that sum to 1.0. The scale factor is stored in the statsoutput dictionary.
map (callable , optional) – Map function to dispatch the matrix chunks to workers. Default is the builtin map, but alternatives include parallel map implementations from a multiprocessing pool.
x0 (1D array , optional) – Initial weight vector to use. Default is to start with ones(n_bins).
tol (float , optional) – Convergence criterion is the variance of the marginal (row/col) sum vector.
max_iters (int , optional) – Iteration limit.
store (bool , optional) – Whether to store the results in the file when finished. Default is False.
store_name (str , optional) – Name of the column of the bin table to save to. Default name is ‘weight’.
use_lock (bool)

Returns:

bias (1D array, whose shape is the number of bins in h5.) – Vector of bin bias weights to normalize the observed contact map. Dropped bins will be assigned the value NaN. N[i, j] = O[i, j] * bias[i] * bias[j]
stats (dict) – Summary of parameters used to perform balancing and the average magnitude of the corrected matrix’s marginal sum at convergence.

Return type:

tuple[ndarray, dict]

cooler.rename_chroms(clr, rename_dict, h5opts=None)[source]

Substitute existing chromosome/contig names for new ones. They will be written to the file and the Cooler object will be refreshed.

Parameters:

clr (Cooler) – Cooler object that can be opened with write permissions.
rename_dict (dict) – Dictionary of old -> new chromosome names. Any names omitted from the dictionary will be kept as is.
h5opts (dict , optional) – HDF5 filter options.

Return type:

None

cooler.create_scool(cool_uri, bins, cell_name_pixels_dict, columns=None, dtypes=None, metadata=None, assembly=None, ordered=False, symmetric_upper=True, mode='w', mergebuf=20000000, delete_temp=True, temp_dir=None, max_merge=200, boundscheck=True, dupcheck=True, triucheck=True, ensure_sorted=False, h5opts=None, lock=None, **kwargs)[source]

Create a single-cell (scool) file.

For each cell store a cooler matrix under /cells, where all matrices have the same dimensions.

Each cell is a regular cooler data collection, so the input must be a bin table and pixel table for each cell. The pixel tables are provided as a dictionary where the key is a unique cell name. The bin tables can be provided as a dict with the same keys or a single common bin table can be given.

Added in version 0.8.9.

Parameters:

cool_uri (str) – Path to scool file or URI string. If the file does not exist, it will be created.
bins (pandas.DataFrame or Dict[str, DataFrame]) – A single bin table or dictionary of cell names to bins tables. A bin table is a dataframe with columns chrom, start and end. May contain additional columns.
cell_name_pixels_dict (Dict [ str , DataFrame ]) – Cell name as key and pixel table DataFrame as value. A table, given as a dataframe or a column-oriented dict, containing columns labeled bin1_id, bin2_id and count, sorted by (bin1_id, bin2_id). If additional columns are included in the pixel table, their names and dtypes must be specified using thecolumns and dtypes arguments. For larger input data, aniterable can be provided that yields the pixel data as a sequence of chunks. If the input is a dask DataFrame, it will also be processed one chunk at a time.
columns (sequence of str , optional) – Customize which value columns from the input pixels to store in the cooler. Non-standard value columns will be given dtype float64unless overriden using the dtypes argument. If None, we only attempt to store a value column named "count".
dtypes (dict , optional) – Dictionary mapping column names to dtypes. Can be used to override the default dtypes of bin1_id, bin2_id or count or assign dtypes to custom value columns. Non-standard value columns given indtypes must also be provided in the columns argument or they will be ignored.
metadata (dict , optional) – Experiment metadata to store in the file. Must be JSON compatible.
assembly (str , optional) – Name of genome assembly.
ordered (bool , optional [ default: False ]) – If the input chunks of pixels are provided with correct triangularity and in ascending order of (bin1_id, bin2_id), set this toTrue to write the cooler in one step. If False (default), we create the cooler in two steps using an external sort mechanism. See Notes for more details.
symmetric_upper (bool , optional [ default: True ]) – If True, sets the file’s storage-mode property to symmetric-upper: use this only if the input data references the upper triangle of a symmetric matrix! For all other cases, set this option to False.
mode ({'w' , 'a'} , optional [ default: 'w' ]) – Write mode for the output file. ‘a’: if the output file exists, append the new cooler to it. ‘w’: if the output file exists, it will be truncated. Default is ‘w’.
mergebuf (int , optional) – Maximum number of records to buffer in memory at any give time during the merge step.
delete_temp (bool , optional) – Whether to delete temporary files when finished. Useful for debugging. Default is False.
temp_dir (str , optional) – Create temporary files in a specified directory instead of the same directory as the output file. Pass - to use the system default.
max_merge (int , optional) – If merging more than max_merge chunks, do the merge recursively in two passes.
h5opts (dict , optional) – HDF5 dataset filter options to use (compression, shuffling, checksumming, etc.). Default is to use autochunking and GZIP compression, level 6.
lock (multiprocessing.Lock , optional) – Optional lock to control concurrent access to the output file.
ensure_sorted (bool , optional) – Ensure that each input chunk is properly sorted.
boundscheck (bool , optional) – Input validation: Check that all bin IDs lie in the expected range.
dupcheck (bool , optional) – Input validation: Check that no duplicate pixels exist within any chunk.
triucheck (bool , optional) – Input validation: Check that bin1_id <= bin2_id when creating coolers in symmetric-upper mode.

Return type:

None

Notes

If the pixel chunks are provided in the correct order required for the output to be properly sorted, then the cooler can be created in a single step by setting ordered=True.

Each chunk of pixels will go through a validation pipeline, which can be customized with the following options: boundscheck, triucheck,dupcheck, ensure_sorted.

cooler.create

cooler.create.sanitize_pixels(bins, **kwargs)[source]

Builds a function to sanitize an already-binned genomic data with genomic bin assignments.

Parameters:

bins (DataFrame) – Bin table to compare pixel records against.
is_one_based (bool , optional) – Whether the input bin IDs are one-based, rather than zero-based. They will be converted to zero-based.
tril_action ('reflect' , 'drop' , 'raise' or None) – How to handle lower triangle (“tril”) pixels. If set to ‘reflect’ [default], tril pixels will be flipped or “reflected” to their mirror image: “sided” column pairs will have their values swapped. If set to ‘drop’, tril pixels will be discarded. This is useful if your input data is duplexed, i.e. contains mirror duplicates of every record. If set to ‘raise’, an exception will be raised if any tril record is encountered.
bin1_field (str) – Name of the column representing ith (row) axis of the matrix. Default is ‘bin1_id’.
bin2_field (str) – Name of the column representing jth (col) axis of the matrix. Default is ‘bin2_id’.
sided_fields (sequence of str) – Base names of column pairs to swap values between when mirror-reflecting pixels.
suffixes (pair of str) – Suffixes used to identify pairs of sided columns. e.g.: (‘1’, ‘2’), (‘_x’, ‘_y’), etc.
sort (bool) – Whether to sort the output dataframe by bin_id and bin2_id.

Returns:

callable – Function of one argument that takes a raw dataframe and returns a sanitized dataframe.

Return type:

_Callable_[[_DataFrame_], _DataFrame_]

cooler.create.sanitize_records(bins, schema=None, **kwargs)[source]

Builds a funtion to sanitize and assign bin IDs to a data frame of paired genomic positions based on a provided genomic bin segmentation.

Parameters:

bins (DataFrame) – Bin table to compare records against.
schema (str , optional) – Use pre-defined parameters for a particular format. Any options can be overriden via kwargs. If not provided, values for all the options below must be given.
decode_chroms (bool) – Convert string chromosome names to integer IDs based on the order given in the bin table. Set to False if the chromosomes are already given as an enumeration, starting at 0. Records with either chrom ID < 0 are dropped.
is_one_based (bool) – Whether the input anchor coordinates are one-based, rather than zero-based. They will be converted to zero-based.
tril_action ('reflect' , 'drop' , 'raise' or None) – How to handle lower triangle (“tril”) records. If set to ‘reflect’, tril records will be flipped or “reflected” to their mirror image: “sided” column pairs will have their values swapped. If set to ‘drop’, tril records will be discarded. This is useful if your input data is symmetric, i.e. contains mirror duplicates of every record. If set to ‘raise’, an exception will be raised if any tril record is encountered.
chrom_field (str) – Base name of the two chromosome/scaffold/contig columns.
anchor_field (str) – Base name of the positional anchor columns.
sided_fields (sequence of str) – Base names of column pairs to swap values between when mirror-reflecting records.
suffixes (pair of str) – Suffixes used to identify pairs of sided columns. e.g.: (‘1’, ‘2’), (‘_x’, ‘_y’), etc.
sort (bool) – Whether to sort the output dataframe by bin_id and bin2_id.
validate (bool) – Whether to do type- and bounds-checking on the anchor position columns. Raises BadInputError.

Returns:

callable – Function of one argument that takes a raw dataframe and returns a sanitized dataframe with bin IDs assigned.

Return type:

_Callable_[[_DataFrame_], _DataFrame_]

cooler.fileops

cooler.fileops.is_cooler(uri)[source]

Determine if a URI string references a cooler data collection. Returns False if the file or group path doesn’t exist.

Parameters:

uri (str)

Return type:

bool

cooler.fileops.is_multires_file(filepath, min_version=1)[source]

Determine if a file is a multi-res cooler file. Returns False if the file doesn’t exist.

Parameters:

filepath (str)
min_version (int)

Return type:

bool

cooler.fileops.list_coolers(filepath)[source]

List group paths to all cooler data collections in a file.

Parameters:

filepath (str)

Returns:

list – Cooler group paths in the file.

Return type:

list[str]

cooler.fileops.cp(src_uri, dst_uri, overwrite=False)[source]

Copy a group or dataset from one file to another or within the same file.

Parameters:

src_uri (str)
dst_uri (str)
overwrite (bool)

Return type:

None

cooler.fileops.mv(src_uri, dst_uri, overwrite=False)[source]

Rename a group or dataset within the same file.

Parameters:

src_uri (str)
dst_uri (str)
overwrite (bool)

Return type:

None

cooler.fileops.ln(src_uri, dst_uri, soft=False, overwrite=False)[source]

Create a hard link to a group or dataset in the same file. Also supports soft links (in the same file) or external links (different files).

Parameters:

src_uri (str)
dst_uri (str)
soft (bool)
overwrite (bool)

Return type:

None

cooler.util

cooler.util.partition(start, stop, step)[source]

Partition an integer interval into equally-sized subintervals. Like builtin range(), but yields pairs of end points.

Examples

for lo, hi in partition(0, 9, 2): print(lo, hi) 0 2 2 4 4 6 6 8 8 9

Parameters:

start (int)
stop (int)
step (int)

Return type:

_Iterator_[tuple[int, int]]

cooler.util.fetch_chromsizes(db, **kwargs)[source]

Download chromosome sizes from UCSC as a pandas.Series, indexed by chromosome label.

Parameters:

db (str)

Return type:

Series

cooler.util.read_chromsizes(filepath_or, name_patterns=('^chr[0-9]+$', '^chr[XY]$', '^chrM$'), all_names=False, **kwargs)[source]

Parse a <db>.chrom.sizes or <db>.chromInfo.txt file from the UCSC database, where db is a genome assembly name.

Parameters:

filepath_or (str or file-like) – Path or url to text file, or buffer.
name_patterns (sequence , optional) – Sequence of regular expressions to capture desired sequence names. Each corresponding set of records will be sorted in natural order.
all_names (bool , optional) – Whether to return all contigs listed in the file. Default isFalse.

Returns:

pandas.Series – Series of integer bp lengths indexed by sequence name.

Return type:

Series

References

cooler.util.binnify(chromsizes, binsize)[source]

Divide a genome into evenly sized bins.

Parameters:

chromsizes (Series) – pandas Series indexed by chromosome name with chromosome lengths in bp.
binsize (int) – size of bins in bp

Returns:

bins (pandas.DataFrame) – Dataframe with columns: chrom, start, end.

Return type:

DataFrame

cooler.util.digest(fasta_records, enzyme)[source]

Divide a genome into restriction fragments.

Parameters:

fasta_records (OrderedDict) – Dictionary of chromosome names to sequence records.
enzyme (str) – Name of restriction enzyme (e.g., ‘DpnII’).

Returns:

frags (pandas.DataFrame) – Dataframe with columns: chrom, start, end.

Return type:

DataFrame