anndata.experimental.AnnCollection (original) (raw)
anndata.experimental.AnnCollection#
class anndata.experimental.AnnCollection(adatas, join_obs='inner', join_obsm=None, join_vars=None, label=None, keys=None, index_unique=None, convert=None, harmonize_dtypes=True, indices_strict=True)[source]#
Lazily concatenate AnnData objects along the obs
axis.
This class doesn’t copy data from underlying AnnData objects, but lazily subsets using a joint index of observations and variables. It also allows on-the-fly application of prespecified converters to .obs
attributes of the AnnData objects.
Subsetting of this object returns an AnnCollectionView
, which provides views of .obs
,.obsm
, .layers
, .X
from the underlying AnnData objects.
Parameters:
adatas Sequence[AnnData] | dict[str, AnnData]
The objects to be lazily concatenated. If a Mapping is passed, keys are used for the keys
argument and values are concatenated.
join_obs Optional[Literal['inner'
, 'outer'
]] (default: 'inner'
)
If “inner” specified all .obs
attributes from adatas
will be inner joined and copied to this object. If “outer” specified all .obsm
attributes from adatas
will be outer joined and copied to this object. For “inner” and “outer” subset objects will access .obs
of this object, not the original .obs
attributes of adatas
. If None
, nothing is copied to this object’s .obs
, a subset object will directly access .obs
attributes of adatas
(with proper reindexing and dtype conversions). For None`the inner join rule is used to select columns of `.obs
of adatas
.
join_obsm Optional[Literal['inner'
]] (default: None
)
If “inner” specified all .obsm
attributes from adatas
will be inner joined and copied to this object. Subset objects will access .obsm
of this object, not the original .obsm
attributes of adatas
. If None
, nothing is copied to this object’s .obsm
, a subset object will directly access .obsm
attributes of adatas
(with proper reindexing and dtype conversions). For both options the inner join rule for the underlying .obsm
attributes is used.
join_vars Optional[Literal['inner'
]] (default: None
)
Specify how to join adatas
along the var axis. If None
, assumes all adatas
have the same variables. If “inner”, the intersection of all variables inadatas
will be used.
label str | None (default: None
)
Column in .obs
to place batch information in. If it’s None, no column is added.
keys Sequence[str] | None (default: None
)
Names for each object being added. These values are used for column values forlabel
or appended to the index if index_unique
is not None
. Defaults to incrementing integer labels.
index_unique str | None (default: None
)
Whether to make the index unique by using the keys. If provided, this is the delimiter between “{orig_idx}{index_unique}{key}”. When None
, the original indices are kept.
convert Callable | dict[str, Callable | dict[str, Callable]] | None (default: None
)
You can pass a function or a Mapping of functions which will be applied to the values of attributes (.obs
, .obsm
, .layers
, .X
) or to specific keys of these attributes in the subset object. Specify an attribute and a key (if needed) as keys of the passed Mapping and a function to be applied as a value.
harmonize_dtypes bool (default: True
)
If True
, all retrieved arrays from subset objects will have the same dtype.
indices_strict bool (default: True
)
If True
, arrays from the subset objects will always have the same order of indices as in selection used to subset. This parameter can be set to False
if the order in the returned arrays is not important, for example, when using them for stochastic gradient descent. In this case the performance of subsetting can be a bit better.
Examples
from scanpy.datasets import pbmc68k_reduced, pbmc3k_processed adata1, adata2 = pbmc68k_reduced(), pbmc3k_processed() adata1.shape (700, 765) adata2.shape (2638, 1838) dc = AnnCollection([adata1, adata2], join_vars='inner') dc AnnCollection object with n_obs × n_vars = 3338 × 208 constructed from 2 AnnData objects view of obsm: 'X_pca', 'X_umap' obs: 'n_genes', 'percent_mito', 'n_counts', 'louvain' batch = dc[100:200] # AnnCollectionView batch AnnCollectionView object with n_obs × n_vars = 100 × 208 obsm: 'X_pca', 'X_umap' obs: 'n_genes', 'percent_mito', 'n_counts', 'louvain' batch.X.shape (100, 208) len(batch.obs['louvain']) 100
Attributes
Methods