make_checkerboard (original) (raw)

sklearn.datasets.make_checkerboard(shape, n_clusters, *, noise=0.0, minval=10, maxval=100, shuffle=True, random_state=None)[source]#

Generate an array with block checkerboard structure for biclustering.

Read more in the User Guide.

Parameters:

shapetuple of shape (n_rows, n_cols)

The shape of the result.

n_clustersint or array-like or shape (n_row_clusters, n_column_clusters)

The number of row and column clusters.

noisefloat, default=0.0

The standard deviation of the gaussian noise.

minvalfloat, default=10

Minimum value of a bicluster.

maxvalfloat, default=100

Maximum value of a bicluster.

shufflebool, default=True

Shuffle the samples.

random_stateint, RandomState instance or None, default=None

Determines random number generation for dataset creation. Pass an int for reproducible output across multiple function calls. See Glossary.

Returns:

Xndarray of shape shape

The generated array.

rowsndarray of shape (n_clusters, X.shape[0])

The indicators for cluster membership of each row.

colsndarray of shape (n_clusters, X.shape[1])

The indicators for cluster membership of each column.

See also

make_biclusters

Generate an array with constant block diagonal structure for biclustering.

References

[1]

Kluger, Y., Basri, R., Chang, J. T., & Gerstein, M. (2003). Spectral biclustering of microarray data: coclustering genes and conditions. Genome research, 13(4), 703-716.

Examples

from sklearn.datasets import make_checkerboard data, rows, columns = make_checkerboard(shape=(300, 300), n_clusters=10, ... random_state=42) data.shape (300, 300) rows.shape (100, 300) columns.shape (100, 300) print(rows[0][:5], columns[0][:5]) [False False False True False] [False False False False False]