Utilities for Developers (original) (raw)

Scikit-learn contains a number of utilities to help with development. These are located in sklearn.utils, and include tools in a number of categories. All the following functions and classes are in the module sklearn.utils.

Warning

These utilities are meant to be used internally within the scikit-learn package. They are not guaranteed to be stable between versions of scikit-learn. Backports, in particular, will be removed as the scikit-learn dependencies evolve.

Validation Tools#

These are tools used to check and validate input. When you write a function which accepts arrays, matrices, or sparse matrices as arguments, the following should be used when applicable.

If your code relies on a random number generator, it should never use functions like numpy.random.random or numpy.random.normal. This approach can lead to repeatability issues in unit tests. Instead, anumpy.random.RandomState object should be used, which is built from a random_state argument passed to the class or function. The functioncheck_random_state, below, can then be used to create a random number generator object.

For example:

from sklearn.utils import check_random_state random_state = 0 random_state = check_random_state(random_state) random_state.rand(4) array([0.5488135 , 0.71518937, 0.60276338, 0.54488318])

When developing your own scikit-learn compatible estimator, the following helpers are available.

Efficient Linear Algebra & Array Operations#

Efficient Random Sampling#

Efficient Routines for Sparse Matrices#

The sklearn.utils.sparsefuncs cython module hosts compiled extensions to efficiently process scipy.sparse data.

Graph Routines#

Testing Functions#

Multiclass and multilabel utility function#

Helper Functions#

Hash Functions#

Warnings and Exceptions#