sklearn.model_selection.ShuffleSplit — scikit-learn 0.20.4 documentation (original) (raw)

Yields indices to split data into training and test sets.

Note: contrary to other cross-validation strategies, random splits do not guarantee that all folds will be different, although this is still very likely for sizeable datasets.

from sklearn.model_selection import ShuffleSplit X = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [3, 4], [5, 6]]) y = np.array([1, 2, 1, 2, 1, 2]) rs = ShuffleSplit(n_splits=5, test_size=.25, random_state=0) rs.get_n_splits(X) 5 print(rs) ShuffleSplit(n_splits=5, random_state=0, test_size=0.25, train_size=None) for train_index, test_index in rs.split(X): ... print("TRAIN:", train_index, "TEST:", test_index) ...
TRAIN: [1 3 0 4] TEST: [5 2] TRAIN: [4 0 2 5] TEST: [1 3] TRAIN: [1 2 4 0] TEST: [3 5] TRAIN: [3 4 1 0] TEST: [5 2] TRAIN: [3 5 1 0] TEST: [2 4] rs = ShuffleSplit(n_splits=5, train_size=0.5, test_size=.25, ... random_state=0) for train_index, test_index in rs.split(X): ... print("TRAIN:", train_index, "TEST:", test_index) ...
TRAIN: [1 3 0] TEST: [5 2] TRAIN: [4 0 2] TEST: [1 3] TRAIN: [1 2 4] TEST: [3 5] TRAIN: [3 4 1] TEST: [5 2] TRAIN: [3 5 1] TEST: [2 4]