fetch_kddcup99 (original) (raw)

sklearn.datasets.fetch_kddcup99(*, subset=None, data_home=None, shuffle=False, random_state=None, percent10=True, download_if_missing=True, return_X_y=False, as_frame=False, n_retries=3, delay=1.0)[source]#

Load the kddcup99 dataset (classification).

Download it if necessary.

Read more in the User Guide.

Added in version 0.18.

Parameters:

subset{‘SA’, ‘SF’, ‘http’, ‘smtp’}, default=None

To return the corresponding classical subsets of kddcup 99. If None, return the entire kddcup 99 dataset.

data_homestr or path-like, default=None

Specify another download and cache folder for the datasets. By default all scikit-learn data is stored in ‘~/scikit_learn_data’ subfolders.

Added in version 0.19.

shufflebool, default=False

Whether to shuffle dataset.

random_stateint, RandomState instance or None, default=None

Determines random number generation for dataset shuffling and for selection of abnormal samples if subset='SA'. Pass an int for reproducible output across multiple function calls. See Glossary.

percent10bool, default=True

Whether to load only 10 percent of the data.

download_if_missingbool, default=True

If False, raise an OSError if the data is not locally available instead of trying to download the data from the source site.

return_X_ybool, default=False

If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.

Added in version 0.20.

as_framebool, default=False

If True, returns a pandas Dataframe for the data and targetobjects in the Bunch returned object; Bunch return object will also have a frame member.

Added in version 0.24.

n_retriesint, default=3

Number of retries when HTTP errors are encountered.

Added in version 1.5.

delayfloat, default=1.0

Number of seconds between retries.

Added in version 1.5.

Returns:

dataBunch

Dictionary-like object, with the following attributes.

data{ndarray, dataframe} of shape (494021, 41)

The data matrix to learn. If as_frame=True, data will be a pandas DataFrame.

target{ndarray, series} of shape (494021,)

The regression target for each sample. If as_frame=True, targetwill be a pandas Series.

framedataframe of shape (494021, 42)

Only present when as_frame=True. Contains data and target.

DESCRstr

The full description of the dataset.

feature_nameslist

The names of the dataset columns

target_names: list

The names of the target columns

**(data, target)**tuple if return_X_y is True

A tuple of two ndarray. The first containing a 2D array of shape (n_samples, n_features) with each row representing one sample and each column representing the features. The second ndarray of shape (n_samples,) containing the target samples.

Added in version 0.20.