fetch_kddcup99 (original) (raw)
sklearn.datasets.fetch_kddcup99(*, subset=None, data_home=None, shuffle=False, random_state=None, percent10=True, download_if_missing=True, return_X_y=False, as_frame=False, n_retries=3, delay=1.0)[source]#
Load the kddcup99 dataset (classification).
Download it if necessary.
Read more in the User Guide.
Added in version 0.18.
Parameters:
subset{‘SA’, ‘SF’, ‘http’, ‘smtp’}, default=None
To return the corresponding classical subsets of kddcup 99. If None, return the entire kddcup 99 dataset.
data_homestr or path-like, default=None
Specify another download and cache folder for the datasets. By default all scikit-learn data is stored in ‘~/scikit_learn_data’ subfolders.
Added in version 0.19.
shufflebool, default=False
Whether to shuffle dataset.
random_stateint, RandomState instance or None, default=None
Determines random number generation for dataset shuffling and for selection of abnormal samples if subset='SA'
. Pass an int for reproducible output across multiple function calls. See Glossary.
percent10bool, default=True
Whether to load only 10 percent of the data.
download_if_missingbool, default=True
If False, raise an OSError if the data is not locally available instead of trying to download the data from the source site.
return_X_ybool, default=False
If True, returns (data, target)
instead of a Bunch object. See below for more information about the data
and target
object.
Added in version 0.20.
as_framebool, default=False
If True
, returns a pandas Dataframe for the data
and target
objects in the Bunch
returned object; Bunch
return object will also have a frame
member.
Added in version 0.24.
n_retriesint, default=3
Number of retries when HTTP errors are encountered.
Added in version 1.5.
delayfloat, default=1.0
Number of seconds between retries.
Added in version 1.5.
Returns:
dataBunch
Dictionary-like object, with the following attributes.
data{ndarray, dataframe} of shape (494021, 41)
The data matrix to learn. If as_frame=True
, data
will be a pandas DataFrame.
target{ndarray, series} of shape (494021,)
The regression target for each sample. If as_frame=True
, target
will be a pandas Series.
framedataframe of shape (494021, 42)
Only present when as_frame=True
. Contains data
and target
.
DESCRstr
The full description of the dataset.
feature_nameslist
The names of the dataset columns
target_names: list
The names of the target columns
**(data, target)**tuple if return_X_y
is True
A tuple of two ndarray. The first containing a 2D array of shape (n_samples, n_features) with each row representing one sample and each column representing the features. The second ndarray of shape (n_samples,) containing the target samples.
Added in version 0.20.