fetch_lfw_people (original) (raw)

sklearn.datasets.fetch_lfw_people(*, data_home=None, funneled=True, resize=0.5, min_faces_per_person=0, color=False, slice_=(slice(70, 195, None), slice(78, 172, None)), download_if_missing=True, return_X_y=False, n_retries=3, delay=1.0)[source]#

Load the Labeled Faces in the Wild (LFW) people dataset (classification).

Download it if necessary.

For a usage example of this dataset, seeFaces recognition example using eigenfaces and SVMs.

Read more in the User Guide.

Parameters:

data_homestr or path-like, default=None

Specify another download and cache folder for the datasets. By default all scikit-learn data is stored in ‘~/scikit_learn_data’ subfolders.

funneledbool, default=True

Download and use the funneled variant of the dataset.

resizefloat or None, default=0.5

Ratio used to resize the each face picture. If None, no resizing is performed.

min_faces_per_personint, default=None

The extracted dataset will only retain pictures of people that have at least min_faces_per_person different pictures.

colorbool, default=False

Keep the 3 RGB channels instead of averaging them to a single gray level channel. If color is True the shape of the data has one more dimension than the shape with color = False.

**slice_**tuple of slice, default=(slice(70, 195), slice(78, 172))

Provide a custom 2D slice (height, width) to extract the ‘interesting’ part of the jpeg files and avoid use statistical correlation from the background.

download_if_missingbool, default=True

If False, raise an OSError if the data is not locally available instead of trying to download the data from the source site.

return_X_ybool, default=False

If True, returns (dataset.data, dataset.target) instead of a Bunch object. See below for more information about the dataset.data anddataset.target object.

Added in version 0.20.

n_retriesint, default=3

Number of retries when HTTP errors are encountered.

Added in version 1.5.

delayfloat, default=1.0

Number of seconds between retries.

Added in version 1.5.

Returns:

datasetBunch

Dictionary-like object, with the following attributes.

datanumpy array of shape (13233, 2914)

Each row corresponds to a ravelled face image of original size 62 x 47 pixels. Changing the slice_ or resize parameters will change the shape of the output.

imagesnumpy array of shape (13233, 62, 47)

Each row is a face image corresponding to one of the 5749 people in the dataset. Changing the slice_or resize parameters will change the shape of the output.

targetnumpy array of shape (13233,)

Labels associated to each face image. Those labels range from 0-5748 and correspond to the person IDs.

target_namesnumpy array of shape (5749,)

Names of all persons in the dataset. Position in array corresponds to the person ID in the target array.

DESCRstr

Description of the Labeled Faces in the Wild (LFW) dataset.

**(data, target)**tuple if return_X_y is True

A tuple of two ndarray. The first containing a 2D array of shape (n_samples, n_features) with each row representing one sample and each column representing the features. The second ndarray of shape (n_samples,) containing the target samples.

Added in version 0.20.

Examples

from sklearn.datasets import fetch_lfw_people lfw_people = fetch_lfw_people() lfw_people.data.shape (13233, 2914) lfw_people.target.shape (13233,) for name in lfw_people.target_names[:5]: ... print(name) AJ Cook AJ Lamas Aaron Eckhart Aaron Guiel Aaron Patterson

fetch_lfw_people (original) (raw)

Gallery examples#