Origin of the means and stds used for preprocessing? · Issue #1439 · pytorch/vision (original) (raw)

Does anyone remember how exactly we came about the channel means and stds we use for the preprocessing?

transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

I think the first mention of the preprocessing in this repo is in #39. In that issue @soumith points to https://github.com/pytorch/examples/tree/master/imagenet for reference. If you look at the history of main.py the commit pytorch/examples@27e2a46 first introduced the values. Unfortunately it contains no explanation, hence my question.

Specifically, I'm seeking answers to the following questions:


I've tested some combinations and will post my results here.

Parameters mean std
train set only, no resizing / cropping [0.4803, 0.4569, 0.4083] [0.2806, 0.2736, 0.2877]
train set only, resize to 256 and center crop to 224 [0.4845, 0.4541, 0.4025] [0.2724, 0.2637, 0.2761]
train set only, center crop to 224 [0.4701, 0.4340, 0.3832] [0.2845, 0.2733, 0.2805]

While the means match fairly well, the std differ significantly.


Update:

The process for obtaining the values of mean and std was roughly equivalent to the following but the the concrete subset that was used is lost:

import torch from torchvision import datasets, transforms as T

transform = T.Compose([T.Resize(256), T.CenterCrop(224), T.PILToTensor(), T.ConvertImageDtype(torch.float)]) dataset = datasets.ImageNet(".", split="train", transform=transform)

means = [] stds = [] for img in subset(dataset): means.append(torch.mean(img)) stds.append(torch.std(img))

mean = torch.mean(torch.tensor(means)) std = torch.mean(torch.tensor(stds))

See #1965 for the reproduction experiments.