std and mean for image normalization different from ImageNet · Issue #20 · openai/CLIP (original) (raw)

torchvision model-zoo's image normalization is:

mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]

CLIP's is:

mean=[0.48145466, 0.4578275, 0.40821073], std=[0.26862954, 0.26130258, 0.27577711]

what's the story behind the difference? Are CLIP's normalization parameters re-calculated on WebImageText?