Incorrect preprocessing for ImageNet-C evaluation (original) (raw)

I see that the ImageNet-C evaluation uses the preprocessing: Resize(256)+CenterCrop(224)+ToTensor().

def load_imagenetc(
n_examples: Optional[int] = 5000,
severity: int = 5,
data_dir: str = './data',
shuffle: bool = False,
corruptions: Sequence[str] = CORRUPTIONS,
prepr: str = 'Res256Crop224'
) -> Tuple[torch.Tensor, torch.Tensor]:
transforms_test = PREPROCESSINGS[prepr]

This causes discrepancies with the scores reported in the original papers (DeepAugment, AugMix, Standard RN-50). The ImageNet-C dataset already contains 224x224 images and hence only ToTensor() should be used for consistency.

Fixing prepr='none' in load_imagenetc should solve the issue (assuming all the models are capable of handling 224x224 images as input).