GitHub - ziqihuangg/CelebA-Dialog: A large-scale visual-language face dataset with fine-grained annotations (ICCV 2021) (original) (raw)

CelebA-Dialog Dataset

Talk-to-Edit: Fine-Grained Facial Editing via Dialog
Yuming Jiang*,Ziqi Huang*,Xingang Pan,Chen Change LoyandZiwei Liu
In IEEE International Conference on Computer Vision (ICCV), 2021.

From MMLab@NTU affliated with S-Lab, Nanyang Technological University.

[Project Page] | [Paper] | [Code] | [Video] | [Web Page]

CelebA-Dialog is a large-scale visual-language face dataset with the following features:

The dataset can be employed as the training and test sets for the following computer vision tasks: fine-grained facial attribute recognition, fine-grained facial manipulation, text-based facial generation and manipulation, face image captioning, natural language based facial recognition and manipulation, and broader multi-modality learning tasks. The dataset is proposed in Talk-to-Edit.

You can download using the following links:

Link (HQ) Size Files Format Description
CelebA-Dialog (HQ) ~4.4 GB 30,000 high-resolution images and corresponding annotations
image (HQ) ~2.7 GB 30,000 JPG images from CelebA-HQ
fine-grained label (HQ) ~600 KB 1 TXT fine-grained labels for 5 attributes
binary label (HQ) ~3.5 MB 1 TXT binary labels for 40 attributes
text (HQ) ~27 MB 4 TXT and JSON natural language captions and editing requests
mask (HQ) ~1.8 GB PNG segmentation masks (1) binary (2) colorized
identity (HQ) ~400 KB 1 TXT identity label of each image
Link (standard) Size Files Format Description
CelebA-Dialog (standard) 202,599 original CelebA images and corresponding annotations
image (standard) images from CelebA
fine-grained label (standard) ~4 MB 1 TXT fine-grained labels for 5 attributes
binary label (standard) ~25 MB 1 TXT binary labels for 40 attributes
text (standard) ~14 MB TXT and JSON natural language captions and editing requests
identity (standard) ~3.3 MB 1 TXT identity label of each image
Link (mapping) Size Files Format Description
HQ-to-standard mapping ~1 MB 1 TXT The mapping between 30,000 CelebA-HQ images and the 202,599 CelebA images

Details

Image

Fine-Grained Label

Binary Label

Text

Mask

We preprocess the facial segmentation masks of CelebAMask-HQ to ease future research.

Below is the color-to-label parsing information:

Label list
0: 'background' 1: 'skin' 2: 'nose' 3: 'eye_g' 4: 'l_eye'
5: 'r_eye' 6: 'l_brow' 7: 'r_brow' 8: 'l_ear' 9: 'r_ear'
10: 'mouth' 11: 'u_lip' 12: 'l_lip' 13: 'hair' 14: 'hat'
15: 'ear_r' 16: 'neck_l' 17: 'neck' 18: 'cloth'

from PIL import Image import numpy as np

segm = Image.open(f) segm = np.array(segm) # shape: [512, 512]

Identity

Some images are of the same person. There are totally 10,177 identities in the dataset. On average, there are:

Agreement

Citation

If you find this dataset useful for your research and use it in your work, please consider cite the following papers:

@InProceedings{CelebA-Dialog, title = {Talk-to-Edit: Fine-Grained Facial Editing via Dialog}, author = {Jiang, Yuming and Huang, Ziqi and Pan, Xingang and Loy, Chen Change and Liu, Ziwei}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision}, year={2021} }

@inproceedings{CelebAMask-HQ, title = {MaskGAN: Towards Diverse and Interactive Facial Image Manipulation}, author = {Lee, Cheng-Han and Liu, Ziwei and Wu, Lingyun and Luo, Ping}, booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2020} }

@inproceedings{CelebA-HQ, title={Progressive Growing of {GAN}s for Improved Quality, Stability, and Variation}, author={Tero Karras and Timo Aila and Samuli Laine and Jaakko Lehtinen}, booktitle={International Conference on Learning Representations}, year={2018}, }

@inproceedings{CelebA, title = {Deep Learning Face Attributes in the Wild}, author = {Liu, Ziwei and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou}, booktitle = {Proceedings of International Conference on Computer Vision (ICCV)}, month = {December}, year = {2015} }