GitHub - ziqihuangg/CelebA-Dialog: A large-scale visual-language face dataset with fine-grained annotations (ICCV 2021) (original) (raw)

CelebA-Dialog Dataset

Talk-to-Edit: Fine-Grained Facial Editing via Dialog
Yuming Jiang*,Ziqi Huang*,Xingang Pan,Chen Change LoyandZiwei Liu
In IEEE International Conference on Computer Vision (ICCV), 2021.

From MMLab@NTU affliated with S-Lab, Nanyang Technological University.

[Project Page] | [Paper] | [Code] | [Video] | [Web Page]

CelebA-Dialog is a large-scale visual-language face dataset with the following features:

Facial images are annotated with rich fine-grained labels, which classify one attribute into multiple degrees according to its semantic meaning.
Accompanied with each image, there are captions describing the attributes and a user request sample.

The dataset can be employed as the training and test sets for the following computer vision tasks: fine-grained facial attribute recognition, fine-grained facial manipulation, text-based facial generation and manipulation, face image captioning, natural language based facial recognition and manipulation, and broader multi-modality learning tasks. The dataset is proposed in Talk-to-Edit.

Download Links

You can download using the following links:

"HQ" refers to images and corresponding annotations for the 30,000 high-resolutions images following CelebA-HQ.
"standard" refers to images and corresponding annotations for original 202,599 CelebA images.

Link (HQ)	Size	Files	Format	Description
CelebA-Dialog (HQ)	~4.4 GB	30,000 high-resolution images and corresponding annotations
├ image (HQ)	~2.7 GB	30,000	JPG	images from CelebA-HQ
├ fine-grained label (HQ)	~600 KB	1	TXT	fine-grained labels for 5 attributes
├ binary label (HQ)	~3.5 MB	1	TXT	binary labels for 40 attributes
├ text (HQ)	~27 MB	4	TXT and JSON	natural language captions and editing requests
├ mask (HQ)	~1.8 GB	PNG	segmentation masks (1) binary (2) colorized
├ identity (HQ)	~400 KB	1	TXT	identity label of each image

Link (standard)	Size	Files	Format	Description
CelebA-Dialog (standard)	202,599 original CelebA images and corresponding annotations
├ image (standard)	images from CelebA
├ fine-grained label (standard)	~4 MB	1	TXT	fine-grained labels for 5 attributes
├ binary label (standard)	~25 MB	1	TXT	binary labels for 40 attributes
├ text (standard)	~14 MB	TXT and JSON	natural language captions and editing requests
├ identity (standard)	~3.3 MB	1	TXT	identity label of each image

Link (mapping)	Size	Files	Format	Description
HQ-to-standard mapping	~1 MB	1	TXT	The mapping between 30,000 CelebA-HQ images and the 202,599 CelebA images

Details

Image

HQ:
- 30,000 face images selected from the CelebA dataset by following CelebA-HQ
- High resolution of 1024 x 1024
standard:
- 202,599 face images from the CelebA dataset

Fine-Grained Label

5 fine-grained attributes annotations per image: Bangs, Eyeglasses, Beard, Smiling, and Age

Binary Label

40 binary attributes annotations per image

Text

Textual captions for each image
A user editing request per image

Mask

We preprocess the facial segmentation masks of CelebAMask-HQ to ease future research.

You can directly download the binary masks for individual labels for each image. These are the same as the ones provided in CelebAMask-HQ. (Download link)
We produce the combined colorized mask for each image following the parsing of CelebAMask-HQ. (Download link)

Below is the color-to-label parsing information:

Label list
0: 'background'	1: 'skin'	2: 'nose'	3: 'eye_g'	4: 'l_eye'
5: 'r_eye'	6: 'l_brow'	7: 'r_brow'	8: 'l_ear'	9: 'r_ear'
10: 'mouth'	11: 'u_lip'	12: 'l_lip'	13: 'hair'	14: 'hat'
15: 'ear_r'	16: 'neck_l'	17: 'neck'	18: 'cloth'

from PIL import Image import numpy as np

segm = Image.open(f) segm = np.array(segm) # shape: [512, 512]

Identity

Some images are of the same person. There are totally 10,177 identities in the dataset. On average, there are:

around 20 images per identity in CelebA (standard)
around 3 images per identity in CelebA-HQ

Agreement

The CelebA-Dialog dataset is available for non-commercial research purposes only.
You agree not to reproduce, duplicate, copy, sell, trade, resell or exploit for any commercial purposes, any portion of the images and any portion of derived data.
You agree not to further copy, publish or distribute any portion of the CelebA-Dialog dataset. Except, for internal use at a single site within the same organization it is allowed to make copies of the dataset.

Citation

If you find this dataset useful for your research and use it in your work, please consider cite the following papers:

@InProceedings{CelebA-Dialog, title = {Talk-to-Edit: Fine-Grained Facial Editing via Dialog}, author = {Jiang, Yuming and Huang, Ziqi and Pan, Xingang and Loy, Chen Change and Liu, Ziwei}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision}, year={2021} }

@inproceedings{CelebAMask-HQ, title = {MaskGAN: Towards Diverse and Interactive Facial Image Manipulation}, author = {Lee, Cheng-Han and Liu, Ziwei and Wu, Lingyun and Luo, Ping}, booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2020} }

@inproceedings{CelebA-HQ, title={Progressive Growing of {GAN}s for Improved Quality, Stability, and Variation}, author={Tero Karras and Timo Aila and Samuli Laine and Jaakko Lehtinen}, booktitle={International Conference on Learning Representations}, year={2018}, }

@inproceedings{CelebA, title = {Deep Learning Face Attributes in the Wild}, author = {Liu, Ziwei and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou}, booktitle = {Proceedings of International Conference on Computer Vision (ICCV)}, month = {December}, year = {2015} }