IDNet: A Novel Dataset for Identity Document Analysis and Fraud Detection (part 8) (original) (raw)

Published September 29, 2024 | Version v2

Dataset Open

Authors/Creators

Description

This is part 8 of the IDNet dataset of our research paper "IDNet: A Novel Identity Document Dataset via Few-Shot and Quality-Driven Synthetic Data Generation. Here's a link to the paper: https://ieeexplore.ieee.org/document/10825017

Citation:

@inproceedings{xie2024idnet,
title={IDNet: A Novel Identity Document Dataset via Few-Shot and Quality-Driven Synthetic Data Generation},
author={Xie, Lulu and Wang, Yancheng and Guan, Hong and Nag, Soham and Goel, Rajeev and Swamy, Niranjan and Yang, Yingzhen and Xiao, Chaowei and Prisby, Jonathan and Maciejewski, Ross and others},
booktitle={2024 IEEE International Conference on Big Data (BigData)},
pages={2244--2253},
year={2024},
organization={IEEE}
}

@article{guan2024idnet,
title={IDNet: A Novel Dataset for Identity Document Analysis and Fraud Detection},
author={Guan, Hong and Wang, Yancheng and Xie, Lulu and Nag, Soham and Goel, Rajeev and Swamy, Niranjan Erappa Narayana and Yang, Yingzhen and Xiao, Chaowei and Prisby, Jonathan and Maciejewski, Ross and Zou, Jia},
journal={arXiv preprint arXiv:2408.01690},
year={2024}
}

Abstract

Successful analysis and fraud detection of identity documents (e.g., passports, driver's licenses, and identity cards) is critical to prevent identity theft and ensure security for online platforms. We created a new benchmark dataset of identity documents to facilitate privacy-preserving fraud detection. The dataset contains 597,900 images (about 400 gigabytes) of synthetically generated identity documents, including driver licenses from 10 U.S. States and passport/ID cards from 10 European countries.

Due to the large dataset size, we divided it into multiple parts, all available on Zenodo. Each zip file contains the original synthetic document, four fraud patterns (text field replacement, face morphing, portrait substitution, mixed fraud patterns), and metadata.

All portrait images in this dataset are synthetic and are provided by https://generated.photos/

Files

DC.zip

Files (48.4 GB)

Additional details