ChemAudit: An Open-Source Chemical Structure Validation Suite - NFDI4Chem (original) (raw)
Data quality in chemistry remains one of the biggest bottlenecks in cheminformatics, drug discovery, and machine learning for chemistry. Issues such as incorrect structural representations, undefined stereocenters, PAINS-flagged1 compounds, and inconsistent standardisation can quietly undermine the reliability of downstream models and analyses.
ChemAudit was built to address this. It’s a free, open-source web platform that brings structure validation, standardisation, structural alert screening, and quality scoring together in one clear, user-friendly interface. No command-line experience needed.

What it does:
Runs 15+ validation checks covering parsability, valence, stereochemistry, and representation consistency
Screens against 480+ PAINS patterns and 700+ pharmaceutical alert filters sourced from BMS, Glaxo, Dundee, and other ChEMBL2 collections
Scores ML-readiness (0–100) by testing 451 molecular descriptors and 7 fingerprint types
Evaluates drug-likeness via Lipinski3, QED4, Veber5, Ghose6, and Muegge 7 rules
Predicts ADMET properties, including synthetic accessibility, solubility, and CNS penetration
Standardises structures using the ChEMBL pipeline8 (salt stripping, tautomer canonicalization, charge normalisation)
Assesses natural product likeness with scaffold analysis
Built for scale: Batch processing supports up to 1M molecules with real-time WebSocket progress tracking. Results can be exported to CSV, Excel, SDF, JSON, and PDF.
Built on proven tools: RDKit9, MolVS10, and the ChEMBL structure pipeline power the backend. React and RDKit.js deliver interactive 2D depictions with atom-level issue highlighting on the frontend.
ChemAudit is designed for database curators, ML researchers, medicinal chemists, and natural products scientists who need reliable, standardised chemical data without the overhead of stitching together disparate CLI tools or licensing commercial software.
Self-hosted and MIT-licensed. Try it, break it, extend it.
Available at:
Feel free to try it out here: https://chemaudit.naturalproducts.net
The code is available on GitHub: https://github.com/Kohulan/ChemAudit