Partial fix to a problem with implicit Hs being written to SMARTS by greglandrum · Pull Request #8893 · rdkit/rdkit (original) (raw)

In the current release of the RDKit we have this behavior when converting a molecule created from SMILES to SMARTS:

In [3]: Chem.MolToSmarts(Chem.MolFromSmiles('c1ccc[nH]1'))
Out[3]: '[#6]1:[#6]:[#6]:[#6]:[#7H]:1'

This is technically correct, but almost certainly doesn't capture the intent: it will match pyrroles which are substituted at the C atoms, but not the N.

The PR resolves this: implicit Hs on normal atoms are no longer written to SMARTS

Note that one case which is still incorrect is the handling of implicit Hs on chiral centers. So even after this change we still will get:

In [5]: Chem.MolToSmarts(Chem.MolFromSmiles('C[C@H](F)Cl'))
Out[5]: '[#6]-[#6@H](-[#9])-[#17]'

Fixing this part of things and still ensuring that we get matches here:

smi='O=C1C[C@H]1F';m=Chem.MolFromSmiles(smi);m.HasSubstructMatch(Chem.MolFromSmarts(Chem.MolToSmarts(m)),useChirality=True)

requires significantly more work. I think it's worth fixing at least part of the problem now.