We've recently done some extensive testing of currently available cheminformatic tools to decide which ones could be integrated to mcule (results of these tests will be published soon in this blog!). One of the simplest tests was reading and writing SD files with a particular tool and check if there is any information loss/excess during conversion. Since the registration system of mcule is primarily based on InChI, we generated InChIs from both input and output SD files and looked for differences. While we identified a number of tool-related issues, we also found some interesting cases where the input and output molecules were identical per see, but they got different InChIs.
CASE 1.
Here are two molecules with slightly different bond angles (note the slightly different position of the oxygen atom):
According to the IUPAC recommendation, the above representations are correct:
"… [this representation] is the preferred style when substituents A, B, and C [i.e. the plain bonds] are all part of the same small fused ring system."
However, InChI output warning messages for both molecules: "Ambiguous stereo: center(s)".
But most importantly InChI says that the molecules above are different. I would be very interested what other people’s think: are these molecules the same or different?
Interestingly, keeping the same angle difference but exchanging the wedge bond with a plain bond results in the same InChIs:
No warning messages this time, although this is not the preferred stereo representation for fused rings. Hm..
CASE 2.
Here are two molecules containing a single tetrahedral stereocentre with four substituents, but with a slight difference in the angles (note the slightly different position of chlorine atom):
In spite of the fact that the IUPAC recommendation says that none of the representations above are acceptable, InChI generates their identifier and for some reason, the resulting InChIs are different.
I found one representation in the InChI technical manual with one solid and one hashed wedge bond opposite to each other. My interpretation of this picture is that if the particular angle on the picture is more than 133 degrees, than the stereo is undefined. I haven’t found any IUPAC recommendation mentioning this rule, but even if it has a rationale, it says that both above stereocentres should be undefined, as the particular angle is more than 180 degrees in both molecules.
Same molecules with different InChIs again. Or something else? What do you think?
"What do you think?"
ReplyDeleteI think: 2D diagrams suck at stereochemistry specification, and you should just use 3D. If you like both, keep both 2D *and* 3D. The first oriented at humans, the latter oriented at machines.
Hi Egon! Thanks for the comment! I certainly agree that 3D should be preferred over 2D to store correct stereochemistry. The problem (at least for mcule) is, however, that most of the input structures we process from vendor companies are in 2D. Also I think almost all chemists draw their molecules by using 2D sketchers, which is way easier than drawing in 3D, although it is more error prone.
ReplyDelete