Saturday, July 30, 2011

Five most common issues with molecular database registration systems. Part 2: Isomer detection

After breaking down multiple component entries to single components, the next step is to analyze these components in more detail and identify different isoforms/representations of the same molecule to make sure the same compounds get the same ID, and different compounds get different IDs. This requires the detection of isomers.

Part 2: Isomer detection

The required level of handling isomers depends on the purpose of the database, and it is worth to decide which level is the best in your case. In short you have to decide what is the same and what is different. In mcule, we are building a vendor database comprising of many millions of molecules. These molecules were drawn by different individuals following different conventions and using different molecule sketchers inevitable resulting in different representations of the same molecule. For our database, we have to identify different isoforms of the same molecule and assign them the same ID. We consider tautomers, different protonation state and mesomers the same at a particular stage of the registration process. At this level you have to identify that the structures below are identical:

Saturday, July 16, 2011

Five most common issues with molecular database registration systems. Part 1: Multiple components

Developing a molecule database is not a trivial task. There are several problems you have to deal with and they can be handled differently depending on the size, content and purpose of the database. Here I try to provide a list of the most common issues we came across recently. First: multiple components.

Part 1. Multiple components.

It usually makes sense to handle single components separately even if samples are provided as multiple components in the source data format. This enables structure searching/filtering at the level of single components. Also, you always have to be suspicious with multiple component entries. The sample might be a mixture of individual, equally important components, but it is also possible that the components are in a particular relationship. It is crucial to analyze this relationship before registering any of the components. Entries coming from vendor companies might contain the followings: salt counter ions, solvent molecules, additives (e.g. antioxidants), contaminants, intermediers and different stereoisomers. Some interesting examples you can see here: