After breaking down multiple component entries to single components, the next step is to analyze these components in more detail and identify different isoforms/representations of the same molecule to make sure the same compounds get the same ID, and different compounds get different IDs. This requires the detection of isomers.
Part 2: Isomer detection
The required level of handling isomers depends on the purpose of the database, and it is worth to decide which level is the best in your case. In short you have to decide what is the same and what is different. In mcule, we are building a vendor database comprising of many millions of molecules. These molecules were drawn by different individuals following different conventions and using different molecule sketchers inevitable resulting in different representations of the same molecule. For our database, we have to identify different isoforms of the same molecule and assign them the same ID. We consider tautomers, different protonation state and mesomers the same at a particular stage of the registration process. At this level you have to identify that the structures below are identical: