Tuesday, January 31, 2012

What's going on ...

The mcule project was started in last June with very ambitious goals. Several months have passed and you could ask what’s happening behind the scenes? And when can I finally start searching/screening/ordering etc. Well, a lot of things happened since last June.

Laying down the foundations took more time than we expected. But the good news is: we are almost there now. So here is a short summary of what we have done and what is currently being done, and most importantly: what you can expect in the upcoming first releases of mcule.

Fundamental rules for the registration system have been set and implemented. This will guarantee that our vendor database will contain high quality molecules, and ID ambiguities will be close to zero. If you read the last posts of this blog you might be aware that our molecule registration system is based on InChI. While it is a very powerful tool for structure registration, it needs some adjustments to make sure no corner cases are missed. Vendor companies provide their structural data in SDF format, which suffers from far more limitations. It seems that no molecule representation will be sufficient on its own to represent the whole chemical space (or even just the accessible part of it). It is just too complex. Another source of ambiguities derives from drawing errors. SDF writing rules of molecule sketchers (e.g. setting the chiral flag automatically) can also result in potential stereochemical ambiguities. What we can do is to define specific structural checks to filter out potentially problematic molecules. By inspecting these problematic cases, new preparation steps can be defined to facilitate automatic registration. We have now more than 50 such checks/preparations in our registration system.

First large scale test of our registration system involving 2.5 M compounds has been completed. Results are promising, 97% of the molecules could be automatically processed, while only 3% of the molecules were subjected to further inspections. With some slight changes we can further reduce the number of retained molecules down to 0.5 - 0.1%. We are doing the final touches and will run the registration on the first few millions of vendor compounds again. This will hopefully provide you a reasonable compound deck to start your searches on.

At the same time, the first searching/screening filters have been integrated with the test database. Simple searches (exact, similarity, substructure) will be available in the first release, while more complex searches (e.g. docking, pharmacophore searches) will appear among the filters soon. You will be able to feed these engines with one of the leading javascript editors, alternatively you can search for MCULE ID, InChI or SMILES.

As to the web interface, we hope that you will get an experience, which will make mcule.com your default searching/screening tool. You get (i) informative molecule index pages containing (among others) molecule IDs, properties, vendor and ordering information, (ii) drag and drop filters to build more complex screening workflows, (iii) flexibly manageable molecule collections/hitlists quickly displayed in list/grid views, and more.

We expect to send out the invitation for the private beta version soon, so please stay tuned!