As a first post in the mcule blog we thought it was a good idea to explain our motivations and how the original idea of mcule came. It all started with the observation of an unmet need. Well, it was actually an unmet need of the storyteller :)
I started my PhD in 2004. As a subject my supervisor suggested me the histamine H4 receptor – a novel and very interesting drug target. Only a few selective H4 ligands were known at that time, so any new ligands would have been of great value. We had good experiences with histamine receptor homology models in the past, so we decided to build a structural model for the H4 receptor, and screen compounds virtually in the hope of finding new H4 ligands. After selecting one suitable model for screening, we were facing the first major problem: how could we screen all commercially available compounds? While the ZINC database offered screening libraries of a few vendor companies it only represented a small portion of the purchasable chemical space and most of the libraries were out-of-date. Still, ZINC was a great help and served as a starting point for our screening database. We updated some libraries directly form the vendors and also added some new libraries not included in ZINC. This took, however, a long time (seeking vendors, registering on their website, waiting for passwords, downloading libraries or waiting for their CD to arrive, etc.). Preparation of the newly added compounds (prediction of protonation states, 3D coordinate generation, ID generation, etc.) also took a while but after a few more weeks the database was ready for screening.
So finally we had the database of nearly 9 millions of 3D structures. Next we estimated the time needed for the calculations (i.e. dock and score 9M structures into our H4 receptor model). We quickly realized that we should buy a thousand (at least a few hundred) computers if we wanted to screen this database in a reasonable time. We were quite lucky to find a cluster of nearly 1000 CPUs freely available for academic research purposes (NIIF ClusterGrid project), but it was in a developmental phase. While we were working with one of the best IT groups in Hungary, we experienced a lot of difficulties, interruptions, and the screening was much more time-consuming than we expected.
After the screening was finished, we started the analysis of nearly 180GB of data (as compressed!) – not a trivial task again... another few weeks gone. We finally had a hit list of the most promising compounds, so we thought that the most difficult part of the project is over. But it was just about to start...
First of all, our initial screening database was partially out-of-date even when we started the screening and second of all, all the remaining parts became out-of-date when the screening and data analysis were finished. We had some clue about potential vendor companies selling our particular virtual hits, but in several cases they were either discontinued or depleted. We had to search different online databases and vendor catalogues to find our compounds. We eventually found most of them but it took us weeks to search the compounds by structure, and (if found) to contact the vendors to assure the compounds are on stock, can be purchased and delivered to our lab. Then we had to compare prices and consider all other details like minimum order fees, delivery fees and times, available quantities, purities, salt forms, etc. This is something that can be done rather easier in case of a single vendor, but we had to deal with tens of them... Another few months gone and molecules started to turn up in our lab, although at random times, which made the planning of the pharmacological screens very challenging.
At the end, we measured the arrived compounds we found some new ligands and wrote a JMC paper about the whole story. So, I guess I'm satisfied with the outcome of the study, but it took me nearly two years to accomplish, which is unacceptable. I was thinking... what if the whole process could have been done in a few days/weeks instead of 2 years. What useful things could I have done in two years instead of molecule structure preparation, data acquisition, moving data from one end to another, emailing with vendor representatives, compound procurement, etc.? I'm sure there were a lot of interesting thing to do. Lot's of things! Most importantly, I could try to develop the identified hits into selective H4 inhibitors with proper pharmacodynamic and pharmacokinetic profiles. Or, I could run screens on different histamine receptor subtypes, could develop a better model, and so on...
What was the big problem with this study, why did it take so much time? I wasted my time mostly on the preparation of an up-to-date molecule database suitable for virtual screening, integration of this database with the screening tools and the computational resources to run the actual screens, then data analysis and finally compound acquisition. Hmm.. so what if all these things would be integrated into a single service? I quickly searched the internet to see if there is any such service available, but only found partially integrated services, and they seemed to be far from what I was thinking about. I imagined a service for virtual screening, where everybody can screen an up-to-date, carefully curated database of purchasable compounds (maybe also his/her own database). Sounds great, right? Now, what if ALL available virtual screening tools would be integrated into this service, moreover they could be run subsequently, so the output of one screen could be the input of another one? That's something I would use I guess, but what if I need an amazing lot of CPUs (like in the study above)? OK, so let's integrate the whole thing with cloud technology to get access to a practically unlimited number of CPUs. Excellent! Now, what about the acquisition of the virtual hits, I really hated doing that, can someone order them for me, please?? Sure thing! Let's develop a service which collects all the hits from not just a few but from hundreds of vendors (whole purchasable chemical space) and delivers it to your door. Excellent! Now, is this going to be some software I have to download, install and call IT people after a few hours of struggling and before the first nervous break-down? Do I need advanced programming skills to make this work? NO! This should be a web service, available for everyone who is interested and it should require absolutely NO programming skills. This would open the door for non-cheminformaticians (pharmacologists, chemists, etc.).
This looked like a good plan! So we gathered a team of excellent programmers, cheminformaticians, molecular modellers and web-developers to make this plan into reality. We are doing this step by step and are very interested in your feedback so that we can provide you with the best service. We are called mcule, please memorize this name and join us to be among the first users who will get early access to the service of mcule. We are planning to launch in August!
Robert Kiss, Ph.D.