What is MoLeR?
A Microsoft Research creation
In recent years drug discovery is being more impacted by A.I. than ever before. I actually invest in this, my favorite play is $BTAI, at least if it goes down below $8 that is. I write about investing too.
We know that machine learning and other technologies are expected to make the hunt for new pharmaceuticals quicker, cheaper and more effective. Artificial intelligence has been making inroads in drug discovery for a good part of the last decade. With quantum computing this could even be boosted further. We’ll know soon enough.
Biotech companies using an AI-first approach (Feb, 2022) have more than 150 small-molecule drugs in discovery and more than 15 already in clinical trials. This AI-fueled pipeline has been expanding at an annual rate of almost 40%. That’s a pretty decent YOY CAGR, or year over year compound annual growth rate.
While some pharma companies and A.I. startups will get good at this, some BigTech companies will as well. Amazon, Google, Microsoft and Apple moving into healthcare isn’t by accident. It’s not just an EMR, health wearables or telemedicine play. It’s not just about B2B marketplaces and drug distribution but drug discovery that’s the real gold-mine.
Given the transformative potential of AI, pharma companies and BigTech need to plan for a future in which AI is routinely used in drug discovery. New players are scaling up fast and creating significant value, but the applications are diverse and pharma companies need to determine where and how AI can most add value for them. It turns out BigTech with major A.I. labs in-house may be better equipped to do this.
MoLeR: Creating a path to more efficient drug design
MR presented a conference paper in 2022, you can read it here. Recent advancements in deep learning-based modeling of molecules promise to accelerate in silico drug discovery. I will try to outline some of its content in this Newsletter article.
If you enjoy A.I. Supremacy, you can get a paid subscription with access to premium content. Simply share this link to a friend or colleague. Many of my paying subscribers expense this newsletter as part of their team’s learning and development budget (just ask!). It’s crazy but some content really is educational.
You can also brows the long paper by Microsoft Research.
Published April 27, 2022
Any mention of “we” or “are” after this point refers to the authors and Microsoft Research.
Many drug discovery projects require a fixed scaffold to be present in the generated molecule, and incorporating that constraint has only recently been explored.
Here, we propose MoLeR, a graph-based model that naturally supports scaffolds as initial seed of the generative procedure, which is possible because it is not conditioned on the generation history.
Our experiments show that MoLeR performs comparably to state-of-the-art methods on unconstrained molecular optimization tasks, and outperforms them on scaffoldbased tasks, while being an order of magnitude faster to train and sample from than existing approaches. Furthermore, we show the influence of a number of seemingly minor design choices on the overall performance.
MoLeR: A Deep Learning-Based Generative Model That Enables Efficient Drug Design
Krzysztof Maziarz has a particularly good background for this, as the lead author. He’s been working on Deep Learning, particularly applied to structured objects such as molecules. Most of my efforts concentrate around using ML to advance scientific discovery, in particular generating drug-like molecules, predicting molecular properties, and improving learned solutions to the electronic Schrödinger equation. I’m quite fascinated by the “creators” behind many of the papers.
One wishes AI labs demonstrated more commitment to the personal brand of the researchers and academics. These are scientists and engineers that have unique stories to tell. If you are an A.I. founder, co-founder, scientist or A.I. personality in academics, perhaps one day I could tell your story. I’ll create a CTA sheet eventually for this, just like I did yesterday for potential sponsors (I need to eat Bro). What I’m realizing writing this Newsletter is the relationships of the social network are enjoyable too.
The paper was presented at ICLR 2022. The Tenth International Conference on Learning Representations (Virtual) Mon Apr 25th through Fri the 29th, 2022.
Drug Discovery is now an increasingly rational process, in which one important phase, called lead optimization, is the stepwise search for promising drug candidate compounds in the lab. In this phase, expert medicinal chemists work to improve “hit” molecules—compounds that demonstrate some promising properties, as well as some undesirable ones, in early screening.
In subsequent testing, chemists try to adapt the structure of hit molecules to improve their biological efficacy and reduce potential side effects. This process combines knowledge, creativity, experience, and intuition, and often lasts for years. Over many decades, computational modelling techniques have been developed to help predict how the molecules will fare in the lab, so that costly and time-consuming experiments can focus on the most promising compounds.
(see banner image, scroll to the top of this)
Figure 1: Classic human-led drug design (bottom) is an iterative process of proposing new compounds and testing them in vitro. As this process requires synthesis in the lab, it is very costly and time consuming. By using computational modelling (top), molecule design can be rapidly performed in silico, with only the most promising molecules promoted to be made in the lab and then eventually tested in vivo.
Microsoft is partnering with Novartis here. Novartis is a Swiss multinational pharmaceutical corporation based in Basel, Switzerland, a major BigPharma of Europe.
The Microsoft Generative Chemistry team is working with Novartis to improve these modelling techniques with a new model called MoLeR.
“MoLeR illustrates how generative models based on deep learning can help transform the drug discovery process and enable our colleagues at Novartis to increase the efficiency in finding new compounds.”
Christopher Bishop, Technical Fellow and Laboratory Director, Microsoft Research Cambridge
Microsoft is thus empowering BigPharma with A.I.
“Creating the formulation to a drug is a bit like cooking,” says Finelli, vice president and head of insights, strategy and design at Novartis, a multinational pharmaceutical company headquartered in Basel, Switzerland.
What is MoLeR?
MoLeR, is a bit like a better recipe. It is a new graph-based generative model suitable for the commonly required task of extending partial molecules. It can use motifs (molecule fragments) to generate outputs (similarly to Jin et al. (2018; 2020)), but integrates this with atom-by-atom generation.
MoLeR (a) is able learn to generate molecules matching the distribution of the training data (with and without scaffolds); (b) together with an off-the-shelf optimization method (MSO (Winter et al., 2019b)) can be used for molecular optimization tasks, matching the state of the art methods in unconstrained optimization, and outperforming them on scaffold-constrained tasks; and (c) is faster in training and inference than baseline methods.
Pretty interesting Kyrz, I’m digging this.
During generation, our model can either add an entire motif in one step, or generate atoms and bonds one-by-one. This means that it can generate arbitrary structures, such as an unusual ring, even if they do not appear in the training data.
We recently focused on predicting molecular properties using machine learning methods in the FS-Mol project. To further support the drug discovery process, we are also working on methods that can automatically design compounds that better fit project requirements than existing candidate compounds.
The MoLeR model
In the MoLeR model, we represent molecules as graphs, in which atoms appear as vertices that are connected by edges corresponding to the bonds.
Our model is trained in the auto-encoder paradigm, meaning that it consists of an encoder—a graph neural network (GNN) that aims to compress an input molecule into a so-called latent code—and a decoder, which tries to reconstruct the original molecule from this code.
As the decoder needs to decompress a short encoding into a graph of arbitrary size, we design the reconstruction process to be sequential. In each step, we extend a partially generated graph by adding new atoms or bonds.
A crucial feature of our model is that the decoder makes predictions at each step solely based on a partial graph and a latent code, rather than in dependence on earlier predictions.
We also train MoLeR to construct the same molecule in a variety of different orders, as the construction order is an arbitrary choice.
This makes more sense as a GIF:
Since drug molecules are not random combinations of atoms. They tend to be composed of larger structural motifs, much like sentences in a natural language are compositions of words, and not random sequences of letters.
Thus, unlike CGVAE, MoLeR first discovers these common building blocks from data, and is then trained to extend a partial molecule using entire motifs (rather than single atoms). Consequently, MoLeR not only needs fewer steps to construct drug-like molecules, but its generation procedure also occurs in steps that are more akin to the way chemists think about the construction of molecules.
Drug-discovery projects often focus on a specific subset of the chemical space, by first defining a scaffold—a central part of the molecule that has already shown promising properties—and then exploring only those compounds that contain the scaffold as a subgraph.
The design of MoLeR’s decoder allows us to seamlessly integrate an arbitrary scaffold by using it as an initial state in the decoding loop. As we randomize the generation order during training, MoLeR implicitly learns to complete arbitrary subgraphs, making it ideal for focused scaffold-based exploration.
Optimization with MoLeR
Even after training our model as discussed above, MoLeR has no notion of “optimization” of molecules.
However, like related approaches, we can perform optimization in the space of latent codes using an off-the-shelf black-box optimization algorithm. This was not possible with CGVAE, which used a much more complicated encoding of graphs.
In our work, we opted for using Molecular Swarm Optimization (MSO), which shows state-of-the-art results for latent space optimization in other models, and indeed we found it to work very well for MoLeR. In particular, we evaluated optimization with MSO and MoLeR on new benchmark tasks that are similar to realistic drug discovery projects using large scaffolds and found this combination to outperform existing models.
We continue to work with Novartis to focus machine learning research on problems relevant to the real-world drug discovery process. The early results are substantially better than those of competing methods, including our earlier CGVAE model. With time, we hope MoLeR-generated compounds will reach the final stages of drug-discovery projects, eventually contributing to new useful drugs that benefit humanity.
I find this pretty innovative stuff in terms of the evolution of A.I. role in drug discovery and design. I did not even know that M.R. was actively working with Novartis.
Considering the high-level business diversification of Microsoft recently, their R&D is going to improve in leaps and bounds so I’m keeping closer watch on what Microsoft Research is up to than usual.
Please check out the account official blog and paper for clearer or better info if needed:
Have a good Sunday!