In contrast to the “lock-and-key” model underlying the long-term success of structural biology and rational drug design, intrinsically disordered proteins (IDPs) exist in an ensemble of highly heterogeneous conformations. Therefore, although IDPs have long been recognized as attractive therapeutic targets for human diseases such as cancer and diabetes, rational drug design for IDPs is extremely challenging because of their dynamic nature. For example, it is computationally prohibitive in the computer-aided drug design to dock the usual large ligand libraries (with ~100,000 or even more compounds) to thousands and tens of thousands of conformations of an IDP target.
Recently, a research group led by Prof. Zhirong Liu from Peking University has proposed a reinforcement learning algorithm to solve the bottleneck of virtual screening upon IDPs.
Fig.1 A reinforcement-learning algorithm developed from the multi-armed bandit problem is highly efficient in solving the bottleneck of virtual screening upon intrinsically disordered proteins.
The solution was inspired by the well-known multi-armed bandit problem, where there is a slot machine with many levers (arms), and each play of one of the levers will lead to a reward following a certain probability of distribution. The multi-armed bandit problem can be applied to many fields, e.g., how to try various treatment methods to cure Covid-19 patients as possible? How to advertise various products/papers dynamically on social media? Actually, the algorithm of multi-armed bandit problem was an essential component of AlphaGo.
Fig. 2 The multi-armed bandit problem.
The problem of virtual screening of IDPs is mathematically equivalent to the multi-armed bandit problem when the number of top ligands to be picked is one. In usual drug-design tasks, however, the aim is to pick top 100 ligands. In other words, it is a variant of multi-armed bandit problem to concurrently pull 100 among 100,000 levers. The Liu group proposed a reversible upper confidence bound (rUCB) algorithm for such a variant problem. The docking process is dynamically arranged so that attempts are focused near the boundary to separate top ligands from the bulk accurately. It is demonstrated in an example of the oncoprotein c-Myc that the average docking number can be greatly reduced while the performance is merely slightly affected.
Fig. 3 Application of the rUCB algorithm on the oncoprotein c-Myc.
This study suggests that reinforcement learning is highly efficient in solving the bottleneck of virtual screening in the rational drug design of IDPs. The work was published in Phys. Chem. Chem. Phys. entitled “Reinforcement learning to boost molecular docking upon protein conforming ensembles”. The first author is Dr. Bin Chong, who has founded a company when he was a doctoral student. The research was supported by the National Natural Science Foundation of China and Beijing National Laboratory for Molecular Sciences.
Paper Link: https://pubs.rsc.org/en/content/articlelanding/2021/cp/d0cp06378a