Multi-agent reinforcement learning based adaptive sampling of conformational free energy landscapes of proteins
Date:
Abstract Molecular Dynamics (MD) simulations have become a crucial tool in chemistry, biology, condensed matter physics, and materials science. Rare state transitions tend to be hard to sample even under massively parallel simulation schemes. To address this issue, several algorithms inspired by reinforcement learning (RL) have arisen to promote exploration of the slow collective variables (CV) of complex systems. However, most of these algorithms are not well-suited to leverage the information gained by sampling a system from different initial states (e.g., a ligand-receptor system starting from bound and unbound poses). To fill this gap, we propose an algorithm inspired by multi-agent RL that extends the functionality of two closely-related techniques (REAP and TSLC) to situations where the sampling can be accelerated by learning from different regions of the CV landscape. Essentially, the algorithm works by remembering which agent discovered each conformation and sharing this information with others at the action-space discretization step. In this way, agents only sense rewards from regions of the landscape that they discovered. The consequences are threefold: (i) agents learn which CV carries more weight in the reward function using only relevant data, (ii) they deprioritize redundant actions, and (iii) agents that obtain higher rewards are assigned more actions. The conformations that are deemed the most rewarding are finally selected as starting points for new simulations. We compare our algorithm with baseline versions of LeastCounts, REAP, TSLC, and AdaptiveBandit based adaptive sampling to show and rationalize the gain in performance.