计算机科学与探索 ›› 2023, Vol. 17 ›› Issue (5): 1049-1056.DOI: 10.3778/j.issn.1673-9418.2109074

• 前沿·综述 • 上一篇    下一篇

超大规模药物虚拟筛选的实现与应用

张宝花,李辉,刘倩,高美娜,黄荷,赵毅,于坤千,金钟   

  1. 1. 中国科学院计算机网络信息中心,北京 100190
    2. 中国科学院上海药物研究所,上海 201203
    3. 中国科学院大学,北京 100049
  • 出版日期:2023-05-01 发布日期:2023-05-01

Implementation and Application of Large-Scale Drug Virtual Screening

ZHANG Baohua, LI Hui, LIU Qian, GAO Meina, HUANG He, ZHAO Yi, YU Kunqian, JIN Zhong   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
    2. Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
    3. University of Chinese Academy of Sciences, Beijing 100049, China
  • Online:2023-05-01 Published:2023-05-01

摘要: 基于分子对接的药物虚拟筛选技术通过评估多个配体化合物与受体的结合强度来筛选最强结合的分子。在新冠病毒疫情全球蔓延形势下,超大规模快速药物虚拟筛选对于从海量配体结构中筛选出潜药分子至关重要。超级计算机的强大算力为药物虚拟筛选提供了硬件保障,但超大规模的药物虚拟筛选还面临着很多挑战,影响了计算的有效进行。在对挑战进行分析的基础上,提出了以中央数据库进行集中任务分发的方案,设计了多层级任务分发框架,并通过多层级智能调度、海量小分子文件多层级压缩处理、动态负载均衡、高容错管理等技术有效应对了面临的各种挑战,开发了简单易用的“树”形多层级任务分发系统,实现了快速高效稳定的药物虚拟筛选任务分发、计算和结果处理功能,计算效率近线性。在此基础上,采用异构计算技术在国产先进计算系统上针对新冠病毒两种不同活性位点快速完成了超过20亿化合物的药物虚拟筛选,为应对暴发性恶性传染病的超大规模快速虚拟筛选提供了强大计算保障。

关键词: 药物虚拟筛选, 分子对接, 并行分发, 容错管理, 动态负载均衡

Abstract: The molecular docking-based virtual screening technique evaluates the binding abilities between multiple ligand compounds and receptors to screen for the active compounds. In the context of the global spread of the COVID-19 pandemic, large-scale and rapid drug virtual screening is crucial for identifying potential drug molecules from massive datasets of ligand structures. The powerful computing power of supercomputer provides hardware guarantee for drug virtual screening, but the super large-scale drug virtual screening still faces many challenges that affects the effective execution of the calculation. Based on the analysis of the challenges, this paper proposes a centralized task distribution scheme with a central database, and designs a multi-level task distribution framework. The challenges are effectively solved through multi-level intelligent scheduling, multi-level compression processing of massive small molecule files, dynamic load balancing and high error tolerance management technology. An easy-to-use “tree” multi-level task distribution system is implemented. A fast, efficient and stable drug virtual screening task distribution, calculation and result analysis function is realized, and the computing efficiency is nearly linear. Then, heterogeneous computing technology is used to complete the drug virtual screening of more than 2 billion compounds, for two different active sites for COVID-19, on the domestic super computing system, which provides a powerful computing guarantee for the super large-scale rapid virtual screening of explosive malignant infectious diseases.

Key words: drug virtual screening, molecular docking, concurrent distribution, fault tolerance management, dynamic load balance