KDD 2021 Tutorial
Artificial Intelligence for Drug Discovery

Jian Tang

Assistant Professor
Mila - Quebec AI Institute

Abstract

Drug discovery is a long and costly process, taking on average 10 years and 2.5 billion dollars to develop a new drug. Artificial intelligence has the potential to significantly accelerate the process of drug discovery by analyzing a large amount of data generated in the biomedical domain such as bioassays, chemical experiments, and biomedical literature. Recently, there is a growing interesting in developing AI techniques for drug discovery in many different communities including machine learning, data mining, and biomedical community. In this tutorial, we will provide a detailed introduction to key problems in drug discovery such as molecular property prediction, de novo molecular design and molecular optimization, retrosynthesis reaction and prediction, and drug repurposing and combination, and also key technique advancements with artificial intelligence for these problems. This tutorial can be served as introduction materials for both computer scientist interested in drug discovery as well as drug discovery practitioners for learning the latest AI techniques along this direction.

Schedule

4:00 pm - 7:00 pm EST, Augest 14, 2021

Slides

The slides can be found here.

Outline

Part I: 80 min
- Drug Discovery Overview [10 min, Feixiong]
- Molecule Properties Prediction [20 min, Jian]
  - Supervised [MPNN (Gilmer et al., 2017)]
  - Self-supervised [ContextPred (Hu et al., 2019), InfoGraph (Sun et al., 2020)]
  - Semi-supervised [InfoGraph (Sun et al., 2020)]
- De novo Molecule Generation and Optimization [20 min, Jian]
  - Variational autoencoder-based approach [JTVAE (Jin et al., 2018)]
  - Autoregressive methods [GCPN (You et al., 2018)]
  - Normalizing Flow-based approaches [GraphAF (Shi et al., 2020), MoFlow (Zang & Wang, 2020)]
- Reaction Prediction and Retrosynthesis [5 min, Jian]
- Molecular Conformation Prediction [20 min, Jian]
  - CGCF [(Xu et al., 2021)]
  - ConfVAE [(Xu et al., 2021)]
  - ConfGF [(Shi et al., 2021)]
- Open Source Platform TorchDrug [5 min, Jian]
Break: 10 min
Part II: 65 min
- Multiomics and Clinical Data-based Drug Repurposing [30 min, Feixiong]
  - Network-based approach [(Cheng et al., 2018)]
  - Graph Neural Network-based approach [(Gysi et al., 2020)]
  - Case Study on COVID-19 [(Zhou et al., 2020), (Gysi et al., 2020)]
- Real World Data and Real World Evidence [30 min, Fei]
  - Treatment effectiveness estimation
  - Pharmacovigilance and safety
  - Eligibility criteria design
- Future Work [5 min]
Part III: 20 min
- QA [20 min]

Organizers

Jian Tang, website
- Brief biograph. Jian Tang is currently an Assistant Professor at Montr'eal Institute for Learning Algorithms (Mila)—a research institute focusing on deep learning and reinforcement learning led by Turing Award Winner Yoshua Bengio—starting from December 2017. His research focuses on graph representation learning, graph neural networks, drug discovery, and knowledge graphs. He is named to the first cohort of Canada CIFAR Artificial Intelligence Chairs (CIFAR AI Research Chair). He was a research fellow in University of Michigan and Carnegie Mellon University. He was a researcher in Microsoft Research Asia for two years. He received the best paper award of ICML’14 and nominated for the best paper of WWW’16. Most of his papers are published in top-tier venues across artificial intelligence, machine learning and data mining conferences (ICML, NeurIPS, ICLR, AAAI, IJCAI, KDD, WWW, and WSDM). He co-organized a tutorial on graph representation learning at KDD 2017 and AAAI 2019, organized a few workshops on graph representation learning at SDM 2019, CIKM 2019, AAAI 2020, and ICML 2020. He published quite a few representative work on graph representation learning (including LINE, LargeVis, RotatE) and recently has been very actively working on graph representation learning for drug discovery.
- Relevant reviewing experience. Jian Tang has served as a reviewer at the major conferences of machine learning, data mining, and natural language processing communities including NIPS, ICML, ICLR, AAAI, IJCAI, ACL, EMNLP, KDD, WWW, and WSDM.
Fei Wang, website
- Brief biograph. Fei Wang is currently an Associate Professor of Health Informatics in Department of Population Health Sciences, Weill Cornell Medicine, Cornell University. His major research interest is data mining and its applications in health data science. He has published more than 250 papers in AI and medicine, which have received more than 12.7K citations Google Scholar. His H-index is 56. His papers have won 8 best paper awards at top international conferences on data mining and medical informatics. His team won the championship of the NIPS/Kaggle Challenge on Classification of Clinically Actionable Genetic Mutations in 2017 and Parkinson’s Progression Markers’ Initiative data challenge organized by Michael J. Fox Foundation in 2016. Dr. Wang is the recipient of the NSF CAREER Award in 2018, the inaugural research leadership award in IEEE International Conference on Health Informatics (ICHI) 2019. Dr. Wang is the chair of the Knowledge Discovery and Data Mining working group in American Medical Informatics Association (AMIA).
- Relevant reviewing experience. Fei Wang has served as senior program committee member/area chair of conferences including AAAI, IJCAI, KDD, CIKM, ICDM, SDM. He also reviews major medical and interdisciplinary journals including Nature Medicine, Nature Communications, Annals of Internal Medicine, etc.
Feixiong Cheng, website
- Brief biograph. Feixiong Cheng, PhD, is a principal investigator with Cleveland Clinic’s Genomic Medicine Institute. Dr. Cheng is working to develop computational and experimental network medicine technologies for advancing the characterization of disease heterogeneity, thereby approaching the goal of coordinated, patient-centered strategies to innovative diagnostics and therapeutics development. The primary goal of Dr. Cheng’s lab is to combine tools from genomics, network medicine, bioinformatics, computational biology, chemical biology, and experimental pharmacology and systems biology assays (e.g., single cell sequencing and iPS-derived cardiomyocytes), to address the challenging questions toward understanding of various human complex diseases (e.g., cardio-oncology, pulmonary vascular diseases, and Alzheimer’s disease), which could have a major impact in identifying novel real-world data-driven diagnostic biomarkers and therapeutic targets for precision medicine. From 2013 to 2017, Dr. Cheng was trained as Postdoctoral Research Fellow in the field of pharmacogenomics and network medicine across Vanderbilt University Medical Center, Northeastern University, and Dana-Farber Cancer Institute. During 2017-2018, Dr. Cheng was promoted to Research Assistant Professor working with two of the world’s leading experts in the field of network medicine, Drs. Albert-Laszlo Barabasi and Joseph Loscalzo, with dual appointment at Northeastern University and Harvard Medical School. Dr. Cheng has received several awards, including NIH Pathway to Independence Award (K99/R00), SCI highly cited papers reward, and Vanderbilt Postdoc of the Year Honorable mention.

References

Sun, F.-Y., Hoffmann, J., Verma, V., & Tang, J. (2020). Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization. ICLR.
Xu, M., Wang, W., Luo, S., Shi, C., Bengio, Y., Gomez-Bombarelli, R., & Tang, J. (2021). An End-to-End Framework for Molecular Conformation Generation via Bilevel Programming. ArXiv Preprint ArXiv:2105.07246.
Shi, C., Luo, S., Xu, M., & Tang, J. (2021). Learning gradient fields for molecular conformation generation. ArXiv Preprint ArXiv:2105.03902.
Xu, M., Luo, S., Bengio, Y., Peng, J., & Tang, J. (2021). Learning neural generative dynamics for molecular conformation generation. ArXiv Preprint ArXiv:2102.10240.
Shi, C., Xu, M., Zhu, Z., Zhang, W., Zhang, M., & Tang, J. (2020). GraphAF: a flow-based autoregressive model for molecular graph generation. ICLR.
Shi, C., Xu, M., Guo, H., Zhang, M., & Tang, J. (2020). A Graph to Graphs Framework for Retrosynthesis Prediction. ICML.
Gottipati, S. K., Sattarov, B., Niu, S., Pathak, Y., Wei, H., Liu, S., Thomas, K. M. J., Blackburn, S., Coley, C. W., Tang, J., & others. (2020). Learning To Navigate The Synthetically Accessible Chemical Space Using Reinforcement Learning. ICML.
Jin, W., Barzilay, R., & Jaakkola, T. (2018). Junction tree variational autoencoder for molecular graph generation. ICML.
You, J., Liu, B., Ying, Z., Pande, V., & Leskovec, J. (2018). Graph convolutional policy network for goal-directed molecular graph generation. Advances in Neural Information Processing Systems, 6410–6421.
Zang, C., & Wang, F. (2020). MoFlow: An Invertible Flow Model for Generating Molecular Graphs. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 617–626.
Sun, M., Zhao, S., Gilvary, C., Elemento, O., Zhou, J., & Wang, F. (2020). Graph convolutional networks for computational drug development and discovery. Briefings in Bioinformatics, 21(3), 919–935.
Hu, W., Liu, B., Gomes, J., Zitnik, M., Liang, P., Pande, V., & Leskovec, J. (2019). Strategies for Pre-training Graph Neural Networks. ArXiv Preprint ArXiv:1905.12265.
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., & Dahl, G. E. (2017). Neural message passing for quantum chemistry. ArXiv Preprint ArXiv:1704.01212.
Kipf, T. N., & Welling, M. (2016). Semi-supervised classification with graph convolutional networks. ArXiv Preprint ArXiv:1609.02907.
Xu, K., Hu, W., Leskovec, J., & Jegelka, S. (2018). How powerful are graph neural networks? ArXiv Preprint ArXiv:1810.00826.
Jin, W., Coley, C., Barzilay, R., & Jaakkola, T. (2017). Predicting organic reaction outcomes with weisfeiler-lehman network. Advances in Neural Information Processing Systems, 2607–2616.
Schwaller, P., Laino, T., Gaudin, T., Bolgar, P., Hunter, C. A., Bekas, C., & Lee, A. A. (2019). Molecular transformer: A model for uncertainty-calibrated chemical reaction prediction. ACS Central Science, 5(9), 1572–1583.
Sacha, M., Błaż, M., Byrski, P., Włodarczyk-Pruszyński, P., & Jastrzębski, S. (2020). Molecule Edit Graph Attention Network: Modeling Chemical Reactions as Sequences of Graph Edits. ArXiv Preprint ArXiv:2006.15426.
Dai, H., Li, C., Coley, C., Dai, B., & Song, L. (2019). Retrosynthesis prediction with conditional graph logic network. Advances in Neural Information Processing Systems, 8872–8882.
Zhou, Y., Hou, Y., Shen, J., Huang, Y., Martin, W., & Cheng, F. (2020). Network-based drug repurposing for novel coronavirus 2019-nCoV/SARS-CoV-2. Cell Discovery, 6(1), 1–18.
Zeng, X., Zhu, S., Liu, X., Zhou, Y., Nussinov, R., & Cheng, F. (2019). deepDR: a network-based deep learning approach to in silico drug repositioning. Bioinformatics, 35(24), 5191–5198.
Zhou, Y., Wang, F., Jian, T., R., N., & Cheng, F. (2020). Artificial Intelligence in Drug Repurposing. The Lancet Digital Health.
Chen, H., Cheng, F., & Li, J. (2020). iDrug: Integration of drug repositioning and drug-target prediction via cross-network embedding. PLoS Computational Biology, 16(7), e1008040.
Cheng, F., Kovács, I. A., & Barabási, A.-L. (2019). Network-based prediction of drug combinations. Nature Communications, 10(1), 1–11.
Cheng, F., Desai, R. J., Handy, D. E., Wang, R., Schneeweiss, S., Barabási, A.-L., & Loscalzo, J. (2018). Network-based approach to prediction and population-based validation of in silico drug repurposing. Nature Communications, 9(1), 1–12.
Gysi, D. M., Valle, Í. D., Zitnik, M., Ameli, A., Gan, X., Varol, O., Sanchez, H., Baron, R. M., Ghiassian, D., Loscalzo, J., & others. (2020). Network medicine framework for identifying drug repurposing opportunities for covid-19. ArXiv Preprint ArXiv:2004.07229.
Zhou, Y., Hou, Y., Shen, J., Kallianpur, A., Zein, J., Culver, D. A., Farha, S., Comhair, S., Fiocchi, C., Gack, M. U., & others. (2020). A Network Medicine Approach to Investigation and Population-based Validation of Disease Manifestations and Drug Repurposing for COVID-19. ChemRxiv.

KDD 2021 TutorialArtificial Intelligence for Drug Discovery