AAAI 2021 Tutorial
Artificial Intelligence for Drug Discovery

Jian Tang

Assistant Professor
Mila - Quebec AI Institute

Abstract

Drug discovery is a long and costly process, taking on average 10 years and 2.5 billion dollars to develop a new drug. Artificial intelligence has the potential to significantly accelerate the process of drug discovery by analyzing a large amount of data generated in the biomedical domain such as bioassays, chemical experiments, and biomedical literature. Recently, there is a growing interesting in developing AI techniques for drug discovery in many different communities including machine learning, data mining, and biomedical community. In this tutorial, we will provide a detailed introduction to key problems in drug discovery such as molecular property prediction, de novo molecular design and molecular optimization, retrosynthesis reaction and prediction, and drug repurposing and combination, and also key technique advancements with artificial intelligence for these problems. This tutorial can be served as introduction materials for both computer scientist interested in drug discovery as well as drug discovery practitioners for learning the latest AI techniques along this direction.

Schedule

8:30 am – 11:45 am Pacific Time, Wednesday, February 3, 2021

Slides

The slides can be found here.

Outline

The proposed tutorial would be 3.5 hours in length. An outline of the tutorial (including an approximate breakdown by presenter) is as follows:

Drug Discovery Overview [20 min, presented by Feixiong]
Deep Learning, Traditional Network-based Methods, Graph Representation Learning [15 min, presented by Fei]
- Convolutional Neural Networks and Recurrent Neural Networks
- Graph Convolutional Networks [GCN (Kipf & Welling, 2016), MPNN (Gilmer et al., 2017), GIN (Xu et al., 2018)]
Molecule Properties Prediction [30 min, presented by Jian]
- Supervised [MPNN (Gilmer et al., 2017)]
- Self-supervised [ContextPred (Hu et al., 2019), InfoGraph (Sun et al., 2020)]
- Semi-supervised [InfoGraph (Sun et al., 2020)]
De novo Molecule Generation and Optimization [40 min, presented by Jian and Fei]
- Variational autoencoder-based approach [JTVAE (Jin et al., 2018)]
- Autoregressive methods [GCPN (You et al., 2018)]
- Normalizing Flow-based approaches [GraphAF (Shi et al., 2020), MoFlow (Zang & Wang, 2020)]
Reaction Prediction and Retrosynthesis [30 min, presented by Jian]
- Reaction prediction [(Jin et al., 2017), (Schwaller et al., 2019), (Sacha et al., 2020)]
- Retrosynthesis [(Dai et al., 2019), (Shi et al., 2020), (Sacha et al., 2020)]
Multiomics and Clinical Data-based Drug Repurposing [45 min, presented by Feixiong and Fei]
- Network-based approach [(Cheng et al., 2018)]
- Graph Neural Network-based approach [(Gysi et al., 2020) ]
- Case Study on COVID-19 [(Zhou et al., 2020), (Gysi et al., 2020)]
Other Topics [15 min, presented by Fei]
Conclusion and Future Directions [15 min, presented by Fei]

Organizers

Jian Tang, website
- Brief biograph. Jian Tang is currently an Assistant Professor at Montr'eal Institute for Learning Algorithms (Mila)—a research institute focusing on deep learning and reinforcement learning led by Turing Award Winner Yoshua Bengio—starting from December 2017. His research focuses on graph representation learning, graph neural networks, drug discovery, and knowledge graphs. He is named to the first cohort of Canada CIFAR Artificial Intelligence Chairs (CIFAR AI Research Chair). He was a research fellow in University of Michigan and Carnegie Mellon University. He was a researcher in Microsoft Research Asia for two years. He received the best paper award of ICML’14 and nominated for the best paper of WWW’16. Most of his papers are published in top-tier venues across artificial intelligence, machine learning and data mining conferences (ICML, NeurIPS, ICLR, AAAI, IJCAI, KDD, WWW, and WSDM). He co-organized a tutorial on graph representation learning at KDD 2017 and AAAI 2019, organized a few workshops on graph representation learning at SDM 2019, CIKM 2019, AAAI 2020, and ICML 2020. He published quite a few representative work on graph representation learning (including LINE, LargeVis, RotatE) and recently has been very actively working on graph representation learning for drug discovery.
- Relevant reviewing experience. Jian Tang has served as a reviewer at the major conferences of machine learning, data mining, and natural language processing communities including NIPS, ICML, ICLR, AAAI, IJCAI, ACL, EMNLP, KDD, WWW, and WSDM.
Fei Wang, website
- Brief biograph. Fei Wang is currently an Associate Professor of Health Informatics in Department of Population Health Sciences, Weill Cornell Medicine, Cornell University. His major research interest is data mining and its applications in health data science. He has published more than 250 papers in AI and medicine, which have received more than 12.7K citations Google Scholar. His H-index is 56. His papers have won 8 best paper awards at top international conferences on data mining and medical informatics. His team won the championship of the NIPS/Kaggle Challenge on Classification of Clinically Actionable Genetic Mutations in 2017 and Parkinson’s Progression Markers’ Initiative data challenge organized by Michael J. Fox Foundation in 2016. Dr. Wang is the recipient of the NSF CAREER Award in 2018, the inaugural research leadership award in IEEE International Conference on Health Informatics (ICHI) 2019. Dr. Wang is the chair of the Knowledge Discovery and Data Mining working group in American Medical Informatics Association (AMIA).
- Relevant reviewing experience. Fei Wang has served as senior program committee member/area chair of conferences including AAAI, IJCAI, KDD, CIKM, ICDM, SDM. He also reviews major medical and interdisciplinary journals including Nature Medicine, Nature Communications, Annals of Internal Medicine, etc.
Feixiong Cheng, website
- Brief biograph. Feixiong Cheng, PhD, is a principal investigator with Cleveland Clinic’s Genomic Medicine Institute. Dr. Cheng is working to develop computational and experimental network medicine technologies for advancing the characterization of disease heterogeneity, thereby approaching the goal of coordinated, patient-centered strategies to innovative diagnostics and therapeutics development. The primary goal of Dr. Cheng’s lab is to combine tools from genomics, network medicine, bioinformatics, computational biology, chemical biology, and experimental pharmacology and systems biology assays (e.g., single cell sequencing and iPS-derived cardiomyocytes), to address the challenging questions toward understanding of various human complex diseases (e.g., cardio-oncology, pulmonary vascular diseases, and Alzheimer’s disease), which could have a major impact in identifying novel real-world data-driven diagnostic biomarkers and therapeutic targets for precision medicine. From 2013 to 2017, Dr. Cheng was trained as Postdoctoral Research Fellow in the field of pharmacogenomics and network medicine across Vanderbilt University Medical Center, Northeastern University, and Dana-Farber Cancer Institute. During 2017-2018, Dr. Cheng was promoted to Research Assistant Professor working with two of the world’s leading experts in the field of network medicine, Drs. Albert-Laszlo Barabasi and Joseph Loscalzo, with dual appointment at Northeastern University and Harvard Medical School. Dr. Cheng has received several awards, including NIH Pathway to Independence Award (K99/R00), SCI highly cited papers reward, and Vanderbilt Postdoc of the Year Honorable mention.

References

Sun, F.-Y., Hoffmann, J., Verma, V., & Tang, J. (2020). Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization. ICLR.
Shi, C., Xu, M., Zhu, Z., Zhang, W., Zhang, M., & Tang, J. (2020). GraphAF: a flow-based autoregressive model for molecular graph generation. ICLR.
Shi, C., Xu, M., Guo, H., Zhang, M., & Tang, J. (2020). A Graph to Graphs Framework for Retrosynthesis Prediction. ICML.
Gottipati, S. K., Sattarov, B., Niu, S., Pathak, Y., Wei, H., Liu, S., Thomas, K. M. J., Blackburn, S., Coley, C. W., Tang, J., & others. (2020). Learning To Navigate The Synthetically Accessible Chemical Space Using Reinforcement Learning. ICML.
Jin, W., Barzilay, R., & Jaakkola, T. (2018). Junction tree variational autoencoder for molecular graph generation. ICML.
You, J., Liu, B., Ying, Z., Pande, V., & Leskovec, J. (2018). Graph convolutional policy network for goal-directed molecular graph generation. Advances in Neural Information Processing Systems, 6410–6421.
Zang, C., & Wang, F. (2020). MoFlow: An Invertible Flow Model for Generating Molecular Graphs. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 617–626.
Sun, M., Zhao, S., Gilvary, C., Elemento, O., Zhou, J., & Wang, F. (2020). Graph convolutional networks for computational drug development and discovery. Briefings in Bioinformatics, 21(3), 919–935.
Hu, W., Liu, B., Gomes, J., Zitnik, M., Liang, P., Pande, V., & Leskovec, J. (2019). Strategies for Pre-training Graph Neural Networks. ArXiv Preprint ArXiv:1905.12265.
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., & Dahl, G. E. (2017). Neural message passing for quantum chemistry. ArXiv Preprint ArXiv:1704.01212.
Kipf, T. N., & Welling, M. (2016). Semi-supervised classification with graph convolutional networks. ArXiv Preprint ArXiv:1609.02907.
Xu, K., Hu, W., Leskovec, J., & Jegelka, S. (2018). How powerful are graph neural networks? ArXiv Preprint ArXiv:1810.00826.
Jin, W., Coley, C., Barzilay, R., & Jaakkola, T. (2017). Predicting organic reaction outcomes with weisfeiler-lehman network. Advances in Neural Information Processing Systems, 2607–2616.
Schwaller, P., Laino, T., Gaudin, T., Bolgar, P., Hunter, C. A., Bekas, C., & Lee, A. A. (2019). Molecular transformer: A model for uncertainty-calibrated chemical reaction prediction. ACS Central Science, 5(9), 1572–1583.
Sacha, M., Błaż, M., Byrski, P., Włodarczyk-Pruszyński, P., & Jastrzębski, S. (2020). Molecule Edit Graph Attention Network: Modeling Chemical Reactions as Sequences of Graph Edits. ArXiv Preprint ArXiv:2006.15426.
Dai, H., Li, C., Coley, C., Dai, B., & Song, L. (2019). Retrosynthesis prediction with conditional graph logic network. Advances in Neural Information Processing Systems, 8872–8882.
Zhou, Y., Hou, Y., Shen, J., Huang, Y., Martin, W., & Cheng, F. (2020). Network-based drug repurposing for novel coronavirus 2019-nCoV/SARS-CoV-2. Cell Discovery, 6(1), 1–18.
Zeng, X., Zhu, S., Liu, X., Zhou, Y., Nussinov, R., & Cheng, F. (2019). deepDR: a network-based deep learning approach to in silico drug repositioning. Bioinformatics, 35(24), 5191–5198.
Zhou, Y., Wang, F., Jian, T., R., N., & Cheng, F. (2020). Artificial Intelligence in Drug Repurposing. The Lancet Digital Health.
Chen, H., Cheng, F., & Li, J. (2020). iDrug: Integration of drug repositioning and drug-target prediction via cross-network embedding. PLoS Computational Biology, 16(7), e1008040.
Cheng, F., Kovács, I. A., & Barabási, A.-L. (2019). Network-based prediction of drug combinations. Nature Communications, 10(1), 1–11.
Cheng, F., Desai, R. J., Handy, D. E., Wang, R., Schneeweiss, S., Barabási, A.-L., & Loscalzo, J. (2018). Network-based approach to prediction and population-based validation of in silico drug repurposing. Nature Communications, 9(1), 1–12.
Gysi, D. M., Valle, Í. D., Zitnik, M., Ameli, A., Gan, X., Varol, O., Sanchez, H., Baron, R. M., Ghiassian, D., Loscalzo, J., & others. (2020). Network medicine framework for identifying drug repurposing opportunities for covid-19. ArXiv Preprint ArXiv:2004.07229.
Zhou, Y., Hou, Y., Shen, J., Kallianpur, A., Zein, J., Culver, D. A., Farha, S., Comhair, S., Fiocchi, C., Gack, M. U., & others. (2020). A Network Medicine Approach to Investigation and Population-based Validation of Disease Manifestations and Drug Repurposing for COVID-19. ChemRxiv.

AAAI 2021 TutorialArtificial Intelligence for Drug Discovery