TL;DR
This week, we cover an older paper that provides a detailed guide on how to effectively run a large scale docking screen.
A Comprehensive Guide to Large Scale Docking in Drug Discovery
In drug discovery, virtual screening of large chemical libraries has become essential to explore a broader chemical landscape beyond the limitations of high-throughput screening (HTS). Today we discuss an article by the Shoichet group at UCSF that provides a comprehensive guide to running a large scale docking screen. Docking is a technique that leverages known structural information about a protein target to estimate the preferred orientation of a potential drug binding to a protein in the body. While docking can be a powerful tool, it is notoriously difficult to get predictive results from docking projects due in part to the difficulty of correcting for all the potential error modes.
This paper attempts to improve the standard state of affairs by discussing protocols and controls applicable across various docking programs, techniques to prepare protein sites, benchmark performance, and confirm screening results with experimental assays. Selecting suitable target sites on the protein of interest is crucial, with high-resolution ligand-bound structures preferred for virtual screening. Such structures define the binding pocket better than ligand-free ones. Modifications to these structures are often necessary for effective docking, including reverting mutations to the wild type, adding missing side chains and loops near the binding site, considering water molecules within the binding pocket, removing buffer components, and ensuring proper hydrogen atom placement.
The article is from a few years ago, so it recommends homology modeling, where structural models of a target protein are generated based on templates with high sequence identity, in the case where a structure is not available. While homology modeling may still prove useful in some cases, for the most part such techniques have been superseded by deep-learning based protein structure prediction such as AlphaFold2.
Control calculations are needed to evaluate the docking model’s accuracy and ability to distinguish known active compounds from known inactive ones. A common strategy is to test decoy molecules that match properties of known active molecules such as molecular weight and check whether the docking algorithm can suitably differentiate between the active and the decoy.
The prospective screening itself entails running the docking algorithm against a large compound library available from a vendor, ranking the molecules according to their docking scores. Hit-picking follows, narrowing down the list by applying filters to select promising candidates. For example, molecules with strained conformations should likely be removed as aphysical.
Experimental testing will be needed to confirm the activity and binding of hits to the target protein. There is considerable art in properly verifying an experimental hit (for example, known PAINS compounds or aggregators should likely be screened out as the figure below indicates). Another potential failure mode is that the wrong compound may have been synthesized by a vendor, necessitating verification of compound identity by mass spec or NMR.
The article discusses in extensive detail the full docking process using DOCK3.7 and ZINC20. Adapting this procedure to other codes will likely take some work since many of the details provided are DOCK specific.
Interesting Links from Around the Web
https://www.quantamagazine.org/the-quest-to-quantify-quantumness-20231019/: A fascinating discussion of different metrics that help clarify the unique resources explored by quantum algorithms.
https://www.nature.com/articles/d41586-023-03267-0, https://www.science.org/doi/10.1126/science.adh1174: IBM’s recent NorthPole chip can dramatically lower inference costs for serving models.
https://www.nature.com/articles/s41586-023-06602-7: An early prototype of an on-chip particle accelerator could lead to practical advances in both medicine and fundamental science.
https://www.extremetech.com/computing/tsmc-chief-our-3nm-node-will-beat-intel-18a: TSMC expresses confidence it will remain ahead of Intel for the coming few years. I am a fan of the rivalry and hope it spurs better chips for users.
Feedback and Comments
Please feel free to email me directly (bharath@deepforestsci.com) with your feedback and comments!
About
Deep Into the Forest is a newsletter by Deep Forest Sciences, Inc. We’re a deep tech R&D company building Chiron, an AI-powered scientific discovery engine for the biotech/pharma industries. Deep Forest Sciences leads the development of the open source DeepChem ecosystem. Partner with us to apply our foundational AI technologies to hard real-world problems in drug discovery. Get in touch with us at partnerships@deepforestsci.com!
Credits
Author: Bharath Ramsundar, Ph.D.
Editor: Sandya Subramanian, Ph.D.
Research and Writing: Rida Irfan