Molecular Generators and Optimizers Failure Modes

Authors

  • Mani Manavalan Capgemini America

DOI:

https://doi.org/10.18034/mjmbr.v8i2.583

Keywords:

Molecular Generators, Molecular design, Optimizers failure modes, Generative models, Distribution learning, Goal-directed generation

Abstract

In recent years, there has been an uptick in interest in generative models for molecules in drug development. In the field of de novo molecular design, these models are used to make molecules with desired properties from scratch. This is occasionally used instead of virtual screening, which is limited by the size of the libraries that can be searched in practice. Rather than screening existing libraries, generative models can be used to build custom libraries from scratch. Using generative models, which may optimize molecules straight towards the desired profile, this time-consuming approach can be sped up. The purpose of this work is to show how current shortcomings in evaluating generative models for molecules can be avoided. We cover both distribution-learning and goal-directed generation with a focus on the latter. Three well-known targets were downloaded from ChEMBL: Janus kinase 2 (JAK2), epidermal growth factor receptor (EGFR), and dopamine receptor D2 (DRD2) (Bento et al. 2014). We preprocessed the data to get binary classification jobs. Before calculating a scoring function, the data is split into two halves, which we shall refer to as split 1/2. The ratio of active to inactive users. Our goal is to train three bioactivity models with equal prediction performance, one to be used as a scoring function for chemical optimization and the other two to be used as performance evaluation models. Our findings suggest that distribution-learning can attain near-perfect scores on many existing criteria even with the most basic and completely useless models. According to benchmark studies, likelihood-based models account for many of the best technologies, and we propose that test set likelihoods be included in future comparisons.

Downloads

Download data is not yet available.

Author Biography

  • Mani Manavalan, Capgemini America

    Sr. Manager, Capgemini America, 79 5th Avenue, Suite 300, New York, NY 10003, USA

References

Bento, A. P., Gaulton, A., Hersey, A., Bellis, L. J., Chambers, J., Davies, M., Kruger, F. A., Light, Y., Mak, L., McGlinchey, S., et al. (Jan. 1, 2014). The ChEMBL Bioactivity Database: An Update. In: Nucleic Acids Res 42.D1, pp. D1083–D1090.

Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S., and Hopkins, A. L. (Jan. 24, 2012). Quantifying the Chemical Beauty of Drugs. In: Nat Chem 4.2, pp. 90– 98.

Breiman, L. (Oct. 1, 2001). Random Forests. In: Mach Learn 45.1, pp. 5–32.

Brown, N., Fiscato, M., Segler, M. H., and Vaucher, A. C. (Mar. 25, 2019). GuacaMol: Benchmarking Models for de Novo Molecular Design. In: J Chem Inf Model 59.3, pp. 1096–1108.

Bynagari, N. B. (2016). Industrial Application of Internet of Things. Asia Pacific Journal of Energy and Environment, 3(2), 75-82. https://doi.org/10.18034/apjee.v3i2.576

Bynagari, N. B. (2017). Prediction of Human Population Responses to Toxic Compounds by a Collaborative Competition. Asian Journal of Humanity, Art and Literature, 4(2), 147-156. https://doi.org/10.18034/ajhal.v4i2.577

Bynagari, N. B. (2018). On the ChEMBL Platform, a Large-scale Evaluation of Machine Learning Algorithms for Drug Target Prediction. Asian Journal of Applied Science and Engineering, 7, 53–64. Retrieved from https://upright.pub/index.php/ajase/article/view/31

Bynagari, N. B., & Fadziso, T. (2018). Theoretical Approaches of Machine Learning to Schizophrenia. Engineering International, 6(2), 155-168. https://doi.org/10.18034/ei.v6i2.568

Cho, K., Merrienboer, B. van, Bahdanau, D., and Bengio, Y. (Oct. 7, 2014). On the Properties of Neural Machine Translation: Encoder-Decoder Approaches.

Donepudi, P. K. (2014). Technology Growth in Shipping Industry: An Overview. American Journal of Trade and Policy, 1(3), 137-142. https://doi.org/10.18034/ajtp.v1i3.503

Donepudi, P. K. (2015). Crossing Point of Artificial Intelligence in Cybersecurity. American Journal of Trade and Policy, 2(3), 121-128. https://doi.org/10.18034/ajtp.v2i3.493

Donepudi, P. K. (2016). Influence of Cloud Computing in Business: Are They Robust?. Asian Journal of Applied Science and Engineering, 5(3), 193-196. Retrieved from https://journals.abc.us.org/index.php/ajase/article/view/1181

Donepudi, P. K. (2017). Machine Learning and Artificial Intelligence in Banking. Engineering International, 5(2), 83-86. https://doi.org/10.18034/ei.v5i2.490

Donepudi, P. K. (2018). Application of Artificial Intelligence in Automation Industry. Asian Journal of Applied Science and Engineering, 7, 7–20. Retrieved from https://upright.pub/index.php/ajase/article/view/23

Douguet, D., Thoreau, E., and Grassy, G. (July 1, 2000). A Genetic Algorithm for the Automated Generation of Small Organic Molecules: Drug Design Using an Evolutionary Algorithm. In: J Comput-Aided Mol Des 14.5, pp. 449–466.

Elton, D. C., Boukouvalas, Z., Fuge, M. D., and Chung, P. W. (Aug. 5, 2019). Deep Learning for Molecular Design—a Review of the State of the Art. In: Mol Syst Des Eng 4.4, pp. 828–849.

Ertl, P. and Schuffenhauer, A. (June 10, 2009). Estimation of Synthetic Accessibility Score of Drug-like Molecules Based on Molecular Complexity and Fragment Contributions. In: J Cheminformatics 1.1, p. 8.

Fadziso, T., & Manavalan, M. (2017). Identical by Descent (IBD): Investigation of the Genetic Ties between Africans, Denisovans, and Neandertals. Asian Journal of Humanity, Art and Literature, 4(2), 157-170. https://doi.org/10.18034/ajhal.v4i2.582

Gao, W. and Coley, C. W. (Apr. 6, 2020). The Synthesizability of Molecules Proposed by Generative Models. In: J Chem Inf Model.

Gomez-Bombarelli, R., Wei, J. N., Duvenaud, D., Hernandez-Lobato, J. M., S ´ anchez-Lengeling, B., Sheberla, D., Aguilera-Iparraguirre, J., Hirzel, T. D., Adams, R. P., and Aspuru-Guzik, A. (Feb. 28, 2018). Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules. In: ACS Central Sci 4.2, pp. 268–276.

Guimaraes, G. L., Sanchez-Lengeling, B., Outeiral, C., Farias, P. L. C., and Aspuru-Guzik, A. (May 30, 2017). Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models. arXiv: 1705.10843.

Hochreiter, S. and Schmidhuber, J. (Dec. 1, 1997). Long Short-Term Memory. In: Neural comput 9, pp. 1735–80.

Jensen, J. H. (Mar. 20, 2019). A Graph-Based Genetic Algorithm and Generative Model/Monte Carlo Tree Search for the Exploration of Chemical Space. In: Chem Sci 10.12, pp. 3567–3572.

Jin, W., Barzilay, R., and Jaakkola, T. (Feb. 12, 2018). Junction Tree Variational Autoencoder for Molecular Graph Generation. arXiv: 1802 . 04364.

Kadurin, A., Aliper, A., Kazennov, A., Mamoshina, P., Vanhaelen, Q., Khrabrov, K., and Zhavoronkov, A. (Dec. 22, 2016). The Cornucopia of Meaningful Leads: Applying Deep Adversarial Autoencoders for New Molecule Development in Oncology. In: Oncotarget 8.7, pp. 10883– 10890.

Kadurin, A., Nikolenko, S., Khrabrov, K., Aliper, A., and Zhavoronkov, A. (Sept. 5, 2017). druGAN: An Advanced Generative Adversarial Autoencoder Model for de Novo Generation of New Molecules with Desired Molecular Properties in Silico. In: Mol Pharm 14.9, pp. 3098–3104.

Krenn, M., Hase, F., Nigam, A., Friederich, P., and Aspuru-Guzik, A. (Mar. 4, 2020). Self-Referencing Embedded Strings (SELFIES): A 100% Robust Molecular String Representation. arXiv: 1905 . 13741

Kusner, M. J., Paige, B., and Hernandez-Lobato, J. M. (Mar. 6, 2017). Grammar Variational Autoencoder. arXiv: 1703.01925.

Landrum, G. (2006). RDKit: Open-Source Cheminformatics. URL: http://www.rdkit.org.

LeCun, Y., Bengio, Y., and Hinton, G. (May 2015). Deep Learning. In: Nature 521.7553, pp. 436–444.

Lehman, J., Clune, J., Misevic, D., Adami, C., Altenberg, L., Beaulieu, J., Bentley, P. J., Bernard, S., Beslon, G., Bryson, D. M., et al. (Nov. 21, 2019). The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities. arXiv: 1803.03453.

Li, Y., Vinyals, O., Dyer, C., Pascanu, R., and Battaglia, P. (Mar. 8, 2018). Learning Deep Generative Models of Graphs. arXiv: 1803.03324.

Manavalan, M. (2016). Biclustering of Omics Data using Rectified Factor Networks. International Journal of Reciprocal Symmetry and Physical Sciences, 3, 1–10. Retrieved from https://upright.pub/index.php/ijrsps/article/view/40

Manavalan, M. (2018). Do Internals of Neural Networks Make Sense in the Context of Hydrology? . Asian Journal of Applied Science and Engineering, 7, 75–84. Retrieved from https://upright.pub/index.php/ajase/article/view/41

Manavalan, M., & Bynagari, N. B. (2015). A Single Long Short-Term Memory Network can Predict Rainfall-Runoff at Multiple Timescales. International Journal of Reciprocal Symmetry and Physical Sciences, 2, 1–7. Retrieved from https://upright.pub/index.php/ijrsps/article/view/39

Manavalan, M., & Donepudi, P. K. (2016). A Sample-based Criterion for Unsupervised Learning of Complex Models beyond Maximum Likelihood and Density Estimation. ABC Journal of Advanced Research, 5(2), 123-130. https://doi.org/10.18034/abcjar.v5i2.581

Mayr, A., Klambauer, G., Unterthiner, T., Steijaert, M., Wegner, J. K., Ceulemans, H., Clevert, D.-A., and Hochreiter, S. (June 6, 2018). Large-Scale Comparison of Machine Learning Methods for Drug Target Prediction on ChEMBL. In: Chem Sci.

Merk, D., Friedrich, L., Grisoni, F., and Schneider, G. (Jan. 2018). De Novo Design of Bioactive Small Molecules by Artificial Intelligence. In: Mol Inform 37.1-2.

Merk, D., Grisoni, F., Friedrich, L., and Schneider, G. (Oct. 22, 2018). Tuning Artificial Intelligence on the de Novo Design of Natural-Product-Inspired Retinoid X Receptor Modulators. In: Nat Commun Chem 1.1, pp. 1–9

Neogy, T. K., & Bynagari, N. B. (2018). Gradient Descent is a Technique for Learning to Learn. Asian Journal of Humanity, Art and Literature, 5(2), 145-156. https://doi.org/10.18034/ajhal.v5i2.578

Nguyen, A., Yosinski, J., and Clune, J. (Apr. 2, 2015). Deep Neural Networks Are Easily Fooled: High Confidence Predictions for Unrecognizable Images. arXiv: 1412.1897.

O’Boyle, N. and Dalke, A. (Sept. 19, 2018). DeepSMILES: An Adaptation of SMILES for Use in Machine-Learning of Chemical Structures. chemrxiv: 7097960.v1.

Olivecrona, M., Blaschke, T., Engkvist, O., and Chen, H. (Sept. 4, 2017). Molecular De-Novo Design through Deep Reinforcement Learning. In: J Cheminformatics 9.1, p. 48.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al. (2011). Scikit-Learn: Machine Learning in Python. In: J Mach Learn Res 12.85, pp. 2825–2830

Polykovskiy, D., Zhebrak, A., Sanchez-Lengeling, B., Golovanov, S., Tatanov, O., Belyaev, S., Kurbanov, R., Artamonov, A., Aladinskiy, V., Veselov, M., et al. (Nov. 29, 2018). Molecular Sets (MOSES): A Benchmarking Plat form for Molecular Generation Models. arXiv: 1811.12823.

Popova, M., Isayev, O., and Tropsha, A. (July 1, 2018). Deep Reinforcement Learning for de Novo Drug Design. In: Sci Adv 4.7, eaap7885.

Preuer, K., Renz, P., Unterthiner, T., Hochreiter, S., and Klambauer, G. (Sept. 24, 2018). Frechet ChemNet Distance: A Metric for Generative Models for Molecules in Drug Discovery. In: J Chem Inf Model 58.9, pp. 1736–1741.

Rogers, D. and Hahn, M. (May 24, 2010). ExtendedConnectivity Fingerprints. In: J Chem Inf Model 50.5, pp. 742–754.

Sanchez-Lengeling, B. and Aspuru-Guzik, A. (July 27, 2018). Inverse Molecular Design Using Machine Learning: Generative Models for Matter Engineering. In: Science 361.6400, pp. 360–365.

Sanchez-Lengeling, B., Outeiral, C., Guimaraes, G. L., and Aspuru-Guzik, A. (Aug. 17, 2017). Optimizing Distributions over Molecular Space. An Objective Reinforced Generative Adversarial Network for InverseDesign Chemistry (ORGANIC). chemrxiv: 5309668. v3.

Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., and Monfardini, G. (Jan. 2009). The Graph Neural Network Model. In: IEEE Trans Neural Netw 20.1, pp. 61–80.

Schmidhuber, J. (Jan. 1, 2015). Deep Learning in Neural Networks: An Overview. In: Neural Netw 61, pp. 85–117.

Schneider, G. (2013). De Novo Molecular Design. John Wiley & Sons, Ltd.

Segler, M. H. S., Kogej, T., Tyrchan, C., and Waller, M. P. (Jan. 24, 2018). Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks. In: ACS Central Sci 4.1, pp. 120–131

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. (Feb. 19, 2014). Intriguing Properties of Neural Networks. arXiv: 1312.6199.

Venkatasubramanian, V., Chan, K., and Caruthers, J. M. (Sept. 1, 1994). Computer-Aided Molecular Design Using Genetic Algorithms. In: Comput Chem Eng 18.9, pp. 833–844.

Weininger, D. (Feb. 1, 1988). “SMILES, a Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules”. In: J Chem Inf Comput Sci 28.1, pp. 31–36.

You, J., Liu, B., Ying, R., Pande, V., and Leskovec, J. (Feb. 24, 2019). Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation. arXiv: 1806.02473.

Zhang, C., Lyu, X., Huang, Y., Tang, Z., and Liu, Z. (Nov. 18, 2019). Molecular Graph Generation with Deep Reinforced Multitask Network and Adversarial Imitation Learning. In: IEEE Int Conf Bioinform Biomed.

Zhavoronkov, A., Ivanenkov, Y. A., Aliper, A., Veselov, M. S., Aladinskiy, V. A., Aladinskaya, A. V., Terentiev, V. A., Polykovskiy, D. A., Kuznetsov, M. D., Asadulaev, A., et al. (Sept. 2019). Deep Learning Enables Rapid Identification of Potent DDR1 Kinase Inhibitors. In: Nat Biotechnol 37.9, pp. 1038–1040.

Zhou, Z., Kearnes, S., Li, L., Zare, R. N., and Riley, P. (Oct. 19, 2018). Optimization of Molecules via Deep Reinforcement Learning. arXiv: 1810.08678.

--0--

Downloads

Published

2021-09-07

Issue

Section

Peer-reviewed Article

How to Cite

Manavalan, M. (2021). Molecular Generators and Optimizers Failure Modes. Malaysian Journal of Medical and Biological Research, 8(2), 53-62. https://doi.org/10.18034/mjmbr.v8i2.583