TL;DR
This week, we cover a blog post discussing limitations of LLMs and discuss a recent article that introduces a deep learning framework that translates mass spec data into molecular structures.
Limitations of LLMs for Planning
This recent article provides a fascinating discussion on the limitations of LLMs for reasoning and planning. As a short summary, LLMs are not yet capable of planning effectively. Methods like Chain-of-thought depend on human-in-the-loop assistance, which often implicitly leverages the fact that the human prompter already knows the answer. As I mentioned in last week’s post, we need more critical studies that probe the limitations of LLMs.
Translating Mass Spectra to De-novo Molecules Using a Deep Learning Framework
This recent article proposes Spec2Mol, a new deep learning framework to predict molecular structure of novel chemical compounds by translating MS/MS spectra, without relying solely on database retrieval. Spec2Mol is based on an encoder-decoder model inspired by Speech2Text. The decoder is trained as an auto-encoder to reconstruct molecules from the SMILES sequence, while the encoder is subsequently trained to match the spectra embeddings with those learned by the auto-encoder.
In traditional approaches for structure elucidation, query spectra are typically matched with spectra available in existing databases. However, such approaches can fail to identify novel molecules with unknown structures that are not yet present in databases. Spec2Mol overcomes this limitation by leveraging unsupervised pre-training on a large dataset of unlabeled molecules (135 million molecules sourced from PubChem and Zinc). This design allows Spec2Mol to make de-novo molecular structure recommendations. The authors claim that it accurately identifies key substructures like rings, long chains, and rare atoms in molecular structures based on spectral features but, challenges remain in predicting structures with large rings and poor quality spectra.
The authors of the paper anticipate that Spec2Mol could prove useful for metabolomic studies where there are many unidentified chemical species whose spectra are not yet in databases. The use of unsupervised representations of molecules to enable working with unknown structures is clever and will find broader impact in the field. However, Spec2Mol is trained on a commercial dataset of MS data, so uptake may be limited for now to institutions that can procure a license. The authors have released a github repo with a PyTorch implementation of their method.
Interesting Links from Around the Web
https://spectrum.ieee.org/explosive-robot-insect: A fascinating jumping insect-like robot powered by combustion
https://www.quantamagazine.org/machine-learning-aids-classical-modeling-of-quantum-systems-20230914/: A nice overview of recent methods applying machine learning to model quantum systems.
https://www.quantamagazine.org/physicists-observe-unobservable-quantum-phase-transition-20230911/: Phase transitions in entanglement states. One of the most interesting popular science articles I’ve read recently.
https://apnews.com/article/pig-kidney-transplant-xenotransplant-83dfb5e6d022ca72039a821cc6bc00ef: A pig kidney worked for 2 months in a donated body from a deceased patient. We are slowly moving towards practical xenotransplants.
Feedback and Comments
Please feel free to email me directly (bharath@deepforestsci.com) with your feedback and comments!
About
Deep Into the Forest is a newsletter by Deep Forest Sciences, Inc. We’re a deep tech R&D company building Chiron, an AI-powered scientific discovery engine for the biotech/pharma industries. Deep Forest Sciences leads the development of the open source DeepChem ecosystem. Partner with us to apply our foundational AI technologies to hard real-world problems in drug discovery. Get in touch with us at partnerships@deepforestsci.com!
Credits
Author: Bharath Ramsundar, Ph.D.
Editor: Sandya Subramanian, Ph.D.
Research and Writing: Rida Irfan