Transformers, transformers, transformers

Estimated Reading Time: 5 minutes

May 13, 2022

TL;DR

Today’s issue reviews a few advances in large transformer models this week from DeepMind and Google Research. Large transformers are now able to learn across a very broad range of tasks (visual, NLP, robotic control, arcade games and more) using one joint training regimen. In addition, advances in pre training strategies may allow for considerable additional performance improvements. The combination of these advances raises the possibility that large transformers could prove a durable substrate for intelligent agents.

Gato: A Generalist Agent?

DeepMind has released a new paper on Gato, a multi-modal multi-task transformer. This system trains one agent to execute a broad range of tasks by exploiting the ability of Transformer architectures to model text, images, and actions.

https://www.deepmind.com/publications/a-generalist-agent

DeepMind plots the performance of Gato versus expert performance on a broad range of tasks. There’s a fairly steep drop where many tasks don’t exceed 60% of expert performance, suggesting there’s a lot of room for improvement. However, DeepMind’s analysis of scaling behavior suggests that Gato consistently improves with additional parameters. The current model has only 1.3 billion parameters, so a larger model could potentially do much better.

Unifying Language Learning Paradigms

Google research has a new intriguing preprint out proposing a unified pretraining objective for language models that combines multiple previous objectives found in the literature. The result model sets state of the art (SOTA) on an astounding 50 NLP tasks! Much of the theory of transformer systems is still quite obscure, which means that clever tricks can yield unexpectedly impactful results.

Is AGI Imminent?

Gato has spurred discussion of whether AGI is imminent (see hacker news). While fears of runaway intelligence may be premature, it is striking how scaled-up transformers are capable of solving tasks across a wide variety of domains. Combined with advances in hardware and new pre training regimes, we may soon see large transformer architectures capable of controlling standalone robotic agents.

Job Board

These postings are for companies we personally find interesting (we are not currently accepting paid ads for posting). If you would like your company’s jobs to be featured, send us a brief blurb for us to review.

No listings for this week but your company could be featured here!

Feedback and Comments

Please feel free to email me directly (bharath@deepforestsci.com) with your feedback and comments!

About

Deep Into the Forest is a newsletter by Deep Forest Sciences, Inc. We’re a deep tech R&D company building Chiron, an AI-powered scientific discovery engine. Deep Forest Sciences leads the development of the open source DeepChem ecosystem. Partner with us to apply our foundational AI technologies to hard real-world problems. Get in touch with us at partnerships@deepforestsci.com!

Credits

Author: Bharath Ramsundar, Ph.D.

Editor: Sandya Subramanian, Ph.D.

Deep into the Forest

Discussion about this post