AI Webinar • May 26th, 2021 • 7:00 PM CET

Efficient Transformers

Efficient Transformers

Fireside chat with Transformer inventor Lukasz Kaiser

Transformer models have been used in a variety of fields and yield great results on many NLP tasks. But between the BERT, GPT-3, and many other variants, they can be inefficient and it can be hard to apply them.

Pi Campus CEO Marco Trombetti will interview Transformer inventor Lukasz Kaiser on a new efficient variant of Transformer. Lukasz will take us through the main methods needed for efficiency and show how they address the main problems that limited the use of some Transformers before. Finally, they’ll talk about future applications of these techniques.
Lukasz Kaiser
Staff Research Scientist at Google and CNRS

OUR GUEST

About Lukasz Kaiser

Lukasz is a deep learning scientist working on sequence-to-neural network sequence models as part of the Google Brain team. He is the co-author of Transformer, one of the most important neural network architectures of today.

He has also been one of the main developers behind Trax, part of the TensorFlow deep learning platform, which was dedicated to accelerating ML research by providing recognized implementations of many algorithms, along with datasets to easily reproduce published results.

He demonstrated that neural networks can learn complex discrete algorithms, such as long multiplication, just from examples. He also worked on natural language processing and constructed state-of-the-art NLP systems for many tasks including translation and summarization

6 Takeaways You'll Get
From Attending This Event

Making Transformers memory efficient without sacrificing accuracy
Fine-tuning state-of-the-art models without datacenter-scale hardware resources
Adapting Transformers to run over 1 million of tokens on a single GPU or TPU device
Replacing dot-product attention by one that uses locality-sensitive hashing
Using reversible residual layers to store activations only once in the training process​
Applying Transformers efficiency techniques to new use case scenarios