21
March
1549397206672.jpeg

Resurrecting Recurrent Neural Networks for Language Modelling

-
Add to Calendar Resurrecting Recurrent Neural Networks for Language ModellingThe Buttery
Location
The Buttery
Speakers
Dr. Razvan Pascanu (Google DeepMind)
Booking Required
Not Required
Accessibility
There is provision for wheelchair users.
Bio:

I'm currently a Research Scientist at DeepMind. I grew up in Romania and studied computer science and electrical engineering for my undergrads in Germany. I got my MSc from Jacobs University, Bremen in 2009. I hold a PhD from University of Montreal (2014), which I did under the supervision of prof. Yoshua Bengio. I was involved in developing Theano and helped writing some of the deep learning tutorials for Theano. I've published several papers on topics surrounding deep learning and deep reinforcement learning (see my scholar page). I'm one of the organizers of EEML (www.eeml.eu) and part of the organizers of AIRomania. As part of the AIRomania community, I have organized RomanianAIDays since 2020, and helped build a course on AI aimed at high school students.



Abstract:

In this talk I will focus on State Space Models (SSMs) , a subclass of Recurrent Neural Networks (RNNs) that has recently gained some attention through works like Mamba, obtaining strong performance against transformer baselines. I will start by first explaining how SSMs can be viewed as just a particular parametrization of RNNs and what are the crucial differences compared to previous recurrent architectures that led to these results. My goal is to demystify the relative complex parametrization of the architecture and identify what elements are needed for the model to perform well. In this process I will introduce the Linear Recurrent Unit (LRU), a simplified linear layer inspired by existing SSM layers. In the second part of the talk, I will focus on language modelling and the block structure in which such layers tend to be embedded. I will argue that beyond the recurrent layer itself, the block structure borrowed from transformers plays a crucial role in the recent successes of this architecture, and present results at scale of well performing hybrid recurrent architectures as compared to strong transformer baseline. I will close the talk with a few open questions and thoughts on the importance of recurrence in modern deep learning models.