Mon, 28 Apr 2025 11:30 AM - Mon, 2 Jun 2025 12:30 PM
Lecture series
This 16 lecture series will explain how language-model systems are built in order to understand and predict their behaviour. Frontier language models are now being used as the foundation for agentic systems, which can carry out tasks that require extended reasoning and long-horizon planning. We investigate the potential safety and security risks associated with such systems, and present current research directions that aim to mitigate them.
The series is designed to be accessible for a broad audience across academia and industry, requiring knowledge from an introductory course in machine learning or statistics (e.g. backpropagation). We emphasise conceptual understanding of such systems, but will discuss technical details where necessary.
We hope that the course will empower researchers to make better use of language model systems and inform deployment across academia and industry. We also hope to stimulate engagement with the serious risks associated with intelligent systems, and encourage further work to address them.
Stay in touch
If you would like to be informed of any updates regarding the course, please sign up to the course mailing list via the form below.
Time and location
All lectures will be held from 11.30-12.30 in MR4, Centre for Mathematical Sciences.
Part I. What is a Language Model?
1. Introduction to Language Models (Monday April 28th)
2. The Transformer Architecture (Wednesday April 30th)
3. Scaling Laws (Friday May 2nd)
Part II. Crafting Agentic Systems
4. Post-Training (Monday May 5th)
5. Reinforcement Learning for Language Models (Wednesday May 7th)
6. Reward Modelling (Friday May 9th)
7. Agents and Agent architectures (Monday May 12th)
Part III. Agentic Behaviour
8. Optimisation and Reasoning (Wednesday May 14th)
9. Reward Hacking and Goal Misgeneralisation (Friday May 16th)
10. Out-of-Context Reasoning and Situational Awareness (Monday May 19th)
11. Deceptive Alignment and Alignment Faking (Wednesday May 21st)
Part IV. Frontiers
12. Threat modelling, Safety Cases, and Systemic Risk (Friday May 23rd)
13. Evaluations (Monday May 26th)
14. AI Control (Wednesday May 28th)
15. AI Orgs and Agendas (Friday May 30th)
16. The Future of Language Models (Monday June 2nd)
