We're excited to have Jacques Thibodeau visiting us from Montreal. He's an independent AI Alignment researcher and will be telling us about two research agendas.

Accelerating Alignment:

Those of us working in AI safety/alignment are not Luddites, and on the contrary we'll need to make the best possible use of AI tools - in particular large language models (LLMs) - to help us do our work.

Jacques aims to create an Alignment Research Assistant using LLMs, to serve as the foundation for the ambitious goal of increasing alignment researcher productivity by 10-100x during crunch time. Can it be done? Let's find out!

Supervising AIs that are improving AIs:

Many researchers, both inside and outside the Safety/Alignment space, are interested in the prospect of AIs helping to design their own successors - a particular kind of automated science. How can this be done safely? Jacques hopes to find out.

There are two main ways an AI can improve itself - improvements to the model architecture, and improvements to the data. Jacques' research agenda focuses on data-driven improvements, as these are most likely to result in (potentially undesirable) behaviour changes.

This is super exciting and important stuff, so we hope you can make it to hear and discuss these critical research agendas.

We'll go to the pub after! 🍻

New to LessWrong?

New Comment