tilmanr

LESSWRONG
LW

tilmanr — LessWrong

14d

This is a special post for quick takes (aka "shortform"). Only the owner can create top-level comments.

tilmanr14dQuick Take

If we want more talented people in AI safety we should focus on creating new programs. I think this might be for most readers easily defendable, but I quickly want to write my thoughts out.

Even ambitious scaling of MATS, Astra, and similar programs won't produce enough AI safety researchers to solve the field's core problems.

If the programs scaled aggressively, I think there are still lots of potential contributors that would not apply to them. From experience talking with ambitious and driven people who could work in AI safety, the reasons can be logistically, like not wanting to relocate, not wanting part/full-time work, financial security, etc. but they can also be subject specific,... (read 400 more words →)

-2

2025 Q1 Pivotal Research Fellowship (Technical & Policy)

Tobias H

Tobias H, tilmanr

We’re excited to announce that applications are now open for our 2025 Q1 Pivotal Research Fellowship, a 9-week program designed to enable promising researchers to produce impactful research and accelerate their careers in technical AI safety, AI governance, and biosecurity.

About the Fellowship

The Pivotal Research Fellowship is hosted in London at the London Initiative for Safe AI (LISA). It offers a unique opportunity for early-career researchers to collaborate with experienced mentors, engage in workshops and seminars, and build a strong network within the AI safety research community in London and beyond.

Dates: February 3rd to April 4th, 2025
Application Deadline: November 21st, 2024
Apply here.

Fellows receive:

Direct mentorship from established researchers
Access to LISA, working alongside leading researchers in AI

... (read 308 more words →)

Apply to the Pivotal Research Fellowship (AI Safety & Biosecurity)

Tobias H

Tobias H, tilmanr

The Swiss Existential Risk Initiative (CHERI) is now called Pivotal Research, and the CHERI research fellowship is now the Pivotal Research Fellowship.

Apply for the Pivotal Research Fellowship this summer in London to research global catastrophic risks (GCR) with experienced mentors on technical AI safety, AI governance, biosecurity & pandemic preparedness. (Deadline: April 21, 2024)

Research Fellowship

The Pivotal Research Fellowship will take place in London from July 1 to August 30, 2024. In our fourth research fellowship, we offer a 9-week program providing fellows with experienced mentors and research managers. Accepted applicants will have the opportunity to work full-time on GCR reduction, focusing on emerging technologies: we look forward to hosting fellows working on technical AI safety,... (read 264 more words →)

Understanding mesa-optimization using toy models

tilmanr

tilmanr, rusheb, Guillaume Corlouer, Dan Valentine, afspies, mivanitskiy, Can

Overview

Solving the problem of mesa-optimization would probably be easier if we understood how models do search internally
We are training GPT-type models on the toy task of solving mazes and studying them in both a mechanistic interpretability and behavioral context.
This post lays out our model training setup, hypotheses we have, and the experiments we are performing and plan to perform. Experimental results will be forthcoming in our next post.
We invite members of the LW community to challenge our hypotheses and the potential relevance of this line of work. We will follow up soon with some early results^[1]. Our main source code is open source, and we are open to collaborations.

Introduction

Some threat models of... (read 2943 more words →)

Replying toHow harmful are improvements in AI? + Poll

tilmanr4y

How harmful are improvements in AI? + Poll

Thank you for giving more context to EleutherAI's stance on acceleration and linking to your newest paper.

I support the claim that your open model contributes to AI safety research, and I generally agree with the improvements for the alignment landscape. I can also understand why you are not detailing possible failure modes of realising LLM, as this would basically be stating a bunch of infohazards.
But at least for me, this opens the space for discussing until which point to open up previously closed models for the sake of alignment research. If an aligned researcher can benefit from access, so could a non-aligned researcher, hence the " accidental acceleration."

How harmful are improvements in AI? + Poll

tilmanr

tilmanr, Marius Hobbhahn

This post was written by Marius Hobbhahn and Tilman Räuker.

Disclaimer: We have previously posted this piece on the EA forum. We now post it here because LW allows for polls, and we have worked in additional feedback.

Over the last years, we have encountered stances on development within AI from “all forms of development in AI is bad” over “sometimes development can be justified” to “If it means control over AI technologies, development could be a necessary evil.” We think it is essential to be aware of these different perspectives of development in AI and be able to deduct where other researchers/organizations stand. This post will look at different considerations and trade-offs regarding... (read 2304 more words →)

LESSWRONG
LW

LESSWRONG
LW

Understanding mesa-optimization using toy models

Apply to the Pivotal Research Fellowship (AI Safety & Biosecurity)

How harmful are improvements in AI? + Poll

2025 Q1 Pivotal Research Fellowship (Technical & Policy)

tilmanr

tilmanr

tilmanr's Shortform

2025 Q1 Pivotal Research Fellowship (Technical & Policy)

Apply to the Pivotal Research Fellowship (AI Safety & Biosecurity)

Understanding mesa-optimization using toy models

How harmful are improvements in AI? + Poll

tilmanr

Understanding mesa-optimization using toy models

Apply to the Pivotal Research Fellowship (AI Safety & Biosecurity)

How harmful are improvements in AI? + Poll

2025 Q1 Pivotal Research Fellowship (Technical & Policy)

tilmanr

tilmanr

tilmanr's Shortform

2025 Q1 Pivotal Research Fellowship (Technical & Policy)

Apply to the Pivotal Research Fellowship (AI Safety & Biosecurity)

Understanding mesa-optimization using toy models

How harmful are improvements in AI? + Poll

About the Fellowship

Research Fellowship

Overview

Introduction