stuhlmueller

Why OpenAI’s Structure Must Evolve To Advance Our Mission

The section "The Future": > As we enter 2025, we will have to become more than a lab and a startup — we have to become an enduring company. The Board’s objectives as it considers, in consultation with outside legal and financial advisors, how to best structure OpenAI to advance...

Dec 28, 202419

GPT-3.5 judges can supervise GPT-4o debaters in capability asymmetric debates

by Charlie George, justin_dan, and stuhlmueller

Introduction The AI safety debate agenda, proposed by Irving et al. (2018), explores using debates between AI agents to ensure truthful answers from advanced systems. Recently three key debate settings have been studied with LLMs: Information asymmetric debates: Debaters have access to information unavailable to the judge. This hidden information...

Aug 27, 202423

Discovering alignment windfalls reduces AI risk

by goodgravy and stuhlmueller

Some approaches to AI alignment incur upfront costs to the creator (an “alignment tax”). In this post, I discuss “alignment windfalls” which are strategies that tend towards the long-term public good at the same time as reaping short-term benefits for a company. My argument, in short: 1. Just as there...

Feb 28, 202415

A Library and Tutorial for Factored Cognition with Language Models

We want to advance process-based supervision for language models. To make it easier for others to contribute to that goal, we're sharing code for writing compositional language model programs, and a tutorial that explains how to get started: * The Interactive Composition Explorer (ICE) is a library for writing and...

Sep 28, 202247

Ought will host a factored cognition “Lab Meeting”

by jungofthewon and stuhlmueller

Ought will host a factored cognition “Lab Meeting” on Friday September 16 from 9:30AM - 10:30AM PT. We'll share the progress we've made using language models to decompose reasoning tasks into subtasks that are easier to perform and evaluate. This is part of our work on supervising process, not outcomes....

Sep 9, 202235

Prize for Alignment Research Tasks

Can AI systems substantially help with alignment research before transformative AI? People disagree. Ought is collecting a dataset of alignment research tasks so that we can: 1. Make progress on the disagreement 2. Guide AI research towards helping with alignment We’re offering a prize of $200-$2000 for each contribution to...

Apr 29, 202264

Elicit: Language Models as Research Assistants

Ought is an applied machine learning lab. We’re building Elicit, the AI research assistant. Our mission is to automate and scale open-ended reasoning. To get there, we train language models by supervising reasoning processes, not outcomes. This is better for reasoning capabilities in the short run and better for alignment...

Apr 9, 202273

stuhlmueller

stuhlmueller

Supervise Process, not Outcomes

Competition: Amplify Rohin’s Prediction on AGI researchers & Safety Concerns

Elicit: Language Models as Research Assistants

Prize for Alignment Research Tasks

stuhlmueller

Supervise Process, not Outcomes

Competition: Amplify Rohin’s Prediction on AGI researchers & Safety Concerns

Elicit: Language Models as Research Assistants

Prize for Alignment Research Tasks

Why OpenAI’s Structure Must Evolve To Advance Our Mission

GPT-3.5 judges can supervise GPT-4o debaters in capability asymmetric debates

Discovering alignment windfalls reduces AI risk

A Library and Tutorial for Factored Cognition with Language Models

Ought will host a factored cognition “Lab Meeting”

Prize for Alignment Research Tasks

Elicit: Language Models as Research Assistants