Boaz Barak

LESSWRONG
LW

Boaz Barak — LessWrong

15d

[I work on the alignment team at OpenAI. However, these are my personal thoughts, and do not reflect those of OpenAI. Cross posted on WindowsOnTheory]

I have read with great interest Claude’s new constitution. It is a remarkable document which I recommend reading. It seems natural to compare this constitution to OpenAI’s Model Spec, but while the documents have similar size and serve overlapping roles, they are also quite different.

The OpenAI Model Spec is a collection of principles and rules, each with a specific authority. In contrast, while the name evokes the U.S. Constitution, the Claude Constitution has a very different flavor. As the document says: “the sense we’re reaching for is closer to what “constitutes”... (read 2141 more words →)

Why we are excited about confession!

Boaz Barak

Boaz Barak, Gabriel Wu, Manas Joglekar

1mo

Boaz Barak, Gabriel Wu, Jeremy Chen, Manas Joglekar

[Linkposting from the OpenAI alignment blog, where we post more speculative/technical/informal results and thoughts on safety and alignment.]

TL;DR We go into more details and some follow up results from our paper on confessions (see the original blog post). We give deeper analysis of the impact of training, as well as some preliminary comparisons to chain of thought monitoring.

We have recently published a new paper on confessions, along with an accompanying blog post. Here, we want to share with the research community some of the reasons why we are excited about confessions as a direction of safety, as well as some of its limitations. This blog... (read 2407 more words →)

138

You will be OK

Boaz Barak

1mo

Seeing this post and its comments made me a bit concerned for young people around this community. I thought I would try to write down why I believe most folks who read and write here (and are generally smart, caring, and knowledgable) will be OK.

I agree that our society often is under prepared for tail risks. As a general planner, you should be worrying about potential catastrophes even if their probability is small. However as an individual, if there is a certain probability X of doom that is beyond your control, it is best to focus on the 1-X fraction of the probability space that you control rather than constantly worrying about... (read 1150 more words →)

•••

Thoughts by a non-economist on AI and economics

Boaz Barak

3mo

[Crossposted on Windows In Theory]

“Modern humans first emerged about 100,000 years ago. For the next 99,800 years or so, nothing happened. Well, not quite nothing. There were wars, political intrigue, the invention of agriculture -- but none of that stuff had much effect on the quality of people's lives. Almost everyone lived on the modern equivalent of $400 to $600 a year, just above the subsistence level …

Then -- just a couple of hundred years ago, maybe 10 generations -- people started getting richer. And richer and richer still. Per capita income, at least in the West, began to grow at the unprecedented rate of about three quarters of a percent per... (read 4131 more words →)

A non-review of "If Anyone Builds It, Everyone Dies"

Boaz Barak

4mo

I was hoping to write a full review of "If Anyone Builds It, Everyone Dies" (IABIED Yudkowsky and Soares) but realized I won't have time to do it. So here are my quick impressions/responses to IABIED. I am writing this rather quickly and it's not meant to cover all arguments in the book, nor to discuss all my views on AI alignment; see six thoughts on AI safety and Machines of Faithful Obedience for some of the latter.

First, I like that the book is very honest, both about the authors' fears and predictions, as well as their policy prescriptions. It is tempting to practice strategic deception, and even if you believe that... (read 958 more words →)

124

•••

Learnings from AI safety course so far

Boaz Barak

5mo

I have been teaching CS 2881r: AI safety and alignment this semester. While I plan to do a longer recap post once the semester is over, I thought I'd share some of what I've learned so far, and use this opportunity to also get more feedback.

Lectures are recorded and uploaded to a youtube playlist, and @habryka has kindly created a wikitag for this course, so you can view lecture notes here .

Let's start with the good parts

Aspects that are working:

Experiments are working well! I am trying something new this semester - every lecture there is a short presentation by a group of students who are carrying out a small experiment related to... (read 711 more words →)

111

boazbarak's Shortform

Boaz Barak

5mo

This is a special post for quick takes (aka "shortform"). Only the owner can create top-level comments.

AI Safety course intro blog

Boaz Barak

7mo

This is a linkpost for the intro course for CS 2881: AI Safety. It is mostly intended for Harvard/MIT students considering taking the course but could be of interest to others.

Call for suggestions - AI safety course

Boaz Barak

7mo

In the fall I am planning to teach an AI safety graduate course at Harvard. The format is likely to be roughly similar to my "foundations of deep learning" course.

I am still not sure of the content, and would be happy to get suggestions.

Some (somewhat conflicting) desiderata:

I would like to cover the various ways AI could go wrong: malfunction, misuse, societal upheaval, arms race, surveillance, bias, misalignment, loss of control,... (and anything else I'm not thinking of right now). I talke about some of these issues here and here.
I would like to see what we can learn from other fields, including software security, aviation and automative safety, drug safety, nuclear arms control,

... (read 182 more words →)

Machines of Faithful Obedience

Boaz Barak

8mo

[Crossposted on Windows On Theory]

Throughout history, technological and scientific advances have had both good and ill effects, but their overall impact has been overwhelmingly positive. Thanks to scientific progress, most people on earth live longer, healthier, and better than they did centuries or even decades ago.

I believe that AI (including AGI and ASI) can do the same and be a positive force for humanity. I also believe that it is possible to solve the “technical alignment” problem and build AIs that follow the words and intent of our instructions and report faithfully on their actions and observations.

I will not defend these two claims here. However, even granting these optimistic premises, AI’s positive... (read 2925 more words →)

LESSWRONG
LW

LESSWRONG
LW

Why we are excited about confession!

AI will change the world, but won’t take it over by playing “3-dimensional chess”.

A non-review of "If Anyone Builds It, Everyone Dies"

Learnings from AI safety course so far

Boaz Barak

Boaz Barak

Thoughts on Claude's Constitution

Why we are excited about confession!

You will be OK

Thoughts by a non-economist on AI and economics

A non-review of "If Anyone Builds It, Everyone Dies"

Learnings from AI safety course so far

boazbarak's Shortform

Boaz Barak

Why we are excited about confession!

AI will change the world, but won’t take it over by playing “3-dimensional chess”.

A non-review of "If Anyone Builds It, Everyone Dies"

Learnings from AI safety course so far

Boaz Barak

Boaz Barak

Thoughts on Claude's Constitution

Why we are excited about confession!

You will be OK

Thoughts by a non-economist on AI and economics

A non-review of "If Anyone Builds It, Everyone Dies"

Learnings from AI safety course so far

boazbarak's Shortform

Aspects that are working: