x

LESSWRONG

LW

Steven Byrnes — LessWrong

Steven Byrnes

Top postsTop post

Steven Byrnes

Message

I'm an AGI safety / AI alignment researcher in Boston with a particular focus on brain algorithms. Research Fellow at Astera. I’m also at: Substack, X/Twitter, Bluesky, RSS, email, and more at this link. See https://sjbyrnes.com/agi.html for a summary of my research and sorted list...

29743

Ω

4803

191

2829

4

8y

Steven Byrnes

I'm an AGI safety / AI alignment researcher in Boston with a particular focus on brain algorithms. Research Fellow at Astera. I’m also at: Substack, X/Twitter, Bluesky, RSS, email, and more at this link. See https://sjbyrnes.com/agi.html for a summary of my research and sorted list...

Top postsTop post

[Intro to brain-like-AGI safety] 1. What's the problem & Why work on it now?

(Last revised: June 2026. See changelog at the bottom.) 1.1 Post summary / Table of contents This is the first of a series of blog posts on the technical safety problem for hypothetical future brain-like Artificial General Intelligence (AGI) systems. That previous sentence might raise a few questions, such as: What is “AGI”? What is “brain-like AGI”? What is “the technical safety problem for brain-like AGI”? If these are “hypothetical future systems”, then why on Earth am I wasting my time reading about them right now? …So my immediate goal in this post is to answer all those questions! After we have that big-picture motivation under our belt, the other 14 posts of this 15-post series will dive into neuroscience and AGI safety in glorious technical detail. See the series cover page for the overall roadmap. Summary of this first post: * In §1.2, I define the “AGI technical safety problem”, put it in the context of other types of safety research (e.g. inventing passively-safe nuclear power plant designs), and relate it to the bigger picture of what it will take for AGI to realize its potential benefits to humanity. * In §1.3, I define “brain-like AGI” as algorithms with big-picture similarity to key ingredients of human intelligence. Future researchers might make such algorithms by reverse-engineering aspects of the brain, or by independently reinventing the same tricks. Doesn’t matter. I argue that “brain-like AGI” is a yet-to-be-invented AI paradigm, quite different from large language models (LLMs). I will also bring up the counterintuitive idea that “brain-like AGI” can (and probably will) have radically nonhuman motivations. I won’t explain that here, but I’ll finish that story by the end of Post #3. * In §1.4, I define the term “AGI”, as I’m using it in this series. * In §1.5, I discuss whether it’s likely that people will eventually make brain-like AGIs, as opposed to some other kind of AGI (or just not invent AGI at all). The section includes seven po

165Jan 26, 2022

Four ways learning Econ makes people dumber re: future AI

365Aug 21, 2025

[Intuitive self-models] 1. Preliminaries

101Sep 19, 2024

Foom & Doom 1: “Brain in a box in a basement”

302Jun 23, 2025

Notes on technical alignment via human-like social drives

1. Frontmatter 1.1 Backstory for this post As discussed in Intro to Brain-Like-AGI Safety, I’m working on the technical alignment problem for a hypothetical future “brain-like AGI”, with a particular focus on treating human innate social and moral drives as a possible jumping-off point for our technical alignment approach. After...

Sympathy for both sides of the egregious misalignment debate

On one side of this debate is Yudkowsky & Soares, who think that (if AI progress continues) we’re on a direct path to egregiously-misaligned, scheming, out-of-control, rogue superintelligence (ASI), not even slightly nice, in the absence of yet-to-be-invented breakthrough technical alignment ideas. On the other side of this debate is...

Empowerment, corrigibility, etc. are simple abstractions (of a messed-up ontology)

1.1 Tl;dr Alignment is often conceptualized as AIs helping humans achieve their goals: AIs that increase people’s agency and empowerment; AIs that are helpful, corrigible, and/or obedient; AIs that avoid manipulating people. But that last one—manipulation—points to a challenge for all these desiderata: a human’s goals are themselves under-determined and...

Some takes on UV & cancer

Table of contents: * Part 1: In which I use my optical physics background to share some hopefully-uncontroversial observations * Part 2: In which I boldly defy Public Health Orthodoxy on the whole UV situation Part 1: In which I use my optical physics background to share some hopefully-uncontroversial observations...

“Act-based approval-directed agents”, for IDA skeptics

Summary / tl;dr In the 2010s, Paul Christiano built an extensive body of work on AI alignment—see the “Iterated Amplification” series for a curated overview as of 2018. One foundation of this program was an intuition that it should be possible to build “act-based approval-directed agents” (“approval-directed agents” for short)....

You can’t imitation-learn how to continual-learn

In this post, I’m trying to put forward a narrow, pedagogical point, one that comes up mainly when I’m arguing in favor of LLMs having limitations that human learning does not. (E.g. here, here, here.) See the bottom of the post for a list of subtexts that you should NOT...

Podcast: Jeremy Howard is bearish on LLMs

Jeremy Howard was recently[1] interviewed on the Machine Learning Street Talk podcast: YouTube link, interactive transcript, PDF transcript. Jeremy co-invented LLMs in 2018, and taught the excellent fast.ai online course which I found very helpful back when I was learning ML, and he uses LLMs all the time, e.g. 90%...

Load More (7/196)