Brendon_Wong

Safety-First Agents/Architectures Are a Promising Path to Safe AGI

Summary Language model agents (LMAs) like AutoGPT have promising safety characteristics compared to traditional conceptions of AGI. The LLMs they are composed of plan, think, and act in highly transparent and correctable ways, although not maximally so, and it is unclear whether safety will increase or decrease in the future. Regardless of where commercial trends will take us, it is possible to develop safer versions of LMAs, as well as other "cognitive architectures" that are not dependent on LLMs. Notable areas of potential safety work include effectively separating and governing how agency, cognition, and thinking arise in cognitive architectures. If needed, safety-first cognitive architectures (SCAs) can match or exceed the performance of less safe systems, and can be compatible with many ways AGI may develop. This makes SCAs a promising path towards influencing and ensuring safe AGI development in everything from very-short-timeline (e.g. LMAs are the first AGIs) to long-timeline scenarios (e.g. future AI models are incorporated into or built explicitly for an existing SCA). Although the SCA field has begun emerging over the past year, awareness seems low, and the field seems underdeveloped. I wanted to write this article so that more people are aware of what's happening with SCAs, document my thinking on the SCA landscape and promising areas of work, and advocate for more people, funding, and research going towards SCAs. Background Language model agents (LMAs), systems that integrate thinking performed by large language models (LLMs) prompting themselves in loops, have exploded in popularity since the release of AutoGPT at the end of March 2023. Even before AutoGPT, the related field that I call "safety-first cognitive architectures" (SCAs) emerged in the AI safety community. Most notably, in 2022, Eric Drexler formulated arguments for the safety of such systems and developed a high-level design for an SCA called the open agency model. Shortly thereafter

13Aug 6, 2023

Can You Give Support or Feedback for My Program to Alleviate Poverty?

13Jun 25, 2015

ARIA's Safeguarded AI grant program is accepting applications for Technical Area 1.1 until May 28th

11May 22, 2024

Positive Book and Other Media Recommendations for a Teen Audience

7Oct 12, 2014

Brendon_Wong

Message

101

13y

ARIA's Safeguarded AI grant program is accepting applications for Technical Area 1.1 until May 28th

Note: I am completely unaffiliated with ARIA. I figured I'd post this since applications are closing soon and I didn't see anyone post about this. My Takeaways: * ARIA is funding the development of Safeguarded AI which is an update to and specific implementation of davidad's Open Agency Architecture. *...

May 22, 202411

Safety-First Agents/Architectures Are a Promising Path to Safe AGI

Aug 6, 202313

Notes on the importance and implementation of safety-first cognitive architectures for AI

Background I've been working on a knowledge management/representation system for ~1.5 years with the initial goal of increasing personal and collective intelligence. I discovered that this work could be highly applicable to AI safety several weeks ago through Eric Drexler's work on QNRs, which pertains to knowledge representation, and then...

May 11, 20233

Can You Give Support or Feedback for My Program to Alleviate Poverty?

Hi LessWrong, Two years ago, when I travelled to Belize, I came up with an idea for a self-sufficient scalable program to address poverty. I saw how many people in Belize were unemployed or getting paid very low wages, but I also saw how skilled they were, a result of...

Jun 25, 201513

Positive Book and Other Media Recommendations for a Teen Audience

Hi Less Wrong, I've got the opportunity to promote books and other forms of content to a largely teenage audience. I'm looking for some good book recommendations and recommendations for a limited amount of other media (websites, movies, etc) that will spread awareness of positive ideas and issues in the...

Oct 12, 20147

Effective Rationality Training Online

Article Prerequisite: Self-Improvement or Shiny Distraction: Why Less Wrong is anti-Instrumental Rationality Introduction The goal of this post is to explore the idea of rationality training, feedback and ideas are greatly appreciated. Less Wrong’s stated mission is to help people become more rational, and it has made progress toward that...

Aug 10, 20134

LESSWRONG
LW

LESSWRONG
LW

Brendon_Wong

Brendon_Wong

Brendon_Wong

Safety-First Agents/Architectures Are a Promising Path to Safe AGI

Can You Give Support or Feedback for My Program to Alleviate Poverty?

ARIA's Safeguarded AI grant program is accepting applications for Technical Area 1.1 until May 28th

Positive Book and Other Media Recommendations for a Teen Audience

Brendon_Wong

ARIA's Safeguarded AI grant program is accepting applications for Technical Area 1.1 until May 28th

Safety-First Agents/Architectures Are a Promising Path to Safe AGI

Notes on the importance and implementation of safety-first cognitive architectures for AI

Can You Give Support or Feedback for My Program to Alleviate Poverty?

Positive Book and Other Media Recommendations for a Teen Audience

Effective Rationality Training Online

Safety-First Agents/Architectures Are a Promising Path to Safe AGI

Can You Give Support or Feedback for My Program to Alleviate Poverty?

ARIA's Safeguarded AI grant program is accepting applications for Technical Area 1.1 until May 28th

Positive Book and Other Media Recommendations for a Teen Audience

ARIA's Safeguarded AI grant program is accepting applications for Technical Area 1.1 until May 28th

Safety-First Agents/Architectures Are a Promising Path to Safe AGI

Notes on the importance and implementation of safety-first cognitive architectures for AI

Can You Give Support or Feedback for My Program to Alleviate Poverty?

Positive Book and Other Media Recommendations for a Teen Audience

Effective Rationality Training Online