reallyeli

reallyeli12d

Agree, and would add: what even is the definition of the term "schemer"? I think Joe Carlsmith's 2023 report coined the term, and defines it as something quite specific with reference to training gaming.

I think people often use it now just in the colloquial sense. I'm not generically against colloquial usage, but I think clarity is often very important for deciding the right interventions. "Is my girlfriend a schemer" is not clearly a helpful frame, if what you're trying to do is think about the space of all ways your girlfriend could end up murdering you.

Replying toThe behavioral selection model for predicting AI motivations

reallyeli2mo

The behavioral selection model for predicting AI motivations

More speculative thoughts:

Perhaps we end up in a regime where this is possible, but expensive enough that we can only afford a bit of it.
If that's the case, the more sample-efficient the RL is, the better off we are — since as the total number of RL rewards we need decreases, we can afford to invest more in the quality of each one.

Even more speculative thoughts:

As an extreme example, you could imagine an AI that's as sample-efficient as a human child. Suppose an AI company had a virtual copy of von Neumann's brain age 5, which they could run at 10x speed, and had to raise him to become an intent-aligned adult.
The resources

reallyeli2mo

The behavioral selection model for predicting AI motivations

Changing selection pressures to align with intended behaviors: This might involve making training objectives more robust, iterating against held-out evaluation signals, or trying to overwrite the AI's motivations at the end of training with high-quality, aligned training data.

Is another broad direction for this increasing the intelligence of the reward models? A fun hypothetical I've heard is to imagine replacing the reward model used during training with Redwood Research (the organization).

So we imagine that during training, whenever an RL reward must be given or not given, a virtual copy of all Redwood Research employees is spun up on the training servers and has one subjective week to decide whether/how to give the RL... (read more)

•••

Replying toNew 80k problem profile: extreme power concentration

reallyeli2mo*

New 80k problem profile: extreme power concentration

[cross-posted from EAF]

Thanks for writing this!!

This risk seems equal or greater to me than AI takeover risk. Historically the EA & AIS communities focused more on misalignment, but I'm not sure if that choice has held up.

Come 2027, I'd love for it to be the case that an order of magnitude more people are usefully working on this risk. I think it will be rough going for the first 50 people in this area; I expect there's a bunch more clarificatory and scoping work to do; this is uncharted territory. We need some pioneers.

People with plans in this area should feel free to apply for career transition funding from my team at Coefficient (fka Open Phil) if they think that would be helpful to them.

Replying toThe behavioral selection model for predicting AI motivations

reallyeli2mo

The behavioral selection model for predicting AI motivations

Thanks for writing this.

One question I have about this and other work in this area is the training / deployment distinction. If AIs are doing continual learning once deployed, I'm not quite sure what that does to this model.

Replying toHow quick and big would a software intelligence explosion be?

reallyeli3mo

How quick and big would a software intelligence explosion be?

Thanks Tom! Appreciate the clear response. This feels like it significantly limits how much I update on the model.

Replying toHow quick and big would a software intelligence explosion be?

reallyeli3mo

How quick and big would a software intelligence explosion be?

We simulate AI progress after the deployment of ASARA.
We assume that half of recent AI progress comes from using more compute in AI development and the other half comes from improved software. (“Software” here refers to AI algorithms, data, fine-tuning, scaffolding, inference-time techniques like o1 — all the sources of AI progress other than additional compute.) We assume compute is constant and only simulate software progress.
We assume that software progress is driven by two inputs: 1) cognitive labour for designing better AI algorithms, and 2) compute for experiments to test new algorithms. Compute for experiments is assumed to be constant. Cognitive labour is proportional to the level of software, reflecting the fact AI has automated AI research.

Your... (read more)

Replying toHow quick and big would a software intelligence explosion be?

reallyeli3mo

How quick and big would a software intelligence explosion be?

These methods may be too aggressive. Before we have ASARA, less capable AI systems may still accelerate software progress by a more moderate amount, plucking the low-hanging fruit. As a result, ASARA has less impact than we might naively have anticipated.

I'm confused.

My default assumption is that prior to ASARA, less-capable AIs will have accelerated software progress a lot — so I'm interested in working that into the model.

It looks like your "gradual boost" section is for people like me; you simulate the gradual emergence of the ASARA boost over a period of five years. But in the gradual boost section, you conclude that using this model results in a higher chance of >10yrs being compressed into one year. (I'm not currently following the logic there, just treating it as a black box.)

Why is the sentence "As a result, ASARA has less impact than we might naively have anticipated" then true? It seems this consideration actually ends up meaning it has more impact.

Replying toThe problem of graceful deference

reallyeli3mo

The problem of graceful deference

Just wanted to say I really enjoyed this post, especially your statement of the problem in the last paragraph.

Economics and Transformative AI (by Tom Cunningham)

reallyeli

4mo

Excerpt

Examples of economic implications from statistical structure.
Here are a few brief cases in which the equilibrium economic effect of AI is determined by the underlying statistical structure of the domain. My conjecture is that these types of observations could be formalized in a common framework.
The concentration of the market for AI depends on the dimensionality of the world. If the world is intrinsically high-dimensional then the returns to model scale will be steadily increasing, and so we should expect high concentration and high markups. If instead the world is intrinsically low-dimensional then the returns to scale will flatten, and there should be low concentration (high competition) and low markups.
The effect of AI

... (read 164 more words →)

Replying toExperiments With Sonnet 4.5's Fiction

reallyeli4mo

Experiments With Sonnet 4.5's Fiction

The guy next to me, who introduced himself as "Blake, Series B, stealth mode,"

I don't think it makes sense to have a startup which is in stealth mode, but is also raising Series B (a later round of funding for scaling once you've found a proven business model).

Human minds form various abstractions over our environment. These abstractions are sometimes fuzzy (too large to fit into working memory) or leaky (they can fail).

Mathematics is the study of what happens when your abstractions are completely non-fuzzy (always fit in working memory) and completely non-leaky (never fail). And also the study of which abstractions can do that.

It might be important for AI strategy to track approx how many people have daily interactions with AI boyfriends / girlfriends. Or, in more generalized form, how many people place a lot of emotional weight and trust in AIs (& which ones they trust, and on what topics).

This could be a major vector via which AIs influence politics, get followers to do things for them, and generally cross the major barrier. The AIs could be misaligned & scheming, or could be acting as tools of some scheme-y humans, or somewhere in between.

(Here I'm talking about AIs which have many powerful capabilities, but aren't able to act on the world themselves e.g. via nanotechnology or robot bodies — this might happen for a variety of reasons.)

LLMs are trained on a human-generated text corpus. Imagine an LLM agent deciding whether or not to be a communist. Seems likely (though not certain) it would be strongly influenced by the existing human literature on communism, i.e. all the text humans have produced about communism arguing its pros/cons and empirical consequences.

Now replace 'communism' with 'plans to take over.' Humans have also produced a literature on this topic. Shouldn't we expect that literature to strongly influence LLM-based decisions on whether to take over?

This is an argument I'm more confident in. Now an argument I'm less confident in.

'The literature' would seem to have a stronger effect on LLMs than it does on humans.... (read more)

A Berkeley professor speculates that LLMs are doing something more like "copying the human mind" than "learning from the world." This seems like it would imply some things we already see (e.g. fuzzily, they're "not very creative"), and it seems like it would imply nontrivial things for what we should expect out of LLMs in the future, though I'm finding it hard to concretize this.

That is, if LLMs are trained with a simple algorithm and acquire functionality that resembles that of the mind, then their underlying algorithm should also resemble the algorithm by which the mind acquires its functionality. However, there is one very different alternative explanation: instead of acquiring its capabilities

... (read more)

If AI turns out to be very useful for cheaply writing formally verified code, what does that do for AI control? We can now request that the untrusted AI produce along with any code it writes a spec and certificate verifying that the code matches the spec.

How bad of a position does this put the untrusted model in, when it's trying to write code that does bad stuff? Some sub-questions towards answering that question:

How tight are the constraints that the formal verification process, at its AI-driven heights, can put on the code? Seems like once something is formally verified at all this rules out large classes of vulnerabilities an AI might exploit. Can we push this further to remove almost all attack vectors?
How easy is it to write a spec that looks good to humans, but actually contains a significant omission?
How feasible is it for the spec writer to collude with the code writer?

reallyeli's Shortform

reallyeli

This is a special post for quick takes (aka "shortform"). Only the owner can create top-level comments.

A good ask for frontier AI companies, for avoiding massive concentration of power, might be:

"don't have critical functions controllable by the CEO alone or any one person alone, and check that this is still the case / check for backdoors periodically"

since this seems both important and likely to be popular.

Funding for programs and events on global catastrophic risk, effective altruism, and other topics

abergal

abergal, reallyeli

[cross-posted from the EA Forum]

Post authors: Eli Rose, Asya Bergal

Posting in our capacities as members of Open Philanthropy’s Global Catastrophic Risks Capacity Building team.

Note: This program, together with our separate program for work that builds capacity to address risks from transformative AI, has replaced our 2021 request for proposals for outreach projects. If you have a project which was in-scope for that program but isn't for either of these, you can apply to our team's general application instead.

Apply for funding here. Applications are open until further notice and will be assessed on a rolling basis.

This is a wide-ranging call for applications, seeking to fund programs and events in a variety of areas of interest to Open... (read 383 more words →)

Are "superforecasters" a real phenomenon?

reallyeli

In https://slatestarcodex.com/2016/02/04/book-review-superforecasting/, Scott writes:

…okay, now we’re getting to a part I don’t understand. When I read Tetlock’s paper, all he says is that he took the top sixty forecasters, declared them superforecasters, and then studied them intensively. That’s fine; I’d love to know what puts someone in the top 2% of forecasters. But it’s important not to phrase this as “Philip Tetlock discovered that 2% of people are superforecasters”. This suggests a discontinuity, a natural division into two groups. But unless I’m missing something, there’s no evidence for this. Two percent of forecasters were in the top two percent. Then Tetlock named them “superforecasters”. We can discuss what skills help people make

... (read more)

LESSWRONG
LW

LESSWRONG
LW

Are "superforecasters" a real phenomenon?

Economics and Transformative AI (by Tom Cunningham)

Funding for work that builds capacity to address risks from transformative AI

Funding for programs and events on global catastrophic risk, effective altruism, and other topics

reallyeli

Economics and Transformative AI (by Tom Cunningham)

reallyeli's Shortform

Funding for programs and events on global catastrophic risk, effective altruism, and other topics

Funding for work that builds capacity to address risks from transformative AI

Are "superforecasters" a real phenomenon?

reallyeli

Are "superforecasters" a real phenomenon?

Economics and Transformative AI (by Tom Cunningham)

Funding for work that builds capacity to address risks from transformative AI

Funding for programs and events on global catastrophic risk, effective altruism, and other topics

reallyeli

Economics and Transformative AI (by Tom Cunningham)

reallyeli's Shortform

Funding for programs and events on global catastrophic risk, effective altruism, and other topics

Funding for work that builds capacity to address risks from transformative AI

Are "superforecasters" a real phenomenon?