LESSWRONG
LW

3179
Noosphere89
4075Ω1648224817
Message
Dialogue
Subscribe

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
An Opinionated Guide to Computability and Complexity
2Noosphere89's Shortform
3y
48
Noosphere89's Shortform
Noosphere8910mo*63

Link to long comments that I want to pin, but are too long to be pinned:

https://www.lesswrong.com/posts/Zzar6BWML555xSt6Z/?commentId=aDuYa3DL48TTLPsdJ

https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/?commentId=Gcigdmuje4EacwirD

https://www.lesswrong.com/posts/DCQ8GfzCqoBzgziew/?commentId=RhTNmgZqjJpzGGAaL

Reply
The "Length" of "Horizons"
Noosphere896h40

I have computed time horizon trends for more general software engineering tasks (i.e. with a bigger context) and my preliminary results point towards a logistic trend, i.e. the exponential is already tapering off. However, I am still pretty uncertain about that.

I predict this is basically due to noise, or at best is a very short-lived trend, similarly to the purported faster trend of RL scaling allowing a doubling of 4 months on certain tasks that is basically driven by good scaffolding (which is what RL-on-CoTs was mostly shown to be) and not a creation of new capabilities.

Reply
Jacob Pfau's Shortform
Noosphere891d20

I won't speak for Jacob Pfau, but the easy answer for why infinite time horizons don't exist is simply due to the fact that we have a finite memory capacity, so tasks that require more than a certain amount of memory simply aren't doable.

You can at the very best (though already I'm required to deviate from real humans by assuming infinite lifespans) have time horizons that are exponentially larger than the memory capacity that you have, and this is because once you go beyond 2^B time, where B is the bits of memory, you must repeat yourself in a loop, meaning that if a task requires longer than 2^B units of time to solve, you will never be able to complete the task.

Reply
The "Length" of "Horizons"
Noosphere893d103

I do agree that METR's horizon work is definitely overrelied on (there's only a few datapoints and there are reasons to believe that the benchmark is biased towards tasks that require little context or memory, among other issues), but I do think the exponential growth in AI capabilities is very plausible a priori, and I wrote up a post on why this should generally be expected (though a caveat is that the doubling times can differ dramatically, so we do need to make sure that we aren't overextrapolating from a narrow selection of tasks), so I think METR's observation of exponential growth is likely to generalize to messy tasks, it's just that the time horizons and doubling factors are different.

Reply
Ethical Design Patterns
Noosphere895d20

One big reason why people don't endorse Heuristic C (though not all of the reason) is that the general population are much more selfish/have much higher time preference than LW/EA people, and in general one big assumption that I think EAs/LWers rely on way too much is that the population inherently cares about the future of humanity, independent of their selfish preferences.

More generally, I think Robin Hanson's right to say that a lot of our altruism is mostly fictional, and is instead a way to signal to exploit social systems/cooperate with other people when it isn't fictional, and the behavior we see is most likely in a world where people's altruism is mostly fictional combined with people not knowing all that much about AI.

This is complementary with other explanations like xpym's.

More generally, a potential crux with a lot of the post is that I think that something like "rationalizing why your preferred policies are correct" to quote PoignardAzur, is ultimately what has to happen to ethical reasoning in general, and there's no avoiding that part, and thus involves dealing with conflict theory inevitably (the comment is how the proposed examples are bad since they invoke political debates/conflict theory issues, but contra that comment I think this isn't avoidable in this domain).

There are interesting questions to ask around how we got to the morals we have (I'd say that something like cooperation between people who need to share things in order to thrive/survive explains why we developed any altruism/moral system that wasn't purely selfish), but in general the moral objectivism assumptions embedded in the discourse are pretty bad if we want to talk about how we got to the morality/values that we have, and it's worth trying to frame the discussion in moral relativist terms.

Reply
anaguma's Shortform
Noosphere896d*91

For what it's worth, I don't think it matters for now, for a couple of reasons:

  1. Most of the capabilities gained this year have come from inference scaling which uses CoT more heavily than pre-training scaling which improves forward passes, though you could reasonably argue that most RL inference gains are basically just a good version of how scaffolding would work in agents like AutoGPT, and don't give new capabilities.
  2. Neuralese architectures that outperform standard transformers on big tasks turn out to be relatively hard to do, and are at least not trivial to scale up (this mostly comes from diffuse discourse, but one example of this is here, where COCONUT did not outperform standard architectures in benchmarks)
  3. Steganography is so far proving quite hard for models to do (examples are here and here and here)
  4. For all of these reasons, models are very bad at evading CoT monitors, and the forward pass is also very weak computationally at any rate.

So I don't really worry about models trying to change their behavior in ways that negatively affect safety/sandbag tasks via steganography/one-forward pass reasoning to fool CoT monitors.

We shall see in 2026 and 2027 whether this continues to hold for the next 5-10 years or so, or potentially more depending on how slowly AI progress goes.

Edit: I retracted the claim that most capabilities come from CoT, due to the paper linked in the very next tweet, and think that RL on CoTs is basically a capability elicitation, not a generator of new capabilities.

Reply2
What, if not agency?
Noosphere896d2-2

While I don't like Mechanize's post fully, and have some reservations about the level of technological determinism expressed in the article, I do think that I'm substantially more skeptical of shifting the paradigm of AIs in ways that don't boost capabilities nearly as much as Sahil thinks, and I'm much more skeptical of the claim that automating away humans was a contingent goal, and tend towards more technological determinism than Sahil:

The Future of AI is Already Written

(I already talked about why AIs are easy to shut down using a simpler hypothesis than Sahil did, this comment is more about how it's way more difficult to steer technological development than people appreciate, and also incorrectly overestimate the level of control humanity does have over things).

Reply
Daniel Tan's Shortform
Noosphere897d*72

I think the counterpoint basically makes the paper instantly become ~0 evidence for the claim that large latent reasoners will exist by the next year, and in general more generic task improvements matter more than specialized task improvements due to the messiness and complexity of reality, and one of my updates over the past 2 years is that RL inference/pre-training scaling dwarfs scaffolding improvements by such large margins that scaffolding quickly becomes worthless, so I no longer consider scaffolded LLMs as a relevant concern/threat.

I'd update back to your prior belief on how likely LLMs will become latent reasoners/have something like neuralese.

I'd also be substantially worried about data leakage here.

I'm retracting the claim that scaffolding doesn't matter permanently (though admittedly I was biased by stuff like the AutoGPT stuff being no longer talked about, presumably because newer LLMs have completely obsoleted their scaffolding).

Edit: Apparently current RL is just mostly the good version of scaffolding that people thought in 2023, if you believe the paper here.

Reply11
The Thinking Machines Tinker API is good news for AI control and security
Noosphere899d20

Firstly, your researchers normally have access to the model architecture. This is unfortunate if you want to avoid it leaking. It's not clear how important this is. My sense is that changes to model architecture have been a minority of the algorithmic improvement since the invention of the transformer.

 

I agree with this, but I'd say it's good to do this anyways, because if AIs start being able to do more and more research, then the chances of architecture/paradigm changes goes up, and this is especially true if AI labor scales faster than human labor, so it's worth preventing this possibility early on.

Also, good news on the new Tinker API.

Reply
johnswentworth's Shortform
Noosphere8910d20

I would have just answered "It depends on what you want to do", with there being no set best prior/Universal Turing Machine, because of theorems like the No Free Lunch theorem (and more generally a takeaway from learning/computational theories is that there is no one best prior that was always justified, contrary to the ancient philosopher's hopes).

Reply1
Load More
16Exponential increase is the default (assuming it increases at all) [Linkpost]
19d
0
13Is there actually a reason to use the term AGI/ASI anymore?
Q
1mo
Q
5
72But Have They Engaged With The Arguments? [Linkpost]
2mo
14
18LLM Daydreaming (gwern.net)
3mo
2
11Difficulties of Eschatological policy making [Linkpost]
4mo
3
7State of play of AI progress (and related brakes on an intelligence explosion) [Linkpost]
6mo
0
57The case for multi-decade AI timelines [Linkpost]
6mo
22
15The real reason AI benchmarks haven’t reflected economic impacts
6mo
0
22Does the AI control agenda broadly rely on no FOOM being possible?
Q
7mo
Q
3
0Can a finite physical device be Turing equivalent?
7mo
10
Load More
Acausal Trade
4 months ago
(+18/-18)
Shard Theory
a year ago
(+2)
RLHF
a year ago
(+27)
Embedded Agency
3 years ago
(+640/-10)
Qualia
3 years ago
(-1)
Embedded Agency
3 years ago
(+314/-43)
Qualia
3 years ago
(+74/-4)
Qualia
3 years ago
(+20/-10)