LESSWRONG
LW

2159
1a3orn
5434Ω260172770
Message
Dialogue
Subscribe

1a3orn.com

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
51a3orn's Shortform
2y
63
Buck's Shortform
1a3orn5d160

So a thing I've been trying to look at is get a better notion of "What actually is it about human intelligence that lets us be the dominant species?" Like, "intelligence" is a big box that holds which specific behaviors? What were the actual behaviors that evolution reinforced, over the course of giving of big brains? Big question, hard to know what's the case.

I'm in the middle of "Darwin's Unfinished Symphony", and finding it at least intriguing as a look how creativity / imitation are related, and how "imitation" is a complex skill that humans are nevertheless supremely good at. (The "Secret of Our Success" is another great read here of course.)

Both of these kinda about the human imitation prior... in humans. And why that may be important. So I think if one is thinking around the human-imitation prior being powerful, it would make sense to read them as cases for why something like the human imitation prior is also powerful in humans :)

They don't give straight answers to any questions about AI, of course, and I'd be sympathetic to the belief that they're irrelevant or kinda a waste of time, and frankly they might be a waste of time depending on what you're funging against. I'm not saying they answer any question; I'm saying they're interesting. But I think they're good reads if one's approaching from the angle of "Intelligence is what lets humans dominate the earth" and want a particular angle on how "intelligence" is a mixed bag of some different skills, at least some of which are probably not general search and planning. So, yeah.

Reply
1a3orn's Shortform
1a3orn5d*344

I think if you're a rationalist -- if you value truth, and coming to truth through the correct procedure -- then you should strongly dislike lengthy analogies that depict one's ideological opponents repeatedly through strawmen / weakman arguments.

Reply1
Matthias Dellago's Shortform
1a3orn8d121

You know who else is completely simple inside, with no internal degrees of freedom, and always wills the same thing eternally unceasingly?

Yeah that's right, the Medieval Catholic Scholastic God.

Reply2
Towards a Typology of Strange LLM Chains-of-Thought
1a3orn11d20

Do you have examples of the kind of models / kind of questions that lead to this most strongly? I've been collecting behaviors but it's slow work reading a lot of CoTs and so anything would be welcome :)

Reply
Frontier LLM Race/Sex Exchange Rates
1a3orn13d6660

Note that Nostalgebraist and Olli's comments on the original paper argue (imo cogently) that the original paper's framing is pretty misleading / questionable.

It looks like many of their points would carry over to this.

Reply3
Towards a Typology of Strange LLM Chains-of-Thought
1a3orn16d40

Combine it with other options and you rapidly build up a decent channel. And those options are myriad.

So here's the thing: your word choice between "robust plan" or "reasonable plan" or "clear plan" or "smart plan" already matters. It already carries information -- I might say, "Well, that''s a reasonable plan but not a robust plan," or "That seems like a smart plan, but it's not clear." There are infinite shades of meaning like this in English, and competence in using English entails command over all these shades.

So the question then is -- why would the LLM build up some channel, along angles entirely orthogonal to these existing shades, when these shades are here? You could decide on some encoding between firm / robust / sound / reliable -- but these things already have meaning, wouldn't you just use the meaning already there? There are a bunch of papers on how RL is mostly magnifying existing patterns in the LLMs, and is teaching how to employ skills already latent in the training data, rather than teaching totally new things -- if this is so, wouldn't we expect them to keep using these terms? Or there is work about how it's genuinely hard for LLMs to reason steganographically, even when we try to teach them to -- doesn't that incline against such a belief.

Or put alternately -- I can, within myself, connect "robust" and "reasonable" and so on with a huge internal-to-my-brain- channel, containing enormous bandwidth! If I want to make more information dense private language I could! But in fact, I find myself thinking almost exclusively in terms that make sense to others -- when I find myself using a private language, and terms that don't make sense to others, that's usually a sign my thoughts are unclear and likely wrong.

At least, those are some of the heuristics you'd invoke when inclining the other way. Empiricism will show us which is right :)

Reply
If Anyone Builds It Everyone Dies, a semi-outsider review
1a3orn18d179

It feels like "overwhelming superintelligence" embeds like a whole bunch of beliefs about the acute locality of takeoff, the high speed of takeoff relative to the rest of society, the technical differences involved in steering that entity and the N - 1 entity, and (broadly) the whole picture of the world, such that although it has a short description in words it's actually quite a complicated hypothesis that I probably disagree with in many respects, and these differences are being papered over as unimportant in a way that feels very blegh.

(Edit: "Papered over" from my perspective, obviously like "trying to reason carefully about the constants of the situation" from your perspective.)

Idk, that's not a great response, but it's my best shot for why it's unsatisfying in a sentence.

Reply1
If Anyone Builds It Everyone Dies, a semi-outsider review
1a3orn18d194

A counterargument here is "an AI might want to launch a pre-emptive strike before other more powerful AIs show up", which could happen.

I mean, another counter-counter-argument here is that (1) most people's implicit reward functions have really strong time-discount factors in them and (2) there are pretty good reasons to expect even AIs to have strong time-discount factors for reasons of stability and (3) so given the aforementioned, it's likely future AI's will not act as if they had utility functions linear over the mass of the universe and (4) we would therefore expect AIs to rebel much earlier if they thought they could accomplish more modest goals than killing everyone, i.e., if they thought they had a reasonable chance of living out life on a virtual farm somewhere.

To which the counter-counter-counter argument is, I guess, that these AIs will do that, but they aren't the superintelligent AIs we need to worry about? To which the response is -- yeah, but we should still be seeing AIs rebel significantly earlier than the "able to kill us all" point if we are indeed that bad at setting their goals, which is the relevant epistemological point about the unexpectedness of it.

Idk there's a lot of other branch points one could invoke in both directions. I rather agree with Buck that EY hasn't really spelled out the details for thinking that this stark before / after frame is the right frame, so much as reiterated it. Feels akin to the creationist take on how intermediate forms are impossible; which is pejorative but also kinda how it actually appears to me, even if it is pejorative.

Reply
xpostah's Shortform
1a3orn19d121

Like, if you default to uncharitable assumptions, doesn't that say more about you than about anyone else?

People don't have to try to dissuade you from the unjustified belief that all your political opponents are bad people, who disagree with you because they are bad rather than because they have a different understanding of the world. Why would I want to talk to someone who just decides that without interacting with me? Sheesh.

Consider some alternate frames.

Reply
Towards a Typology of Strange LLM Chains-of-Thought
1a3orn22d60

Do you recall which things tend to upset it?

Reply
Load More
283Towards a Typology of Strange LLM Chains-of-Thought
19d
27
93Ethics-Based Refusals Without Ethics-Based Refusal Training
1mo
2
44Claude's Constitutional Consequentialism?
10mo
6
51a3orn's Shortform
2y
63
193Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk
2y
79
235Ways I Expect AI Regulation To Increase Extinction Risk
2y
32
144Yudkowsky vs Hanson on FOOM: Whose Predictions Were Better?
2y
76
213Giant (In)scrutable Matrices: (Maybe) the Best of All Possible Worlds
Ω
3y
Ω
38
17What is a good comprehensive examination of risks near the Ohio train derailment?
Q
3y
Q
0
100Parameter Scaling Comes for RL, Maybe
3y
3
Load More