LESSWRONG
LW

46
Thane Ruthenis
8714Ω785419751
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
8Thane Ruthenis's Shortform
Ω
1y
Ω
170
AI Safety Public Materials
3 years ago
(+195)
Thane Ruthenis's Shortform
Thane Ruthenis7h20

Have you happen to have read by beginner-friendly book about AI safety/risk "Uncontrollable"? 

Nope.

Reply
Thane Ruthenis's Shortform
Thane Ruthenis19h20

Hopefully it's just my personal pet peeves, then!

Reply
Thane Ruthenis's Shortform
Thane Ruthenis19h20

I liked that one as well, I think it does a good job at what it aims to do.

My worry is that "improved and simplified" may not be enough. As kave pointed out, it might be that we've already convinced everyone who could be convinced by arguments-shaped-like-this, and that we need to generate arguments shaped in a radically different way to convince new demographics, rather than slightly vary the existing shapes. (And it may be the case that it's very hard for people-like-us to generate arguments shaped different ways – I'm not quite sure what that'd look like, though I haven't thought about it much – so doing that would require something nontrivial from us, not just "sit down and try to come up with new essays".)

Maybe that's wrong; maybe the issue was lack of reach rather than exhausting the persuadees' supply, and the book-packaging + timing will succeed massively. We'll see.

Reply
Thane Ruthenis's Shortform
Thane Ruthenis20h30

Sure; but the following sections are meant as explanations/justifications of why that is the case. The paragraph I omitted does a good job of explaining why they would need to learn to predict the world at large, not just humans, and would therefore contain more than just human-mimicrky algorithms. To reinforce that with the point about reasoning models, one could perhaps explain how that "generate sixteen CoTs, pick the best" training can push LLMs to recruit those hidden algorithms for the purposes of steering rather than just prediction, or even to incrementally develop entirely new skills.

A full explanation of reinforcement learning is probably not worth it (perhaps it was in the additional 200% of the book Eliezer wrote, but I agree it should've been aggressively pruned). But as-is, there are just clearly missing pieces here.

Reply
Thane Ruthenis's Shortform
Thane Ruthenis1d60

I worry that I particularly enjoy the kind of writing they do, but we've already tapped the market of folks like me

Yup, hence my not being excited to see the usual rhetoric being rehearsed, instead of something novel.

The other thing I really liked: they would occassionally explain some science to expand on their point (nuclear physics is the example they expounded on at length, but IIRC they mentioned a bunch of other bit of science in passing)

Yup. Chapter 10 is my favorite.

Reply
Thane Ruthenis's Shortform
Thane Ruthenis1d55-2

Just finished If Anyone Builds It, Everyone Dies (and some of the supplements).[1] It feels... weaker than I'd hoped. Specifically, I think Part 3 is strong, and the supplemental materials are quite thorough, but Parts 1-2... I hope I'm wrong, and this opinion is counterweighed by all these endorsements and MIRI presumably running it by lots of test readers. But I'm more bearish on it making a huge impact than I was before reading it.

Point 1: The rhetoric – the arguments and their presentations – is often not novel, just rehearsed variations on the arguments Eliezer/MIRI already deployed. This is not necessarily a problem, if those arguments were already shaped into their optimal form, and I do like this form... But I note those arguments have so far failed to go viral. Would repackaging them into a book, and deploying it in our post-ChatGPT present, be enough? Well, I hope so.

Point 2: I found Chapter 2 in particular somewhat poorly written in how it explains the technical details.

Specifically, those explanations often occupy that unfortunate middle ground between "informal gloss" and "correct technical description" where I'd guess they're impenetrable both to non-technical readers and to technical readers unfamiliar with the subject matter.

An example that seems particularly egregious to me:

You might think that, because LLMs are grown without much understanding and trained only to predict human text, they cannot do anything except regurgitate human utterances. But that would be incorrect. [...]

Furthermore, AIs nowadays are not trained only to predict human-generated text. An AI-grower might give their AI sixteen tries at solving a math problem, thinking aloud in words about how to solve it; then, the “chain-of-thought” for whichever of the sixteen tries went best would get further reinforced by gradient descent, yielding what’s called a reasoning model. That’s a sort of training that can push AIs to think thoughts no human could think.

How does that conclusion follow? If a base model can only regurgitate human utterances, how is generating sixteen utterances and then reinforcing some of them leads to it... not regurgitating human utterances? This explanation is clearly incomplete. My model of a nonexpert technical-minded reader, who is actually tracking the gears the book introduces, definitely notices that and is confused.

Explanation of base models' training at the start of the chapter feels flawed in the same way. E. g.:

Then, they determine the architecture: the rules for how to combine their input (like the Once upon a ti sequence of 15 14 3…) with the weights in the parameters. Something like, “I’ll multiply each input-number with the weight in the first parameter, and then add it to the weight in the second parameter, and then I’ll replace it with zero if it’s negative, and then…” They pick a lot of operations like that—hundreds of billions, linking every single weight into the calculation.

My model of a technical-minded reader is confused about how that whole thing is supposed to work. It sounds like AI developers manually pick billions of operations? What? The technical reader would've rather you just mentioned operations on matrices. This might've required more effort to understand, but understanding would've been at all possible.

My model of a nontechnical reader just has their eyes glaze over when reading this description. It uses simple words, but it clearly gestures at some messy complicated thing, and doesn't actually conceptualize it in a simple-to-understand way. ("It" being "the neural substrate", and how it can both be unreadable to us yet encode useful computations.)

And then this explanation is used to build the definition of gradient descent; and then this term is used all throughout the rest of the book to make arguments for various things. My guess is that this explanation is not sufficient to make readers feel like they grok the concept; on the contrary, it's likely to make them earmark the term as "don't really get this one". This would then, subtly or not, poison every argument where this term reoccurs.

Or maybe I'm unfairly nitpicking. Again, I think MIRI ran it by many test readers, so presumably this actually does work fine in practice? But this is what my eyes are telling me.

Point 3: Part 2, the fictional story. It's kind of... eh. The stated purpose is to help "make abstract considerations feel more real", but does it actually accomplish that? It's written in a pretty abstract way. Its narrative is mixed with technical discussion. It takes the AI's perspective, and doesn't view things from the world's perspective much, so it doesn't create a visceral sense of something you would see happening around you. It involves some honestly quite convoluted master-plan scenarios with multicancer pandemics.

Does the story actually serve to reinforce the risk's plausibility? Maybe, but I wouldn't have guessed so.

Point 4: The question of "but how does the AI kill us?" is probably at the forefront of many people's minds, especially once the basics of "it would want to kill us" are established, but the book takes its sweet time getting there. And I don't think Chapter 6 is doing a stellar job either. It meanders around the point so much:

  • It starts with an analogy...
  • ... then it builds some abstract scaffolding about why ASI defeating humanity should be an easy call to make, even if we can't predict how...
  • ... then it vaguely alludes to weird powers the ASI may develop, still without quite spelling out how these powers would enable a humanity-killing plan...
  • ...  then it drops a bunch of concrete examples of weird hacks you can pull off with technology, still without describing how it may all come together to enable an omnicide...
  • ... then it seems to focus on how much you can improve on biology/how easy it is to solve...
  • ... then it does some more abstract discussion...
  • ... and then the chapter ends...
  • (the whole thing kind of feels like this gif)

... and then we get to Part 2, the fictional story. In which the described concrete scenario is IMO quite convoluted and implausible-sounding, plus see all my other complaints about it in the previous point.


I think Part 3 is strong.[2] I think a solid chunk of Part 1 is strong as well. The online supplements seem great, and I like the format choices there (title-questions followed by quick subtitle answers). But Chapter 2, Chapter 6, and Part 2 seem like weak points. Which is unfortunate, since they're both the parts where the object-level case is laid out, and the early parts which decide whether a reader would keep reading or not.

Or so my impression goes.

  1. ^

    I binged it starting from the minute it became available, because I heard those reports about MIRI employees getting something new from it regarding the alignment problem, and wondered if it would enhance my own understanding as well, or perhaps upturn it and destroy my hopes about my own current research agenda. But no, unfortunately/thankfully there was nothing new for me.

  2. ^

    It also featured detailed descriptions of various engineering challenges/errors and the distillations of lessons from them (Chapter 10 and this supplement), which was the most interesting and useful part of the book for me personally. 

Reply211
johnswentworth's Shortform
Thane Ruthenis5d*221

Not directly related to your query, but seems interesting:

The receptor was the first one I checked, and sure enough I have a single-nucleotide deletion 42 amino acids in to the open reading frame (ORF) of the 389 amino acid protein. That will induce a frameshift error, completely fucking up the rest of protein.

Which, in turn, is pretty solid evidence for "oxytocin mediates the emotion of loving connection/aching affection" (unless there are some mechanisms you've missed). I wouldn't have guessed it's that simple.

Generalizing, this suggests we can study links between specific brain chemicals/structures and cognitive features by looking for people missing the same universal experience, checking if their genomes deviate from the baseline in the same way, then modeling the effects of that deviation on the brain. Alternatively, the opposite: search for people whose brain chemistry should be genetically near-equivalent except for one specific change, then exhaustively check if there's some blatant or subtle way their cognition differs from the baseline.

Doing a brief literature review via GPT-5, apparently this sort of thing is mostly done with regards to very "loud" conditions, rather than in an attempt to map out the brain in general. I could imagine that it won't turn out that simple in practice, but the actual bottleneck is probably researchers with a good enough theory-of-mind to correctly figure out the subtle ways the subjects' cognition differs (easy for "severe autism", much harder for "I feel empathy and responsibility, but not loving connection").

Reply1
Shortform
Thane Ruthenis6d134

From my limited experience following AI events, agreed. Whole storms of nonsense can be generated by some random accounts posting completely non-credible claims, some people unthinkingly amplifying those, then other people seeing that they are being amplified, thinking it means there's something to them, amplifying them further, etc.

Reply
Is there actually a reason to use the term AGI/ASI anymore?
Answer by Thane RuthenisSep 10, 2025*80

One of the cruxes here is whether one believes that "AGI" is in fact a real distinct thing, rather than there just being a space of diverse cognitive algorithms of different specializations and powers, with no high-level organization. I believe that, and that the current LLM artefacts are not AGI. Some people disagree. That's a crux.

If you're one of the people who thinks that "general intelligence" is a thing, then figuring out whether a given system is an AGI or not, or whether a given paradigm scales to AGI, can be a way to figure out whether that system/paradigm is going to be able to fully automates the economy/AI research. An AGI would definitely be able to do that, so "LLMs are an AGI" implies "LLMs will be transformative". "LLMs are not AGI" does not imply "LLMs won't be transformative/omnicide-capable", so determining the AGI-ness doesn't give you all the answers (such as the exact limits of the LLM paradigm). Nevertheless, "is this an AGI?" is still a useful query to run[1], and getting "no" does somewhat update you away from "LLMs will be transformative", since it removes some of the reasons they may be transformative. 

Tertiarily, one question here may be, "what the hell do you mean by a 'real AGI' as distinct from a 'good-enough AGI approximant'"? Here's a long-winded pseudo-formal explanation:

I deny the assumption that behavioral definitions of AGI/ASI are akin 
to denying the distinction between the Taylor-polynomial approximation of a function, and that function itself (at least if we assume infinite compute like this).

Consider a function f(x) that could be expanded into a power series p(x) with an infinite convergence radius, e. g. f(x)=ex.[2] The power series is that function, exactly, not an approximation; it's a valid way to define f(x).

However, that validity only holds if you use the whole infinite-term power series, p∞(x). Partial sums pn(x), with finite n, do not equal f(x), "are not" f(x). In practice, however, we're always limited to such finite partial sums. If so, then if you're choosing to physically/algorithmically implement f(x) as some pn(x), the behavior of pn(x) as you turn the "crank" of n is crucial for understanding what you're going to achieve.

Suppose that we define an AGI as some function F(x). As above, F can have many exactly-equivalent representations/implementations, call their set R(F). One of them may be some entity Gn(x) such that limn→∞Gn(x)=F(x).

Does it matter which member of R(F) we're working with, if we're reasoning about F's behavior or capabilities? No: they are all exactly equivalent. Some members of R(F) may define F behaviorally, some may do it structurally; we don't care. We pick whichever definition is more convenient to work with.

However, implementing F in practice – incarnating it into algorithms and physics – is a completely different problem. To do so, we have to pick some structural representation/definition from R(F), and we have to mind its length/complexity, because it'll be upper-bounded (by our finite compute).

In this case, the pick of representation matters. Notably, for some k≠∞, Gk is not a member of R(F). If so, we have to start wondering about the error dist(Gk;F), and how it behaves as we move k around, and what is the largest k such that Gk's description complexity is below our upper bound.

For example, perhaps Gn(x) is an expansion of F(x) around some abstract "point" x=a, and while G∞ exactly equals F everywhere, Gk only serves as a good approximation around that point a, with the error growing rapidly as you move away from a. (We may call that "failure to generalize out of distribution".)

If so, nitpicking about the definition/representation/implementation of F that is being used to incarnate F absolutely matters.

For some members of R(F), say Q∞, even their complexity-limited approximations Qk are good enough everywhere. Formally, we may say that ∀x:dist(F(x);Qk(x))<ϵ, for some small ϵ. The informal statement "Qk is a genuine AGI" is then only slightly imprecise.

For other members of R(F), like G∞, their complexity-limited approximations Gk are not that good. Formally, ∃x:dist(F;Qk)>T, for some large T and large sets of x. The informal statement "Gk is an AGI" is then drastically incorrect: it essentially says ∀x:dist(F(x);Gk(x))<ϵ. Translated into practical considerations, reasoning about F is not a good way to predict Gk's behavior.

(Edit: Further, if what you care about are practically feasible AGI designs, you may restrict the set R(F) only to those definitions/representations whose small-error approximations have a finite description length. Hence you may somewhat-imprecisely say that "Gn is not a real AGI" – since it's true for all finite n and n=∞ is implicitly ruled to be outside the consideration.

I note that this is what I was doing in this comment, and yeah, my using the practical and the theoretical definitions of R(F) is unhelpful. I'll try to avoid that in the future.)

  1. ^

    Since figuring out whether it's an AGI/AGI-complete may be easier than predicting its consequences from the first principles.

    Formally, suppose all members of some set S have a property q, and suppose we're wondering whether some entity f has that property. Proving f∈S is a valid way to prove that f possesses q, and it may be easier than proving that in a more "direct" way (without referring to S).

  2. ^

    Formally, assume f(x) is an entire function.

Reply
Yes, AI Continues To Make Rapid Progress, Including Towards AGI
Thane Ruthenis7d20

I'd just say "ah, any software that you can run on computers that can cause the extinction of humanity even if humans try to prevent it" would fulfill the sufficiency criterion for AGIniplav

Fair.

One crux here may be that you are more certain that "AGI" is a thing?

Yup.

Reply
Load More
53The System You Deploy Is Not the System You Design
12d
0
26Is Building Good Note-Taking Software an AGI-Complete Problem?
4mo
13
373A Bear Case: My Predictions Regarding AI Progress
6mo
163
140How Much Are LLMs Actually Boosting Real-World Programmer Productivity?
Q
6mo
Q
52
152The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better
7mo
51
32Abstract Mathematical Concepts vs. Abstractions Over Real-World Systems
Ω
7mo
Ω
10
39Are You More Real If You're Really Forgetful?
QΩ
10mo
QΩ
25
20Towards the Operationalization of Philosophy & Wisdom
11mo
2
8Thane Ruthenis's Shortform
Ω
1y
Ω
170
92A Crisper Explanation of Simulacrum Levels
2y
13
Load More