RobertM

LessWrong dev & admin as of July 5th, 2022.

Comments

Sorted by

What do people mean when they talk about a "long reflection"?  The original usages suggest flesh-humans literally sitting around and figuring out moral philosophy for hundreds, thousands, or even millions of years, before deciding to do anything that risks value lock-in, but (at least) two things about this don't make sense to me:

  • A world where we've reliably "solved" for x-risks well enough to survive thousands of years without also having meaningfully solved "moral philosophy" is probably physically realizable, but this seems like a pretty fine needle to thread from our current position.  (I think if you have a plan for solving AI x-risk that looks like "get to ~human-level AI, pump the brakes real hard, and punt on solving ASI alignment" then maybe you disagree.)
  • I don't think it takes today-humans a thousand years to come up with a version of indirect normativity (or CEV, or whatever) that actually just works correctly.  I'd be somewhat surprised if it took a hundred, but maybe it's actually very tricky.  A thousand just seems crazy.  A million makes it sound like you're doing something very dumb, like figuring out every shard of each human's values and don't know how to automate things.
RobertM40

I tried to make a similar argument here, and I'm not sure it landed.  I think the argument has since demonstrated even more predictive validity with e.g. the various attempts to build and restart nuclear power plants, directly motivated by nearby datacenter buildouts, on top of the obvious effects on chip production.

RobertM20

Should be fixed now.

RobertM20

Good catch, looks like that's from this revision, which looks like it was copied over from Arbital - some LaTeX didn't make it through.  I'll see if it's trivial to fix.

RobertM20

The page isn't dead, Arbital pages just don't load sometimes (or take 15+ seconds).

RobertMΩ83116

I understand this post to be claiming (roughly speaking) that you assign >90% likelihood in some cases and ~50% in other cases that LLMs have internal subjective experiences of varying kinds.  The evidence you present in each case is outputs generated by LLMs.

The referents of consciousness for which I understand you to be making claims re: internal subjective experiences are 1, 4, 6, 12, 13, and 14.  I'm unsure about 5.

Do you have sources of evidence (even illegible) other than LLM outputs that updated you that much?  Those seem like very surprisingly large updates to make on the basis of LLM outputs (especially in cases where those outputs are self-reports about the internal subjective experience itself, which are subject to substantial pressure from post-training).


Separately, I have some questions about claims like this:

The Big 3 LLMs are somewhat aware of what their own words and/or thoughts are referring to with regards to their previous words and/or thoughts. In other words, they can think about the thoughts "behind" the previous words they wrote.

This doesn't seem constructively ruled out by e.g. basic transformer architectures, but as justification you say this:

If you doubt me on this, try asking one what its words are referring to, with reference to its previous words. Its "attention" modules are actually intentionally designed to know this sort of thing, using using key/query/value lookups that occur "behind the scenes" of the text you actually see on screen.

How would you distinguish an LLM both successfully extracting and then faithfully representing whatever internal reasoning generated a specific part of its outputs, vs. conditioning on its previous outputs to give you plausible "explanation" for what it meant?  The second seems much more likely to me (and this behavior isn't that hard to elicit, i.e. by asking an LLM to give you a one-word answer to a complicated question, and then asking it for its reasoning).

RobertM98

My impression is that Yudkowsky has harmed public epistemics in his podcast appearances by saying things forcefully and with rather poor spoken communication skills for novice audiences.

I recommend reading the Youtube comments on his recorded podcasts, rather than e.g. Twitter commentary from people with a pre-existing adversarial stance to him (or AI risk questions writ large).

RobertM187

On one hand, I feel a bit skeptical that some dude outperformed approximately every other pollster and analyst by having a correct inside-view belief about how existing pollster were messing up, especially given that he won't share the surveys.  On the other hand, this sort of result is straightforwardly predicted by Inadequate Equilibria, where an entire industry had the affordance to be arbitrarily deficient in what most people would think was their primary value-add, because they had no incentive to accuracy (skin in the game), and as soon as someone with an edge could make outsized returns on it (via real-money prediction markets), they outperformed all the experts.

On net I think I'm still <50% that he had a correct belief about the size of Trump's advantage that was justified by the evidence he had available to him, but even being directionally-correct would have been sufficient to get outsized returns a lot of the time, so at that point I'm quibbling with his bet sizing rather than the direction of the bet.

RobertM42

I'm pretty sure Ryan is rejecting the claim that the people hiring for the roles in question are worse-than-average at detecting illegible talent.

RobertM120

Depends on what you mean by "resume building", but I don't think this is true for "need to do a bunch of AI safety work for free" or similar.  i.e. for technical research, many people that have gone through MATS and then been hired at or founded their own safety orgs have no prior experience doing anything that looks like AI safety research, and some don't even have much in the way of ML backgrounds.  Many people switch directly out of industry careers into doing e.g. ops or software work that isn't technical research.  Policy might seem a bit trickier but I know several people who did not spend anything like years doing resume building before finding policy roles or starting their own policy orgs and getting funding.  (Though I think policy might actually be the most "straightforward" to break into, since all you need to do to demonstrate compentence is publish a sufficiently good written artifact; admittedly this is mostly for starting your own thing.  If you want to get hired at a "larger" policy org resume building might matter more.)

Load More