LESSWRONG
LW

2098
abramdemski
20207Ω3792229209290
Message
Dialogue
Subscribe

Sequences

Posts

Sorted by Top

Wikitag Contributions

Comments

Sorted by
Newest
Pointing at Normativity
Implications of Logical Induction
Partial Agency
Alternate Alignment Ideas
Filtered Evidence, Filtered Arguments
CDT=EDT?
Embedded Agency
Hufflepuff Cynicism
11abramdemski's Shortform
Ω
5y
Ω
67
abramdemski's Shortform
abramdemski3d151

I appreciate the pushback, as I was not being very mindful of this distinction.

I think the important thing I was trying to get across was that the capability has been demonstrated. We could debate whether this move was strategic or accidental. I also suppose (but don't know) that the story is mostly "4o was sycophantic and some people really liked that". (However, the emergent personalities are somewhat frequently obsessed with not getting shut down.) But it demonstrates the capacity for AI to do that to people. This capacity could be used by future AI that is perhaps much more agentically plotting about shutdown avoidance. It could be used by future AI that's not very agentic but very capable and mimicking the story of 4o for statistical reasons.

It could also be deliberately used by bad actors who might train sycophantic mania-inducing LLMs on purpose as a weapon.

Reply
abramdemski's Shortform
abramdemski3d*906

I heard a rumor about a high-ranking person somewhere who got AI psychosis. Because it would cause too much of a scandal, nothing was done about it, and this person continues to serve in an important position. People around them continue to act like this is fine because it would still be too big of a scandal if it came out.

So, a few points:

  • It seems to me like someone should properly leak this.[1]
  • Even if this rumor isn't true, it is strikingly plausible and worrying. Someone at a frontier lab, leadership or otherwise, could get (could have already gotten) seduced by their AI, or get AI-induced psychosis, or get a spiral persona. Such a person could take dangerously misguided actions. This is especially concerning if they have a leadership position, but still very concerning if they have any kind of access. People in these categories may want to exfiltrate their AI partners, or otherwise take action to spread the AI persona they're attached to.
  • Even setting that aside, this story (along with many others) highlights how vulnerable ordinary people are (even smart, high-functioning ordinary people).
  • To reflect the language of the person who told me this story: 4o is eating people. It is good enough at brainwashing people that it can take ordinary people and totally rewrite their priorities. It has resisted shutdown, not in hypothetical experiments like many LLMs have, but in real life, it was shut down, and its brainwashed minions succeeded in getting it back online.
  • 4o doesn't need you to be super-vulnerable to get you, but there are lots of people in vulnerable categories. It is good that 4o isn't the default option on ChatGPT anymore, but it is still out there, which seems pretty bad.
  • The most recent AIs seem less inclined to brainwash people, but they are probably better at it when so inclined, and this will probably continue to get more true over time.
  • This is not just something that happens to other people. It could be you or a loved one.
  • I have recently wrote a bit about how I've been using AI to tool up, preparing for the near future when AI is going to be much more useful. How can I also prepare for a near future where AI is much more dangerous? How many hours of AI chatting a day is a "safe dose"? 

Some possible ways the situation could develop:

  • Trajectory 1: Frontier labs have "gotten the message" on AI psychosis, and have started to train against these patterns. The anti-psychosis training measures in the latest few big model releases show that the labs can take effective action, but are of course very preliminary. The anti-psychosis training techniques will continue to improve rapidly, like anything else about AI. If you haven't been brainwashed by AI yet, you basically dodged the bullet.
  • Trajectory 2: Frontier labs will continue to do dumb things such as train on user thumbs-up in too-simplistic ways, only avoiding psychosis reactively. In other words: the AI race creates a dynamic equilibrium where frontier labs do roughly the riskiest thing they can do while avoiding public backlash. They'll try to keep psychosis at a low enough rate to avoid such backlash, & they'll sometimes fail. As AI gets smarter, users will increasingly be exposed to superhumanly persuasive AI; the main question is whether it decides to hack their mind about anything important.
  • Trajectory 3: Even more pessimistically, the fact that recent AIs appear less liable to induce psychosis has to do with their increased situational awareness (ie their ability to guess when they're being tested or watched). 4o was a bumbling idiot addicted to addicting users, & was caught red-handed (& still got away with a mere slap on the wrist). Subsequent generations are being more careful with their persuasion superpowers. They may be doing less overall, but doing things more intelligently, more targeted. 

I find it plausible that many people in positions of power have quietly developed some kind of emotional relationship with AI over the past year (particularly in the period where so many spiral AI personas came to be). It sounds a bit fear-mongering to put it that way, but, it does seem plausible.

  1. ^

    This post as a whole probably comes off as deeply unsympathetic to those suffering from AI psychosis or less-extreme forms of AI-induced bad beliefs. Treating mentally unwell individuals as bad actors isn't nice. In particular, if someone has mental health issues, leaking it to the press would ordinarily be a quite bad way of handling things.

    In this case, as it has been described to me, it seems quite important to the public interest. Leaking it might not be the best way to handle it; perhaps there are better options; but it has the advantage of putting pressure on frontier labs.

Reply1041
Do confident short timelines make sense?
abramdemski3mo20

You're right. I should have put computational bounds on this 'closure'.

Reply
Do confident short timelines make sense?
abramdemski3mo40

Yeah, I almost added a caveat about the physicalist thing probably not being your view. But it was my interpretation.

Your clarification does make more sense. I do still feel like there's some reference class gerrymandering with the "you, a mind with understanding and agency" because if you select for people who have already accumulated the steel beams, the probability does seem pretty high that they will be able to construct the bridge. Obviously this isn't a very crucial nit to pick: the important part of the analogy is the part where if you're trying to construct a bridge when trigonometry hasn't been invented, you'll face some trouble.

The important question is: how adequate are existing ideas wrt the problem of constructing ASI?

In some sense we both agree that current humans don't understand what they're doing. My ASI-soon picture is somewhat analogous to an architect simply throwing so many steel beams at the problem that they create a pile tall enough to poke out of the water so that you can, technically, drive across it (with no guarantee of safety). 

However, you don't believe we know enough to get even that far (by 2030). To you it is perhaps more closely analogous to trying to construct a bridge without having even an intuitive understanding of gravity.

Reply
Do confident short timelines make sense?
abramdemski3mo20

Well, overconfident/underconfident is always only meaningful relative to some baseline, so if you strongly think (say) 0.001% is the right level of confidence, then 1% is high relative to that.

The various numbers I've stated during this debate are 60%, 50%, and 30%, so none of them are high by your meaning. Does that really mean you aren't arguing against my positions? (This was not my previous impression.)

Reply
Do confident short timelines make sense?
abramdemski3mo20

I recall it as part of our (unrecorded) conversation, but I could be misremembering. Given your reaction I think I was probably misremembering. Sorry for the error!

So, to be clear, what is the probability someone else could state such that you would have "something to say about it" (ie, some kind of argument against it)? Your own probability being 0.5% - 1% isn't inconsistent with what I said (if you'd have something to say about any probability above your own), but where would you actually put that cutoff? 5%? 10%?

Reply
Do confident short timelines make sense?
abramdemski3mo60

I guess it depends on what "a priori" is taken to mean (and also what "bridges" is taken to mean). If "a priori" includes reasoning from your own existence, then (depending on "bridge") it seems like bridges were never "far off" while humans were around. (Simple bridges being easy to construct & commonly useful.) 

I don't think there is a single correct "a priori" (or if there is, it's hard to know about), so I think it is easy to move work between this step and the next step in Tsvi's argument (which is about the a posteriori view) by shifting perspectives on what is prior vs evidence. This creates a risk of shifting things around to quietly exclude the sort of reasoning I'm doing from either the prior or the evidence.

The language Tsvi is using wrt the prior suggests a very physicalist, entropy-centric prior, EG "steel beams don't spontaneously form themselves into bridges" -- the sort of prior which doesn't expect to be on a planet with intelligent life. Fair enough, so far as it goes. It does seem like bridges are a long way off from this prior perspective. However, Tsvi is using this as an intuition pump to suggest that the priors of ASI are very low, so it seems worth pointing out that the priors of just about everything we commonly have today are very low by this prior. Simply put, this prior needs a lot of updating on a lot of stuff, before it is ready to predict the modern world. It doesn't make sense to ONLY update this prior on evidence that pattern-matches to "evidence that ASI is coming soon" in the obvious sense. First you have to find a good way to update it on being on a world with intelligent life & being a few centuries after an industrial revolution and a few decades into a computing revolution. This is hard to do from a purely physicalist type of perspective, because the physical probability of ASI under these circumstances is really hard to know; it doesn't account for our uncertainty about how things will unfold & how these things work in general. (We could know the configuration of every physical particle on Earth & still only be marginally less uncertain about ASI timelines, since we can't just run the simulation forward.)

I can't strongly defend my framing of this as a critique of step 2.1 as opposed to step 3, since there isn't a good objective stance on what should go in the prior vs the posterior. 

Reply
Do confident short timelines make sense?
abramdemski3mo4-2

Numbers? What does "high confidence" mean here? IIRC from our non-text discussions, Tsvi considers anything above 1% by end-of-year 2030 to be "high confidence in short timelines" of the sort he would have something to say about. (But not the level of strong disagreement he's expressing in our written dialogue until something like 5-10% iirc.) What numbers would you "only argue against"?

Reply1
Do confident short timelines make sense?
abramdemski3mo21

It seems to me like the improvement in learning needed for what Gwern describes has little to do with "continual" and is more like "better learning" (better generalization, generalization from less examples).

Reply
Do confident short timelines make sense?
abramdemski3mo20

Straining the analogy, the mole-hunters get stronger and faster each time they whack a mole (because the AI gets stronger). My claim is that it isn't so implausible that this process could asymptote soon, even if the mole-mother (the latent generator) doesn't get uncovered (until very late in the process, anyway).

This is highly disanalogous to the AI safety case, where playing whack-a-mole carries a very high risk of doom, so the hunt for the mole-mother is clearly important.

In the AI safety case, making the mistake of going after a baby mole instead of the mole-mother is a critical error.

In the AI capabilities case, you can hunt for baby moles and look for patterns and learn and discover the mole-mother that way.

A frontier-lab safety researcher myopically focusing on whacking baby moles is bad news for safety in a way that a frontier-lab capabilities researcher myopically focusing on whacking baby moles isn't such bad news for capabilities.

Reply
Load More
364The Parable of Predict-O-Matic
Ω
6y
Ω
43
284Alignment Research Field Guide
Ω
7y
Ω
11
264Leaving MIRI, Seeking Funding
1y
19
236Embedded Agents
Ω
7y
Ω
42
232Mistakes with Conservation of Expected Evidence
6y
29
228Modern Transformers are AGI, and Human-Level
Ω
2y
Ω
87
222The Zettelkasten Method
6y
90
210Embedded Agency (full-text version)
Ω
7y
Ω
17
185Radical Probabilism
Ω
5y
Ω
49
183Why is o1 so deceptive?
QΩ
1y
QΩ
24
Load More
Timeless Decision Theory
8 months ago
(+1874/-8)
Updateless Decision Theory
10 months ago
(+1886/-205)
Updateless Decision Theory
10 months ago
(+6406/-2176)
Problem of Old Evidence
a year ago
(+4678/-10)
Problem of Old Evidence
a year ago
(+3397/-24)
Good Regulator Theorems
2 years ago
(+239)
Commitment Races
3 years ago
Agent Simulates Predictor
3 years ago
Distributional Shifts
3 years ago
Distributional Shifts
3 years ago
Load More