LESSWRONG
LW

rvnnt
17731090
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
3rvnnt's Shortform
5mo
3
Manifesto for doing good science in AI
rvnnt8d10

Suppose you succeed at doing impactful science in AI. What is your plan for ensuring that those impacts are net-positive? (And how would you define "positive" in this context?)

(CTRL+F'ing this post yielded zero safety-relevant matches for "safe", "beneficial", or "align".)

Reply
rvnnt's Shortform
rvnnt5mo10

It's unclear whether there is a tipping point where [...]

Yes. Also unclear whether the 90% could coordinate to take any effective action, or whether any effective action would be available to them. (Might be hard to coordinate when AIs control/influence the information landscape; might be hard to rise up against e.g. robotic law enforcement or bioweapons.)

Don't use passive voice for this. [...]

Good point! I guess one way to frame that would be as

by what kind of process do the humans in law enforcement, military, and intelligence agencies get replaced by AIs? Who/what is in effective control of those systems (or their successors) at various points in time?

And yeah, that seems very difficult to predict or reliably control. OTOH, if someone were to gain control of the AIs (possibly even copies of a single model?) that are running all the systems, that might make centralized control easier? </wild, probably-useless speculation>

Reply
rvnnt's Shortform
rvnnt5mo21

A potentially somewhat important thing which I haven't seen discussed:

  • People who have a lot of political power or own a lot of capital, are unlikely to be adversely affected if (say) 90% of human labor becomes obsolete and replaced by AI.
  • In fact, so long as property rights are enforced, and humans retain a monopoly on decisionmaking/political power, such people are not-unlikely to benefit from the economic boost that such automation would bring.
  • Decisions about AI policy are mostly determined by people with a lot of capital or political power. (E.g. Andreessen Horowitz, JD Vance, Trump, etc.)

(This looks like a decisionmaker is not the beneficiary -type of situation.)

Why does that matter?

  • It has implications for modeling decisionmakers, interpreting their words, and for how to interact with them.[1]

  • If we are in a gradual-takeoff world[2], then we should perhaps not be too surprised to see the wealthy and powerful push for AI-related policies that make them more wealthy and powerful, while a majority of humans become disempowered and starve to death (or live in destitution, or get put down with viruses or robotic armies, or whatever). (OTOH, I'm not sure if that possibility can be planned/prepared for, so maybe that's irrelevant, actually?)


  1. For example: we maybe should not expect decisionmakers to take risks from AI seriously until they realize those risks include a high probability of "I, personally, will die". As another example: when people like JD Vance output rhetoric like "[AI] is not going to replace human beings. It will never replace human beings", we should perhaps not just infer that "Vance does not believe in AGI", but instead also assign some probability to hypotheses like "Vance thinks AGI will in fact replace lots of human beings, just not him personally; and he maybe does not believe in ASI, or imagines he will be able to control ASI". ↩︎

  2. Here I'll define "gradual takeoff" very loosely as "a world in which there is a >1 year window during which it is possible to replace >90% of human labor, before the first ASI comes into existence". ↩︎

Reply
Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion?
rvnnt5mo84

Thank you for (being one of the horrifyingly few people) doing sane reporting on these crucially important topics.

Reply
Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion?
rvnnt5mo20

Typo: "And humanity needs all the help we it can get."

Reply
Altman blog on post-AGI world
rvnnt5mo21

Out of (1)-(3), I think (3)[1] is clearly most probable:

  • I think (2) would require Altman to be deeply un-strategic/un-agentic, which seems in stark conflict with all the skillful playing-of-power-games he has displayed.
  • (3) seems strongly in-character with the kind of manipulative/deceitful maneuvering-into-power he has displayed thus far.
  • I suppose (1) is plausible; but for that to be his only motive, he would have to be rather deeply un-strategic (which does not seem to be the case).

(Of course one could also come up with other possibilities besides (1)-(3).)[2]


  1. or some combination of (1) and (3) ↩︎

  2. E.g. maybe he plans to keep ASI to himself, but use it to implement all-of-humanity's CEV, or something. OTOH, I think the kind of person who would do that, would not exhibit so much lying, manipulation, exacerbating-arms-races, and gambling-with-everyone's-lives. Or maybe he doesn't believe ASI will be particularly impactful; but that seems even less plausible. ↩︎

Reply
Should you publish solutions to corrigibility?
rvnnt5mo10

Note that our light cone with zero value might also eclipse other light cones that might've had value if we didn't let our AGI go rogue to avoid s-risk.

That's a good thing to consider! However, taking Earth's situation as a prior for other "cradles of intelligence", I think that consideration returns to the question of "should we expect Earth's lightcone to be better or worse than zero-value (conditional on corrigibility)?"

Reply
Should you publish solutions to corrigibility?
rvnnt5mo10

To me, those odds each seem optimistic by a factor of about 1000, but ~reasonable relative to each other.

(I don't see any low-cost way to find out why we disagree so strongly, though. Moving on, I guess.)

But this isn't any worse to me than being killed [...]

Makes sense (given your low odds for bad outcomes).

Do you also care about minds that are not you, though? Do you expect most future minds/persons that are brought into existence to have nice lives, if (say) Donald "Grab Them By The Pussy" Trump became god-emperor (and was the one deciding what persons/minds get to exist)?

Reply
Should you publish solutions to corrigibility?
rvnnt5mo10

IIUC, your model would (at least tentatively) predict that

  • if person P has a lot of power over person Q,
  • and P is not sadistic,
  • and P is sufficiently secure/well-resourced that P doesn't "need" to exploit Q,
  • then P will not intentionally do anything that would be horrible for Q?

If so, how do you reconcile that with e.g. non-sadistic serial killers, rapists, or child abusers? Or non-sadistic narcissists in whose ideal world everyone else would be their worshipful subject/slave?

That last point also raises the question: Would you prefer the existence of lots of (either happily or grudgingly) submissive slaves over oblivion?

To me it seems that terrible outcomes do not require sadism. Seems sufficient that P be low in empathy, and want from Q something Q does not want to provide (like admiration, submission, sex, violent sport, or even just attention).[1] I'm confused as to how/why you disagree.


  1. Also, AFAICT, about 0.5% to 8% of humans are sadistic, and about 8% to 16% have very little or zero empathy. How did you arrive at "99% of humanity [...] are not so sadistic"? Did you account for the fact that most people with sadistic inclinations probably try to hide those inclinations? (Like, if only 0.5% of people appear sadistic, then I'd expect the actual prevalence of sadism to be more like ~4%.) ↩︎

Reply
Should you publish solutions to corrigibility?
rvnnt5mo10

It seems like you're assuming people won't build AGI if they don't have reliable ways to control it, or else that sovereign (uncontrolled) AGI would be likely the be friendly to humanity.

I'm assuming neither. I agree with you that both seem (very) unlikely. [1]

It seems like you're assuming that any humans succeeding in controlling AGI is (on expectation) preferable to extinction? If so, that seems like a crux: if I agreed with that, then I'd also agree with "publish all corrigibility results".


  1. I expect that unaligned ASI would lead to extinction, and our share of the lightcone being devoid of value or disvalue. I'm quite uncertain, though. ↩︎

Reply
Load More
3rvnnt's Shortform
5mo
3
13Should you publish solutions to corrigibility?
Q
5mo
Q
13
7Requesting feedback/advice: what Type Theory to study for AI safety?
Q
5y
Q
4