LESSWRONG
LW

267
Signer
621Ω314390
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
4Signer's Shortform
2y
1
4Signer's Shortform
2y
1
No wikitag contributions to display.
Wei Dai's Shortform
Signer5h10

We can punt on values for now

What's wrong with just using AI for obvious stuff like curing death while you solve metaethics? Not necessary disagree about usefulness of people in the field changing their attitude, but more towards "the problem is hard, so we should not run CEV on day one".

Reply
On Fleshling Safety: A Debate by Klurl and Trapaucius.
Signer4d2218

I don’t deny the existence of some filters and selection pressures! I am saying that the filter you are pointing to, is not quantitatively strong enough and narrow enough to pinpoint only korrigibility as its singular outcome!

I think that's the best wording of disagreement I've seen. What would be better is to see a quantitative justification grounded in reality. Because as it stands Ezra Klein just says "looks strong enough to me".

Reply
[Question] What the discontinuity is, if not FOOM?
Signer1mo10

If all AIs are scheming, they can take over together. If a world with a powerful AI that is actually on humanity's side is assumed instead, then at some level of power of friendly AI you probably can run unaligned AI and it will not be able to do much harm. But just assuming there being many AIs doesn't solve scheming by itself - if training actually works as bad as predicted, then no AI of many would be aligned enough.

Reply
Beyond the Zombie Argument
Signer1mo10

Russelian monism struggles with Epiphenomenality: if the measurable, structural properties are sufficient to predict what happens, the the phenomenal properties are along for the ride.

I mean, it's monism - it supposed to only has one type of stuff, obviously structural properties only work, because of underlying phenomenal/physical substrate.

furthermore, since mental states are ultimately identical to physical brain states, they share the causal powers of brain states (again without the need to posit special explanatory apparatus such as “psychophysical laws”), and in that way epiphenomenalism is avoided.

I don't see how having two special maps has anything to do with monistic ontology, that enables casual closure. What's the problem with just having neutral-monistic ontology, like you say Dual-aspect neutral monism has, and use normal physical epistemology?

the epistemic irreducibility of the mental to the physical is also accepted.

Why? If ontologically there is only one type of stuff, then you can reduce mental description to physical, because they describe one reality. Same way you reduce old physical theory to a new one.

Reply
[Question] What the discontinuity is, if not FOOM?
Signer1mo10

Why would that be discontinuous?

Because incremental progress missed deception.

I’m arguing against 99%

I agree such confidence lacks justification.

Reply
Beyond the Zombie Argument
Signer1mo1-9

I don't think there is a need to qualify it as a potential solution - Russellian Monism just solves the Hard Problem.

Reply
[Question] What the discontinuity is, if not FOOM?
Signer1mo10

I don't think anyone is against incremental progress. It's just that if after incremental progress AI takes over, then it's not good enough alignment. And what's the source of confidence in it being enough?

"Final or nonexistent" seems to be appropriate for scheming detection - if you missed only one way for AI to hide it's intentions, it will take over. So yes, degree of scheming in broad sense and how much you can prevent it is a crux and other things depend on it. Again, I don't see how you can be confident that future AI wouldn't scheme.

Reply
A non-review of "If Anyone Builds It, Everyone Dies"
Signer1mo20

I just think that it wouldn’t be the case that we had one shot but we missed it, but rather had many shots and missed them all.

This interpretation only works if by missed shots you mean "missed opportunities to completely solve alignment". Otherwise you can observe multiple failures along the way and fix observable scheming, but you only need to miss one alignment failure on the last capability level. The point is just that your monitoring methods, even improved after many failures to catch scheming in pre-takeover regime, are finally tested only when AI is really can take over. Because real ability to take over is hard to fake. And you can't repeat this test after you improved your monitoring, if you failed. Maybe your alignment training after previous observed failure in pre-takeover regime really made AI non-scheming. But maybe you just missed some short thought where AI decided to not think about takeover since it can't win yet. And you'll need to rely on your monitoring without actually testing whether it can catch all such possibilities that depend on actual environment that allows takeover.

Reply
[Question] What the discontinuity is, if not FOOM?
Signer1mo90

if ASI is developed gradually , alignment can be tweaked as you go along.

The whole problem is that alignment, as in "AI doesn't want to take over in a bad way" is not assumed to be solved. So you think your alignment training works for your current version of pre-takeover ASI, but actually previous versions already schemed for a long time, so running a version capable of takeover suddenly for you creates a discontinuity, where ASI takes over because it now can. It means all your previous alignment work and scheming detection is finally tested when you run a version capable of takeover and you can only fail once on this test. And training against scheming is predicted to not work and just create stealthier schemers. And "AI can take over" is predicted to be hard to fake for AI so you can't confidently check for scheming just by observing what it would do in fake environment.

Reply
Why Corrigibility is Hard and Important (i.e. "Whence the high MIRI confidence in alignment difficulty?")
Signer1mo88

The technical intuitions we gained from this process, is the real reason for our particularly strong confidence in this problem being hard.

I don't understand why anyone would expect such reason to be persuasive to other people. Like, to rely on illegible intuitions in the matters of human extinction just feels crazy. Yes, certainty doesn't matter, we need to stop either way. But still - is it even rational to be so confident when you rely on illegible intuitions? Why don't check yourself with something more robust, like actually writing your hypotheses, reasoning, and counting evidence? Sure there is something better than saying "I base my extreme confidence on intuitions".

And it's not only about corrigibility - "you don’t get what you train for" being universal law of intelligence in the real world, or utility maximization, especially in the limit, being good model of real things, or pivotal real world science being definitely so hard you can't possibly be distracted even once and still figure it out - everything is insufficiently justified.

Reply2
Load More