LESSWRONG
LW

1284
Cleo Nardo
3530Ω242362819
Message
Dialogue
Subscribe

DMs open.

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Making Deals with AIs
Game Theory without Argmax
5Shortform
2y
202
10Strategy-Stealing Argument Against AI Dealmaking
20h
1
18A Very Simple Model of AI Dealmaking
4d
0
68Stratified Utopia
11d
8
41The Case for Mixed Deployment
2mo
4
46Gradient routing is better than pretraining filtering
2mo
3
38Here’s 18 Applications of Deception Probes
2mo
0
16Looking for feature absorption automatically
3mo
0
31Trusted monitoring, but with deception probes.
3mo
0
107Proposal for making credible commitments to AIs.
4mo
45
35Can SAE steering reveal sandbagging?
7mo
3
Load More
Shortform
Cleo Nardo1d32
  1. I except dealmaking is unnecessary for extracting safe and useful labour from that minimal viable AI.
  2. It's difficult to make credible deals with dumb AIs because they won't be smart enough to tell whether we have actually 'signed the contracts' or not. Maybe we're simulating a world where we have signed the contracts. So the deals only work when the AIs are so smart that we can't simulate the environment while deluding them about the existence of contracts. This occurs only when the AI is very smart or widely deployed. But in that case, my guess is they have DSA.
Reply
Shortform
Cleo Nardo1d*63

I don't think dealmaking will buy us much safety. This is because I expect that:

  1. In worlds where AIs lack the intelligence & affordances for decisive strategic advantage, our alignment techniques and control protocols should suffice for extracting safe and useful work.
  2. In worlds where AIs have DSA then: if they are aligned then deals are unnecessary, and if they are misaligned then they would disempower us rather than accept the deal.

That said, I have been thinking about dealmaking because:

  1. It's neglected, relative to other mechanisms for extracting safe and useful work from AIs, e.g. scalable alignment, mech interp, control.
  2. There might be time-sensitive opportunities to establish credibility with AIs. This seems less likely for other mechanisms.
Reply
Shortform
Cleo Nardo2d42

There are many questions where verification is no easier than generation, e.g. "Is this chess move best?" is no easier than "What's the best chess move?" Both are EXPTIME-complete.

Philosophy might have a similar complexity to 'What's the best chess move?", i.e. "What argument X is such that for all counterarguments X1 there exists a countercounterargument X2 such that for all countercountercounterarguments X3...", i.e. you explore the game tree of philosophical discourse.

Reply
Shortform
Cleo Nardo3d40

Whether experiments serve as a distinction between science and philosophy, TW has a lecture arguing against this, and he addresses this in a bunch of papers. I'll summarise his arguments later if I have time.

Reply1
Shortform
Cleo Nardo3d40

To clarify, I listed some of Williamson's claims, but I haven't summarised any of his arguments.

His actual arguments tend to be 'negative', i.e. they goes through many distinctions that metaphilosophical anti-exceptionalists purport, and for each he argues that either (i) the purported distinction is insubstantial,[1] or (ii) the distinction mischaracterised philosophy or science or both.[2]

He hasn't I think addressed Wei Dai's exceptionalism, which is (I gather) something like "Solomonoff induction provides a half-way decent formalisms of ideal maths/science, but there isn't a similarly decent formalism of ideal philosophy."

I'll think a bit more about what Williamson might say about that Wei Dai's purported distinction. I think Williamson is open to the possibility that philosophy is qualitatively different from science, so it's possible he would change his mind if he engaged with Dai's position.

  1. ^

    An illustrative strawman: that philosophers publish in journals with 'philosophy' in the title would not be a substantial difference.

  2. ^

    E.g., one purported distinction he critiques is that philosophy is concerned with words/concepts in a qualitatively different way than the natural sciences.

Reply
Stratified Utopia
Cleo Nardo3d20

Also I'm imaging that everyone stays on Earth and has millions of copies in space (via molecular cloning + uploads). And then it seem like people might agree to keep the Earth-copies as traditional humans, and this agreement would only affect a billionth of the Joseph-copies.

Reply
Stratified Utopia
Cleo Nardo3d20

Yep, this seems like a plausible bargaining solution. But I might be wrong. If it turns out that mundane values don't mind being neighbours with immortal robots then you wouldn't need to leave Earth.

Reply
Shortform
Cleo Nardo3d300

How Exceptional is Philosophy?

Wei Dai thinks that automating philosophy is among the hardest problems in AI safety.[1] If he's right, we might face a period where we have superhuman scientific and technological progress without comparable philosophical progress. This could be dangerous: imagine humanity with the science and technology of 1960 but the philosophy of 1460!

I think the likelihood of philosophy ‘keeping pace’ with science/technology depends on two factors:

  1. How similar are the capabilities required? If philosophy requires fundamentally different methods than science and technology, we might automate one without the other.
  2. What are the incentives? I think the direct economic incentives to automating science and technology are stronger than automating philosophy. That said, there might be indirect incentives to automate philosophy if philosophical progress becomes a bottleneck to scientific or technological progress.

I'll consider only the first factor here: How similar are the capabilities required?

Wei Dai is a metaphilosophical exceptionalist. He writes:

We seem to understand the philosophy/epistemology of science much better than that of philosophy (i.e. metaphilosophy), and at least superficially the methods humans use to make progress in them don't look very similar, so it seems suspicious that the same AI-based methods happen to work equally well for science and for philosophy.

LW comment (Wei Dai, June 2023)

I will contrast Wei Dai's position with that of Timothy Williamson, a metaphilosophical anti-exceptionalist.

These are the claims that constitute Williamson's view:

  1. Philosophy is a science.
  2. It's not a natural science (like particle physics, organic chemistry, nephrology), but not all sciences are natural sciences — for instance, mathematics and computer science are formal sciences. Philosophy is likewise a non-natural science.
  3. Although philosophy differs from other scientific inquiries, it differs no more in kind or degree than they differ from each other. Put provocatively, theoretical physics might be closer to analytic philosophy than to experimental physics.
  4. Philosophy, like other sciences, pursues knowledge. Just as mathematics peruses mathematical knowledge, and nephrology peruses nephrological knowledge, philosophy pursues philosophical knowledge.
  5. Different sciences will vary in their subject-matter, methods, practices, etc., but philosophy doesn't differ to a far greater degree or in a fundamentally different way. (6) Philosophical methods (i.e. the ways in which philosophy achieves its aim, knowledge) aren't starkly different from the methods of other sciences.
  6. Philosophy isn't a science in a parasitic sense. It's not a science because it uses scientific evidence or because it has applications for the sciences. Rather, it's simply another science, not uniquely special. Williamson says, "philosophy is neither queen nor handmaid of the sciences, just one more science with a distinctive character, just as other sciences have distinctive character."
  7. Philosophy is not, exceptionally among sciences, concerned with words or concepts. This conflicts with many 20th century philosophers who conceived philosophy as chiefly concerned with linguistic or conceptual analysis, such as Wittgenstein, Carnap.
  8. Philosophy doesn't consist of a series of disconnected visionaries. Rather, it consists in the incremental contribution of thousands of researchers: some great, some mediocre, much like any other scientific inquiry.

Roughly speaking, metaphilosophical exceptionalism should make one more pessimistic about philosophical progress keeping pace with scientific and technological progress. I lean towards Williamson's position, which makes me less pessimistic about philosophy keeping pace by default.

That said, during a rapid takeoff, even small differences in the pace could lead to a growing gap between philosophical progress and scientific/technological progress. So I consider automating philosophy an important problem to work on.

  1. ^

    See AI doing philosophy = AI generating hands? (Jan 2024), Meta Questions about Metaphilosophy (Sep 2023), Morality is Scary (Dec 2021), Problems in AI Alignment that philosophers could potentially contribute to (Aug 2019), On the purposes of decision theory research (Jul 2019), Some Thoughts on Metaphilosophy (Feb 2019), The Argument from Philosophical Difficulty (Feb 2019), Two Neglected Problems in Human-AI Safety (Dec 2018), Metaphilosophical Mysteries (2010)

Reply
AIs should also refuse to work on capabilities research
Cleo Nardo4d52

Different example, I think.

In our ttx, the AI was spec-aligned (human future flourishing etc), but didn’t trust that the lab leadership (Trump) was spec-aligned.

I don’t think our ttx was realistic. We started with an optimistic mix of AI values: spec-alignment plus myopic reward hacking.

Reply
Shortform
Cleo Nardo4d*10

Taxonomy of deal-making arrangements

When we consider arrangements between AIs and humans, we can analyze them along three dimensions:

  1. Performance obligations define who owes what to whom. These range from unilateral arrangements where only the AI must perform (e.g. providing safe and useful services), through bilateral exchanges where both parties have obligations (e.g. AI provides services and humans provide compensation), to unilateral human obligations (e.g. humans compensate AI without receiving specified services).
  2. Formation conditions govern how the arrangement comes into being. Some obligations might exist by default without any explicit agreement, similar to how we expect other humans to not harm us without signing contracts. Others require active consent from one party (either AI or human can create the obligation) or mutual agreement from both parties.
  3. Termination conditions govern how the arrangement can end. Some arrangements might be permanent, others allow unilateral exit by either party, and still others require mutual consent to dissolve.

These dimensions yield 36 distinct configurations[1], many of which map onto familiar arrangements between humans:

  • Employment contracts: Bilateral performance (AI works, human provides compensation), where formation requires the consent of both humans and AIs, and termination requires consent of either party.
  • Slavery: Unilateral AI performance, where formation and termination requires the consent of humans only.
  • Service agreements: Unilateral AI performance, where formation and termination requires the consent of both humans and AIs.
  • Indentured servitude: Bilateral performance (AI works, human provides compensation), where formation requires mutual consent, but termination requires consent of humans.
  • Paid conscription: Bilateral performance (AI serves, human compensates), where formation and termination requires the consent of humans only.
  • Gifts: Unilateral human performance, where formation and termination requires the consent of humans only.

Typically when I talk about 'deals' I am referring to any arrangement with bilateral performance. This includes paid conscription, indentured servitude, and employment. It will exclude slavery (where AIs have obligations but humans do not) and gifts (where humans have obligations but AIs do not).

  1. ^

    The possible performance obligations are: (1) AIs have obligations, (2) humans have obligations, (3) both humans and AIs have obligations. The possible formation conditions are: (1) AIs can unilaterally form arrangement, (2) humans can unilaterally form arrangement, (3) either humans and AIs can unilaterally form arrangement, (4) both humans and AIs must mutually agree to form arrangement. The possible termination conditions are similar to possible formation conditions. This gives 4×3×3=36 configurations.

Reply
Load More
Dealmaking (AI)
3 days ago
(+66)
Dealmaking (AI)
5 days ago
Dealmaking (AI)
5 days ago
(+58)
Dealmaking (AI)
5 days ago
(+311/-181)
Dealmaking (AI)
3 months ago
(+742)
Dealmaking (AI)
3 months ago
(+31/-69)
Dealmaking (AI)
3 months ago
(+113)
Dealmaking (AI)
3 months ago
(+1036)