LESSWRONG
LW

529
Noosphere89
4265Ω1649227517
Message
Dialogue
Subscribe

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
An Opinionated Guide to Computability and Complexity
2Noosphere89's Shortform
3y
48
Noosphere89's Shortform
Noosphere8911mo*63

Link to long comments that I want to pin, but are too long to be pinned:

https://www.lesswrong.com/posts/Zzar6BWML555xSt6Z/?commentId=aDuYa3DL48TTLPsdJ

https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/?commentId=Gcigdmuje4EacwirD

https://www.lesswrong.com/posts/DCQ8GfzCqoBzgziew/?commentId=RhTNmgZqjJpzGGAaL

Reply
Will AI systems drift into misalignment?
Noosphere892d20

However, there’s an important dynamic this model misses, which is that, when the detector becomes less effective, the model’s capabilities might also decline. For example, suppose a lie detector functions because it leverages some important representations that help a model reason about its situation. A single gradient step could make the lie detector less reliable, but to do so, it might need to distort some of those important internal representations. And as a result, the model would become worse at e.g., writing code. So it’s possible that reducing the effectiveness of a detector requires paying a tax.


An important implication of this result is that good detectors of misalignment should have the property that either AIs will be detected, or if the AI can undetectably be misaligned, it should have lower capabilities compared to the hypothetical aligned AI.

And this is why even if we never get holy-grail interpretability until we face misalignment risk from AI, if at all, interpretability research is still useful even if we cannot explain everything that's going on in the model, because you can use this to make detectors that make AIs pay a tax for undetectable misalignment.

Similar stories hold for AI control, and it's a big reason why I like the fact that AI control is getting funded right now.

Reply1
Why Truth First?
Noosphere893d20

The other part is that partisans of the narrative overfocus on the other side's bad arguments because of most people not being able to check the arguments, and to be frank the entire area is a mess that I'm not willing to go down, and I instead focus on less-charged topics.

Like, all 4 of the examples are great demonstrations of why you need to be able to steelman your opponent, and one of the central problems in politics is people are trapped in a loop of destroying bad arguments instead of focusing on good arguments.

Reply
Variously Effective Altruism
Noosphere896d31

Yeah, that particular part sounded a lot like "I can't understand why people disagree with IABIED without suffering from PR mindset".

Like, this would be not out of place if someone couldn't actually understand or tolerate disagreement with the Grand Ideas, and I really wish this quote was stricken from the post entirely:

Another example I would cite was the response to If Anyone Builds It, Everyone Dies by the core EA people, including among others Will MacAskill himself and also the head of CEA. This was a very clear example of PR mindset, where quite frankly a decision was made that this was a bad EA look, the moves it proposes were unstrategic, and thus the book should be thrown overboard. If Will is sincere about this reckoning, he should be able to recognize that this is what happened.

Reply
p.b.'s Shortform
Noosphere897d20

Conditional on a slowdown in AI progress, my primary hypothesis is that the problem is that recent AI models haven't scaled much in compute compared to past models and have relied on RL progress, and current RL is becoming less and less of a free lunch than before and is actually less efficient than pre-training.

Which is a slight update against software-only singularity stories occurring.

Reply
Legible vs. Illegible AI Safety Problems
Noosphere898d60

I somewhat agree with this, but I don't agree with conclusions like illegible problems being made legible means the value of working on the problem flips sign, and I want to explain why I disagree with this:
 

  1. I generally believe that even unsolved legible problems won't halt deployment of powerful AIs, an example scenario is here, at least without blatant signs that are basically impossible to spin, and even more importantly, not halting the deployment of powerful AIs might be the best choice we have, with inaction risk being too high for reasonable AI developers (for example, Anthropic) to avoid shutting down.
  2. One of my general beliefs on philosophical problems is that lots of the solutions to philosophical problems will be unsatisfying, and most importantly here the general solution doesn't matter, because more specific/tailored solutions that reflect what our universe is like is also fine, which is a very common way to make tractable previously intractable problems by changing the problem statement.
  3. Because of the fact that a lot of philosophical problems only matter when AI capabilities are very, very high (as in, the thought experiments that motivate them assume almost arbitrarily high capabilities from AIs), this means that human work on them doesn't actually matter, and this has to be delegated to (aligned) ASIs. This is also strengthened to the extent that philosophical problems require capabilities insights to solve, and they're roadblockers for AI's value, meaning AI folks will be incentivized to solve them by default.

More generally, a big reason why people are focusing on more legible problems nowadays is in large part because of a shift in focus in what regime of AI capabilities to target for safety interventions, and in particular there's a lot less focus on the post-intelligence explosion era where AIs can do stuff that 0 humans can reliably hope to do, and much more focus on the first steps of say, AI fully automating AI R&D, where it's easier to reason about intervention effectiveness and you can rely more on non-perfect/not fully justifiable solutions like AI control.

Reply
Comparative advantage & AI
Noosphere8912d30

You didn't actually answer the question posed, which was "Why couldn't humans and ASI have peaceful trades even in the absence of empathy/love/alignment to us rather than killing us?" and not "Why would we fail at making AIs that are aligned/have empathy for us?"

Reply
What's up with Anthropic predicting AGI by early 2027?
Noosphere8913d3-3

However, I'd also update toward the current paradigm continuing to progress at a pretty fast clip and this would push towards expecting powerful AI in the current paradigm within 15 years (and probably within 10).

This specific prediction can't occur, due to data limitations, but Moore's law leading to passable (not a perfect emulation) simulation of human brains means that I do agree with the timeline medians, so I do generate similar conclusions.

The next 5-6 years will be a big test of whether the pure/strong version of the scaling hypothesis is correct in AI.

So yeah, TAI timelines being 10-15 years as a expectation are reasonable, but that has to come through a different paradigm.

Reply
Human Values ≠ Goodness
Noosphere8915d50

Indeed, you could make a very reasonable argument that the entire reason AI might be dangerous is because once it's able to automate away the entire economy, as an example, defection no longer has any cost and has massive benefits (at least conditional on no alignment in values).

The basic reason why you can't defect easily and gain massive amounts of utility from social systems is a combo of humans not being able to evade enforcement reliably, due to logistics issues, combined with people being able to reliably detect defection in small groups due to reputation/honor systems, and combined with the fact that humans as individuals are far, far less powerful even selfishly as individuals than as cooperators.

This of course breaks once AGI/ASI is invented, but John Wentworth's post doesn't need to apply to post-AGI/ASI worlds.

Reply
RSPs are pauses done right
Noosphere8917d-2-17

BTW, that part of the interview is also why the claim that Anthropic violated its RSP by not stopping research/deployment of new models upon not having ASL-3 security is incorrect, as RSPs never were a framework that allowed for pausing unilaterally.

More generally, it's useful to keep this in mind the next time a controversy over a RSP violation happens, as I predict it will happen again.

[This comment is no longer endorsed by its author]Reply11
Load More
122The main way I've seen people turn ideologically crazy [Linkpost]
25d
22
22Exponential increase is the default (assuming it increases at all) [Linkpost]
2mo
0
13Is there actually a reason to use the term AGI/ASI anymore?
Q
2mo
Q
5
79But Have They Engaged With The Arguments? [Linkpost]
3mo
17
18LLM Daydreaming (gwern.net)
4mo
2
11Difficulties of Eschatological policy making [Linkpost]
5mo
3
7State of play of AI progress (and related brakes on an intelligence explosion) [Linkpost]
7mo
0
57The case for multi-decade AI timelines [Linkpost]
7mo
22
15The real reason AI benchmarks haven’t reflected economic impacts
7mo
0
22Does the AI control agenda broadly rely on no FOOM being possible?
Q
8mo
Q
3
Load More
Acausal Trade
5 months ago
(+18/-18)
Shard Theory
a year ago
(+2)
RLHF
a year ago
(+27)
Embedded Agency
3 years ago
(+640/-10)
Qualia
3 years ago
(-1)
Embedded Agency
3 years ago
(+314/-43)
Qualia
3 years ago
(+74/-4)
Qualia
3 years ago
(+20/-10)