LESSWRONG
LW

NickGabs
3859480
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
An upcoming US Supreme Court case may impede AI governance efforts
NickGabs2y32

I think you’re probably right. But even this will make it harder to establish an agency where the bureaucrats/technocrats have a lot of autonomy, and it seems there’s at least a small chance of an extreme ruling which could make it extremely difficult.

Reply
An upcoming US Supreme Court case may impede AI governance efforts
NickGabs2y10

Yeah, I think they will probably do better and more regulations than if politicians were more directly involved, but I’m not super sanguine about bureaucrats in absolute terms.

Reply
Proposal: labs should precommit to pausing if an AI argues for itself to be improved
NickGabs2y10

This was addressed in the post: "To fully flesh out this proposal, you would need concrete operationalizations of the conditions for triggering the pause (in particular the meaning of "agentic") as well as the details of what would happen if it were triggered. The question of how to determine if an AI is an agent has already been discussed at length at LessWrong. Mostly, I don't think these discussions have been very helpful; I think agency is probably a "you know it when you see it" kind of phenomenon. Additionally, even if we do need a more formal operationalization of agency for this proposal to work, I suspect that we will only be able to develop such an operationalization via more empirical research. The main particular thing I mean to actively exclude by stipulating that the system must be agentic is an LLM or similar system arguing for itself to be improved in response to a prompt. "

Reply
Reliability, Security, and AI risk: Notes from infosec textbook chapter 1
NickGabs2y21

Thanks for posting this, I think these are valuable lessons and I agree it would be valuable for someone to do a project looking into successful emergency response practices. One thing this framing does also highlight is that, as Quintin Pope discussed in his recent post on alignment optimism, the “security mindset” is not appropriate for the default alignment problem. We are only being optimized against once we have failed to align the AI; until then, we are mostly held to the lower bar of reliability, not security. There is also the problem of malicious human actors where security problems could pop up before the AI becomes misaligned, but this failure mode seems less likely and less x-risk inducing than misalignment, and involves a pretty different set of measures (info sharing policies, encrypting the weights as opposed to training techniques or evals).

Reply
The 0.2 OOMs/year target
NickGabs2y36

I think concrete ideas like this that take inspiration from past regulatory successes are quite good, esp. now that policymakers are discussing the issue.

Reply
Want to win the AGI race? Solve alignment.
NickGabs2y10

I agree with aspects of this critique. However, to steelman Leopold, I think he is not just arguing that demand-driven incentives will drive companies to solve alignment due to consumers wanting safe systems, but rather that, over and above ordinary market forces, constraints imposed by governments, media/public advocacy, and perhaps industry-side standards will make it such that it is ~impossible to release a very powerful, unaligned model. I think this points to a substantial underlying disagreement in your models - Leopold thinks that governments and the public will "wake up" sufficiently quickly to catastrophic risk from AI such that there will be regulatory and PR forces which effectively prevent the release of misaligned models, including evals/ways of detecting misalignment that are more robust than those that might be used by ordinary consumers (which could as you point out likely be fooled by surface-level alignment due to RLHF).

Reply
Why does advanced AI want not to be shut down?
NickGabs2y30

How do any capabilities or motivations arise without being explicitly coded into the algorithm?

Reply
Load More
125Steering Llama-2 with contrastive activation additions
Ω
2y
Ω
29
20Science of Deep Learning more tractably addresses the Sharp Left Turn than Agent Foundations
2y
2
57An upcoming US Supreme Court case may impede AI governance efforts
2y
17
31Empirical Evidence Against "The Longest Training Run"
2y
0
3Proposal: labs should precommit to pausing if an AI argues for itself to be improved
2y
3
12AI Doom Is Not (Only) Disjunctive
2y
0
39We Need Holistic AI Macrostrategy
2y
4
16Takeoff speeds, the chimps analogy, and the Cultural Intelligence Hypothesis
3y
2
12Miscellaneous First-Pass Alignment Thoughts
3y
4
24Distillation of "How Likely Is Deceptive Alignment?"
3y
4
Load More