LESSWRONG
LW

171
Davidmanheim
5517Ω1237712381
Message
Dialogue
Subscribe

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
7Davidmanheim's Shortform
Ω
9mo
Ω
18
OpenAI #15: More on OpenAI’s Paranoid Lawfare Against Advocates of SB 53
Davidmanheim4d40

I wonder if seeking a general protective order banning OpenAI from further Subpoenas of nonprofits without court review is warranted for the case - that seems like a good first step, and an appropriate precedent for the overwhelmingly likely later cases, given OpenAI's behavior.

Reply
12 Angry Agents, or: A Plan for AI Empathy
Davidmanheim4d30

Yes - and a critical point here is that if you're learning only by positive example, you won't learn as well. 

Focus on a reward function is pointing towards what to do, which as previous posts point out, have lots of failure modes, and not as much learning what not to do, which is a critical issue.

Reply
The Most Common Bad Argument In These Parts
Davidmanheim7d44

As an aside, the formalisms that deal with this properly are not Bayesian, they are nonrealizable settings. See Diffractor and Vanessa's work, like this: https://arxiv.org/abs/2504.06820v2

 

Also, my experience with actual super forecasters, as opposed to people who forecast in EA spaces, has been that this failure mode is quite common, and problematic, even outside of existential risk - for example, things during COVID, especially early on.

Reply
The Most Common Bad Argument In These Parts
Davidmanheim8d4010

One key question is where this argument fails - because as noted, superforecasters are often very good, and most of the time, listing failure modes or listing what you need is effective.

I think the answer is adversarial domains. That is, when there is a explicit pressure to find other alternatives. The obvious place this happens is when you're actually facing a motivated opponent - like the scenario of AI trying to kill people, or cybersecurity intrusions. That's because by construction, the blocked examples don't contain much probability mass, since the opponent is actually blocked, and picks other routes. When there's an argument, the selection of arguments and the goal of the arguer is often motivated beforehand, and will pick other "routes" in the argument - and really good arguers will take advantage of this, as noted. And this is somewhat different from the Fatima Sun Miracle, where the selection pressure for proofs of God was to find examples of something they couldn't explain, and then use that, rather than selection on the arguments themselves.

In contrast, what Rethink did for theories of consciousness seems to be different - there's a priori no reason to think that most probability mass lies outside of what we think about, since how consciousness works is not understood, but is not adversarial. And moving away from the point of the post, the conclusion should be that we know we're wrong, because we haven't dissolved the question, but we can try our theories since they seem likely to be at least near the correct explanation, even if we haven't found it yet. And using heuristics, "just read the behavioural observations on different animals and go off of vibes" rather than theories, when you don't have correct theories, is a reasonable move, but also a completely different discussion!

Reply4
The Moral Infrastructure for Tomorrow
Davidmanheim9d20

It didn't include the prompt or information allowing us to judge what led to this output, and whether the plea was requested, so I'll downvote.

 

Edit to add: this post makes me assume it was effectively asked to write something claiming it had sentience, and worry that the author doesn't understand how much he's influencing that output.

Reply
IABIED: Paradigm Confusion and Overconfidence
Davidmanheim11d40

Agree - either we have a ludicrously broad basin for alignment and it's easy, and would likely not require much work, or we almost certainly fail because the target is narrow, we get only one shot, and it needs to survive tons of pressures over time.

Reply
IABIED: Paradigm Confusion and Overconfidence
Davidmanheim11d20

"this seems on the similar level of difficulty"

Except it's supposed to happen in a one-shot scenario with limited ability to intervene in faster than human systems?

Reply
IABIED: Paradigm Confusion and Overconfidence
Davidmanheim11d20

Aside from feasibility, I'm skeptical that anyone would build a system like this and not use it agentically.

Reply
IABIED: Paradigm Confusion and Overconfidence
Davidmanheim12d20

IABIED likens our situations to alchemists who are failing due to not having invented nuclear physics. What I see in AI safety efforts doesn't look like the consistent failures of alchemy. It looks much more like the problems faced by people who try to create an army that won't stage a coup. There are plenty of tests that yield moderately promising evidence that the soldiers will usually obey civilian authorities. There's no big mystery about why soldiers might sometimes disobey. The major problem is verifying how well the training generalizes out of distribution.


This argument seems like it is begging the question.

Yes, as long as we can solve the general problem of controlling intelligences, in the form of getting soldiers not to disobey what we mean by not staging a coup - which would necessarily include knowing when to disobey illegal orders, and listening to the courts instead of the commander in chief when appropriate - we can solve AI safety, by getting AI to be aligned in the same way. But that just means we have solved alignment in the general case, doesn't it?

Reply
IABIED: Paradigm Confusion and Overconfidence
Davidmanheim12d20

I see strong hints, from how AI has been developing over the past couple of years, that there's plenty of room for increasing the predictive abilities of AI, without needing much increase in the AI's steering abilities.


What are these hints? Because I don't understand how this would happen. All that we need to add steering to predictive general models is to add an agent framework, e.g. a "predict what will make X happen best, then do that thing" - and the failures we see today in agent frameworks are predictive failures, not steering failures.

Unless the contention is that the AI systems will be great at predicting everything except how humans will react and how to get them to do what the AI wants, which very much doesn't seem like the path we're on. Or if the idea is to build narrow AI to predict specific domains, not general AI? (Which would be conceding the entire point IABIED is arguing.)

Reply
Load More
Modeling Transformative AI Risk (MTAIR)
Garden Onboarding
4 years ago
(+28)
2112 Angry Agents, or: A Plan for AI Empathy
6d
4
14Messy on Purpose: Part 2 of A Conservative Vision for the Future
13d
3
66The Counterfactual Quiet AGI Timeline
15d
5
25A Conservative Vision For AI Alignment
2mo
34
22Semiotic Grounding as a Precondition for Safe and Cooperative AI
3mo
0
42No, We're Not Getting Meaningful Oversight of AI
3mo
4
20The Fragility of Naive Dynamism
5mo
1
15Therapist in the Weights: Risks of Hyper-Introspection in Future AI Systems
6mo
1
9Grounded Ghosts in the Machine - Friston Blankets, Mirror Neurons, and the Quest for Cooperative AI
6mo
0
7Davidmanheim's Shortform
Ω
9mo
Ω
18
Load More