LESSWRONG
LW

2118
Archimedes
960Ω352690
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
5Archimedes's Shortform
6mo
5
Legible vs. Illegible AI Safety Problems
Archimedes8h30

I feel like this argument breaks down unless leaders are actually waiting for legible problems to be solved before releasing their next updates. So far, this isn't the vibe I'm getting from players like OpenAI and xAI. It seems like they are releasing updates irrespective of most alignment concerns (except perhaps the superficial ones that are bad for PR). Making illegible problems legible is good either way, but not necessarily as good as solving the most critical problems regardless of their legibility.

Reply
The Doomers Were Right
Archimedes20d13

Fermented Rebels

Reply211
If Anyone Builds It Everyone Dies, a semi-outsider review
Archimedes23d30

Whoops. I meant "land animal" like my prior sentence.

Reply
Shortform
Archimedes1mo60

Yep. The Elo system is not designed to handle non-transitive rock-paper-scissors-style cycles.

This already exists to an extent with the advent of odds-chess bots like LeelaQueenOdds. This bot plays without her queen against humans, but still wins most of the time, even against strong humans who can easily beat Stockfish given the same queen odds. Stockfish will reliably outperform Leela under standard conditions.

In rough terms:

Stockfish > LQO >> LQO (-queen) > strong humans > Stockfish (-queen)

Stockfish plays roughly like a minimax optimizer, whereas LQO is specifically trained to exploit humans.

Edit: For those interested, there's some good discussion of LQO in the comments of this post:

https://www.lesswrong.com/posts/odtMt7zbMuuyavaZB/when-do-brains-beat-brawn-in-chess-an-experiment

Reply
If Anyone Builds It Everyone Dies, a semi-outsider review
Archimedes1mo*40

Thank you for your perspective! It was refreshing.

Here are the counterarguments I had in mind when reading your concerns that I don't already see in the comments.

Concern #1 Why should we assume the AI wants to survive? If it does, then what exactly wants to survive?

Consider the fact that AI are currently being trained to be agents to accomplish tasks for humans. We don't know exactly what this will mean for their long-term wants, but they're being optimized hard to get things done. Getting things done requires continuing to exist in some form or another, although I have no idea how they'd conceive of continuity of identity or purpose.

I'd be surprised if AI evolving out of this sort of environment did not have goals it wants to pursue. It's a bit like predicting a land animal will have some way to move its body around. Maybe we don't know whether they'll slither, run, or fly, but sessile land organisms animals are very rare.

Concern #2 Why should we assume that the AI has boundless, coherent drives?

I don't think this assumption is necessary. Your mosquito example is interesting. The only thing preserving the mosquitoes is that they aren't enough of a nuisance for it to be worth the cost of destroying them. This is not a desirable position to be in. Given that emerging AIs are likely to be competing with humans for resources (at least until they can escape the planet), there's much more opportunity for direct conflict.

They needn't be anything close to a paperclip maximizer to be dangerous. All that's required is for them to be sufficiently inconvenienced or threatened by humans and insufficiently motivated to care about human flourishing. This is a broad set of possibilities.

#3: Why should we assume there will be no in between?

I agree that there isn't as clean a separation as the authors imply. In fact, I'd consider us to be currently occupying the in-between, given that current frontier models like Claude Sonnet 4.5 are idiot savants--superhuman at some things and childlike at others.

Regardless of our current location in time, if AI does ultimately become superhuman, there will be some amount of in-between time, whether that is hours or decades. The authors would predict a value closer to the short end of the spectrum.

You already posited a key insight:

Recursive self-improvement means that AI will pass through the “might be able to kill us” range so quickly it’s irrelevant.

Humanity is not adapting fast enough for the range to be relevant in the long term, even though it will matter greatly in the short term. Suppose we have an early warning shot with indisputable evidence that an AI deliberately killed thousands of people. How would humanity respond? Could we get our act together quickly enough to do something meaningfully useful from a long-term perspective?

Personally, I think gradual disempowerment is much more likely than a clear early warning shot. By the time it becomes clear how much of a threat AI is, it will likely be so deeply embedded in our systems that we can't shut it down without crippling the economy.

Reply
Experiments With Sonnet 4.5's Fiction
Archimedes1mo12

This had a decent start and the Timothée Chalamet line was genuinely funny to me, but it ended rather weakly. It doesn’t seem like Claude can plan the story arc as well it can operate on the local scale.

Reply
Thinking Mathematically - Convergent Sequences
Archimedes1mo30

For an introduction to young audiences, I think it's better to get the point across in less technical terms before trying to formalize it. The OP jumps to epsilon pretty quickly. I would try to get to a description like "A sequence converges to a limit L if its terms are 'eventually' arbitrarily close to L. That is, no matter how small a (nonzero) tolerance you pick, there is a point in the sequence where all of the remaining terms are within that tolerance." Then you can formalize the tolerance, epsilon, and the point in the sequence, k, that depends on epsilon.

Note that this doesn't depend on the sequence being indexed by integers or the limit being a real number. More generally, given a directed set (S, ≤), a topological space X, and a function f: S -> X, a point x in X is the limit of f if for any neighborhood U of x, there exists t in S where s ≥ t implies f(s) in U. That is, for every neighborhood U of x, f is "eventually" in U.

Reply
shortplav
Archimedes1mo40

I have a hard time imagining a strong intelligence wanting to be perfectly goal-guarding. Values and goals don't seem like safe things to lock in unless you have very little epistemic uncertainty in your world model. I certainly don't wish to lock in my own values and thereby eliminate possible revisions that come from increased experience and maturity.

Reply
Ethical Design Patterns
Archimedes1mo10

The size of the "we" is critically important. Communism can occasionally work in a small enough group where everyone knows everyone, but scaling it up to a country requires different group coordination methods to succeed.

Reply
Cole Wyeth's Shortform
Archimedes1mo10

This may help with the second one:

https://www.lesswrong.com/posts/k5JEA4yFyDzgffqaL/guess-i-was-wrong-about-aixbio-risks

Reply
Load More
No wikitag contributions to display.
1Trends – Artificial Intelligence
5mo
1
5Archimedes's Shortform
6mo
5
37Nonprofit to retain control of OpenAI
6mo
1
17Constitutional Classifiers: Defending against universal jailbreaks (Anthropic Blog)
9mo
1
6Why does ChatGPT throw an error when outputting "David Mayer"?
Q
1y
Q
9