Random Developer — LessWrong

I have definitely seen "You're absolutely right!" in Claude Code sessions when I point out a major refactoring Claude missed.

Wei Dai's Shortform

Random Developer6d40

Thank you! Let me clarify my phrasing.

we can't safety build a superintelligence, and if we do, we will not remain in control.

When I speak of losing control, I don't just mean losing control over the AI. I also mean losing any real control over our future. The future of the human race may be decided at a meeting that we do not organize, that we do not control, and that we do not necessarily get to speak at.

I, do, however, agree that futures where someone remains in control of the superintelligence also look worrisome to me, because we haven't solved alignment of powerful humans in any lasting way despite 10,000 years of trying.

Wei Dai's Shortform

Random Developer6d32

We should also consider the possibility that we can't safety build a superintelligence and remain in control. What if "alignment" means, "We think we can build a superintelligence that's a slightly better pet owner for the human race, but we can't predict how it will evolve as it learns"? What if there's nothing better on offer?

I cannot rule this out as a major possibility, for all the reasons pointed out in IABIED. I think it's a possibility worth serious consideration when planning.

Don't Mock Yourself

Random Developer6d4023

When self-deprecating humor works, it's often a form of counter signaling. To review:

Moderately wealthy people often attempt to show off their wealth. This might be expensive cars or heavy gold neck chains, depending on culture. This is "signaling."
But the really rich are often content to downplay their visible wealth. I know of a billionaire who drives an ordinary pickup. Steve Jobs famously wore jeans and a turtleneck everywhere. The message this sends is "I'm not one of those unfortunate folks who need to around flashing their wealth like some kind of drug dealer. You can read about my wealth in the newspapers." This is "counter signaling", and the premise is that nobody would ever mistake a billionaire for a poor person, but they might (horrors) mistake a billionaire for one of those ordinary nouveau riche multimillionaires. If you have to flash your wealth around constantly, you're poor.

Self-deprecating humor works reasonably well as a form of counter signaling, when it's used in the sense that "I'm so confident and secure that I can make jokes at my own expense." It's the billionaire driving around town in a pickup, or Arnold Schwarzenegger making jokes at the expense of his masculinity, or even the peacock's tail.

But like a lot of counter signaling, when it goes wrong, it goes badly wrong. Two friends can tease each other savagely if they both know they're ride-or-die. But two near-friends can accidentally hurt each badly with teasing. I could tease my former startup boss in front of the entire company, but we'd all been through some shit together, and my boss and I were good friends. If you do that to a regular boss? You're in deep shit.

So if your self-deprecating humor is actually hurting your self worth, stop. You shouldn't talk yourself down if you actually secretly believe that it matters. Self-deprecating humor works for people who secretly believe that they are awesome. For everyone else with ordinary levels of self-regard? Don't go tearing yourself down.

How long do AI companies have to achieve significant capability gains before funding collapses?

Random Developer7d164

The dot com crash was tied to federal reserve interest rate increases that resulted in a sell off as investors moved towards less speculative investments.

I was working in the industry during the dotcom boom. And I can say with confidence that by the start of 2000, the startups had gotten very, very dumb. In many cases, they were dumber than the also-ran crypto and AI startups we've seen recently. A couple of them back in 2000 were my clients, though I tried to get rid of the dumb ones quickly.

There's a critical moment in every bubble where you start hearing investment advice from normies at cocktail parties, and people start publishing books insisting, "This time is different and the market will go up forever!" When your non-technical grand-uncle starts button-holing you about obscure cryptocurrencies, then the market has run out of suckers. And a crash is coming.

The Fed will generally cut interest rates around this point, on the theory that if people have enough money to invest in ideas that dumb, it's time to "take the punchbowl away." There's a bunch of economic modeling behind this, of course. But a good intuition is that if there's enough money in the system to invest in thousands of extremely dumb companies, we probably have too much liquidity.

I suppose the AI bubble could genuinely be different, in that we might build SkyNet before the bubble collapses on its own. That would, I guess, represent the popping of the 10,000-year Homo sapiens bubble. But I'd prefer to avoid that.

Experiments With Sonnet 4.5's Fiction

Random Developer7d100

That was coherent, and I moderately enjoyed reading it.

Science fiction editor Teresa Nielsen Hayden once wrote a blog post "Slushkiller", which described what it was like for an editor to sort through "slush", the unsolicited manuscripts submitted by aspiring authors. "Slush" is a lot like "slop", except it's human-written. And much of it is terrible, to the point that first-time readers become "slush drunk."

TNH classified slush using a 14 point scale, starting with:

Author is functionally illiterate.

Author has submitted some variety of literature we don’t publish: poetry, religious revelation, political rant, illustrated fanfic, etc.

Author has a serious neurochemical disorder, puts all important words into capital letters, and would type out to the margins if MSWord would let him.

..and ending with:

(You have now eliminated 95-99% of the submissions.)

Someone could publish this book, but we don’t see why it should be us.

Author is talented, but has written the wrong book.

It’s a good book, but the house isn’t going to get behind it, so if you buy it, it’ll just get lost in the shuffle.

Buy this book.

I feel like this short story falls somewhere in the (7-11) range. If I were feeling very generous, I might go with a variation of:

The book has an engaging plot. Trouble is, it’s not the author’s, and everybody’s already seen that movie/read that book/collected that comic.

Except, of course, LLMs don't fail in quite the same way as humans. They are quite competent at putting words together coherently, almost to a fault if you encourage them. But there is a deep underlying predictability to LLMs at every possible level. They almost always do the most predictable version of whatever it is they're trying to do. Which isn't surprising, coming from a next-token predictor. For programming (my profession), this is arguably an advantage. We often want the most predictable and boring code that fits some (possibly stringent) constraints. For writing, I think that you probably want to aspire to "delightful surprise" or "I never knew I wanted this, but it's exactly perfect."

But the fact that LLMs can now write very short stories that are technically solid, moderately humorous, and not-totally-pointless is a big step. Certainly, this story is better than a clear majority of what humans used to submit unsolicited to publishing houses. And it compares well to the median entry on Royal Road's "latest updates."

In my recent experiences with Claude Code, I would estimate that it beats 75-80% of the college interns I've hired in my career (in first week performance).

If you can't even imagine AGI within our lifetimes (possibly after 0-2 more transformer-sized breakthroughs, as Eliezer put it), you're not paying attention.

(I am pretty far from happy about this, because I believe robust alignment of even human-level intelligences is impossible in principle. But that's another discussion.)

We’ve automated x-risk-pilling people

Random Developer16d*63

Just for fun, I've been arguing with the AI that "We're probably playing Russian roulette with 5 chambers loaded, not 6. This is still a very dumb idea and we should stop." The AI is very determined to convince me that we're playing with 6 chambers, not 5.

The primary outcomes of this process have been to remind me:

Why I don't "talk" to LLMs about anything except technical problems or capability evals. "Talking to LLMs" always winds up feeling like "Talking to the fey". It's a weird alien intelligence with unknowable goals, and mythology is pretty clear that talking to the fey rarely ends well.
Why I don't talk to cult members. "Yes, yes, you have an answer to everything, and a completely logical reason why I should believe in our Galatic Overlord Unex. But my counter argument is: No thank you, piss off.^[1]"
Ironically, why I really should be concerned about LLM persuasiveness. The LLM is more interested in regurgitating IABIED at me than it is in convincing me. And yet it's still pretty convincing. If it were better at modeling my argument, it could probably be very convincing.

So an impressive effort, and an interesting experiment. Whether it works, and whether it is a good thing, are questions to which I don't have an immediate answer.

G.K. Chesterton, in Orthodoxy: "Such is the madman of experience; he is commonly a reasoner, frequently a successful reasoner. Doubtless he could be vanquished in mere reason, and the case against him put logically. But it can be put much more precisely in more general and even æsthetic terms. He is in the clean and well-lit prison of one idea: he is sharpened to one painful point. He is without healthy hesitation and healthy complexity." ↩︎

shortplav

Random Developer16d10

Another potential complication is hard to get at philosophically, but it could be described as "the AIs will have something analogous to free will". Specifically, they will likely have a process where the AI can learn from experience, and resolve conflicts between incompatible values and goals it already holds.

If this is the case, then it's entirely possible that the AI's goals will adjust over time, in response to new information, or even just thanks to "contemplation" and strategizing. (AIs that can't adjust to changing circumstances and draw their own conclusions are unlikely to compete with other AIs that can.)

But if the AI's values and goals can be updated, then ensuring even vague alignment gets even harder.

Nice-ish, smooth takeoff (with imperfect safeguards) probably kills most "classic humans" in a few decades.

Random Developer16d*103

I think a lot about Homo erectus, which went extinct a bit over 100,000 years ago. They were quite smart, but they were ultimately competing for resources with smarter hominids. Even the chimps and the gorillas mostly survive in tiny numbers, because they didn't compete too directly with our ancestors, and because we'd be sad if the last ones went extinct. So we make a half-assed attempt to save them.

In the medium-term, intelligence is really valuable! The species that can (let's say) win a Nobel Prize every six months of GPU time, and copy itself in seconds, is going to rapidly win any evolutionary competition. The world must have been pretty bewildering for Homo erectus towards the end. When the ancestors of the Neaderthals and modern humans showed up, they brought better tools, and much better communication and coordination.

Or to put it another way, the family dog may be loved and pampered, but nobody asks it whether it wants to be spayed, or if the family should buy a new house. The Homo sapiens all sit around a table and make those decisions without the dog's input. And maybe the only way to put food on the table is to move to the city, and the only good apartment choices forbid pets.

I don't think the near-term details matter nearly as much as people think. Takeoff in 5 months, 5 years, or 20 years? Incomprehensible alien superintelligences, or vaguely friendly incomprehensible superintelligences who care about us about as much as we care about chimps? A single AI, or competing AIs? The net result is that humans aren't making the decisions about our future. "Happy humans" have just become a niche luxury good for things that can run rings around us, with their own goals and problems. And those things, since they can replicate and change, will almost certainly face evolutionary forces of their own.

I definitely think the "benevolent godlike singleton" is just as likely to fail in horrifying ways as any other scenario. Once you permanently give away all your power, how do you guarantee any bargain?

I see superintelligence a lot like a diagnosis of severe metastatic cancer. It's not whether it's going to kill you, it's a question of how many good years of life you can buy before the end. I support an AI halt, because every year we stay halted buys humans more years of life. I suspect that "alignment" is as futile as Homo erectus trying to "align" Homo sapiens. It's not completely hopeless—dogs did manage to sort of "align" humans—but I think the absolute best-case scenario is "humans as house pets, with owners of varying quality, likely selecting us to be more pleasing pets." Which, thank you, but that doesn't actually sound like success? And all the other possibilities go rapidly downhill from there. And Eliezer would argue that the "house pets" solution is a mirage and he may be right.

I'd prefer to put off rolling those dice as long as we can.

Buck's Shortform

Random Developer19d*30

Thinking about the advisor's comments:

And, from our understanding of your political goals, "treaty" is probably more like the thing you actually want. But, either is kinda reasonable to say, in this context.

What I understand Yudkowsky and Soares want might be summarized as something like:

"Large numbers of GPUs should be treated like large numbers of uranium gas centrifuges."
"Publishing details of certain AI algorithms should be treated like publishing detailed engineering guidelines for improving the yield of a nuclear device."
~~"Researching certain kinds of AI algorithms should be treated like doing gain of function research into highly contagious airborne Ebola."~~ Actually, we probably don't take bio threats nearly as seriously as we should.

The thing they want here includes a well-written treaty. Or an "international agreement." If you buy their assumptions, then yes, you would want to lay out bright lines around things like data center capacity and monitoring, chip fabs, and possibly what kind of AI research is publishable.

But nuclear deterence also has quite a few other moving parts beyond the international agreements, including:

The gut knowledge of the superpowers that if they screw this up, then their entire civilization dies.^[1]
A tense, paranoid standoff between the key players.^[2]
A system of economic sanctions strongly backed by major powers.
Mysterious bad things happening to uranium centrifuges.
Quiet conversations between government officials and smart people where the officials say things like, "Pretty please never mention that idea again."^[3]

The key point in all of these circumstances is that powerful people and countries believe that "If we get this wrong, we might die." This isn't a case of "We want 80% fewer of our cities to be blown up with fusion bombs". It's a case of "We want absolutely none of our cities blown up by fusion bombs, because it won't stop with just one or two."

And so the rule that "You may only own up to X uranium gas centrifuges" is enforced using multiple tools, ranging from treaties/agreements to quiet requests to unilateral exercises of state power.

Possibly while singing "Duck and Cover". Which is actually decent advice for a nuclear war. Think of a nuclear explosion as a cross between a tornado and a really bright light that kills you. Getting away from a window and under a desk is not the worst heuristic, and even simple walls provide some shielding against gamma radiation. Sadly, this probably doesn't work against SkyNet, no matter what the meme suggests. But an entire generation of children saw these videos and imagined their deaths. And some of those people still hold power. When the last of them retire, nuclear deterence will likely weaken. ↩︎
"The whole point of a Doomsday machine is lost if you keep it a secret!" ↩︎
This is paraphrased, but it's from a real example. ↩︎

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments