LESSWRONG
LW

4
David Matolcsi
1428121300
Message
Dialogue
Subscribe

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Obstacles in ARC's agenda
Christian homeschoolers in the year 3000
David Matolcsi1d2710

I'm pretty confused by the conclusion of this post. I was nodding along during the first half of the essay: I myself worry a lot about how I and others will navigate the dilemma of exposure to AI super-persuasion and addiction on one side, and paranoid isolationism on the other. 

But then in the conclusion of the post, you only talk about how people will fall into one of these two traps: isolationist religious communes locking their members in until the end of times. 

I worry more about the other trap: people foolishly exposing themselves to too much AI generated super-stimulus and getting their brain fried. I think much more people will be exposed to various addictive AI generated content than the number of people who have strong enough religious communities that they create an isolationist bubble.

I think it's plausible that the people who expose themselves to all the addictive stuff on the AI-internet will also sooner or later get captured by some isolationist bubble that keeps them locked away from the other competing memes: arguably that's the only stable point. But I worry that these stable points will be worse than the Christian co-ops you describe. 

I imagine an immortal man, in the year 3000, sitting at his computer, not having left his house or having talked to a human in almost a thousand years, talking with his GPT-5.5 based AI girlfriend and scrolling his personalized twitter feed, full of AI generated outrage stories rehashing the culture war fights of his youth. Outside his window, there is a giant billboard advertising "Come on, even if you want to fritter your life away, at least use our better products! At least upgrade your girlfriend to GPT-6!" But his AI girlfriend told him to shutter his window a thousand years ago, so the billboard is to no avail.

This is of course a somewhat exaggerated picture, but I really do believe that one-person isolation bubbles will be more common and more dystopian than the communal isolationism you describe. 

Reply
Christian homeschoolers in the year 3000
David Matolcsi1d399

in the year 3000, still teaching that the Earth is 6,000 years old

No, it will be 7000 years old by then.

Reply10
Thoughts on Gradual Disempowerment
David Matolcsi1mo40

On the other hand, there is another interesting factor in kings losing power that might be more related to what you are talking about (though I don't think this factor is as important as the threat of revolutions discussed in the previous comment).

My understanding is that part of the story for why kings lost their power is that the majority of people were commoners, so the best writers, artists and philosophers were commoners (or at least not the highest aristocrats), and the kings and the aristocrats read their work, and these writer often argued for more power to the people. The kings and aristocrats sometimes got sincerely convinced, and agreed to relinquish some powers even when it was not absolutely necessary for preempting revolutions.

I think this is somewhat analogous to the story of cultural AI dominance in Gradual Disempowerment: all the most engaging content creators are AIs, humans consume their content, the AIs argue for giving power to AIs, and the humans get convinced. 

I agree this is a real danger, but I think there might be an important difference between the case of kings and the AI future. 

The court of Louis XVI read Voltaire, but I think if there was someone equally witty to Voltaire who also flattered the aristocracy, they would have plausibly liked him more. But the pool of witty people was limited, and Voltaire was far wittier than any of the few pro-aristocrat humorists, so the royal court put up with Voltaire's hostile opinions. 

On the other hand, in a post-AGI future, I think it's plausible that with a small fraction of the resources you can get close to saturating human engagement. Suppose pro-human groups fund 1% of the AIs generating content, and pro-AI groups fund 99%. (For the sake of argument, let's grant the dubious assumption that the majority of economy is controlled by AIs.) I think it's still plausible that the two groups can generate approximately equally engaging content, and if humans find pro-human content more appealing, then that just wins out.

Also, I'm kind of an idealist, and I think part of the reason that Voltaire was successful is that he was just right about a lot of things, parliamentary government really leads to better outcomes than absolute monarchy from the perspective of a more-or-less shared human morality. So I have some hope (though definitely not certainty) that AI content creators competing in a free marketplace of ideas will only convince humanity to voluntarily relinquish power if relinquishing power is actually the right choice.

Reply
Thoughts on Gradual Disempowerment
David Matolcsi1mo51

I don't think that the example of kings losing their powers really supports your thesis here. That wasn't a seamless, subtle process of power slipping away. There was a lot of bloodshed and threat of bloodshed involved.

King Charles I tried to exercise his powers as a real king and go against the Parliament, but the people rebelled and he lost his head. After that, his son managed to restore the monarchy, though he needed to agree to some more restrictions on his powers. After that, James II tried to go against the Parliament again, and got overthrown and replaced by another guy who agreed to relinquish the majority of royal powers. After that, the king still had some limited say, but he they tried to do unpopular taxes in America, the colonies rebelled, and gained independence through a violent revolution. Then next door from England, Louis XVI tried to go against the will of his Assembly, and lost his head. After these, the British Parliament started to politely ask their kings to relinquish the remainder of their powers, and they wisely agreed, so their family could keep their nominal rulership, their nice castle, and most importantly, their head.

I think the analogous situation would be AIs violently over-taking some countries, and after that, the other countries bloodlessly surrendering to their AIs. I think this is much closer to the traditional picture of AI takeover than to the picture you are painting in Gradual Disempowerment.

Reply
Leon Lang's Shortform
David Matolcsi1mo183

Unfortunately, I don't think that "this is how science works" is really true. Science focuses on having a simple description of the world, while Solomonoff induction focuses on the description of the world plus your place in it, being simple.

This leads to some really weird consequences, which people sometimes refer to as the Solomonoff induction being malign.

Reply
Thomas Kwa's Shortform
David Matolcsi1mo40

Even more dramatically, it looks like Haiti's GDP per capita is still lower today than what it was during the time of slavery in the 1770s. This of course doesn't mean that the Haitians were better off back then than they are now (Haitian slavery was famously brutal, I think significantly worse even than US slavery). Still, it's an interesting data point for how efficient slavery-based cash crop production was in some places.

(My main source is this paper on Haitian economic history, plus looking at historical franc to usd conversion rates and inflation calculators.)

Reply
Linch's Shortform
David Matolcsi1mo217

Counter-evidence: I first read and watched the play in Hungarian translation, where there is no confusion about "wherefore" and "why". It still hasn't occurred to me that the line doesn't make sense, and I've never heard anyone else in Hungary pointing this out either.

I also think you are too literal-minded in your interpretation of the line, I always understood it to mean "oh Romeo, why are you who you are?" which makes perfect sense.

Reply
Training a Reward Hacker Despite Perfect Labels
David Matolcsi1mo2-1

Interesting result. 

Did you experiment with how the result depends on the base rate of hacking? 

Suppose you start with a model that you already finetuned to reward-hack 90% of the time. Then you do recontextualized training on non-hack examples. I assume this would decrease reward-hacking, as it doesn't make much of a difference that the training teaches the model to consider hacking (it already does), but the main effect is that it trains on examples where the model eventually doesn't hack. While in your case, the base rate was low enough that teaching the model to consider reward-hacking was the stronger effect. 

Do you agree with this assessment? Do you have a guess what the level of base-rate hacking is where recontextualized learning on non-hack example starts to decrease reward-hacking?

Other question:

What do you think would happen if you continued your recontextualized training on non-hack examples for much longer? I would expect that in the long run, the rate of reward-hacking would go down, eventually going below the original base-rate, maybe approaching zero in the limit. Do you agree with this guess? Did you test how the length of training affects the outcome?

Reply
The Problem
David Matolcsi1mo20

Sorry, I made a typo, the Fan Hui match was in 2015, I have no idea why I wrote 2021.

I think Scott's description is accurate, though it leaves out the years from 2011-2015 when AIs were around the level of the strongest amateurs, which makes the progress look more discontinuous than it was.

Reply1
The Problem
David Matolcsi1mo86

Separately from my specific comment on Go, I think that "people are misinformed in one direction, so I will say something exaggerated and false in the other direction to make them snap out of their misconception" is not a great strategy. They might notice that the thing you said is not true, ask a question on it, and then you need to back-track and they get confirmation of their belief that these AI people always exaggerate everything. 

I have once seen an AI safety advocate once talking to a skeptical person who was under the impression that AIs still can't piece together three logical steps. The advocate at some point said the usual line about the newest AIs having reached "PhD level capabilities" and the audience immediately called them out on that, and then they needed to apologize that of course they only meant PhD-level on specific narrow tests, and they didn't get to correct any of the audience's misconceptions.

Also, regardless of strategic considerations, I think saying false things is bad. 

Reply
Load More
6David Matolcsi's Shortform
4mo
13
6David Matolcsi's Shortform
4mo
13
44Obstacles in ARC's agenda: Low Probability Estimation
5mo
0
42Obstacles in ARC's agenda: Mechanistic Anomaly Detection
5mo
1
123Obstacles in ARC's agenda: Finding explanations
5mo
10
51Don't over-update on FrontierMath results
6mo
7
131"The Solomonoff Prior is Malign" is a special case of a simpler argument
10mo
46
113You can, in fact, bamboozle an unaligned AI into sparing your life
1y
173
62A very non-technical explanation of the basics of infra-Bayesianism
2y
9
22Infra-Bayesianism naturally leads to the monotonicity principle, and I think this is a problem
2y
6
108A mostly critical review of infra-Bayesianism
3y
9
Load More