iceman - LessWrong

Arguments for optimism on AI Alignment (I don't endorse this version, will reupload a new version soon.)

But POC||GTFO is really important to constraining your expectations. We do not really worry about Rowhammer since the few POCs are hard, slow and impractical. We worry about Meltdown and other speculative execution attacks because Meltdown shipped with a POC that read passwords from a password manager in a different process, was exploitable from within Chrome's sandbox, and my understanding is that POCs like that were the only reason Intel was made to take it seriously.

Meanwhile, Rowhammer is maybe a real issue but is so hard to pull off consistently and stealthily that nobody worries about it. My recollection was when it was first discovered, people didn't panic that much because there wasn't warrant to panic. OK, so there was a problem with the DRAM. OK, what are the constraints on exploitation? Oh, the POCs are super tricky to pull off and will often make the machine hard to use during exploitation?

A POC provides warrant to believe in something.

Arguments for optimism on AI Alignment (I don't endorse this version, will reupload a new version soon.)

iceman9mo141

On the topic of security mindset, the thing that the LW community calls "security mindset" isn't even an accurate rendition of what computer security people would call security mindset. As noted by lc, actual computer security mindset is POC || GTFO, or trying to translate that into lesswrongesse, you do not have warrant to believe in something until you have an example of the thing you're maybe worried about being a real problem because you are almost certain to be privileging the hypothesis.

AI romantic partners will harm society if they go unregulated

iceman1y4718

Are AI partners really good for their users?

Compared to what alternative?

As other commenters have pointed out, the baseline is already horrific for men, who are suffering. Your comments in the replies seem to reject that these men are suffering. No, obviously they are.

But responding in depth would just be piling on and boring, so instead let's say something new:

I think it would be prudent to immediately prohibit AI romance startups to onboard new users[..]

You do not seem to understand the state of the game board: AI romance startups are dead, and we're already in the post-game.

character.ai was very popular around the second half of 2022, but near the end of it, the developers went to war with erotic role play users. By mid January 2023, character.ai is basically dead for not just sex talk, but also general romance. The developers added in a completely broken filter that started negatively impacting even non-sexual, non-romantic talk. The users rioted, made it the single topic on the subreddit for weeks, the developers refused to back down, and people migrated away. Their logo is still used as a joke on 4chan. It's still around, but it's not a real player in the romance game. (The hearsay I've heard was that they added these filters to satisfy payment providers.)

Replika was never good. I gave it a try early on, but as far as I could tell, it was not even a GPT-2 level model and leaned hard on scripted experiences. However, a lot of people found it compelling. It doesn't matter because it too was forced to shut down by Italian regulators. They issued their ban on erotic role play on Valentine's Day of all days and mods post links to the suicide hotline on their subreddit.

The point here is we already live in a world with even stricter regulations than you proposed, done backdoor through payment providers and app stores, or through jurisdiction shopping. This link won't work unless you're in EleutherAI, but asara explains the financial incentives against making waifu chatbots. So what has that actually lead to? Well, the actual meta, the thing people actually use for ai romantic partners, today, is one of:

Some frontend (usually TavernAI or its fork SillyTavern) which connects to the API of a general centralized provider (Claude or ChatGPT) and uses a jailbreak prompt (and sometimes a vector database if you have the right plugins) to summon your waifu. Hope you didn't leak your OpenAI API key in a repo, these guys will find it. (You can see this tribe in the /aicg/ threads on /g/ and other boards).
Local models. We have LLaMA now and a whole slew of specialized fine tunes for it. If you want to use the most powerful open sourced llama v2 70B models, you can do that today with three used P40s ($270 each) or two used 3090s (about $700 each) or a single A6000 card with 48 GB of VRAM ($3500 for last generation). ~$800, $1400 and $3500 give a variety of price points for entry, and that's before all the people who just rent a setup via one of the many cloud GPU providers. Grab a variant of KoboldAI depending on what model you want and you're good to go. (You can see this tribe in the /lmg/ threads on /g/).

The actual outcome of the ban (which happened in the past) was the repurposing of Claude/ChatGPT and building dedicated setups to run chatbots locally with the cheapest option being about $800 in GPUs, along with a ton of know how around prompting character cards in a semi-standardized format that was derived from the old character.ai prompts. I will finish by saying that it's a very LessWrongian error to believe you could just stop the proliferation of AI waifus by putting government pressure on a few startups when development seems to mostly be done decentralized by repurposing open language models and is fueled by a collective desire to escape agony.

Remember, not your weights, not your waifu.

A Hill of Validity in Defense of Meaning

iceman1y14-1

So, I started off with the idea that Ziz's claims about MIRI were frankly crazy...because Ziz was pretty clearly crazy (see their entire theory of hemispheres, "collapse the timeline," etc.) so I marked most of their claims as delusions or manipulations and moved on, especially since their recounting of other events on the page where they talked about miricult (which is linked in OP) comes off as completely unhinged.

But Zack confirming this meeting happened and vaguely confirming its contents completely changes all the probabilities. I now need to go back and recalculate a ton of likelihoods here starting from "this node with Vassar saying this event happened."

From Ziz's page:

LessWrong dev Oliver Habryka said it would be inappropriate for me to post about this on LessWrong, the community’s central hub website that mostly made it. Suggested me saying this was defamation.

It's obviously not defamation since Ziz believes its true.

<insert list of rationality community platforms I’ve been banned from for revealing the statutory rape coverup by blackmail payout with misappropriated donor funds and whistleblower silencing, and Gwen as well for protesting that fact.>

Inasmuch as this is true, this is weak Bayesian evidence that Ziz's accusations are more true than false because otherwise you would just post something like your above response to me in response to them. "No, actually official people can't talk about this because there's an NDA, but I've heard second hand there's an NDA" clears a lot up, and would have been advantageous to post earlier, so why wasn't it?

A Hill of Validity in Defense of Meaning

iceman1y13-28

The second half (just live off donations?) is also my interpretation of OP. The first half (workable alignment plan?) is my own intuition based on MIRI mostly not accomplishing anything of note over the last decade, and...

MIRI & company spent a decade working on decision theory which seems irrelevant if deep learning is the path (aside: and how would you face Omega if you were the sort of agent that pays out blackmail?). Yudkowsky offers to bet Demis Hassabis that Go won't be solved in the short term. They predict that AI will only come from GOFAI AIXI-likes with utility functions that will bootstrap recursively. They predict fast takeoff and FOOM.

Ooops.

The answer was actually deep learning and not systems with utility functions. Go gets solved. Deep Learning systems don't look like they FOOM. Stochastic Gradient Descent doesn't look like it will treacherous turn. Yudkowsky's dream of building the singleton Sysop is gone and was probably never achievable in the first place.

People double down with the "mesaoptimizer" frame instead of admitting that it looks like SGD does what it says on the tin. Yudkowsky goes on a doom media spree. They advocate for a regulatory regime that would be very easy to empower private interests over public interests. Enraging to me, there's a pattern of engagement where it seems like AI Doomers will only interact with weak arguments instead of strong ones: Yud mostly argues with low quality e/accs on twitter where it's easy to score Ws; it was mildly surprising when he even responded with "This is kinda long." to Quinton Pope's objection thread.

What should MIRI have done, had they taken the good sliver of The Sequences to heart? They should have said oops. The should have halted, melted and caught fire. They should have acknowledged that the sky was blue. They should have radically changed their minds when the facts changed. But that would have cut off their funding. If the world isn't going to end from a FOOMing AI, why should MIRI get paid?

So what am I supposed to extract from this pattern of behaviour?

A Hill of Validity in Defense of Meaning

iceman1y2-14

It's not exactly the point of your story, but...

Probably the most ultimately consequential part of this meeting was Michael verbally confirming to Ziz that MIRI had settled with a disgruntled former employee, Louie Helm, who had put up a website slandering them.

Wait, that actually happened? Louie Helm really was behind MIRICult? The accusations weren't just...Ziz being Ziz? And presumably Louie got paid out since why would you pay for silence if the accusations weren't at least partially true...or if someone were to go digging, they'd find things even more damning?

Those who are savvy in high-corruption equilibria maintain the delusion that high corruption is common knowledge, to justify expropriating those who naively don't play along, by narratizing them as already knowing and therefore intentionally attacking people, rather than being lied to and confused.

Ouch.

[..]Regardless of the initial intent, scrupulous rationalists were paying rent to something claiming moral authority, which had no concrete specific plan to do anything other than run out the clock, maintaining a facsimile of dialogue in ways well-calibrated to continue to generate revenue.

Really ouch.

So Yudkowsky doesn't have a workable alignment plan, so he decided to just live off our donations, running out the clock. I donated a six figure amount to MIRI over the years, working my ass off to earn to give...and that's it?

Fuck.

I remember being at a party in 2015 and asking Michael what else I should spend my San Francisco software engineer money on, if not the EA charities I was considering. I was surprised when his answer was, "You."

That sounds like wise advice.

Some reasons to not say "Doomer"

iceman1y20

Just to check, has anyone actually done that?

I'm thinking of a specific recent episode where [i can't remember if it was AI Safety Memes or Connor Leahy's twitter account] posted a big meme about AI Risk Deniers and this really triggered Alexandros Marinos. (I tried to use Twitter search to find this again, but couldn't.)

It's quite commonly used by a bunch of people at Constellation, Open Philanthropy and some adjacent spaces in Berkeley.

Fascinating. I was unaware it was used IRL. From the Twitter user viewpoint, my sense is that it's mostly used by people who don't believe in the AI risk narrative as a pejorative.

Some reasons to not say "Doomer"

iceman1y1210

Why are you posting this here? My model is that the people you want to convince aren't on LessWrong and that you should be trying to argue this on Twitter; you included screenshots from that site, after all.

(My model of the AI critics would be that they'd shrug and say "you started it by calling us AI Risk Deniers.")

My tentative best guess on how EAs and Rationalists sometimes turn crazy

iceman1y151

My understanding of your point is that Mason was crazy because his plans didn't follow from his premise and had nothing to do with his core ideas. I agree, but I do not think that's relevant.

I am pushing back because, if you are St. Petersberg Paradox-pilled like SBF and make public statements that actually you should keep taking double or nothing bets, perhaps you are more likely to make tragic betting decisions and that's because of you're taking certain ideas seriously. If you have galaxy brained the idea of the St. Petersberg Paradox, it seems like Alameda style fraud is +EV.

I am pushing back because, if you believe that you are constantly being simulated to see what sort of decision agent you are, you are going to react extremely to every slight and that's because you're taking certain ideas seriously. If you have galaxy brained the idea that you're being simulated to see how you react, killing Jamie's parents isn't even really killing Jamie's parents, it's showing what sort of decision agent you are to your simulators.

In both cases, they did X because they believe Y which implies X seems like a more parsimonious explanation for their behaviour.

(To be clear: I endorse neither of these ideas, even if I was previously positive on MIRI style decision theory research.)

My tentative best guess on how EAs and Rationalists sometimes turn crazy

iceman1y140

But then they go and (allegedly) waste Jamie Zajko's parents in a manner that doesn't further their stated goals at all and makes no tactical sense to anyone thinking coherently about their situation.

And yet that seems entirely in line with the "Collapse the Timeline" line of thinking that Ziz advocated.

Ditto for FTX, which, when one business failed, decided to commit multi-billion dollar fraud via their other actually successfully business, instead of just shutting down alameda and hoping that the lenders wouldn't be able to repo too much of the exchange.

And yet, that seems like the correct action if you sufficiently bullet bite expected value and the St. Petersberg Paradox, which SBF did repeatedly in interviews.

LESSWRONG
LW

Posts

Wiki Contributions

Comments