Meta-note related to the question: asking this question here, now, means you're answer will be filtered for people who stuck around with capital r Rationality and the current LessWrong denizens, not the historical ones who have left the community. But I think that most of the interesting answers you'd get are from people who aren't here at all or rarely engage with the site due to the cultural changes over the last decade.
OK, but we've been in that world where people have cried wolf too early at least since The Hacker Learns to Trust, where Connor doesn't release his GPT-2 sized model after talking to Buck.
There's already been a culture of advocating for high recall with no regards to precision for quite some time. We are already at the "no really guys, this time there's a wolf!" stage.
Right now, I wouldn't recommend trying either Replika or character.ai: they're both currently undergoing major censorship scandals. character.ai has censored their service hard, to the point where people are abandoning ship because the developers have implemented terrible filters in an attempt to clamp down on NSFW conversations, but this has negatively affected SFW chats. And Replika is currently being investigated by the Italian authorities, though we'll see what happens over the next week.
In addition to ChatGPT, both Replika and character.ai are driving towards people towards running their own AIs locally, AI non-proliferation is probably not in the cards now. /g/ has mostly coalesced around pygmalion-ai, but the best model they have is a 6B. As you allude to in a footnote, I am deliberately not looking at this tech until it's feasible to run locally because I don't want my waifu to disappear.
(More resources: current /g/ thread, current /mlp/ thread)
Didn't read the spoiler and didn't guess until half way through "Nothing here is ground truth".
I suppose I didn't notice because I already pattern matched to "this is how academics and philosophers write". It felt slightly less obscurant than a Nick Land essay, though the topic/tone aren't a match to Land. Was that style deliberate on your part or was it the machine?
Like things, simulacra are probabilistically generated by the laws of physics (the simulator), but have properties that are arbitrary with respect to it, contingent on the initial prompt and random sampling (splitting of the timeline).
What do the smarter simulacra think about the physics of which they find themselves in? If one was very smart, could they look at what the probabilities of the next token, and wonder about why some tokens get picked over others? Would they then wonder about how the "waveform collapse" happens and what it means?
While it’s nice to have empirical testbeds for alignment research, I worry that companies using alignment to help train extremely conservative and inoffensive systems could lead to backlash against the idea of AI alignment itself.
On the margin, this is already happening.
Stability.ai delayed the release of Stable Diffusion 2.0 to retrain the entire system on a dataset filtered without any NSFW content. There was a pretty strong backlash against this and it seems to have caused a lot of people to move towards the idea that they have to train their own models. (SD2.0 appeared to have worse performance on humans, presumably because they pruned out a large chunk of pictures with humans in it since they didn't understand how the range of the LAION punsafe
classifier, and the evidence of this is in the SD2.1 model card where they fine tuned 2.0 with a radically different punsafe
value.)
I know of at least one 4x A100 machine that someone purchased for fine tuning because of just that incident, and have heard rumors of a second. We should expect censored and deliberately biased models to lead to more proliferation of differently trained models, compute capacity, and the expertise to fine tune and train models.
Zack's series of posts in late 2020/early 2021 were really important to me. They were a sort of return to form for LessWrong, focusing on the valuable parts.
What are the parts of The Sequences which are still valuable? Mainly, the parts that build on top of Korzybski's General Semantics and focus hard core on map-territory distinctions. This part is timeless and a large part of the value that you could get by (re)reading The Sequences today. Yudkowsky's credulity about results from the social sciences and his mind projection fallacying his own mental quirks certainly hurt the work as a whole though, which is why I don't recommend people read the majority of it.
The post is long though, but it kind of has to be. For reasons not directly related to the literal content of this essay, people seem to have collectively rejected the sort of map-territory thinking that we should bring from The Sequences into our own lives. This post has to be thorough because there are a number of common rejoinders that have to be addressed. This is why I think this post is better for inclusion than something like Communication Requires Common Interests or Differential Signal Costs, which is much shorter, but only addresses a subset of the problem.
Since the review instructions ask how this affected my thinking, well...
Zack writes generally, but he writes because he believes people are not correctly reasoning in a current politically contentious topic. But that topic is sort of irrelevant: the value comes in pointing out that high status members of the rationalist community are completely flubbing lawful thinking. That made it thinkable that actually, they might be failing in other contexts.
Would I have been receptive to Christiano's point that MIRI doesn't actually have a good prediction track record had Zack not written his sequence on this? That's a hard counterfactual, especially since I had already lost a ton of respect for Yudkowsky by this point, in part because of the quality of thought in his other social media posting. But I think it's probable enough and these series of posts certainly made the thought more available.
The funny thing is that I had assumed the button was going to be buggy, though I was wrong how. The map header has improperly swallowed mouse scroll wheel events whenever it's shown; I had wondered if the button would also interpret them likewise since it was positioned in the same way, so I spent most of the day carefully dragging the scrollbar.
There must be some method to do something, legitimately and in good-faith, for people's own good.
"Must"? There "must" be? What physical law of the universe implies that there "must" be...?
Let's take the local Anglosphere cultural problem off the table. Let's ignore that in the United States, over the last 2.5 years, or ~10 years, or 21 years, or ~60 years (depending on where you want to place the inflection point), social trust has been shredded, policies justified under the banner of "the common good" have primarily been extractive and that in the US, trust is an exhausted resource. Let's ignore that OP is specifically about trying to not make one aspect of this problem worse. Let's ignore that high status individuals in the LessWrong and alignment community have made statements about whose values are actually worthwhile, in an public abandonment of the neutrality of CEV which might have made some sort of deal thinkable. Let's ignore that because that would be focusing on one local culture in a large multipolar world, and at the global scale, questions are even harder:
How do you intend to convince the United States Government to surrender control to the Chinese Communist Party, or vice versa, and form a global hegemon necessary to actually prevent research into AI? If you don't have one control the other, why should either trust that the other isn't secretly doing whatever banned AI research required the authoritarian scheme in the first place, when immediately defecting and continuing to develop AI has a risky, but high payout? If you do have one control the other, how does the subjugated government maintain the legitimacy with its people necessary to continue to be their government?
How do you convince all nuclear sovereign states to sign on to this pact? What do you do with nations which refuse? They're nuclear sovereign states. The lesson of Gaddafi and the lesson of Ukraine is that you do not give up your deterrent no matter what because your treaty counterparties won't uphold their end of a deal when it's inconvenient for them. A nuclear tipped Ukraine wouldn't have been invaded by Russia. There is a reason that North Korea continues to exist. (Also, what do you do when North Korea refuses to sign on?)
This response is enraging.
Here is someone who has attempted to grapple with the intellectual content of your ideas and your response is "This is kinda long."? I shouldn't be that surprised because, IIRC, you said something similar in response to Zack Davis' essays on the Map and Territory distinction, but that's ancillary and AI is core to your memeplex.
I have heard repeated claims that people don't engage with the alignment communities' ideas (recent example from yesterday). But here is someone who did the work. Please explain why your response here does not cause people to believe there's no reason to engage with your ideas because you will brush them off. Yes, nutpicking e/accs on Twitter is much easier and probably more hedonic, but they're not convincible and Quinton here is.