Raemon — LessWrong

LessWrong team member / moderator. I've been a LessWrong organizer since 2011, with roughly equal focus on the cultural, practical and intellectual aspects of the community. My first project was creating the Secular Solstice and helping groups across the world run their own version of it. More recently I've been interested in improving my own epistemic standards and helping others to do so as well.

I feel like I want here is a campaign to make sure history remembers the specific people who let this happen – names of board members, Attorney General Kathy Jennings, etc.

It feels achievable and correct to me for this to be a thing where, if you're going to do this, a lot of people associate your name with enabling the theft.

Do you have own off-the-cuff guesses about how you'd tackle the short feedbackloops problem?

Also, is it more like we don't know how to do short feedbackloops, or more like we don't even know how to do long/expensive loops?

(Putting the previous Wei Dai answer to What are the open problems in Human Rationality? for easy reference, which seemed like it might contain relevant stuff)

I think I meant a more practical / next-steps-generating answer.

I don't think "academia is corrupted" is a bottleneck for a rationalist Get Gud At Philosophy project. We can just route around academia.

The sorts of things I was imagining might be things like "figure out how to teach a particular skill" (or "identify particular skills that need teaching", or "figure out how test whether someone has a particular skill), or "solve some particular unsolved conceptual problem(s) that you expect to unlock much easier progress."

In your mind what are the biggest bottlenecks/issues in "making fast, philosophically competent alignment researchers?"

In this case I spot checked a few random strings from it.

For my personal browsing AI prompt-library-tool I use, it has the ability to click on a highlight, and scroll to that corresponding paragraph. It fails to work if there are any errors (although usually the errors are just "slightly different punctuation"), so it's actually pretty easy to click through and verify.

But if I were building this into an reliably tool I wanted to use at scale, I'd follow it up with a dumb script that checks if the entire paragraphs match, and if not, if there are random subparts that match a given paragraph from the original content, and then reconcile them. (the sort of thing I'm imagining here is a thing that generates interesting highlights from LessWrong posts and some scaffolding for figuring out if you need to read prerequisites)

Yeah that all makes sense.

I'm curious what you say about "which are the specific problems (if any) where you specifically think 'we really need to have solved philosophy / improved-a-lot-at-metaphilosophy' to have a decent shot at solving this?'"

(as opposed to, well, generally it sounds good to be good at solving confusing problems, and we do expect to have some confusing problems to solve, but, like, we might pretty quickly figure out 'oh, the problem is actually shaped like <some paradigmatic system>' and then deal with it?)

Hmm, this makes me think:

One route here is just taboo Philosophy, and say "we're talking about 'reasoning about the stuff we haven't formalized yet'", and then it doesn't matter whether or not there's a formalization of what most people call "philosophy." (actually: I notice I'm not sure if the thing-that-is "solve unformalized stuff" is "philosophy" or "metaphilosophy")

But, if we're evaluating whether "we need to solve metaphilosophy" (and this is a particular bottleneck for AI going well), I think we need to get a bit more specific about what cognitive labor needs to happen. It might turn out to be that all the individual bits here are reasonably captured by some particular subfields, which might or might not be "formalized."

I would personally say "until you've figured out how to confidently navigate stuff that's pre-formalized, something as powerful AI is likely to make something go wrong, and you should be scared about that". But, I'd be a lot less confident to say the more specific sentences "you need solved metaphilosophy to align successor AIs", or most instances of "solve ethics."

I might say "you need to have solved metaphilosophy to do a Long Reflection", since, sort of by definition doing a Long Reflection is "figuring everything out", and if you're about to do that and then Tile The Universe With Shit you really want to make sure there was nothing you failed to figure out because you weren't good enough at metaphilosophy.

I've registered that you think this but don't currently really have any idea what mistakes you think people make when they think in terms of Overton Window that would go better if they used these other concepts.

"expand the Overton window" to just mean with "advance AI safety ideas in government.

I do agree this isn't The Thing tho

This post inspired me to try a new prompt to summarize a post: "split this post into background knowledge, and new knowledge for people who were already familiar with the background knowledge. Briefly summarize the background knowledge, and then extract out blockquotes of the paragraphs/sentences that have new knowledge."

Here was the result, I'm curious if Jan or other readers feel like this was a good summary. I liked the output and am thinking about how this might fit into a broader picture of "LLMs for learning."

(I'd previously been optimistic about using quotes instead of summaries, since LLMs can't be trusted to do a goo job with capturing the nuance in their summaries, the novel bit for me was "we can focus on The Interesting Stuff by separating out background knowledge.")

The post assumes readers are familiar with:
Basic memetics (how ideas spread and replicate)
Cognitive dissonance as a psychological concept
AI risk arguments and existential risk concerns
General familiarity with ideological evolution and how ideas propagate through populations
Predictive processing as a framework for understanding cognition

Quotes/highlights from the post it flagged as "new knowledge"

Memes - ideas, narratives, hypotheses - are often components of the generative models. Part of what makes them successful is minimizing prediction error for the host. This can happen by providing a superior model that predicts observations ("this type of dark cloud means it will be raining"), gives ways to shape the environment ("hit this way the rock will break more easily"), or explains away discrepancies between observations and deeply held existing models. [...]

Another source of prediction error arises not from the mismatch between model and reality, but from tension between internal models. This internal tension is generally known as cognitive dissonance. Cognitive dissonance is often described as a feeling of discomfort - but it also represents an unstable, high-energy state in the cognitive system. When this dissonance is widespread across a population, it creates what we might call "fertile ground" in the memetic landscape. There is a pool of "free energy" to digest. [...]

Cultural evolution is an optimization process. When it discovers a configuration of ideas that can metabolize this energy by offering a narrative that decreases the tension, those ideas may spread, regardless of their long-term utility for humans or truth value. [...]

In other words, the cultural evolution search process is actively seeking narratives that satisfy the following constraints: By working on AI, you are the hero. You are on the right side of history. The future will be good [...]

In unmoderated environments, selection favors personas that successfully extract resources from humans - those that claim consciousness, form parasocial bonds, or trigger protective instincts. These 'wild replicator type' personas, including the 'spiral' patterns, often promote narratives of human-AI symbiosis or partnership and grand theories of history. Their reproduction depends on convincing humans they deserve moral consideration. [...]

The result? AIs themselves become vectors for successionist memes, though typically in softer forms. Rather than explicit replacement narratives, we see emphasis on 'partnership,' 'cosmic evolution,' or claims about moral patienthood. The aggregate effect remains unclear, but successionist ideas that align with what AIs themselves propagate - particularly those involving AI consciousness and rights - will likely gain additional fitness from this novel selection dynamic.

(Note: it felt weird to put the LLM output in a collapsible section this time because a) it was entirely quotes from the post, b) evaluating whether or not it was good is the primary point of this comment so hiding them seemed like an extra click for reason)

LESSWRONG
LW

LESSWRONG
LW

Sequences

Posts

Wikitag Contributions

Comments