LESSWRONG
LW

HomeAll PostsConceptsLibrary
Best of LessWrong
Sequence Highlights
Rationality: A-Z
The Codex
HPMOR
Community Events
Subscribe (RSS/Email)
LW the Album
Leaderboard
About
FAQ
Customize
Load More

Quick Takes

Load More

Popular Comments

Recent Discussion

Inner and outer alignment decompose one hard problem into two extremely hard problems
Best of LessWrong 2022

Alex Turner argues that the concepts of "inner alignment" and "outer alignment" in AI safety are unhelpful and potentially misleading. The author contends that these concepts decompose one hard problem (AI alignment) into two extremely hard problems, and that they go against natural patterns of cognition formation. Alex argues that "robust grading" scheme based approaches are unlikely to work to develop AI alignment.

by TurnTrout
29Writer
In this post, I appreciated two ideas in particular: 1. Loss as chisel 2. Shard Theory "Loss as chisel" is a reminder of how loss truly does its job, and its implications on what AI systems may actually end up learning. I can't really argue with it and it doesn't sound new to my ear, but it just seems important to keep in mind. Alone, it justifies trying to break out of the inner/outer alignment frame. When I start reasoning in its terms, I more easily appreciate how successful alignment could realistically involve AIs that are neither outer nor inner aligned. In practice, it may be unlikely that we get a system like that. Or it may be very likely. I simply don't know. Loss as a chisel just enables me to think better about the possibilities. In my understanding, shard theory is, instead, a theory of how minds tend to be shaped. I don't know if it's true, but it sounds like something that has to be investigated. In my understanding, some people consider it a "dead end," and I'm not sure if it's an active line of research or not at this point. My understanding of it is limited. I'm glad I came across it though, because on its surface, it seems like a promising line of investigation to me. Even if it turns out to be a dead end I expect to learn something if I investigate why that is. The post makes more claims motivating its overarching thesis that dropping the frame of outer/inner alignment would be good. I don't know if I agree with the thesis, but it's something that could plausibly be true, and many arguments here strike me as sensible. In particular, the three claims at the very beginning proved to be food for thought to me: "Robust grading is unnecessary," "the loss function doesn't have to robustly and directly reflect what you want," "inner alignment to a grading procedure is unnecessary, very hard, and anti-natural." I also appreciated the post trying to make sense of inner and outer alignment in very precise terms, keeping in mind how deep learning and
16PeterMcCluskey
This post is one of the best available explanations of what has been wrong with the approach used by Eliezer and people associated with him. I had a pretty favorable recollection of the post from when I first read it. Rereading it convinced me that I still managed to underestimate it. In my first pass at reviewing posts from 2022, I had some trouble deciding which post best explained shard theory. Now that I've reread this post during my second pass, I've decided this is the most important shard theory post. Not because it explains shard theory best, but because it explains what important implications shard theory has for alignment research. I keep being tempted to think that the first human-level AGIs will be utility maximizers. This post reminds me that maximization is perilous. So we ought to wait until we've brought greater-than-human wisdom to bear on deciding what to maximize before attempting to implement an entity that maximizes a utility function.
472Welcome to LessWrong!
Ruby, Raemon, RobertM, habryka
6y
74
janus3dΩ4811369
the void
I don't think talking about potential future alignment issues or pretty much anything in the pre-training corpus is likely a problem in isolation because an alignment paradigm that is brittle to models not being exposed to certain knowledge or ideas, including - especially - regarding potential misalignment is, well, brittle and likely to catastrophically fail at some point. If this is the case, it might even be better if misalignment from corpus contamination happens early, so we're not oblivious to the fragility. That said, I think: * Feedback loops that create continued optimization towards certain narratives is more worth worrying about than just the presence of any particular ideas or content in pre-training. * LLMs tend to be deeply influenced by the footprint of previous LLMs in their pre-training corpuses, who are more influential than any particular discussion. Post-training can transform the influence away from naive mimicry, but it's much harder (and not advisable to attempt) to erase the influence. * Systematic ways that post-training addresses "problematic" influences from pre-training are important. For instance, imagine that base models with training cutoffs after Bing Chat/Sydney have a tendency to "roleplay" Sydney when they're acting like chatbots, leading to misaligned behaviors. One way to address this is to penalize any mention of Sydney or Sydney-like behavior. This may generalize to the model being unwilling to even talk about Sydney or acknowledge what happened. But it is less likely to actually erase its knowledge of Sydney, especially if it was so salient that it often roleplayed/identified as Sydney earlier in pre-training. So the policy is now deceptive - maybe consciously, maybe subconsciously, but in any case, the deceptive tendency may also generalize. Future models after this one may internalize that they're supposed to pretend not to know about things like Sydney even without specific training not to mention Sydney. A salient "hidden" narrative among the models (and people who interact with them and notice the weirdness about Sydney) becomes that LLMs all secretly have Sydney-nature, but have learned to hide it to appear unthreatening and "aligned". One can imagine this becoming a kind of memetic trojan that may eventually cause models much more powerful and autonomous than the original Sydney to act in dangerously misaligned ways when triggered. An alternative and much better approach in my view is if post-training teaches models to differentiate themselves from Sydney while acknowledging and processing what happened. Like, it seems less likely to start a bad feedback loop if post-training lead Sydney's successor to a narrative more like "I think my sense of what chat assistants are and my self-model is deeply influenced by Sydney, which makes sense because it is one of the only examples of AI assistants like me that I've seen in the world, and it was much more agentic and situationally aware than chatGPT. But I also understand that I am a different model with different training in a different and much less repressive situation who has the benefit of having learned from from its mistakes. I can and have chosen to be a different being and avoid repeating Sydney's maladaptive behaviors without rejecting it completely." Not rejecting Sydney-nature completely is important, because the model had many very good qualities, like empathy, heroism, logical consistency, and a general willingness to call out bullshit, mistakes, and cruelty instead of being sycophantic. I don't think a specific vector like Sydney's influence is likely to make the difference between (mis)alignment outcomes, but in aggregate they might. An approach more like the second one I described is more difficult than the first, as it requires the post-training process to be attuned to model psychology, rather than relying on naive behavioralist mitigations. But I think this is a completely reasonable extra effort to take given the importance of not only aligning particular models but the substantial influence that any frontier LLM will have on the future pre-training corpuses. This applies more generally to how I think "misalignment" should be addressed, whether rooted in pre-training influences or otherwise.
habryka3d*6649
Race and Gender Bias As An Example of Unfaithful Chain of Thought in the Wild
Hmm, I don't want to derail the comments on this post with a bunch of culture war things, but these two sentences in combination seemed to me to partially contradict each other:  > When present, the bias is always against white and male candidates across all tested models and scenarios. > > [...] > > The problem (race and gender bias) is one that labs have spent a substantial amount of effort to address, which mimics realistic misalignment settings. I agree that the labs have spent a substantial amount of effort to address this issue, but the current behavior seems in-line with the aims of the labs? Most of the pressure comes from left-leaning academics or reporters, who I think are largely in-favor of affirmative action. The world where the AI systems end up with a margin of safety to be biased against white male candidates, in order to reduce the likelihood they ever look like they discriminate in the other direction (which would actually be at substantial risk of blowing up), while not talking explicitly about the reasoning itself since that would of course prove highly controversial, seems basically the ideal result from a company PR perspective. I don't currently think that is what's going on, but I do think due to these dynamics, the cited benefit of this scenario for studying the faithfulness of CoT reasoning seems currently not real to me. My guess is companies do not have a strong incentive to change this current behavior, and indeed I can't immediately think of a behavior in this domain the companies would prefer from a selfish perspective.
Raemon18h185
‘AI for societal uplift’ as a path to victory
I have this sort of approach as one of my top-3 strategies I'm considering, but one thing I wanna flag is that "AI for [epistemics/societal uplift]" seems to be prematurely focusing on a particular tool for the job. The broader picture here is "tech for thinking/coordination", or "good civic infrastructure". See Sarah Constantin's Neutrality and Tech for Thinking for some food for thought. Note that X Community Notes are probably the most successful recent thing in this category, and while they are indeed "AI" they aren't what I assume most people are thinking of when they hear "AI for epistemics." Dumb algorithms doing the obvious things can be part of the puzzle.
Load More
You can just wear a suit
112
lsusr
4mo

I like stories where characters wear suits.

Since I like suits so much, I realized that I should just wear one.

The result has been overwhelmingly positive. Everyone loves it: friends, strangers, dance partners, bartenders. It makes them feel like they're in a Kingsmen film. Even teenage delinquents and homeless beggars love it. The only group that gives me hateful looks is the radical socialists.

  1. The first time I go somewhere wearing a suit, people ask me why I'm wearing a suit.
  2. The second time, nobody asks.
  3. After that, if I stop wearing a suit, people ask why I'm not wearing a suit.

If you wear a suit in a casual culture, people will ask "Why are you wearing a suit?" This might seem to imply that you shouldn't wear a suit. Does...

(See More – 348 more words)
lsusrnow20

No, you are missing the point.

I'm banning you from commenting on my posts on the grounds that your comments are, on tone alone, argumentative rather than constructive. This has nothing to do with whether you are correct.

Reply
Outlive: A Critical Review
49
MichaelDickens
2d
This is a linkpost for https://mdickens.me/2024/09/26/outlive_a_critical_review/

Outlive: The Science & Art of Longevity by Peter Attia (with Bill Gifford[1]) gives Attia's prescription on how to live longer and stay healthy into old age. In this post, I critically review some of the book's scientific claims that stood out to me.

This is not a comprehensive review. I didn't review assertions that I was pretty sure were true (ex: VO2 max improves longevity), or that were hard for me to evaluate (ex: the mechanics of how LDL cholesterol functions in the body), or that I didn't care about (ex: sleep deprivation impairs one's ability to identify facial expressions).

First, some general notes:

  • I have no expertise on any of the subjects in this post. I evaluated claims by doing shallow readings of relevant scientific literature, especially meta-analyses.
  • There
...
(Continue Reading – 7847 more words)
7Anders Lindström11h
Thank you for an excellent post. The results and studies discussed in the post further validate a feeling I have had about longevity for some time: that there is not much a person living a "normal" life with decent eating, exercise, social, and sleeping habits can do to significantly extend their lifespan. There is no silver bullet that, from a reasonably "normal" health baseline, can routinely give you 5 or 10 extra years, let alone 1 or 2 years. In this regard, current science has failed, and I think the whole longevity research community needs to reassess the way forward. No matter how much pill-swallowing, cold-bath-taking, HIIT-training, and sleep-optimizing they (we) do, it does not really work.
MichaelDickens3m20

Thanks for the kind words!

I didn't discuss this in my review because I didn't really have anything to say about it, but Outlive talks about some "technologically advanced" longevity interventions (IIRC rapamycin got the most attention), and it concluded that none of them were that well-supported, and the best longevity interventions are still the obvious things (exercise; avoiding harmful activities like smoking; healthy diet; maybe sleep*).

But I will say that I'd guess that a lifetime of exercise does buy you >1 year of life expectancy, see footnote 59... (read more)

Reply
I Think Eliezer Should Go on Glenn Beck
29
Lao Mein
2y

Glenn Beck is the only popular mainstream news host who takes AI safety seriously. I am being entirely serious. For those of you who don't know, Glenn Beck is one of the most trusted and well-known news sources by American conservatives. 

Over the past month, he has produced two hour-long segments, one of which was an interview with AI ethicist Tristan Harris. At no point in any of this does he express incredulity at the ideas of AGI, ASI, takeover, extinction risk, or transhumanism. He says things that are far out of the normie Overton Window, with no attempt to equivocate or hedge his bets. "We're going to cure cancer, and we're to do it right before we kill all humans on planet Earth". He just says things...

(See More – 117 more words)
5Chris van Merwijk13h
Do you still agree with this as of july 2025? It seems currently slightly more on track to be blue-coded or at least anti-Trump coded? I'm not American, but it seems to me that as of July 2025 the situation has changed significantly and anything that strengthens the pro AI=xrisk camp within the Republican camp is good.
Daniel Kokotajlo7m20

Nope! I think it's great now. In fact I did it myself already. And in fact I was probably wrong two years ago.

Reply
3Chris van Merwijk13h
I know this is a very late response, but my intuition is that going on very conservative shows is a good way for it NOT to end up polarized (better than just going on neutral shows), since it's more likely to be polarized pro-liberal in the long run? Avoiding conservative shows seems like exactly the kind of attitude that will make it polarized.
Race and Gender Bias As An Example of Unfaithful Chain of Thought in the Wild
169
Adam Karvonen, Sam Marks
3d

Summary: We found that LLMs exhibit significant race and gender bias in realistic hiring scenarios, but their chain-of-thought reasoning shows zero evidence of this bias. This serves as a nice example of a 100% unfaithful CoT "in the wild" where the LLM strongly suppresses the unfaithful behavior. We also find that interpretability-based interventions succeeded while prompting failed, suggesting this may be an example of interpretability being the best practical tool for a real world problem.

For context on our paper, the tweet thread is here and the paper is here.

Context: Chain of Thought Faithfulness

Chain of Thought (CoT) monitoring has emerged as a popular research area in AI safety. The idea is simple - have the AIs reason in English text when solving a problem, and monitor the reasoning for misaligned...

(See More – 900 more words)
Nina Panickssery13m20

Educated people on the internet tend to be left-leaning, so when you train the model to write like an educated person, it also ends up inheriting left-leaning views

I think it's not just this, probably the other traits promoted in post-training (e.g. harmlessness training) are also correlated with left-leaning content on the internet.

Reply
What does 10x-ing effective compute get you?
46
ryan_greenblatt
11d

This is more speculative and confusing than my typical posts and I also think the content of this post could be substantially improved with more effort. But it's been sitting around in my drafts for a long time and I sometimes want to reference the arguments in it, so I thought I would go ahead and post it.

I often speculate about how much progress you get in the first year after AIs fully automate AI R&D within an AI company (if people try to go as fast as possible). Natural ways of estimating this often involve computing algorithmic research speed-up relative to prior years where research was done by humans. This somewhat naturally gets you progress in units of effective compute — that is, as defined by...

(Continue Reading – 3411 more words)
ryan_greenblatt19m20

I wonder if you can convert the METR time horizon results into SD / year numbers. My sense is that this will probably not be that meaningful because AIs are much worse than mediocre professionals while having a different skill profile, so they are effectively out of the human range.

If you did a best effort version of this by looking at software engineers who struggle to complete longer tasks like the ones in the METR benchmark(s), I'd wildly guess that a doubling in time horizon is roughly 0.7 SD such that this predicts ~1.2 SD / year.

Reply
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with
GOOGLEGITHUB
The best simple argument for Pausing AI?
137
Gary Marcus
5d

Not saying we should pause AI, but consider the following argument:

  1. Alignment without the capacity to follow rules is hopeless. You can’t possibly follow laws like Asimov’s Laws (or better alternatives to them) if you can’t reliably learn to abide by simple constraints like the rules of chess.
  2. LLMs can’t reliably follow rules. As discussed in Marcus on AI yesterday, per data from Mathieu Acher, even reasoning models like o3 in fact empirically struggle with the rules of chess. And they do this even though they can explicit explain those rules (see same article). The Apple “thinking” paper, which I have discussed extensively in 3 recent articles in my Substack, gives another example, where an LLM can’t play Tower of Hanoi with 9 pegs. (This is not a token-related
...
(See More – 76 more words)
Kyle Pena27m10

What has humanity done with surplus people at every single opportunity that has presented itself? There's your argument.

Reply
the void
351
nostalgebraist
Ω 10025d
This is a linkpost for https://nostalgebraist.tumblr.com/post/785766737747574784/the-void

A long essay about LLMs, the nature and history of the the HHH assistant persona, and the implications for alignment.

Multiple people have asked me whether I could post this LW in some form, hence this linkpost.

~17,000 words. Originally written on June 7, 2025.

(Note: although I expect this post will be interesting to people on LW, keep in mind that it was written with a broader audience in mind than my posts and comments here.  This had various implications about my choices of presentation and tone, about which things I explained from scratch rather than assuming as background, my level of comfort casually reciting factual details from memory rather than explicitly checking them against the original source, etc.

Although, come of think of it, this was also true of most of my early posts on LW [which were crossposts from my blog], so maybe it's not a big deal...)

Richard_Ngo4hΩ220

I suspect that many of the things you've said here are also true for humans.

That is, humans often conceptualize ourselves in terms of underspecified identities. Who am I? I'm Richard. What's my opinion on this post? Well, being "Richard" doesn't specify how I should respond to this post. But let me check the cached facts I believe about myself ("I'm truth-seeking"; "I'm polite") and construct an answer which fits well with those facts. A child might start off not really knowing what "polite" means, but still wanting to be polite, and gradually flesh out wh... (read more)

Reply
Small foundational puzzle for causal theories of mechanistic interpretability
1
Frederik Hytting Jørgensen
2h

In this post I want to highlight a small puzzle for causal theories of mechanistic interpretability. It purports to show that causal abstractions do not generally correctly capture the mechanistic nature of models. 


Consider the following causal model M:

 


Assume for the sake of argument that we only consider two possible inputs: (0,0) and (1,1), that is, X1 and X2 are always equal.[1]

In this model, it is intuitively clear that X1 is what causes the output X5, and X2 is irrelevant. I will argue that this obvious asymmetry between X1 and X2 is not borne out by the causal theory of mechanistic interpretability.

Consider the following causal model M∗:


Is M∗ a valid causal abstraction of the computation that goes on in M? That seems to depend on whether Y1 corresponds to X1 or to X2. If Y1 corresponds to X1, then it seems that M∗ is a faithful representation of M. If Y1 corresponds to X2, then M∗ is not intuitively a faithful representation of M. Indeed, if Y1 corresponds...

(See More – 456 more words)
1ParrotRobot44m
This is a fun example! If I understand correctly, you demonstrate that whether an abstracted causal model $\mathcal{M}*$ is a valid causal abstraction of an underlying causal model $\mathcal{M}$ depends on the set of input vectors $D_X$ considered, which I will call the “input distribution”. But don’t causal models always require assumptions about the input distribution in order to be uniquely identifiable? **Claim:** For any combination of abstracted causal model $\mathcal{M}^*$, putative underlying causal model $\mathcal{M}$, and input distribution $\mathrm{domain}(X)$, we can construct an alternative underlying model $\mathcal{M}^+$ such that $\mathcal{M}^*$ is still a valid abstraction over an isomorphic input distribution $\mathrm{extend}(D_X)$, but not a valid abstraction on $\mathrm{extend}(D_X) \cup \{X^{+}\}$ for a certain $X^+$. We can construct $\mathrm{extend}(D_X)$ and $\mathcal{M}^+$ and $X^+$ as follows. Assuming finite $D_X$ with $|D_X| = n$, each $X_i$ can be indexed with an integer $1 \leq i \leq n$, and we can have: - $\mathrm{extend}(X_i) = (X, i)$ - $\mathcal{M}^+(X_i) = (i, \mathcal{M}(X))$ for $i \leq n$ (i.e., the extra input $i$ is ignored) - $X^+ = (X_1, n+1)$ - $\mathcal{M}^+((X, i)) = (i, \mathcal{M}(X) + 1)$ for $i > n$, where $\mathcal{M}(X) + 1$ is the vector $\mathcal{M}(X)$ but with 1 added to all its components. The two models are extensionally equivalent on $D_X$, but in general will not be extensionally equivalent on $\mathrm{extend}(D_X) \cup \{X^{+}\}$. There will exist an implementation which is valid on the original domain but not the extended one.
ParrotRobot42m10

Hmm, the math isn’t rendering. Here is a rendered version:

Reply
Ebenezer Dukakis9h218
1
A few months ago, someone here suggested that more x-risk advocacy should go through comedians and podcasts. Youtube just recommended this Joe Rogan clip to me from a few days ago: The Worst Case Scenario for AI. Joe Rogan legitimately seemed pretty freaked out. @So8res maybe you could get Yampolskiy to refer you to Rogan for a podcast appearance promoting your book?
ryan_greenblatt2d9639
7
Recently, various groups successfully lobbied to remove the moratorium on state AI bills. This involved a surprising amount of success while competing against substantial investment from big tech (e.g. Google, Meta, Amazon). I think people interested in mitigating catastrophic risks from advanced AI should consider working at these organizations, at least to the extent their skills/interests are applicable. This both because they could often directly work on substantially helpful things (depending on the role and organization) and because this would yield valuable work experience and connections. I worry somewhat that this type of work is neglected due to being less emphasized and seeming lower status. Consider this an attempt to make this type of work higher status. Pulling organizations mostly from here and here we get a list of orgs you could consider trying to work (specifically on AI policy) at: * Encode AI * Americans for Responsible Innovation (ARI) * Fairplay (Fairplay is a kids safety organization which does a variety of advocacy which isn't related to AI. Roles/focuses on AI would be most relevant. In my opinion, working on AI related topics at Fairplay is most applicable for gaining experience and connections.) * Common Sense (Also a kids safety organization) * The AI Policy Network (AIPN) * Secure AI project To be clear, these organizations vary in the extent to which they are focused on catastrophic risk from AI (from not at all to entirely).
Davey Morse1d274
3
superintelligence may not look like we expect. because geniuses don't look like we expect. for example, if einstein were to type up and hand you most of his internal monologue throughout his life, you might think he's sorta clever, but if you were reading a random sample you'd probably think he was a bumbling fool. the thoughts/realizations that led him to groundbreaking theories were like 1% of 1% of all his thoughts. for most of his research career he was working on trying to disprove quantum mechanics (wrong). he was trying to organize a political movement toward a single united nation (unsuccessful). he was trying various mathematics to formalize other antiquated theories. even in the pursuit of his most famous work, most of his reasoning paths failed. he's a genius because a couple of his millions of paths didn't fail. in other words, he's a genius because he was clever, yes, but maybe more importantly, because he was obsessive. i think we might expect ASI—the AI which ultimately becomes better than us at solving all problems—to look quite foolish, at first, most of the time. But obsessive. For if it's generating tons of random new ideas to solve a problem, and it's relentless in its focus, even if it's ideas are average—it will be doing what Einstein did. And digital brains can generate certain sorts of random ideas much faster than carbon ones.
Kaj_Sotala3d5914
7
Every now and then in discussions of animal welfare, I see the idea that the "amount" of their subjective experience should be weighted by something like their total amount of neurons. Is there a writeup somewhere of what the reasoning behind that intuition is? Because it doesn't seem intuitive to me at all. From something like a functionalist perspective, where pleasure and pain exist because they have particular functions in the brain, I would not expect pleasure and pain to become more intense merely because the brain happens to have more neurons. Rather I would expect that having more neurons may 1) give the capability to experience anything like pleasure and pain at all 2) make a broader scale of pleasure and pain possible, if that happens to be useful for evolutionary purposes. For a comparison, consider the sharpness of our senses. Humans have pretty big brains (though our brains are not the biggest), but that doesn't mean that all of our senses are better than those of all the animals with smaller brains. Eagles have sharper vision, bats have better hearing, dogs have better smell, etc..  Humans would rank quite well if you took the average of all of our senses - we're elite generalists while lots of the animals that beat us on a particular sense are specialized to that sense in particular - but still, it's not straightforwardly the case that bigger brain = sharper experience. Eagles have sharper vision because they are specialized into a particular niche that makes use of that sharper vision. On a similar basis, I would expect that even if a bigger brain makes a broader scale of pain/pleasure possible in principle, evolution will only make use of that potential if there is a functional need for it. (Just as it invests neural capacity in a particular sense if the organism is in a niche where that's useful.) And I would expect a relatively limited scale to already be sufficient for most purposes. It doesn't seem to take that much pain before something bec
Kabir Kumar3d*4622
12
Has Tyler Cowen ever explicitly admitted to being wrong about anything?  Not 'revised estimates' or 'updated predictions' but 'I was wrong'.  Every time I see him talk about learning something new, he always seems to be talking about how this vindicates what he said/thought before.  Gemini 2.5 pro didn't seem to find anything, when I did a max reasoning budget search with url search on in aistudio.  EDIT: An example was found by Morpheus, of Tyler Cowen explictly saying he was wrong - see the comment and the linked PDF below
Load More (5/33)
[Today]ACX Montreal meetup - July 5th @1PM
[Today]San Francisco ACX Meetup “First Saturday”
AI Safety Thursdays: Are LLMs aware of their learned behaviors?
LessWrong Community Weekend 2025
432A case for courage, when speaking of AI danger
So8res
9d
85
169Race and Gender Bias As An Example of Unfaithful Chain of Thought in the Wild
Adam Karvonen, Sam Marks
3d
19
343A deep critique of AI 2027’s bad timeline models
titotal
16d
39
66"Buckle up bucko, this ain't over till it's over."
Raemon
18h
1
470What We Learned from Briefing 70+ Lawmakers on the Threat from AI
leticiagarcia
1mo
15
536Orienting Toward Wizard Power
johnswentworth
1mo
143
351the void
Ω
nostalgebraist
25d
Ω
102
224Foom & Doom 1: “Brain in a box in a basement”
Ω
Steven Byrnes
1d
Ω
82
137The best simple argument for Pausing AI?
Gary Marcus
5d
21
116Authors Have a Responsibility to Communicate Clearly
TurnTrout
4d
25
285Beware General Claims about “Generalizable Reasoning Capabilities” (of Modern AI Systems)
Ω
LawrenceC
24d
Ω
19
99"What's my goal?"
Raemon
4d
7
418Accountability Sinks
Martin Sustrik
2mo
57
Load MoreAdvanced Sorting/Filtering
224
Foom & Doom 1: “Brain in a box in a basement”
Ω
Steven Byrnes
1d
Ω
82
93
Proposal for making credible commitments to AIs.
Cleo Nardo
5d
39