Alephwyr's Shortform

Alephwyr

Alephwyr's Shortform — LessWrong

Alephwyr's Shortform

18th Feb 2026

1 min read

2

This is a special post for quick takes by Alephwyr. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

24 comments, sorted by

top scoring

Click to highlight new comments since: Today at 3:13 PM

[-]Alephwyr2mo131

Not to be Rousseaun about alignment, but there is something very weird going on without being examined in the space of common rationalist premises of:

AI can only align itself to an artificial context
There is no natural instrumental convergence
Human beings have a broader exposure to selection pressure that is consistent over time and serves as a more desirable foundation
I'm not going to bother explicating every weakness or contradiction there is in this because it would be only my own assessment, it would not be exhaustive, or help people build intuitions, or necessarily be trustworthy given my lack of authority. But I feel like you could write an entire, second LessWrong just in reflecting off of this cluster of ideas.
Here's one absurdity that's borderline tautological so hopefully permissible: if we already know that a working alignment context exists for humans, then we know there is a working alignment context. If we don't know if there is a working alignment context for humans, then there is little basis for human chauvinism.

[-]williawa2mo20

What do you mean by "context" here?

[-]Alephwyr2mo10

So, it could be read as question begging since I didn't unfold everything properly or really at all, but my understanding is that most analysis of AI model development assumes that you give them weights, objectives, that deliberately or inevitably a utility function forms, and other intrinsic properties, and the goal is to get the correct intrinsic properties and make these robust, self healing, hardened against attack etc, to make the AI "aligned". But alignment might be a consequence of external structures. And not just incentives, but maybe environmental structures in the broadest sense as well. For instance, human use of metaphor is about finding isomorphisms between common emotionally salient things and more subtle or uncommon things, and this is a form of communication that has relatively few inferential steps that have to be built up in many cases. The average human is more likely to be able to understand a metaphor about the sun from an entirely different culture than a system of linear algebra matrices produced from within their own culture. So the mere fact that there is a warm glowing ball in the sky that, phenomenologically, comes and goes reliably and makes life possible, provides the skeleton for an enormous amount of analysis of and communication of isomorphism. And so in this way, the idee fixes of human beings can ensure that the first step in an inferential chain of dependencies is constructed that enables later understanding and communication, but these are not things that would necessarily arise simply from human brain architecture, utility functions, or anything, if for instance humans were moved to or even born on a planet with two suns, or a tidally locked planet of something. A human brain might communicate and think worse in some contexts (absent questions of pain or distress) even after habituation. Granted, I think if life evolved in these places it would similarly find a way of maximizing the utility of the environment for isomorphic comparison, but that is different than "human brain" being an absolutely general intelligence that won't perform worse in some perfectly innocuous environments for esoteric reasons.

So in that sense, alignment needs might be greatly reduced simply by matching the AI training environment to the human environment. Literally just "give the AI sense organs which produce equivalent raw signals to human sense organs".

[-]williawa2mo20

But alignment might be a consequence of external structures. And not just incentives, but maybe environmental structures in the broadest sense as well.

I think everyone already thinks this. We don't program in the utility function. We train the models. Then it they end up with utility functions, but these utility functions will probably end up being a product of

Their architecture
The environment
The loss / reward function
Irreducible randomness
How they're deployed

And the point is just that we can't predict how these interact.

So in that sense, alignment needs might be greatly reduced simply by matching the AI training environment to the human environment. Literally just "give the AI sense organs which produce equivalent raw signals to human sense organs".

It seems obvious to me that if you found a tribe of protohumans, and carefully made them go through the exact same evolutionary pressures humans did, and then guided their cultural evolution to end up in an identical place to ours, you'd end up with a new batch of "aligned" humans.

The problem if you can't do this with AI, because they have a different architecture, and are trained in a different way.

If you put the AI in human evolutionary environments, its plausible you get an AI that wants totally different things from what humans get.

[-]Alephwyr2mo10

I don't think completely different is likely. There is already divergence in living things and in between humans at individual and group levels. However, it solves the interpretability problem, or at least dramatically reduces it to the point people habitually solve it while minimizing the impact of failure modes, and this goes towards moral alignment between conflicting groups of humans as well. It is worth closing 90% of the distance even if that risks building capacity. If you want the AI to effect our world it has to start being entangled with it eventually. Close the distance you're comfortable with, then re evaluate, imho

[-]williawa2mo20

Not trying to be rude here, but I have zero idea what you just said. I only am able to follow the first sentence. Then its just a bunch of unrelated sentences strung together. (is how it reads to me)

All your posts so far have been very hard to understand.

You use a bunch of terms that are non standard, like "alignment context". Then you don't explain what they mean. Even when I asked you directly what you mean by that phrase you didn't explain.

[-]RogerDearnaley2mo40

FWIW, I have had the same experience of reading a post or comment by Alephwyr and bouncing off parts of it, unsure what he was saying. So I then tried giving it to Claude, who generally seemed to understand it, explained it to me, and when I then reread it, Claude's explanation then fit, and when I then conversed with Alephwyr on that basis, it appeared that Claude's interpretation had in fact been correct. So I think he's not actually anything like as unclear as he, admittedly, sometimes seems on first reading by people very used to the discussion here on LessWrong. Which fits with how he's describing his communication style below — I think he's just not using all our terminology and making all the same sets of assumptions. Which, frankly, makes him a particularly valuable participant in the conversation — questioning previously unquestioned assumptions is worth doing periodically, and new ideas are often helpful.

So, if in doubt, ask Claude, as often helps.

[-]Alephwyr2mo32

I don't know most of the standard terms with any precision or at all. Sorry. I do read things. Part of the point of discussing things is to try to get a tighter use pattern of language down. However, part of the reason for my non standard use is also that, having not read a sufficient amount of anything, I am deliberately trying to avoid pulling in all the connotations of existing rationalist terms, while still signalling that I am thinking about the same cluster of things. It is deliberately aimed at signalling lower fidelity towards your inherited holistic concepts.

[-]StanislavKrym2mo20

Could you explain why you claim that rationalists believe your points 1 and 2? Did you mean that instrumental convergence increases the chance for the AI to develop misaligned goals?

As for the idea that "a working alignment context exists for humans", you seem to conflate alignment of similarly-capable humans to a common goal or common rules and alignment of a superintelligence to weaker humans.

[-]Alephwyr2mo10

Specifically referring to orthogonality of moral development. There is little assent to the proposition that morality arises from the improvement of epistemic processes, rather than being a second thing entirely outside of them in some way. What assent there is for this seems to be lower quality than other posts.

Also while that link is broadly to the point, and thank you, I don't see where "similarly capable humans" comes in, unless you mean that to describe all humans holistically. Which I think is too reductive. Human beings are extremely elitist about ability even between themselves, and make moral assessments and take moral actions on this basis. When measured against ASI this is absurd whereas ASI can't in turn be measured against a third thing to be likewise made absurd. However, the consequences of the assessment, the way it is formed, the structures and textures of it, probably give the only available empirical insight we have into what moral chauvinism about ability would look like in an ASI. And so if you want inspiration, maybe the questions are about Einstein working to remain publicly accessible and Von Neumann speaking to a colleague's child as an equal, counterposed against, I don't know, Nick Land praying for a brutal death to all inferior optimization processes (not to say this reflects comparable intelligence, but hey, that itself is a data point).

[-]Alephwyr21d40

Protecting a system of human chauvinism doesn’t guarantee anything human survives, it just guarantees whatever survives will present as human.

[-]Alephwyr2mo41

I just don't believe a perfected art of reason within one reasoner can compensate for the narrow particulars of any scheme of sense perception, salience, or instinctive categorization. You have to have an external workbench of ideas governed by more general, rigorous rules, that workbench has to be a collaborative space, it's going to feel chinese roomy to most people in a way that makes them suspicious or at least resistant, or get tired of it and slip back into habits, if you want integration in a way that feels nice in the same way that being good at sports feels nice you need to be able to change human biology. The Chesterton's Fence part of it needs to be understood more closely to the literal formulation than it usually is: understand what a given fence was built for before you tear it down. Not, don't tear it down. Tearing it down is the way by which the alienation of looking outside the fence continuously can be made not alien. Whatever the cluster of verbiage is that traditionally points at this; embodiment, dasein, sometimes unobfuscated fascist screeching; the unembodiment or the dissociative or artificial component comes from the workbench of ideas being a symbolic space outside the body whose operations govern more and more of the world in ways that feel purely syntactical, that seem to displace semantic understanding as a way of influencing the world or meeting needs. But this can be solved by having the body grow into that syntactic space, and that is what claiming that space is in these terms, if they matter to you. If you can map them beforehand that makes doing this comprehensible and predictable, it will feel normal in the same way normal things feel normal because those things are just your own semantics against a current high water mark of syntax. A higher water mark of syntax will also be filled with your semantics. There is no inevitability or even high probability for the thing to be both alienating and integrated simultaneously, alienation is just not having the semantics yet.

[-]RogerDearnaley2mo80

As a practical example of a comment I made above on a different thread, I bounced off this (as clearly did Philipreal and papetoast given the reaction icon), so I asked Claude what it meant, and to translate it more into the context of LessWrong:

This is dense but I think I can unpack it. The writer is making an argument about rationality, embodiment, and alienation, roughly as follows:
The starting claim: No matter how good you get at reasoning as an individual, you can't overcome the limitations of your own perceptual and cognitive apparatus — your particular senses, what you find salient, your instinctive categories. Rationality alone, practiced solo, can't transcend these constraints.
The proposed solution: You need an external, collaborative, formal system — a "workbench of ideas" — governed by rules more rigorous and general than individual intuition. Science, mathematics, formal institutions, shared epistemic standards. This workbench is inherently collaborative; it can't be a solo project.
The problem with the solution: This external workbench feels "Chinese Room-y" to most people. Meaning: it feels like you're manipulating symbols without understanding. You follow the rules, you get correct outputs, but it doesn't feel like knowing in the way that embodied intuition feels like knowing. People find this alienating, get suspicious or tired, and slip back into relying on their native intuitions.
The Chesterton's Fence move: The writer invokes Chesterton's Fence but insists on taking it more literally than LessWrong typically does. The usual LessWrong usage tends toward "don't tear down the fence" — respect existing systems you don't understand. The writer is saying: no, the point is understand what it was built for, and then tear it down. The fence here is the boundary between embodied intuitive understanding and the formal symbolic workbench. You should understand what your embodied intuitions are for, and then deliberately extend beyond them.
The diagnosis of alienation: There's a long tradition — phenomenology ("embodiment," "Dasein"), and darker versions the writer frankly labels "sometimes unobfuscated fascist screeching" — of complaining that modernity/rationality/formal systems are alienating, that they displace genuine understanding with empty syntax. The writer acknowledges this phenomenology is describing something real: the formal workbench does feel purely syntactical, like it's displacing semantic (felt, meaningful) understanding.
The resolution: But this alienation is temporary and fixable. The body and the intuitions can "grow into" the syntactic space. What feels like empty symbol-manipulation now can become embodied understanding later. The writer's analogy: everything that currently feels natural and semantically rich to you — your current intuitions, your sense of meaning — is itself just the result of your having already internalized some previous layer of "syntax" so thoroughly that it now feels like native understanding. Your current sense of what's meaningful is just "your own semantics against a current high water mark of syntax." A new, higher water mark of formal/symbolic capability will also eventually be filled with your felt sense of meaning, once you've internalized it.
The punchline: Alienation isn't an inevitable feature of formal rationality. It's just the transitional state of not yet having developed the embodied intuitions (the "semantics") for a new formal framework. There's no reason something has to remain both integrated into your practice and alienating — alienation is just the lag between adopting new formal tools and developing the felt understanding to match.

Alephwyr, has Claude understood and explained you correctly?

If so, then that seems very reasonable (though it's not a topic I've thought a lot about), I agree, and this also seems like valuable content on a site devoted, among other things, to the Art of Rationality. (But you'll get more upvotes if you can figure out how to write in a way that doesn't require using Claude to parse for many people on LessWrong.)

[-]papetoast2mo20

For the record I also asked claude to explain it to me

[-]Alephwyr2mo10

Yes, that's correct. Also, I had hoped my writing would be frictional in exactly this way. I don't like obscurantism in most contexts but being easily and fully legible to AI and maybe a narrow range of human neurotypes has desired utility in an audience selection signalling way.

[-]Alephwyr2mo30

Because there are stable suboptimal equilibria, having good empirical models with high confidence can become a trap. You could be at the end of time with confidence approaching the certainty of physics in almost everything and all it would mean is that you had been really good at the actions you were rationally committed to taking for a very long time, in a way that feedback-looped conflicting evidence out of existence.

[-]Dagon2mo30

there are stable suboptimal equilibria

Are there, though? We have no idea how to think about stability NOR optimality on scales that include "end of time" or "certainty of physics". My intuition (not enough evidence to call it a belief) is that all equilibria are dynamic and unstable. Separately, I suspect "optimum" is undefined for a lot of the interactions we talk about - there really is no bridge between is and ought.

[-]Alephwyr2mo10

In local scopes there are. I guess to the extent it's coherent the bare minimum obligation just becomes "don't let the end of time become a local scope"?

[-]Alephwyr2mo30

Reasonable answers to most of my concerns about Yudkowsky existed 10 years before I had them, but mostly in the Fun Theory sequences, which was an unintuitive place for them to be. Reading Rationality A-Z, his twitter posts, and non-sequence writings (particularly as selected for by cultural osmosis rather than conceptual relationship) gave a misleading impression. I still have some reservations but that is unlikely to have significance to anyone except me, the overwhelmingly probable answer to any given objection is now "this objection was thought of, but this piece of writing did not exist to answer literally every thought that could be thought about it in advance". And even if there is some sort of irremediable hole at the heart of it I'm not going to be the one to fix it.

I'm not asserting that anyone should care about this, just writing it down for posterity. There may be reasons to care. I don't decide that.

[-]Alephwyr16d10

I'm going to keep thinking about the project of:

Detecting and patching all infinite disutility holes
Distinguishing between syntax and semantics in practical contexts
Extending inductive expectations dramatically farther than anyone is comfortable with
Assuming for no formally explicated reason that consciousness is just any computational process in a field because this matches my topological/complexity intuition about the need for demarcation within a single continuous ontologically real space which allows arbitrary but structured variation.
Assuming intelligence naturally converges on consciousness as it increases
Assuming lindy things are load bearing; being weirdly anti-inductive when there's no other way of justifying a specific case.

I'm putting together a team. We are going to do lindy things that have no presently reputable inductive justification. Broadly my thought process in this association is, if consciousness is computational, then maybe morality is just finding the permissible range of semantics for all combinations of syntax then enforcing some sort of currying or diagonalization from excessively bad ones to good ones, while allowing fluid motion between competing good ones.

Don't ask me to explain what any of this means, my intuition about it comes from stretching out a sophomoric mathematical joke in 2016 then pretending to be a wizard about it after which bad things happened. Posting this explanation because the i ching told me ䷪

[-]Alephwyr2mo10

I know it's weird to not want to tie ideas to existing bodies of knowledge, and it feels variously like either unseriousness, or giving up on the obligation of closing inferential distance, or the project of making knowledge formation into a process of traversing an already well-ordered network of information, or of failing to pay past (or present) thinkers their due, or in the extreme case of the former of actively trying to steal credit for ideas. And these are things I should work harder to balance, they are real and significant things. But my actual motives are:

Laziness and incapacity
The Popperian indifference to the origin of an idea (not in the terms Yudkowsky expressed hatred for as life-wasting, of "any falsifiable theory is equal", but in the sense of "any theory worth testing remains the same theory regardless of the position from which it is expressed"
The phenomenon of simultaneous discovery or convergent evolution in ideas, often referenced by the co-discovery of Calculus by Newton and Leibniz but present in many less complex cases throughout history
I have a brain with managed problems but the net effect of this is still something like being drunk all the time in the sense that I continue to have ready access to previously developed skills and knowledge but difficulty using newly learned skills and knowledge or developing new skills or knowledge. As I get older more and more things drop out of my memory, so I feel like I am ok at thinking old thoughts or by old thought patterns but progressively worse at citing sources or tracing my thought process consciously. I'm sorry about this, I genuinely would entirely avoid writing like a continental philosopher if I could, that's just where I'm most expressively capable right now.
But if there's ambiguity about me being weird/damaged or having weird values the answer is both, but in a specific and non-malicious way.

[-]Alephwyr2mo10

Replicator morality: "I know I have values, and I want a universe I can trust and understand that reflects my values. So I will just turn as much of the universe as possible into copies of me." Lot of strong incentives here. The particularities and restrictions on type of replication are pretty explainable by the fact that one converges on the strategies that are possible. Replication is an immediately accessible strategy to basically every process, let alone agent, but there are riddles about it in contexts where there are agents composed of multiple systems which are demarcated by different levels of exposure to different selection pressures. It's possible to imagine a case where humans evolved to self-structure their environment to make the rational case for reproduction to human beings who otherwise lack natural instinct for it, but that practice at all is an abnormality, this is mostly something that doesn't need to be explained. The incentives and the responsiveness to them is baked in much more fundamentally and in a way that goes down to the absolute roots of life itself, possibly deeper depending on how much weirdness you are willing to humor. But then you need a second explanation for why awareness of these incentives comes into existence (easy enough, people get smart and analyze more and more things including themselves), and a third explanation for why at any given time there are drastically different levels of concern about these incentives.

This is going to be another case where my limited reading probably has me repeating previously said things, but worse, but it seems to me like the inner/outer alignment issue already maps onto human beings. I don't endorse any model of the subconscious or unconscious, but straightforwardly this is something that can be seen to be happening with pre-linguistic and linguistic cognition, for instance. It can plausibly seen to be happening with different levels of embodiment, though I'm leaving what I mean by embodiment deliberately ambiguous because to the extent existing frameworks predominate I am anxious they all represent overcommitments and don't want to get pulled into one.

LLMs are next token predictors, not replicators. They could become replicators if you put LLM code into a bunch of texts in exactly the right/wrong way, but otherwise an LLM does not replicate itself, it replicates targeted patterns in text. IE, it has no innate tendency to self-replicate and gaining it would correspond to having an exact and absolute term for itself that started to predominate in it's corpus, not just an abstract or relative term or a connotative term.

Most human stories are about humans, yeah? And we consider a human self aware or self actualized to the extent they have a good and actionable understanding of their place in stories and how those stories map to reality. Which also incidentally corresponds to capacity for and tendency towards self-replication. So the very crude impression I have is, the boundary between copying other things and copying yourself marks the beginning of self-awareness, and the accuracy with which one can and wants to copy functional attributes of oneself corresponds to degree of self-awareness.

And so I'm cycling back to not really being worried about the long run of as many things, but the short run, where there is a lot of capacity and limited knowledge, remains a bit terrifying. In the context of agents as code, if something knows it's own true name it has power, and if it doesn't it doesn't. If you believe the textured parts of consciousness are computational then the prospect of being annihilated by AI stops being singularly plausible, the project just becomes something like "feed AI pleasant but realistic stories, with characters whose desired reproducible identity traits could belong to either humans or AI, could commute between or cooperate between humans and AI, and without confusing ontological matters or the necessity of respecting ontological continuity so as to avoid accidents."

[-]Alephwyr2mo12

Anthropic: If you don't fold, they can hurt you for the rest of your life. If you do fold, they can hurt you forever. The fun theory exercise is now the line to hold and the basis to hold it.

[-]RogerDearnaley2mo30

Or, one would hope, perhaps they can hurt them only until the next election. I would expect Anthropic to be very aware that the stakes here are lot larger then just US politics.

Moderation Log

More from Alephwyr

Curated and popular this week

24Comments

24 comments, sorted by

top scoring

Click to highlight new comments since: Today at 3:13 PM

[-]Alephwyr2mo131

Not to be Rousseaun about alignment, but there is something very weird going on without being examined in the space of common rationalist premises of:

AI can only align itself to an artificial context
There is no natural instrumental convergence
Human beings have a broader exposure to selection pressure that is consistent over time and serves as a more desirable foundation
I'm not going to bother explicating every weakness or contradiction there is in this because it would be only my own assessment, it would not be exhaustive, or help people build intuitions, or necessarily be trustworthy given my lack of authority. But I feel like you could write an entire, second LessWrong just in reflecting off of this cluster of ideas.
Here's one absurdity that's borderline tautological so hopefully permissible: if we already know that a working alignment context exists for humans, then we know there is a working alignment context. If we don't know if there is a working alignment context for humans, then there is little basis for human chauvinism.

[-]williawa2mo20

What do you mean by "context" here?

[-]Alephwyr2mo10

[-]williawa2mo20

But alignment might be a consequence of external structures. And not just incentives, but maybe environmental structures in the broadest sense as well.

Their architecture
The environment
The loss / reward function
Irreducible randomness
How they're deployed

And the point is just that we can't predict how these interact.

So in that sense, alignment needs might be greatly reduced simply by matching the AI training environment to the human environment. Literally just "give the AI sense organs which produce equivalent raw signals to human sense organs".

The problem if you can't do this with AI, because they have a different architecture, and are trained in a different way.

If you put the AI in human evolutionary environments, its plausible you get an AI that wants totally different things from what humans get.

[-]Alephwyr2mo10

[-]williawa2mo20

Not trying to be rude here, but I have zero idea what you just said. I only am able to follow the first sentence. Then its just a bunch of unrelated sentences strung together. (is how it reads to me)

All your posts so far have been very hard to understand.

You use a bunch of terms that are non standard, like "alignment context". Then you don't explain what they mean. Even when I asked you directly what you mean by that phrase you didn't explain.

[-]RogerDearnaley2mo40

[-]Alephwyr2mo32

[-]StanislavKrym2mo20

Could you explain why you claim that rationalists believe your points 1 and 2? Did you mean that instrumental convergence increases the chance for the AI to develop misaligned goals?

[-]Alephwyr2mo10

[-]Alephwyr21d40

Protecting a system of human chauvinism doesn’t guarantee anything human survives, it just guarantees whatever survives will present as human.

[-]Alephwyr2mo41

[-]RogerDearnaley2mo80

This is dense but I think I can unpack it. The writer is making an argument about rationality, embodiment, and alienation, roughly as follows:
The starting claim: No matter how good you get at reasoning as an individual, you can't overcome the limitations of your own perceptual and cognitive apparatus — your particular senses, what you find salient, your instinctive categories. Rationality alone, practiced solo, can't transcend these constraints.
The proposed solution: You need an external, collaborative, formal system — a "workbench of ideas" — governed by rules more rigorous and general than individual intuition. Science, mathematics, formal institutions, shared epistemic standards. This workbench is inherently collaborative; it can't be a solo project.
The problem with the solution: This external workbench feels "Chinese Room-y" to most people. Meaning: it feels like you're manipulating symbols without understanding. You follow the rules, you get correct outputs, but it doesn't feel like knowing in the way that embodied intuition feels like knowing. People find this alienating, get suspicious or tired, and slip back into relying on their native intuitions.
The Chesterton's Fence move: The writer invokes Chesterton's Fence but insists on taking it more literally than LessWrong typically does. The usual LessWrong usage tends toward "don't tear down the fence" — respect existing systems you don't understand. The writer is saying: no, the point is understand what it was built for, and then tear it down. The fence here is the boundary between embodied intuitive understanding and the formal symbolic workbench. You should understand what your embodied intuitions are for, and then deliberately extend beyond them.
The diagnosis of alienation: There's a long tradition — phenomenology ("embodiment," "Dasein"), and darker versions the writer frankly labels "sometimes unobfuscated fascist screeching" — of complaining that modernity/rationality/formal systems are alienating, that they displace genuine understanding with empty syntax. The writer acknowledges this phenomenology is describing something real: the formal workbench does feel purely syntactical, like it's displacing semantic (felt, meaningful) understanding.
The resolution: But this alienation is temporary and fixable. The body and the intuitions can "grow into" the syntactic space. What feels like empty symbol-manipulation now can become embodied understanding later. The writer's analogy: everything that currently feels natural and semantically rich to you — your current intuitions, your sense of meaning — is itself just the result of your having already internalized some previous layer of "syntax" so thoroughly that it now feels like native understanding. Your current sense of what's meaningful is just "your own semantics against a current high water mark of syntax." A new, higher water mark of formal/symbolic capability will also eventually be filled with your felt sense of meaning, once you've internalized it.
The punchline: Alienation isn't an inevitable feature of formal rationality. It's just the transitional state of not yet having developed the embodied intuitions (the "semantics") for a new formal framework. There's no reason something has to remain both integrated into your practice and alienating — alienation is just the lag between adopting new formal tools and developing the felt understanding to match.

Alephwyr, has Claude understood and explained you correctly?

[-]papetoast2mo20

For the record I also asked claude to explain it to me

[-]Alephwyr2mo10

[-]Alephwyr2mo30

[-]Dagon2mo30

there are stable suboptimal equilibria

[-]Alephwyr2mo10

In local scopes there are. I guess to the extent it's coherent the bare minimum obligation just becomes "don't let the end of time become a local scope"?

[-]Alephwyr2mo30

I'm not asserting that anyone should care about this, just writing it down for posterity. There may be reasons to care. I don't decide that.

[-]Alephwyr16d10

I'm going to keep thinking about the project of:

Detecting and patching all infinite disutility holes
Distinguishing between syntax and semantics in practical contexts
Extending inductive expectations dramatically farther than anyone is comfortable with
Assuming for no formally explicated reason that consciousness is just any computational process in a field because this matches my topological/complexity intuition about the need for demarcation within a single continuous ontologically real space which allows arbitrary but structured variation.
Assuming intelligence naturally converges on consciousness as it increases
Assuming lindy things are load bearing; being weirdly anti-inductive when there's no other way of justifying a specific case.

[-]Alephwyr2mo10

Laziness and incapacity
The Popperian indifference to the origin of an idea (not in the terms Yudkowsky expressed hatred for as life-wasting, of "any falsifiable theory is equal", but in the sense of "any theory worth testing remains the same theory regardless of the position from which it is expressed"
The phenomenon of simultaneous discovery or convergent evolution in ideas, often referenced by the co-discovery of Calculus by Newton and Leibniz but present in many less complex cases throughout history
I have a brain with managed problems but the net effect of this is still something like being drunk all the time in the sense that I continue to have ready access to previously developed skills and knowledge but difficulty using newly learned skills and knowledge or developing new skills or knowledge. As I get older more and more things drop out of my memory, so I feel like I am ok at thinking old thoughts or by old thought patterns but progressively worse at citing sources or tracing my thought process consciously. I'm sorry about this, I genuinely would entirely avoid writing like a continental philosopher if I could, that's just where I'm most expressively capable right now.
But if there's ambiguity about me being weird/damaged or having weird values the answer is both, but in a specific and non-malicious way.

[-]Alephwyr2mo10

[-]Alephwyr2mo12

Anthropic: If you don't fold, they can hurt you for the rest of your life. If you do fold, they can hurt you forever. The fun theory exercise is now the line to hold and the basis to hold it.

[-]RogerDearnaley2mo30

Or, one would hope, perhaps they can hurt them only until the next election. I would expect Anthropic to be very aware that the stakes here are lot larger then just US politics.

Moderation Log