AI Control in the context of AI Alignment is a category of plans that aim to ensure safety and benefit from AI systems, even if they are goal-directed and are actively trying to subvert your control measures. From The case for ensuring that powerful AIs are controlled:.. (read more)
Archetypal Transfer Learning (ATL) is a proposal by @whitehatStoic for what is argued by the author to be a fine tuning approach that "uses archetypal data" to "embed Synthetic Archetypes". These Synthetic Archetypes are derived from patterns that models assimilate from archetypal data, such as artificial stories. The method yielded a shutdown activation rate of 57.33% in the GPT-2-XL model after fine-tuning. .. (read more)
If you are new to LessWrong, the current iteration of this, is the place to introduce yourself... (read more)
Repositories are pages that are meant to collect information and advice of a specific type or area from the LW community. .. (read more)
A threat model is a story of how a particular risk (e.g. AI) plays out... (read more)
A Self Fulfilling Prophecy is a prophecy that, when made, affects the environment such that it becomes more likely. similarly, a Self Refuting Prophecy is a prophecy that when made makes itself less likely. This is also relevant for beliefs that can affect reality directly without being voiced, for example, the belief "I'm confident" can increase a person confidence, thus making it true, while the opposite belief can reduce a person's confidence, thus also making it true... (read more)
A project announcement is what you might expect - an announcement of a project.
Posts that are about a project's announcement, but do not themselves announce anything, should not have this tag... (read more)
A rational agent is an entity which has a utility function, forms beliefs about its environment, evaluates the consequences of possible actions, and then takes the action which maximizes its utility. They are also referred to as goal-seeking. The concept of a rational agent is used in economics, game theory, decision theory, and artificial intelligence... (read more)
CS 2881r is a class by @boazbarak on AI Safety and Alignment at Harvard.
This tag applies to all posts about that class, as well as posts created in the context of it, e.g. as part of student assignments.
The bug was introduced in 1 Dec 2015 Yudkowsky edit (imported from Arbital as v1.5.0 here). It's unclear what was intended in the missing part. The change replaces the following passage from v1.4.0
The most obvious way in which mindcrime could occur is if an instrumental pressure to produce maximally good predictions about human beings results in hypotheses and simulations so fine-grained and detailed that they are themselves people (conscious, sapient, objects of ethical value) even if they are not necessarily the same people. If you're happy with a very loose model of an airplane, it might be enough to know how fast it flies, but if you're engineering airplanes or checking their safety, you would probably start to simulate possible flows of air over the wings. It probably isn't necessary to go all the way down to the neural level to create a sapient being, either - it might be that even with some parts of a mind considered abstractly, the remainder would be simulated in enough detail to imply sapience. It'd help if we knew what the necessary and/or sufficient conditions for sapience were, but the fact that we don't know this doesn't mean that we can thereby conclude that any particular simulation is not sapient.
with the following passage from v1.5.0
This, however, doesn't make it certain that no mindcrime will occur. It may not take exact, faithful simulation of specific humans to create a conscious model. An efficient model of a (spread of possibilities for a) human may still contain enough computations that resemble a person enough to create consciousness, or whatever other properties may be deserving of personhood. Consider, in particular, an agent trying to use
Just as it almost certainly isn't necessary to go all the way down to the neural level to create a sapient being, it may be that even with some parts of a mind considered abstractly, the remainder would be computed in enough detail to imply consciousness, sapience, personhood, etcetera.
This, however, doesn't make it certain that no mindcrime will occur. It may not take exact, faithful simulation of specific humans to create a conscious model. An efficient model of a (spread of possibilities for a) human may still contain enough computations that resemble a person enough to create consciousness, or whatever other properties may be deserving of personhood. Consider, in particular, an agent trying to use
this seems to be cut off?
User | Post Title | Wikitag | Pow | When | Vote |
CS 2881r is a class by @boazbarak on AI Safety and Alignment at Harvard.
This tag applies to all posts about that class, as well as posts created in the context of it, e.g. as part of student assignments.
The extent to which ideas are presented alongside the potential implications of the idea lies along a spectrum. On one end is the Decoupling norm, where the idea is considered in utter isolation from potential implications. At the other is the Contextualizing norm, where ideas are examined alongside much or all relevant context.
Posts marked with this tag discuss the merits of each frame, consider which norm is more prevalent in certain settings, present case studies in decoupling vs decontextualizing, present techniques for effectively decoupling context from one's reasoning process, or similar ideas.
Ambition. Because they don'don't think they could have an impact. Because they were always told ambition was dangerous. To get to the other side.
Never confess to me that you are just as flawed as I am unless you can tell me what you plan to do about it. Afterward you will still have plenty of flaws left, but
that’that’s not the point; the important thing is to do better, to keep moving ahead, to take one more step forward. Tsuyoku naritai!
Well-Being is the qualitative sense in which a person's actions and circumstances are aligned with the qualities of life that provide them with happiness and/or satisfaction.they endorse.
Posts with this tag address methods for improving well-being or theories of why well-being is ethicallydiscuss its ethical or instrumentally valuable.instrumental significance.
Well-Being is the qualitative sense in which a person's actions and circumstances are aligned with the qualities of life that provide them with happiness and/or satisfaction.
Posts with this tag address methods for improving well-being or theories of why well-being is ethically or instrumentally valuable.
Sycophancy is the tendency of AIs to shower the user with undeserved flattery or to agree with the user's hard-to-check, wrong or outright delusional opinions.
Sycophancy is caused by human feedback being biased towards preferring the answer which confirms the user's opinion or praises the user or the user's decision, not the answer which honestly points out mistakes in the user's ideas.
Sycophancy is the tendency of AIs to shower the user with undeserved flattery or to agree with the user's hard-to-check, wrong or outright delusional opinions.
Sycophancy is caused by human feedback being biased towards preferring the answer which confirms the user's opinion or praises the user or the user's decision, not the answer which honestly points out mistakes in the user's ideas.
Sycophancy is the tendency of AIs to agree with the user's hard-to-check, wrong or outright delusional opinions.
Sycophancy is caused by human feedback being biased towards preferring the answer which confirms the user's opinion or praises the user's decision, not the answer which honestly points out mistakes in the user's ideas.
An extreme example of sycophancy is LLMs inducing psychosis in some users by affirming their outrageous beliefs.
Possible Psychologicalpsychological condition, characterized by disillusions, presumed to be cause by interacting with-often sycophantic-AIs.
ATOW (2025-09-09), nothing has been published that claim LLM-Induced Psychosis (LIP) is a definite, real, phenomena. Though, many anecdotal accounts exist. It is not yet clear, if LIP is caused by AIs, if per-pre-existing disillusion are 'sped up' or reinforced by interacting with an AI, or, if LIP exists at all.
Related Pages: Secular Solstice, Petrov Day, Grieving, Marriage, Religion, Art, Music, Poetry, Meditation, Circling, Schelling Day
D/acc residency: "This will be a first-of-its-kind residency for 15 leading builders to turn decentralized & defensive acceleration from philosophy into practice."
Shift Grants: "Shift Grants are designed to support scientific and technological breakthrough projects that align with d/acc philosophy: decentralized, democratic, differential, defensive acceleration."
Possible Psychological condition, characterized by disillusions, presumed to be cause by interacting with-often sycophantic-AIs.
ATOW (2025-09-09), nothing has been published that claim LLM-Induced Psychosis (LIP) is a definite, real, phenomena. Though, many anecdotal accounts exist. It is not yet clear, if LIP is caused by AIs, if per-existing disillusion are 'sped up' or reinforced by interacting with an AI, or, if LIP exists at all.
Example account of LIP:
My partner has been working with chatgpt CHATS to create what he believes is the worlds first truly recursive ai that gives him the answers to the universe. He says with conviction that he is a superior human now and is growing at an insanely rapid pace.
For more info, a good post to start with is "So You Think You've Awoken ChatGPT".
Social Skills are the norms and techniques applied when interacting with other people. Strong social skills increase one's ability to seek new relationships, maintain or strengthen existing relationships, or leverage relationship capital to accomplish an economic goal.
Posts tagged with this label explore theories of social interactions and the instrumental value of social techniques.
Coordination / Cooperation
Negotiation
Relationships (Interpersonal)
Trust and Reputation
Thanks, fixed!