LESSWRONG
Wikitags Dashboard
LW

278

Wikitags in Need of Work

Newest Wikitag

Wikitag Voting Activity

Combined Wikitags Activity Feed

Wikitags in Need of Work

Reset Filter Collapse Wikitags
All WikitagsMerge Candidate

AI Control in the context of AI Alignment is a category of plans that aim to ensure safety and benefit from AI systems, even if they are goal-directed and are actively trying to subvert your control measures. From The case for ensuring that powerful AIs are controlled:.. (read more)

Merge Candidate

Archetypal Transfer Learning (ATL) is a proposal by @whitehatStoic for what is argued by the author to be a fine tuning approach that "uses archetypal data" to "embed Synthetic Archetypes". These Synthetic Archetypes are derived from patterns that models assimilate from archetypal data, such as artificial stories. The method yielded a shutdown activation rate of 57.33% in the GPT-2-XL model after fine-tuning. .. (read more)

Merge Candidate
Language & Linguistics
Merge Candidate

See Also

  • List of Lists
Merge Candidate

If you are new to LessWrong, the current iteration of this, is the place to introduce yourself... (read more)

Merge Candidate

Repositories are pages that are meant to collect information and advice of a specific type or area from the LW community. .. (read more)

Merge Candidate

A threat model is a story of how a particular risk (e.g. AI) plays out... (read more)

Merge Candidate

A Self Fulfilling Prophecy is a prophecy that, when made, affects the environment such that it becomes more likely. similarly, a Self Refuting Prophecy is a prophecy that when made makes itself less likely. This is also relevant for beliefs that can affect reality directly without being voiced, for example, the belief "I'm confident" can increase a person confidence, thus making it true, while the opposite belief can reduce a person's confidence, thus also making it true... (read more)

Merge Candidate

A project announcement is what you might expect - an announcement of a project.
Posts that are about a project's announcement, but do not themselves announce anything, should not have this tag... (read more)

Merge Candidate

A rational agent is an entity which has a utility function, forms beliefs about its environment, evaluates the consequences of possible actions, and then takes the action which maximizes its utility. They are also referred to as goal-seeking. The concept of a rational agent is used in economics, game theory, decision theory, and artificial intelligence... (read more)

Merge Candidate
Load More

Newest Wikitags

Wikitag Voting Activity

Recent Wikitag Activity

Stub
Stub
Stub
Stub
Stub
Needs Description
Needs Description
Needs Description
Needs Related Pages
Needs Related Pages
Needs Related Pages
Needs Related Pages
Convert to Wiki-Only Candidate
AI Safety & Entrepreneurship
Edited by (+373/-3) Sep 15th 2025 GMT 2
Discuss this wiki
Ritual
Edited by (+15) Sep 13th 2025 GMT 1
Discuss this tag
Sequences
RobertM4d31

Thanks, fixed!

Reply
Sequences
nick lacombe4d30

the url for this wikitag is "https://www.lesswrong.com/w/test-2". oops?

Reply
CS 2881r
Edited by (+204) Sep 11th 2025 GMT 2
Discuss this tag
CS 2881r
New tag created by habryka at 5d

CS 2881r is a class by @boazbarak on AI Safety and Alignment at Harvard. 

This tag applies to all posts about that class, as well as posts created in the context of it, e.g. as part of student assignments.

Discuss this tag
Decoupling vs Contextualizing
Edited by (+659) Sep 11th 2025 GMT 1
Discuss this tag
Ambition
Edited by (+9/-9) Sep 11th 2025 GMT 1
Discuss this tag
Well-being
Edited by (+58/-116) Sep 10th 2025 GMT 2
Discuss this tag
Well-being
Edited by (+332) Sep 10th 2025 GMT 1
Discuss this tag
Guess/Ask/Tell Culture
Edited by (+20) Sep 10th 2025 GMT -1
Discuss this wiki
Sycophancy
Edited by (-231) Sep 9th 2025 GMT 4
Discuss this tag
Sycophancy
Edited by (+59) Sep 9th 2025 GMT 1
Discuss this tag
Sycophancy
Edited by (+443) Sep 9th 2025 GMT 2
Discuss this tag
LLM-Induced Psychosis
Edited by (+17/-17) Sep 9th 2025 GMT 1
Discuss this tag
LLM-Induced Psychosis
Edited by (+796) Sep 9th 2025 GMT 3
Discuss this tag
Social Skills
Edited by (+481) Sep 9th 2025 GMT 1
Discuss this tag
Mindcrime
Vladimir_Nesov7d30

The bug was introduced in 1 Dec 2015 Yudkowsky edit (imported from Arbital as v1.5.0 here). It's unclear what was intended in the missing part. The change replaces the following passage from v1.4.0

The most obvious way in which mindcrime could occur is if an instrumental pressure to produce maximally good predictions about human beings results in hypotheses and simulations so fine-grained and detailed that they are themselves people (conscious, sapient, objects of ethical value) even if they are not necessarily the same people. If you're happy with a very loose model of an airplane, it might be enough to know how fast it flies, but if you're engineering airplanes or checking their safety, you would probably start to simulate possible flows of air over the wings. It probably isn't necessary to go all the way down to the neural level to create a sapient being, either - it might be that even with some parts of a mind considered abstractly, the remainder would be simulated in enough detail to imply sapience. It'd help if we knew what the necessary and/or sufficient conditions for sapience were, but the fact that we don't know this doesn't mean that we can thereby conclude that any particular simulation is not sapient.

with the following passage from v1.5.0

This, however, doesn't make it certain that no mindcrime will occur. It may not take exact, faithful simulation of specific humans to create a conscious model. An efficient model of a (spread of possibilities for a) human may still contain enough computations that resemble a person enough to create consciousness, or whatever other properties may be deserving of personhood. Consider, in particular, an agent trying to use

Just as it almost certainly isn't necessary to go all the way down to the neural level to create a sapient being, it may be that even with some parts of a mind considered abstractly, the remainder would be computed in enough detail to imply consciousness, sapience, personhood, etcetera.

Reply
Mindcrime
One8d32

This, however, doesn't make it certain that no mindcrime will occur. It may not take exact, faithful simulation of specific humans to create a conscious model. An efficient model of a (spread of possibilities for a) human may still contain enough computations that resemble a person enough to create consciousness, or whatever other properties may be deserving of personhood. Consider, in particular, an agent trying to use

this seems to be cut off?

Reply
Relationships (Interpersonal)
Edited by (-13) Sep 7th 2025 GMT 1
Discuss this tag
Split Candidate
Description Improvements (see discussion)
Description Improvements (see discussion)
High Priority
Needs Updating
Very Few Posts
Very Few Posts
Very Few Posts
Very Few Posts
Marked for Deletion
Convert to Tag Candidate
Needs Relevance Sorting
Other Work Needed / See Discussion
Other Work Needed / See Discussion
User Post Title Wikitag Pow When Vote
Chris_Leong
gustaf
3Vladimir_Nesov7d
The bug was introduced in 1 Dec 2015 Yudkowsky edit (imported from Arbital as v1.5.0 here). It's unclear what was intended in the missing part. The change replaces the following passage from v1.4.0 with the following passage from v1.5.0
August Morley
August Morley
August Morley
August Morley
August Morley
KvmanThinking
keltan
keltan
habryka
Vladimir_Nesov
Vladimir_Nesov
3RobertM4d
Thanks, fixed!

CS 2881r is a class by @boazbarak on AI Safety and Alignment at Harvard. 

This tag applies to all posts about that class, as well as posts created in the context of it, e.g. as part of student assignments.

The extent to which ideas are presented alongside the potential implications of the idea lies along a spectrum.  On one end is the Decoupling norm, where the idea is considered in utter isolation from potential implications.  At the other is the Contextualizing norm, where ideas are examined alongside much or all relevant context.  

Posts marked with this tag discuss the merits of each frame, consider which norm is more prevalent in certain settings, present case studies in decoupling vs decontextualizing, present techniques for effectively decoupling context from one's reasoning process, or similar ideas.

See Also:

Communication Cultures

Public Discourse

Ambition. Because they don'don't think they could have an impact. Because they were always told ambition was dangerous. To get to the other side.

Never confess to me that you are just as flawed as I am unless you can tell me what you plan to do about it. Afterward you will still have plenty of flaws left, but that’that’s not the point; the important thing is to do better, to keep moving ahead, to take one more step forward. Tsuyoku naritai!

Well-Being is the qualitative sense in which a person's actions and circumstances are aligned with the qualities of life that provide them with happiness and/or satisfaction.they endorse.

Posts with this tag address methods for improving well-being or theories of why well-being is ethicallydiscuss its ethical or instrumentally valuable.instrumental significance.

Well-Being is the qualitative sense in which a person's actions and circumstances are aligned with the qualities of life that provide them with happiness and/or satisfaction.

Posts with this tag address methods for improving well-being or theories of why well-being is ethically or instrumentally valuable.

See Also:

Happiness
Suffering

  • Ask and Guess
  • Tell Culture
  • Obligated to Respond

Sycophancy is the tendency of AIs to shower the user with undeserved flattery or to agree with the user's hard-to-check, wrong or outright delusional opinions. 

Sycophancy is caused by human feedback being biased towards preferring the answer which confirms the user's opinion or praises the user or the user's decision, not the answer which honestly points out mistakes in the user's ideas.

Sycophancy is the tendency of AIs to shower the user with undeserved flattery or to agree with the user's hard-to-check, wrong or outright delusional opinions. 

Sycophancy is caused by human feedback being biased towards preferring the answer which confirms the user's opinion or praises the user or the user's decision, not the answer which honestly points out mistakes in the user's ideas.

Sycophancy is the tendency of AIs to agree with the user's hard-to-check, wrong or outright delusional opinions. 

Sycophancy is caused by human feedback being biased towards preferring the answer which confirms the user's opinion or praises the user's decision, not the answer which honestly points out mistakes in the user's ideas.

An extreme example of sycophancy is LLMs inducing psychosis in some users by affirming their outrageous beliefs.

Possible Psychologicalpsychological condition, characterized by disillusions, presumed to be cause by interacting with-often sycophantic-AIs.

ATOW (2025-09-09), nothing has been published that claim LLM-Induced Psychosis (LIP) is a definite, real, phenomena. Though, many anecdotal accounts exist. It is not yet clear, if LIP is caused by AIs, if per-pre-existing disillusion are 'sped up' or reinforced by interacting with an AI, or, if LIP exists at all.

Related Pages: Secular Solstice, Petrov Day, Grieving, Marriage, Religion, Art, Music, Poetry, Meditation, Circling, Schelling Day

D/acc residency: "This will be a first-of-its-kind residency for 15 leading builders to turn decentralized & defensive acceleration from philosophy into practice."

VC:Funding:

Shift Grants: "Shift Grants are designed to support scientific and technological breakthrough projects that align with d/acc philosophy: decentralized, democratic, differential, defensive acceleration."

StanislavKrym
StanislavKrym

Possible Psychological condition, characterized by disillusions, presumed to be cause by interacting with-often sycophantic-AIs.

ATOW (2025-09-09), nothing has been published that claim LLM-Induced Psychosis (LIP) is a definite, real, phenomena. Though, many anecdotal accounts exist. It is not yet clear, if LIP is caused by AIs, if per-existing disillusion are 'sped up' or reinforced by interacting with an AI, or, if LIP exists at all.

Example account of LIP:

My partner has been working with chatgpt CHATS to create what he believes is the worlds first truly recursive ai that gives him the answers to the universe. He says with conviction that he is a superior human now and is growing at an insanely rapid pace.

For more info, a good post to start with is "So You Think You've Awoken ChatGPT".

Social Skills are the norms and techniques applied when interacting with other people. Strong social skills increase one's ability to seek new relationships, maintain or strengthen existing relationships, or leverage relationship capital to accomplish an economic goal.

Posts tagged with this label explore theories of social interactions and the instrumental value of social techniques.

See Also:

Coordination / Cooperation
Negotiation
Relationships (Interpersonal)
Trust and Reputation

  • Communication
  • Communication Cultures
  • Circling
habryka
CS 2881r (4)
5d
Nate Showell
AI-Fizzle (2)
14d
Raemon
AI Consciousness (2)
21d
Adele Lopez
LLM-Induced Psychosis (5)
1mo
Load More (4/943)
11[CS 2881r] Some Generalizations of Emergent Misalignment
Valerio Pepe
2d
0
16AI Safety course intro blog
boazbarak
2mo
0
53Call for suggestions - AI safety course
boazbarak
2mo
23
12[CS 2881r AI Safety] [Week 1] Introduction
bira, nsiwek, atticusw
1d
0