LESSWRONG
Wikitags Dashboard
LW

597

Wikitags in Need of Work

Newest Wikitag

Wikitag Voting Activity

Combined Wikitags Activity Feed

Wikitags in Need of Work

Reset Filter Collapse Wikitags
All Wikitags

Newest Wikitags

Wikitag Voting Activity

Recent Wikitag Activity

Needs Description
Needs Related Pages
Stub
Needs Relevance Sorting
Merge Candidate
Very Few Posts
Needs Updating
Convert to Tag Candidate
Convert to Wiki-Only Candidate
High Priority
Marked for Deletion
User Post Title Wikitag Pow When Vote
Description Improvements (see discussion)
Other Work Needed / See Discussion
Split Candidate
JohnofCharleston
Unconference (3)
2d
Topaz
university groups (0)
7d
david reinstein
The Unjournal (1)
7d
plex
Utopia (32)
1mo
Load More (4/950)
Guide To The Less Wrong Editor
Sinclair Chen17m10

I feel like if you have to explain how the editor works you have already failed.

Reply
Less Wrong/2007 Articles/Summaries
Edited by (-38) Nov 13th 2025 GMT 1
Discuss this wiki
Unconference
Edited by (+133/-20) Nov 11th 2025 GMT 1
Discuss this tag
Unconference
Edited by (+3476) Nov 11th 2025 GMT 1
Discuss this tag
Unconference
New tag created by JohnofCharleston at 2d

Note to LW Admins: I was surprised this tag/explanation didn't exist, so I adapted from our Manifest X DC Attendee Guide. Recommend categorizing this wikitag under "Community".

 

Unlike traditional conferences with a fixed agenda, unconferences are organized by attendees on the spot. This has the advantage of attendees actively shaping the activities, discussions, and topics into the kind of event they want to have....

(Read More)

Discuss this tag
Self-Love
Edited by (+25/-3) Nov 10th 2025 GMT 1
Discuss this tag
Self-Love
Edited by (+153) Nov 10th 2025 GMT 1
Discuss this tag
Sinclair's Razor
Edited by (+18/-30) Nov 7th 2025 GMT 2
Discuss this wiki
Sinclair's Razor
Edited by (+836) Nov 7th 2025 GMT 2
Discuss this wiki
university groups
New tag created by Topaz at 7d
Discuss this tag
The Unjournal
Edited by (+743) Nov 7th 2025 GMT 1
Discuss this tag
The Unjournal
New tag created by david reinstein at 7d

The Unjournal is a nonprofit organisation that works to organize and fund public journal-independent feedback, rating, and evaluation of hosted papers and dynamically-presented research projects. Their initial focus is on quantitative work that informs global priorities, especially in economics, policy, and social science. They aim to encourage better research by making it easier for researchers to get feedback and credible ratings on their work.

The Unjournal was founded by David Reinstein and received a $565,000 grant from the Survival and Flourishing Fund in 2023.

Further reading:

An introduction to The Unjournal...

(Read More)

Discuss this tag
Eliezers Lost Alignment Articles The Arbital Sequence
Kabir Kumar8d10

Thanks for putting this together - the first article is especially useful

Reply
Sufficiently optimized agents appear coherent
Edited by (+99/-75) Nov 5th 2025 GMT 2
Discuss this wiki
Sufficiently optimized agents appear coherent
Edited by (+214/-164) Nov 5th 2025 GMT 2
Discuss this wiki
Sufficiently optimized agents appear coherent
Edited by (-27) Nov 5th 2025 GMT 2
Discuss this wiki
Relevant powerful agents will be highly optimized
Edited by (+19/-47) Nov 5th 2025 GMT 2
Discuss this wiki
Relevant powerful agents will be highly optimized
Edited by (+70/-75) Nov 5th 2025 GMT 2
Discuss this wiki
Löb's theorem
Edited by (+1154) Nov 4th 2025 GMT 2
Discuss this tag
Eliezer Yudkowsky
Edited by (+95/-69) Oct 31st 2025 GMT 1
Discuss this wiki
gustaf
gustaf
david reinstein
JenniferRM
Mateusz Bagiński
Mateusz Bagiński
Mateusz Bagiński
Mateusz Bagiński
Mateusz Bagiński
P. João
P. João
TsviBT
TsviBT
JohnofCharleston
JohnofCharleston

An example of a scenario that negates RelevantPowerfulAgentsHighlyOptimizedthe thesis that relevant powerful agents will be highly optimized is KnownAlgorithmNonrecursiveIntelligenceKANSI, where a cognitively powerful intelligence is produced by pouring lots of computing power into known algorithms, and this intelligence is then somehow prohibited from self-modification and the creation of environmental subagents.

This section serves to answer the question: How much and when should we judge and blame ourselves, and how much and when should we take care forof ourselves and accept ourselves?

This serves to answer the question: How much and when should we judge and blame ourselves, and how much and when should we care for and accept ourselves?

  • on LessWrong starting with The Sequences / Rationality A-Z

  • Harry Potter and the Methods Of Rationality (book)

  • Planecrash / Project Lawful (long fictional story)

  • on Arbital (AI Alignment)

  • If Anyone Builds It, Everyone Dies (book)

  • on Twitter/X (mostly retweeting)

  • and on Facebook (mostly retweeting)

  • on Mediumyudkowsky.net (2 posts)

    (personal/fiction)
  • on Tumblr (fiction / on writing)

    • e.g. Masculine Mongoose (short story)
  • on Medium (2 posts)

  • on fanfiction.net (3 stories)

  • on yudkowsky.net (personal/fiction)

  • on Reddit (fiction / on writing)

    • also pseudonymously: Kindness to Kin

Note to LW Admins: I was surprised this tag/explanation didn't exist, so I adapted from our Manifest X DC Attendee Guide. Recommend categorizing this wikitag under "Community".

  • Propose Sessions: Anyone can propose a session on a topic they're passionate about or want to learn more about. Don't worry if you've never led a session before, or if your idea isn't fully formed—these are generally low-risk environments for sharing and exploring. 
    • Sometimes there is a scratchpad for topic ideas that are not yet scheduled, which can be helpful for polling interest in potential topics.
    • Focusing on interesting digressions from discussions earlier in the day often produces many of the better sessions at these events. This is the kind of iterative, responsive content that is neglected by conferences where everything is scheduled in advance .advance.
  • Schedule Sessions: Some unconferences have participants vote on proposed topics, especially if breakout space is badly constrained. Most don't, devoting at least some rooms to open first-come-first-served reservations. In larger unconferences, organizers may move or reshuffle sessions based on demand, topics, and potential conflicts, but in smaller ones this is usually not necessary.
  • Participate Actively: You can attend any session that interests you, move between sessions, or even start an impromptu discussion. The goal is active engagement and collaborative learning. 
  • Experiential - Some unusual but relatable experience you’ve had that others might find interesting
    • Example: Someone gave a talk on their experience volunteering for political campaigns that was interesting
  • Skill-based - The rough outlines of how some skill works, great for encouraging participation or paired practice
    • Example: Someone gave a talk on improv comedy/improv and it was really cool
  • How Things Work -  Describe why something works the way it does. Often best if you can trace a bit of the history and describe how constraints drove particular design elements. 
  • What X doesn’t know about Y, and what Y should know about X
    • Best if it’s two people, one representing each camp, but that’s not required.
  • Unfurl the Intervention Banner - At the first LessOnline, a longtime rationalist gave a talk about his world model, then scheduled an intervention. For himself. Where random attendees used his model of the world to tell him what to do with his life, or at least his next year. It was a brilliant and productive idea.
  • Lecture with live betting/predictions
  • Debate a topic that can be operationalized into a prediction market, and encourage the audience to bet as arguments sway them. Either project the market visible to all, or for bonus difficulty, project the market behind the debaters so the audicenceaudience can see, but the participants can't.
  • There will be some bounded notion of Bayesian rationality that incorporates e.g. a theory of LogicalUncertainty which agents will appear from a human perspective to strictly obey. All departures from this bounded coherence that humans can understand using their own computing power will have been eliminated.
  • OptimizedAppearCoherent: It will not be possible for humans to specifically predict in advance any large coherence violation as e.g. the above intertemporal conjunction fallacy. Anything simple enough and computable cheaply enough for humans to predict in advance will also be computationally possible for the agent to eliminate in advance. Any predictable coherence violation which is significant enough to be humanly worth noticing, will also be damaging enough to be worth eliminating.

Note to LW Admins: Recommend categorizing under "Community"

 

Unlike traditional conferences with a fixed agenda, unconferences are organized by attendees on the spot. This has the advantage of attendees actively shaping the activities, discussions, and topics into the kind of event they want to have.

Here's how it generally works:

  • Propose Sessions: Anyone can propose a session on a topic they're passionate about or want to learn more about. Don't worry if you've never led a session before, or if your idea isn't fully formed—these are generally low-risk environments for sharing and exploring. 
    • Sometimes there is a scratchpad for topic ideas that are not yet scheduled, which can be helpful for polling interest in potential topics.
    • Focusing on interesting digressions from discussions earlier in the day often produces many of the better sessions at these events. This is the kind of iterative, responsive content that is neglected by conferences where everything is scheduled in advance .
  • Schedule Sessions: Some unconferences have participants vote on proposed topics, especially if breakout space is badly constrained. Most don't, devoting at least some rooms to open first-come-first-served reservations. In larger unconferences, organizers may move or reshuffle sessions based on demand, topics, and potential conflicts, but in smaller ones this is usually not necessary.
  • Participate Actively: You can attend any session that interests you, move between sessions, or even start an impromptu discussion. The goal is active engagement and collaborative learning. 

Some prompts for short talks:

  • Experiential - Some unusual but relatable experience you’ve had that others might find interesting
    • Example: Someone gave a talk on their experience volunteering for political campaigns that was interesting
  • Skill-based - The rough outlines of how some skill works, great for encouraging participation or paired practice
    • Example: Someone gave a talk on improv comedy/improv and it was really cool
  • How Things Work -  Describe why something works the way it does. Often best if you can trace a bit of the history and describe how constraints drove particular design elements. 
  • What X doesn’t know about Y, and what Y should know about X
    • Best if it’s two people, one representing each camp, but that’s not required.

Some more involved topics that might need an hour or more:

  • Unfurl the Intervention Banner - At the first LessOnline, a longtime rationalist gave a talk about his world model, then scheduled an intervention. For himself. Where random attendees used his model of the world to tell him what to do with his life, or at least his next year. It was a brilliant and productive idea.
  • Lecture with live betting/predictions
  • Debate a topic that can be operationalized into a prediction market, and encourage the audience to bet as arguments sway them. Either project the market visible to all, or for bonus difficulty, project the market behind the debaters so the audicence can see, but the participants can't.

Finally: Having any amount of preparation is better than not. Simply writing down an outline on an index card will dramatically improve most talks, compared to improvising everything. Don’t let this dissuade you from adding something day-of, particularly if you’re riffing off of an idea from earlier in the day. But try to take 15 minutes to sit in a quiet corner and sketch out what you want to say, how you want to structure it, and what kind of discussions you want to prompt.

Summary: Violations of coherence constraints in probability theory and decision theory correspond to qualitatively destructive or dominated behaviors. Coherence violations so easily computed as to be humanly predictable should be eliminated by optimization strong enough and general enough to reliably eliminate behaviors that are qualitatively dominated by cheaply computable alternatives. From our perspectiveperspective, this should produce agents such that, ceteris paribus, we do not think we can predict, in advance, any coherence violation in their behavior.

  • You prefer to be in San Francisco rather than Berkeley, and if you are in BerkeleyBerkeley, you will pay $50 for a taxi ride to San Francisco. (So far, no problem.)
  • You prefer San Jose to San FranciscoFrancisco, and if in San FranciscoFrancisco, you will pay $50 to go to San Jose. (Still no problem so far.)
  • You like Berkeley more than San JoseJose, and if in San Jose will pay $50 to go to Berkeley.

Again, we see a manifestation of a powerful family of theorems showing that agents whichthat cannot be seen as corresponding to any coherent probabilities and consistent utility function will exhibit qualitatively destructive behavior, like paying someone a cent to throw a switch and then paying them another cent to throw it back.

There is similarly a large literature on many classes of coherence arguments that yield classical probability theory, such as the Dutch Book theorems. There is no substantively different rival to probability theory and decision theory whichthat is competitive when it comes to (a) plausibly having some bounded analogue which could appear to describe the uncertainty of a powerful cognitive agent, and (b) seeming highly motivated by coherence constraints, that is, being forced by the absence of qualitatively harmful behaviors that correspond to coherence violations.

Even an incoherent collection of shifting drives and desires may well recognize, after having paid their two cents or $150, that they are wasting money, and try to do things differently (self-modify). An AI's programmers may recognize that, from their own perspective, they would rather not have their AI spendingspend money on circular taxi rides. This implies a path from incoherent non-advanced agents to coherent advanced agents as more and more optimization power is applied to them.

Without knowing in advance the exact specifics of the optimization pressures being applied, it seems that, in advance and ceteris paribus, we should expect that paying a cent to throw a switch and then paying again to switch it back, or throwing away $150 on circular taxi rides, are qualitatively destructive behaviors that optimization would tend to eliminate. E.g., one expects a consequentialist goal-seeking agent would prefer, or a policy reinforcement learner would be reinforced, or a fitness criterion would evaluate greater fitness, etcetera,etc., for eliminating the behavior that corresponds to incoherence, ceteris paribusparibus, and given the option of eliminating it at a reasonable computational cost.

The probability that an agent that is cognitively powerful enough to be relevant to existential outcomes,outcomes will have been subject to strong, general optimization pressures. Two (disjunctive) supporting arguments are that, one, pragmatically accessible paths to producing cognitively powerful agents tend to invoke strong and general optimization pressures, and two, that cognitively powerful agents would be expected to apply strong and general optimization pressures to themselves.

Ending up with a scenario along the lines of KnownAlgorithmNonrecursiveIntelligenceKANSI requires defeating both of the above conditions simultaneously. The second condition seems more difficult and seems to require more Corrigibility or CapabilityControl features than the first.

Perfect epistemic and instrumental coherence is too computationally expensive for bounded agents to achieve. ConsiderConsider, e.g., the conjunction rule of probability that P(P(A&B) <= P(A)∧B)≤P(A). If A is a theorem, and B is a lemma very helpful in proving A, then asking the agent for the probability of A alone may elicit a lower answer than asking the agent about the joint probability of A&B (since thinking of B as a lemma increases the subjective probability of A). This is not a full-blown form of conjunction fallacy since there is no particular time at which the agent explicitly assigns a lower probability to P(P(A&)=P((A∧B %% )∨(A&~B)∧¬B)) than to P(P(A&B)∧B). But even for an advanced agent, if a human was watching the series of probability assignments, the human might be able to say some equivalent of, "Aha, even though the agent was exposed to no new outside evidence, it assigned probability X to P(A)P(A) at time t,t, and then assigned probability Y>X to P(P(A&B)∧B) at time t+2.t+2.".

  • There will be some bounded notion of Bayesian rationality that incorporatesincorporates, e.g., a theory of LogicalUncertaintylogical uncertainty, which agents will appear from a human perspective to strictly obey. All departures from this bounded coherence that humans can understand using their own computing power will have been eliminated.
  • It will not be possible for humans to specifically predict in advance any large coherence violation asas, e.g., the above intertemporal conjunction fallacy. Anything simple enough and computable cheaply enough for humans to predict in advance will also be computationally possible for the agent to eliminate in advance. Any predictable coherence violationviolation, which is significant enough to be humanly worth noticing, will also be damaging enough to be worth eliminating.

Although the first notion of salvageable coherence above seems to us quite plausible, it has a large gap with respect to what this bounded analogue of rationality might be. Insofar as the thesis that optimized agents appearingappear coherent has practical implications, these implications should probably rest upon the second line of argument.

One possible loophole of the second line of argument might be some predictable class of incoherencesincoherences, which are not at all damaging to the agent and hence not worth spending even relatively tiny amounts of computing power to eliminate. If so, this would imply some possible humanly predictable incoherences of advanced agents, but these incoherences would not be exploitable to cause any final outcome that is less than maximally preferred by the agent, including scenarios where the agent spends resources it would not otherwise spend, etc.

Remark one: To advance-predict specific incoherence in an advanced agent, (a) we'd need to know what the superior alternative waswas, and (b) it would need to lead to the equivalent of going around in loops from San Francisco to San Jose to Berkeley.

Remark two: IfIf, on some development...

Read More (149 more words)

Sinclair's Razor explains that anputs the following explanation that's always on the table is: This person is pretending to not understand X, or has really convinced themselves that X isn't true, in order to not disturb their current position and its benefits.

Sometimes written "Loeb's Theorem" (because umlauts are tricky). This is a theorem about proofs of what is provable and how they interact with what is actually provable in ways that surprise some people.

This math result often comes up when attempting to formalize "an agent" or "a value system" as somehow related to "a set of axioms".

Often, when making such mental motions, one wants to take multi-agent interactions seriously, and make the game-theoretically provably endorsable actions "towards an axiom system" be somehow contingent on what that other axiom system might or might not be able to game-theoretically provably endorse.

You end up with proofs about proofs about proofs... and then, without care, the formal proof systems themselves might explode or might give agentically incoherent results on certain test cases.

Sometimes, in this research context, the phrase "loebstacle" or "Löbstacle" comes up. This was an area of major focus (and a common study guide pre-requisite) for MIRI from maybe 2011 to 2016?

It became much less important later after the invention/discovery of the Garrabrant Inductor.

As to the math of Löb's theorem itself...

The Unjournal is a nonprofit organisation that works to organize and fund public journal-independent feedback, rating, and evaluation of hosted papers and dynamically-presented research projects. Their initial focus is on quantitative work that informs global priorities, especially in economics, policy, and social science. They aim to encourage better research by making it easier for researchers to get feedback and credible ratings on their work.

The Unjournal was founded by David Reinstein and received a $565,000 grant from the Survival and Flourishing Fund in 2023.

Further reading:

An introduction to The Unjournal

The Unjournal's first evaluation

External links:

The Unjournal. Official Website.

Hosted output: https://Unjournal.pubpub.org

 

Summaries of LessWrong Posts from 2007

It is difficult to get a man to understand something, when his salary depends on his not understanding it.

-Upton Sinclair

Sinclair's Razor is a stock hypothesis template for explaining why someone appears to not understand something. Suppose there's a proposition X that has evidence and arguments for it, and those arguments ought to be on balance convincing for anyone who's thought about the question of X enough. But suppose that someone works in a field related to X, and has deep knowledge of all the stuff involved in the proposition X, but still doesn't believe X. What can explain this?

Sinclair's Razor explains that an explanation that's always on the table is: This person is pretending to not understand X, or has really convinced themselves that X isn't true, in order to not disturb their current position and its benefits.

No posts to display.
1
2Can we do useful meta-analysis? Unjournal evaluations of "Meaningfully reducing consumption of meat... is an unsolved problem..."
david reinstein
7d
0
1
40Manifest X DC Opening Benediction - Making Friends Along the Way
JohnofCharleston
4d
0
1
22Lighthaven-ish Ticket Strategy: Three Pillars of FOMO
JohnofCharleston
1d
0