| User | Post Title | Wikitag | Pow | When | Vote |
Note to LW Admins: I was surprised this tag/explanation didn't exist, so I adapted from our Manifest X DC Attendee Guide. Recommend categorizing this wikitag under "Community".
Unlike traditional conferences with a fixed agenda, unconferences are organized by attendees on the spot. This has the advantage of attendees actively shaping the activities, discussions, and topics into the kind of event they want to have....
The Unjournal is a nonprofit organisation that works to organize and fund public journal-independent feedback, rating, and evaluation of hosted papers and dynamically-presented research projects. Their initial focus is on quantitative work that informs global priorities, especially in economics, policy, and social science. They aim to encourage better research by making it easier for researchers to get feedback and credible ratings on their work.
The Unjournal was founded by David Reinstein and received a $565,000 grant from the Survival and Flourishing Fund in 2023.
Thanks for putting this together - the first article is especially useful
An example of a scenario that negates RelevantPowerfulAgentsHighlyOptimizedthe thesis that relevant powerful agents will be highly optimized is KnownAlgorithmNonrecursiveIntelligenceKANSI, where a cognitively powerful intelligence is produced by pouring lots of computing power into known algorithms, and this intelligence is then somehow prohibited from self-modification and the creation of environmental subagents.
This section serves to answer the question: How much and when should we judge and blame ourselves, and how much and when should we take care forof ourselves and accept ourselves?
This serves to answer the question: How much and when should we judge and blame ourselves, and how much and when should we care for and accept ourselves?
on LessWrong starting with The Sequences / Rationality A-Z
Planecrash / Project Lawful (long fictional story)
on Arbital (AI Alignment)
and on Facebook (mostly retweeting)
on Mediumyudkowsky.net (2 posts)
on Tumblr (fiction / on writing)
on Medium (2 posts)
on yudkowsky.net (personal/fiction)
on Reddit (fiction / on writing)
Note to LW Admins: I was surprised this tag/explanation didn't exist, so I adapted from our Manifest X DC Attendee Guide. Recommend categorizing this wikitag under "Community".
Note to LW Admins: Recommend categorizing under "Community"
Unlike traditional conferences with a fixed agenda, unconferences are organized by attendees on the spot. This has the advantage of attendees actively shaping the activities, discussions, and topics into the kind of event they want to have.
Here's how it generally works:
Some prompts for short talks:
Some more involved topics that might need an hour or more:
Finally: Having any amount of preparation is better than not. Simply writing down an outline on an index card will dramatically improve most talks, compared to improvising everything. Don’t let this dissuade you from adding something day-of, particularly if you’re riffing off of an idea from earlier in the day. But try to take 15 minutes to sit in a quiet corner and sketch out what you want to say, how you want to structure it, and what kind of discussions you want to prompt.
Summary: Violations of coherence constraints in probability theory and decision theory correspond to qualitatively destructive or dominated behaviors. Coherence violations so easily computed as to be humanly predictable should be eliminated by optimization strong enough and general enough to reliably eliminate behaviors that are qualitatively dominated by cheaply computable alternatives. From our perspectiveperspective, this should produce agents such that, ceteris paribus, we do not think we can predict, in advance, any coherence violation in their behavior.
Again, we see a manifestation of a powerful family of theorems showing that agents whichthat cannot be seen as corresponding to any coherent probabilities and consistent utility function will exhibit qualitatively destructive behavior, like paying someone a cent to throw a switch and then paying them another cent to throw it back.
There is similarly a large literature on many classes of coherence arguments that yield classical probability theory, such as the Dutch Book theorems. There is no substantively different rival to probability theory and decision theory whichthat is competitive when it comes to (a) plausibly having some bounded analogue which could appear to describe the uncertainty of a powerful cognitive agent, and (b) seeming highly motivated by coherence constraints, that is, being forced by the absence of qualitatively harmful behaviors that correspond to coherence violations.
Even an incoherent collection of shifting drives and desires may well recognize, after having paid their two cents or $150, that they are wasting money, and try to do things differently (self-modify). An AI's programmers may recognize that, from their own perspective, they would rather not have their AI spendingspend money on circular taxi rides. This implies a path from incoherent non-advanced agents to coherent advanced agents as more and more optimization power is applied to them.
Without knowing in advance the exact specifics of the optimization pressures being applied, it seems that, in advance and ceteris paribus, we should expect that paying a cent to throw a switch and then paying again to switch it back, or throwing away $150 on circular taxi rides, are qualitatively destructive behaviors that optimization would tend to eliminate. E.g., one expects a consequentialist goal-seeking agent would prefer, or a policy reinforcement learner would be reinforced, or a fitness criterion would evaluate greater fitness, etcetera,etc., for eliminating the behavior that corresponds to incoherence, ceteris paribusparibus, and given the option of eliminating it at a reasonable computational cost.
The probability that an agent that is cognitively powerful enough to be relevant to existential outcomes,outcomes will have been subject to strong, general optimization pressures. Two (disjunctive) supporting arguments are that, one, pragmatically accessible paths to producing cognitively powerful agents tend to invoke strong and general optimization pressures, and two, that cognitively powerful agents would be expected to apply strong and general optimization pressures to themselves.
Ending up with a scenario along the lines of KnownAlgorithmNonrecursiveIntelligenceKANSI requires defeating both of the above conditions simultaneously. The second condition seems more difficult and seems to require more Corrigibility or CapabilityControl features than the first.
Perfect epistemic and instrumental coherence is too computationally expensive for bounded agents to achieve. ConsiderConsider, e.g., the conjunction rule of probability that P(P(A&B) <= P(A)∧B)≤P(A). If A is a theorem, and B is a lemma very helpful in proving A, then asking the agent for the probability of A alone may elicit a lower answer than asking the agent about the joint probability of A&B (since thinking of B as a lemma increases the subjective probability of A). This is not a full-blown form of conjunction fallacy since there is no particular time at which the agent explicitly assigns a lower probability to P(P(A&)=P((A∧B %% )∨(A&~B)∧¬B)) than to P(P(A&B)∧B). But even for an advanced agent, if a human was watching the series of probability assignments, the human might be able to say some equivalent of, "Aha, even though the agent was exposed to no new outside evidence, it assigned probability X to P(A)P(A) at time t,t, and then assigned probability Y>X to P(P(A&B)∧B) at time t+2.t+2.".
Although the first notion of salvageable coherence above seems to us quite plausible, it has a large gap with respect to what this bounded analogue of rationality might be. Insofar as the thesis that optimized agents appearingappear coherent has practical implications, these implications should probably rest upon the second line of argument.
One possible loophole of the second line of argument might be some predictable class of incoherencesincoherences, which are not at all damaging to the agent and hence not worth spending even relatively tiny amounts of computing power to eliminate. If so, this would imply some possible humanly predictable incoherences of advanced agents, but these incoherences would not be exploitable to cause any final outcome that is less than maximally preferred by the agent, including scenarios where the agent spends resources it would not otherwise spend, etc.
Remark one: To advance-predict specific incoherence in an advanced agent, (a) we'd need to know what the superior alternative waswas, and (b) it would need to lead to the equivalent of going around in loops from San Francisco to San Jose to Berkeley.
Remark two: IfIf, on some development...
Sinclair's Razor explains that anputs the following explanation that's always on the table is: This person is pretending to not understand X, or has really convinced themselves that X isn't true, in order to not disturb their current position and its benefits.
Sometimes written "Loeb's Theorem" (because umlauts are tricky). This is a theorem about proofs of what is provable and how they interact with what is actually provable in ways that surprise some people.
This math result often comes up when attempting to formalize "an agent" or "a value system" as somehow related to "a set of axioms".
Often, when making such mental motions, one wants to take multi-agent interactions seriously, and make the game-theoretically provably endorsable actions "towards an axiom system" be somehow contingent on what that other axiom system might or might not be able to game-theoretically provably endorse.
You end up with proofs about proofs about proofs... and then, without care, the formal proof systems themselves might explode or might give agentically incoherent results on certain test cases.
Sometimes, in this research context, the phrase "loebstacle" or "Löbstacle" comes up. This was an area of major focus (and a common study guide pre-requisite) for MIRI from maybe 2011 to 2016?
It became much less important later after the invention/discovery of the Garrabrant Inductor.
As to the math of Löb's theorem itself...
The Unjournal is a nonprofit organisation that works to organize and fund public journal-independent feedback, rating, and evaluation of hosted papers and dynamically-presented research projects. Their initial focus is on quantitative work that informs global priorities, especially in economics, policy, and social science. They aim to encourage better research by making it easier for researchers to get feedback and credible ratings on their work.
The Unjournal was founded by David Reinstein and received a $565,000 grant from the Survival and Flourishing Fund in 2023.
An introduction to The Unjournal
The Unjournal's first evaluation
The Unjournal. Official Website.
Hosted output: https://Unjournal.pubpub.org
It is difficult to get a man to understand something, when his salary depends on his not understanding it.
-Upton Sinclair
Sinclair's Razor is a stock hypothesis template for explaining why someone appears to not understand something. Suppose there's a proposition X that has evidence and arguments for it, and those arguments ought to be on balance convincing for anyone who's thought about the question of X enough. But suppose that someone works in a field related to X, and has deep knowledge of all the stuff involved in the proposition X, but still doesn't believe X. What can explain this?
Sinclair's Razor explains that an explanation that's always on the table is: This person is pretending to not understand X, or has really convinced themselves that X isn't true, in order to not disturb their current position and its benefits.
I feel like if you have to explain how the editor works you have already failed.