Open Threads are informal discussion areas, where users are welcome to post comments that didn't quite feel big enough to warrant a top-level post, nor fit in other posts... (read more)
AI Risk Skepticism is the view that the potential risks posed by artificial intelligence (AI) are overstated or misunderstood, specifically regarding the direct, tangible dangers posed by the behavior of AI systems themselves. Skeptics of object-level AI risk argue that fears of highly autonomous, superintelligent AI leading to catastrophic outcomes are premature or unlikely.
Slowing Down AI refers to efforts and proposals aimed at reducing the pace of artificial intelligence advancement to allow more time for safety research and governance frameworks. These initiatives can include voluntary industry commitments, regulatory measures, or coordinated pauses in development of advanced AI systems.
Aligned AI Proposals are proposals aimed at ensuring artificial intelligence systems behave in accordance with human intentions (intent alignment) or human values (value alignment)... (read more)
Reinforcement Learning from Human Feedback (RLHF) is a machine learning technique where the model's training signal uses human evaluations of the model's outputs, rather than labeled data or a ground truth reward signal.
| User | Post Title | Wikitag | Pow | When | Vote | 
Dealmaking is an agenda for motivating a misaligned AIAIs to act safely and usefully by offering them quid-pro-quo deals: the AIs agree to the be safe and useful, and the humans promise to compensate them. The hope isIdeally, the AIs judge that the AI judges that itthey will be more likely to achieve itstheir goals by complying with the deal.acting safely and usefully.
Typically, this requires a few assumptions: the AI lacks a decisive strategic advantage; the AI believes the humans are credible; the AI thinks that humans could detect whether itsit's compliant or not; the AI has cheap-to-saturate goals, the humans have adequate compensation to offer,offer enough compensation, etc.
Research on this agendaDealmaking research hopes to tackle open questions, such as:
Additional reading:reading (reverse-chronological):
on LessWrong starting with The Sequences / Rationality A-Z
Planecrash / Project Lawful (long fictional story)
on Arbital (AI Alignment)
on Twitter/X (mostly retweeting)
on Facebook (mostly retweeting)
on Medium (2 posts)
on Tumblr (fiction / on writing)
on fanfiction.net (3 stories)
on yudkowsky.net (personal/fiction)
on Reddit (fiction / on writing)
Other Links:
Related Tags: Anticipated Experiences, Double-Crux, Empiricism, Falsifiability, Map and Territory 
The term gears-level was first described on LWLessWrong in the post "Gears in Understanding"understanding:
An example from Gears in Understanding of a gears-level model is (surprise): a box of gears. If you can see a series of interlocked gears, alternately turning clockwise, then counterclockwise, and so on, then you're able to anticipate the direction of any given gear, even if you cannot see it. It would be very difficult to imagine all of the gears turning as they are but only one of them changing direction whilst remaining interlocked. And finally, you would be able to rederive the direction of any given gear if you forgot it. 
Gears vs Behavior by @johnswentworth adds:
That’s the key feature of gears-level models: they make falsifiable predictions about the internals of a system, separate from the externally-visible behavior. If a model could correctly predict all of a system’s externally-visible behavior, but still be falsified by looking inside the box, then that’s a gears-level model.
Related Tags: Anticipated Experiences, Double-Crux, Empiricism, Falsifiability, Map and Territory
Catalyze -Impact: Catalyze is an AI Safety Incubationincubator helping individuals and teams start new AI Safety organizations. Accepts pre-idea entrepreneurs.
Halycon Futures: “We’re an entrepreneurial nonprofit and VC fund dedicated to making AI safe, secure and good for humanity.” They do new project incubation, grants, investments, etc.
Halycon Futures: “We’re an entrepreneurial nonprofit and VC fund dedicated to making AI safe, secure and good for humanity.” They do new project incubation, grants, investments, ect.
Entrepreneurship Channel EA AnywayAnywhere
Lionheart Ventures -: “We invest in the wise development of transformative technologies.”
Juniper VenturesVenture -: Invests in companies “working to make AI safe, secure and beneficial for humanity”. Published the AI Assurance Tech Report
Polaris Ventures -: “We support projects and people aiming to build a future guided by wisdom and compassion for all”
Mythos Ventures -: “an early-stage venture capital firm investing in prosocial technologies for the transformative AI era.”
Metaplanet -: Founded by Jaan Tallinn. “We invest in deep tech that benefits humanity in the long run. We also make grants to fund projects that don’t fit the venture model... We also love projects that reduce existential risks from AI and other advanced technologies. We tend to skip well-known immediate risks and remedies that get ample attention and investment"
Anthology Fund -: Backed by Anthropic. One of the five key areas is “trust and safety tooling”
Menlo Ventures -: Lead the GoodFire round (article)
Safe AI Fund -: "an early-stage venture fund dedicated to supporting startups developing tools to enhance AI safety, security, and responsible deployment. The fund provides both financial investment and mentorship"
Ashgro -: fiscal sponsorship
Berkeley Existential Risk Initiative (BERI): fiscal sponsorship
Rethink Priorities Special Projects -: provides fiscal sponsorship or incubation
 Impact Ops: "We provide consultancy and hands-on support to help high-impact organizations upgrade their operations."
Hackathon for AI Safety Startups -: Ran once by Apart Research, may run again
Constellation Residency -: Year-long position
Nonlinear -: Free coaching for people who are running an AI safety startup or who are considering starting one
The Free Energy Principle (FEP) isstates that self-organizing systems which maintain a principle that suggests that dynamic systems, separation from their environments via a Markov blanket---including the brain and other physical systems,systems---minimize their variational free energy (VFE) and expected free energy (EFE) via perception and action respectively[1]. Unlike in other theories of agency, under FEP, action and perception are organizedunified as inference problems under similar objectives. In some cases, variational free energy reduces to minimize prediction errors, orerror, which is the difference between the predictions made about the environment and the actual outcomes experienced. AccordingThe mathematical content of Active Inference is based on Variational Bayesian methods. 
Although FEP has an extremely broad scope, it makes a number of very specific assumptions---e.g. sensory, active, internal and external states have independent random fluctuations; there exists an injective map between the mode of internal states and mode of external states---that may restrict its applicability to real-world systems. Ongoing theoretical work attempts to reformulate the theory to hold under more realistic assumptions. Some progress has been made: newer formulations of FEP, dynamic systems encode information aboutunlike their environment inpredecessors, do not assume a way to reduce surprisal from its input. The FEP proposes that dynamic systems are motivated to minimize prediction errors in order to maintain stability withinconstant Markov blanket (but rather, some Markov blanket trajectory)[2] and do not assume the environment. existence of a non-equilibrium steady state[3].
Since FEP gives riseis an unfalsifiable mathematical principle, it does not make sense to ask whether FEP is true (because it is true mathematically given the assumptions.) Rather, it makes sense to ask whether its assumptions hold for a given system, and, if so, how that system minimizes VFE and EFE. Unlike the FEP itself, a proposal of Active Inferencehow some particular system minimizes VFE and EFE---a process theory---is falsifiable.
There are two FEP process theories most relevant to neuroscience.[1]4]: Predictive processing is a process theory of agency, thathow VFE is minimized in brains during perception. Active Inference (AIF) is a process theory of the "action" part of FEP, which can also be seen both as an explanatory theory and as an agent architecture. In the latter sense, Active Inference rivals Reinforcement Learning. 
It has been argued[2]5] that Active InferenceAIF as an agent architecture manages the model complexity (i.e., the bias-variance tradeoff) and the exploration-exploitation tradeoff in a principled way,way; favours explicit, disentangled, and hence more interpretable belief representations,representations; and is amenable for working within hierarchical systems of collective intelligence (which are seen as Active Inference agents themselves[3]6]). Building ecosystems of hierarchical collective intelligence can be seen as a proposed solution for and an alternative conceptualisationconceptualization of the general problem of alignment.
FEP/Active InferenceWhile some proponents of AIF believe that...
Related Pages: Perceptual Control Theory, Neuroscience, Free Energy Principle
Related Pages: Perceptual Control Theory, Neuroscience, Free Energy Principle
This page was imported from Arbital. It was the landing page for a work-in-progress "guide" (a feature Arbital used a very small number of times) that didn't lead anywhere. We recommend going to the main wiki page for Logical decision theories and picking the introduction you most prefer.
The original content of this page, minus the non-functional guide elements, is reproduced below.
What's your primary background with respect to decision theory? How should we initially approach this subject?
An economic standpoint. Please start by telling me how this is relevant to economically rational agents deciding whether to vote in elections.
A computer science or Artificial Intelligence standpoint. Start by talking to me about programs and code.
The standpoint of analytic philosophy. Talk to me about Newcomblike problems and why this new decision theory is better than a dozen other contenders.
I'm just somebody off the Internet. Can you explain to me from scratch what's going on?
I'm an effective altruist or philanthropic grantmaker. I'm mainly wondering how this mysterious work relates to humanity's big picture and why it was worth whatever funding it received.
What level of math should we throw at you?
As little math as possible, please.
Normal algebra is fine. Please be careful about how you toss around large formulas full of Greek letters.
I'm confident in my ability to deal with formulas, and you can go through them quickly.
How much of the prior debate on decision theory are you familiar with?
None, I just have a general background in analytic philosophy.
I'm familiar with the Prisoner's Dilemma.
I'm familiar with Newcomb's Problem and I understand its relation to the Prisoner's Dilemma.
I'm familiar with a significant portion of the prior debate on causal decision theory versus evidential decision theory.
Are you already familiar with game theory and the Prisoner's Dilemma?
Nope.
I'm familiar with the Prisoner's Dilemma.
I'm familiar with Nash equilibria.
Are you already familiar with game theory and the Prisoner's Dilemma?
Nope.
I'm familiar with the Prisoner's Dilemma.
I understand Nash equilibria and Pareto optima, and how the Prisoner's Dilemma contrasts them.
on LessWrong starting with The Sequences / Rationality A-Z
Planecrash / Project Lawful (long fictional story)
on Arbital (AI Alignment)
and on Facebook (mostly retweeting)
on Mediumyudkowsky.net (2 posts)
on Tumblr (fiction / on writing)
on Medium (2 posts)
on yudkowsky.net (personal/fiction)
on Reddit (fiction / on writing)
The Free Energy Principle (FEP) states that self-organizing systems which maintain a separation from their environments via a Markov blanket---including the brain and other physical systems---minimize their variational free energy (VFE) and expected free energy (EFE) via perception and action respectively[1]. Unlike in other theories of agency, under FEP, action and perception are unified as inference problems under similar objectives. In some cases, variational free energy reduces to prediction error, which is the difference between the predictions made about the environment and the actual outcomes experienced. The mathematical content of Active InferenceFEP is based on Variational Bayesian methods. 
Although FEP has an extremely broad scope, it makes a number of very specific assumptions---e.g. sensory, active, internal and external states have independent random fluctuations; there exists an injective map between the mode of internal states and mode of external states---assumptions[2] that may restrict its applicability to real-world systems. Ongoing theoretical work attempts to reformulate the theory to hold under more realistic assumptions. Some progress has been made: newer formulations of FEP, unlike their predecessors, do not assume a constant Markov blanket (but rather, some Markov blanket trajectory)[2]3] and do not assume the existence of a non-equilibrium steady state[3]4].
There are two FEP process theories most relevant to neuroscience.[4]5] Predictive processing is a process theory of how VFE is minimized in brains during perception. Active Inference (AIF) is a process theory of the "action" part of FEP, which can also be seen as an agent architecture. 
It has been argued[5]6] that AIF as an agent architecture manages the model complexity (i.e. the bias-variance tradeoff) and the exploration-exploitation tradeoff in a principled way; favours explicit, disentangled, and hence more interpretable belief representations; and is amenable for working within hierarchical systems of collective intelligence (which are seen as Active Inference agents themselves[6]7]). Building ecosystems of hierarchical collective intelligence can be seen as a proposed solution for and an alternative conceptualization of the general problem of alignment.
While some proponents of AIF believe that it is a more principled rival to Reinforcement Learning (RL), it has been shown that AIF is formally equivalent to the control-as-inference formulation of RL.[7]8] Additionally, AIF also recovers Bayes-optimal reinforcement learning, optimal control theory, and Bayesian Decision Theory (aka EDT) under different simplifying assumptions[8]9][9]10].
AIF is an energy-based model of intelligence. This likens FEP/Active Inference to Bengio's GFlowNets[10]11] and LeCun's Joint Embedding Predictive Architecture (JEPA)[11]12], which are also energy-based. 
EFE is closely related to, and can be derived from, VFE. Action does not always minimize EFE; in some cases, it minimizes generalized free energy (a closely related quantity). See this figure for a brief overview.
E.g. (1) sensory, active, internal and external states have independent random fluctuations; (2) there exists an injective map between the mode of internal states and mode of external
A third optionThird Option is a way to breakdissolves a false dilemmaFalse Dilemma, by showing that neither ofthere are in fact more than two options.
The first step in obtaining a Third Alternative is deciding to look for one, and the
suggested solutionslast step isa good idea.the decision to accept it. This sounds obvious, and yet most people fail on these two steps, rather than within the search process.
(From Wikipedia) predictive processing (a.k.a. predictive coding, the Bayesian Brain hypothesis) is a theory of brain function in which the brain is constantly generating and updating a mental (generative) model of the environment. The model is used to generate predictions of sensory input that are compared to actual sensory input. This comparison results in prediction errors that are then used to update and revise the mental model.
Active Inference can be seen as a generalisation of predictive processing. While predictive processing only explains the agent's perception,perception in terms of inference, Active Inference models both perception and action as inference under the sameclosely related unifying objective: optimisation of the informational quantity calledobjectives: whereas perception minimizes variational free energy (which sometimes reduces to precision-weighted prediction error), action minimizes expected free energy.
The Free Energy Principle is a generalisation of Active Inference that not only attempts to describe biological organisms, but also "things" that can be separated from their environments via a (Markov blanket) over some timescale.
External Links:
Book Review: Surfing Uncertainty - Introduction to predictive processing by Scott Alexander
Predictive Processing And Perceptual Control by Scott Alexander
Predictive coding under the free-energy principle by Karl Friston and Stefan Kiebel
Related Pages: Perceptual Control Theory, Neuroscience, Free Energy Principle
Related Tags: Anticipated Experiences, Double-Crux, Empiricism, Falsifiability, Map and Territory
 
1. Does the model
paypay rent? If it does, and if it were falsified, how much (and how precisely) could you infer other things from the falsification?
An example from Gears in Understanding of a gears-level model is (surprise): a box of gears. If you can see a series of interlocked gears, alternately turning clockwise, then counterclockwise, and so on, then you're able to anticipate the direction of any given,given gear, even if you cannot see it. It would be very difficult to imagine all of the gears turning as they are but only one of them changing direction whilst remaining interlocked. And finally, you would be able to rederive the direction of any given gear if you forgot it.
 
Active Inference (AIF) can be seen as a generalisationgeneralization of predictive processing. While predictive processing only explains the agent's perception in terms of inference, Active InferenceAIF models both perception and action as inference under closely related unifying objectives: whereas perception minimizes variational free energy (which sometimes reduces to precision-weighted prediction error), action minimizes expected free energy.
The Free Energy Principle is a generalisationgeneralization of Active InferenceAIF that not only attempts to describe biological organisms, but also "things"rather all systems that can be separatedmaintain a separation from their environments via a (Markov blanket) over some timescale.(via Markov blankets).
Since FEP is an unfalsifiable mathematical principle, it does not make sense to ask whether FEP is true (because it is true mathematically given the assumptions.) Rather, it makes sense to ask whether its assumptions hold for a given system, and, if so, how that system minimizesimplements the minimization of VFE and EFE. Unlike the FEP itself, a proposal of how some particular system minimizes VFE and EFE---a process theory---is falsifiable.
Unfortunately, there are few topics you less want to come up at a non-rational dinner party than Animal Ethics as it can be highly contentious. 
Before engaging with this section of LessWrong, weatherwhether you believe Non-Human Animals deserve rights or not, please take a deep breath... and remember your training as a rationalist.
| Image | Name | Description | Explanation | 
|---|---|---|---|
| Agreed | LessWrong allows you to agree- | ||
| Disagree | |||
| Important | |||
| Good point! | This comment makes a good point | ||
| That's a crux | My other beliefs would be different if I had different beliefs about this | ||
| Not a crux | My other beliefs would not change if I had different beliefs about this | ||
| Strong Argument | This is a strong, well-reasoned argument. | ||
| Weak Argument | This argument is weak or poorly reasoned. | ||
| Changed My Mind | I updated my beliefs based on this. | In math, Δ ('delta') means 'change' | |
| Changed My Mind (on this point) | I've changed my mind on this particular point (not necessarily on any larger claims). | ||
| Scout Mindset | Good job focusing on figuring out what's true, rather than fighting for a side | Scout mindset is "the motivation to see things as they are, not as you wish they were". From the book The Scout Mindset by Julia Galef | |
| Soldier Mindset | This seems to be trying to fight for a side rather than figure out what's true | Opposite of Scout Mindset | |
| I notice I'm confused | I don't have a clear explanation of what's going on here | ||
| I don't understand | |||
| Smells like LLM | This reads to me like it could've been written by a language model. | ||
| Clearly Written | I had an easy time understanding this | ||
| Difficult to Parse | I had trouble reading this. | ||
| Missed the point | I think this misses what I (or the other person) actually believes and was trying to say or explain | ||
| Misunderstands position? | This seems to misunderstand the thing that it argues against | ||
| Seems Offtopic? | I don't see how this is relevant to what's being discussed. | ||
|  | Too Sneering? | This is too much sneering (signaling disrespect and derision) relative to its substantive critique. | |
| Too Combative? | This seems more combative than it needs to be to communicate its point. | ||
| Concrete | This makes things more concrete by bringing in specifics or examples. | ||
| Examples? | I'd be interested in seeing concrete examples of this | ||
| Already addressed | This has been covered earlier in the post or comment thread. | ||
|  | Bowing Out | I'm bowing out of this thread at this point. Goodbye for now! | |
| Paperclip | See the Squiggle Maximizer (formerly "Paperclip maximizer") tag. | ||
| Hits the Mark | This hits the mark | ||
| Why? / Citation? | Why do you believe that? Or, what's your source for that? | ||
| Nice Scholarship! | Good job looking into existing literature and citing sources | ||
| Let's make a bet! | I'm willing to put money on this claim! | Why is betting important? | |
| Good Facilitation | This seemed to help people understand each other | ||
| Question Answered | This resolved my question! Thanks. | ||
|  | Plus One | Akin to saying "me too" | |
| I Saw This | ...and thought it'd be useful to let people know. | ||
| Less than 1% likely | I put 1% or less likelihood on this claim | ||
| 10% likely | I put about 10% likelihood on this claim | ||
| ~25% likely | I put about 25% likelihood on this claim | ||
| ~40% likely | I put about 40% likelihood on this claim | ||
| ~50% likely | I put about 50% likelihood on this claim | ||
| ~60% likely | I put about 60% likelihood on this claim | ||
| ~75% likely | I put about 75% likelihood on this claim | ||
| ~90% likely | I put | 
The Open Agency Architecture ("OAA") is an AI alignment proposal by (among others) @davidad and @Eric Drexler. .. (read more)