The Open Agency Architecture ("OAA") is an AI alignment proposal by (among others) @davidad and @Eric Drexler. .. (read more)
Open Threads are informal discussion areas, where users are welcome to post comments that didn't quite feel big enough to warrant a top-level post, nor fit in other posts... (read more)
AI Risk Skepticism is the view that the potential risks posed by artificial intelligence (AI) are overstated or misunderstood, specifically regarding the direct, tangible dangers posed by the behavior of AI systems themselves. Skeptics of object-level AI risk argue that fears of highly autonomous, superintelligent AI leading to catastrophic outcomes are premature or unlikely.
Slowing Down AI refers to efforts and proposals aimed at reducing the pace of artificial intelligence advancement to allow more time for safety research and governance frameworks. These initiatives can include voluntary industry commitments, regulatory measures, or coordinated pauses in development of advanced AI systems.
Aligned AI Proposals are proposals aimed at ensuring artificial intelligence systems behave in accordance with human intentions (intent alignment) or human values (value alignment)... (read more)
Reinforcement Learning from Human Feedback (RLHF) is a machine learning technique where the model's training signal uses human evaluations of the model's outputs, rather than labeled data or a ground truth reward signal.
| User | Post Title | Wikitag | Pow | When | Vote |
"Goodfire is a research company using interpretability to understand, learn from, and design AI systems. Our mission is to build the next generation of safe and powerful AI—not by scaling alone, but by understanding the intelligence we're building."
Hey everyone! My name's Rishi. Hoping to explore more of the Rationalist community and float some of my ideas. Any initial reading recs? I'm mostly interested in the relation of rationalism to metaphysics.
The arguments about which entities to include or exclude seem to contradict each other, or don't really justify their positions. Examples:
The only argument that seems to me to have force is "avoid a slap-fight over who gets to rule the world". The argument for excluding particular (plausibly-)moral patients is that if you try to include them, you might be conquered by someone else who doesn't include them, and get a worse ultimate outcome.
Summaries of discussions, takeaways, etc. from LessWrong meetups that have already taken place.
ML4Good is a France-based field-building organisation that runs AI Safety bootcamps.
Inkhaven is a 30-day residency where one has to publish posts every day.day, as part of an effort to grow stronger as a writer. While this likely helps one in the longer term, the shorter-term effect ishas produced some excellent posts it also produces a fair bit of noise too, and also many more likely creation ofhastily-written or experimental posts with less effort to doublecheck the arguments and, as a result, with epistemic problems. than usual.
Inkhaven-like posts emerge when other people try to imitate this manner on a smaller scale (e.g. Lightcone team members doing their own 1-week writing stints)stints, or 'HalfHaven' where remote LessWrongers aim to post 30 posts over the course of two months).
See Omega (alien philosopher-troll) for a standard list of properties.
Interp on Deepseeks mHC architecture
"Goodfire is a research company using interpretability to understand, learn from, and design AI systems. Our mission is to build the next generation of safe and powerful AI—not by scaling alone, but by understanding the intelligence we're building."
For the purposes of Agent Foundations, Payor's lemma has been proposed as an alternative to Löb's theorem, due to both being simpler and possibly having a probabilistic generalization in a way that breaks for Löb's theorem.theorem. If it works out, this would provide a way for agents to do the probabilistic version of logical decision theory type stuff like cooperate in the Prisoner's dilemma when given each other's source code, this time with uncertainty.
When formalized, causal relationships are usually formalized as a directed acyclic graph from parent events to child events saying how to compute the probable child given the state of its parents.
Summaries of discussions, takeaways, etc. from LessWrong meetups that have already taken place.
Löb's theorem states that, given any statement P, if Peano Arithmetic (PA for short) proves that it can be 'trusted' if it proves P (that is, Prove(P) implies P), then it actually just proves P. This means that PA cannot tell you that it can be trusted about P, unless it also just tells you P. It also holds for theories that contain PA.
As a consequence, whenever we try to prove a statement P, we can go ahead and just assume that P is provable, and then see if we can show that that implies that P is true. This might sound really stupid and contradictory at first glance - the important thing is to be really clear about what is proving what. In the condition, PA is saying that if it proves P, then P is true. In 'our' view (that is, in a metatheory), we see that if PA says that, then PA will also say that P is true.
It became much less important later after the invention/discovery of the Garrabrant Inductor. There is also work on using the similar Payor's Lemma, which possibly allows for a probabilistic version in a way that break's for Löb's theorem.
In formal notation, let Prv stand for the standard provability predicate of PA. Then, Prv(T) is true if and only if there is a proof from the axioms and rules of inference of PA of T. Then what we would like PA to say is that Prv(S)⟹S for every sentence S.
But alas, PA suffers from a problem of self-trust.
Löb's theorem states that if PA⊢Prv(S)⟹S then PA⊢S. This immediately implies that if PA is consistent, the sentences PA⊢Prv(S)⟹S are not provable when S is false, even though according to our intuitive understanding of the standard model every sentence of this form must be true.
Thus, PA is incomplete, and fails to prove a particular set of sentences that would increase massively our confidence in it.
Notice that Gödel's second incompleteness theorem follows immediately from Löb's theorem, as if PA is consistent, then by Löb's PA⊬Prv(0=1)⟹0=1, which by the propositional calculus implies PA⊬¬Prv(0=1).
Our default editor ("LessWrong Docs") is a customized version of the CKEditor library. It offers a user-friendly "What You See Is What You Get" (WYSIWYG) interface and is generally the most intuitive way to format posts. It offers support for image uploading, code blocks, LaTeX, tables, footnotes, and many other options. You can also directly copy-paste from Google Docs and preserve most formatting (except links to internal headers — see below), including header font sizing, hyperlinks, images — and, most recentlyrecently, footnotes, although you'll need to use a workaround!










Is this good for intuition? It doesn't really seem like the correct explanation as for why we measure randomness.
Eg. a simple spin 1/2 particle (nowadays often the way people are introduced to QM) has 2D hilbert space, and still has identical randomness.