Singluar learning theory is a theory that applies algebraic geometry to statistical learning theory, developed by Sumio Watanabe. Reference textbooks are "the grey book", Algebraic Geometry and Statistical Learning Theory, and "the green book", Mathematical Theory of Bayesian Statistics.
Archetypal Transfer Learning (ATL) is a proposal by @whitehatStoic for what is argued by the author to be a fine tuning approach that "uses archetypal data" to "embed Synthetic Archetypes". These Synthetic Archetypes are derived from patterns that models assimilate from archetypal data, such as artificial stories. The method yielded a shutdown activation rate of 57.33% in the GPT-2-XL model after fine-tuning. .. (read more)
Open Threads are informal discussion areas, where users are welcome to post comments that didn't quite feel big enough to warrant a top-level post, nor fit in other posts... (read more)
A Black Marble is a technology that by default destroys the civilization that invents it. It's one type of Existential Risk. AGI may be such an invention, but isn't the only one... (read more)
AI Evaluations focus on experimentally assessing the capabilities, safety, and alignment of advanced AI systems. These evaluations can be divided into two main categories: behavioral and understanding-based... (read more)
AI Risk Skepticism is the view that the potential risks posed by artificial intelligence (AI) are overstated or misunderstood, specifically regarding the direct, tangible dangers posed by the behavior of AI systems themselves. Skeptics of object-level AI risk argue that fears of highly autonomous, superintelligent AI leading to catastrophic outcomes are premature or unlikely.
Slowing Down AI refers to efforts and proposals aimed at reducing the pace of artificial intelligence advancement to allow more time for safety research and governance frameworks. These initiatives can include voluntary industry commitments, regulatory measures, or coordinated pauses in development of advanced AI systems.
Aligned AI Proposals are proposals aimed at ensuring artificial intelligence systems behave in accordance with human intentions (intent alignment) or human values (value alignment)... (read more)
| User | Post Title | Wikitag | Pow | When | Vote |
Updateful decision theories change the probability distribution used to evaluate actions over time. Updateless Decision Theory (UDT) does not, instead always maximizing a priori expected utility. This works well in well-defined decision problems with high probability in the prior, but in some senses, does not learn. Open-Minded Updatelessness seeks to combine the advantages of updatelessness and updatefulness by allowing some changes in probabilities, without fully updating on evidence.
Christopher Alexander (1936-2022) was an architect who studied the way nature and traditionally built buildings (such as peasant huts, or cathedrals) are a particular kind of beautiful, and have (he argued) the ability to bring a person back into a sense of perspective (e.g., a person may be quite stressed out about some detail, and then go for a long walk in nature, and find themselves "coming back to themselves.") Alexander attempted to work out a theory of design (for buildings, but also for design work broadly) that would create houses and other built objects with this same sort of beauty and sense of perspective embedded in them. His work inspired the "design patterns" movement in computer science, and, indirectly, wikis.
Once upon a time, LessWrong was a place where you'd be told to Read The Sequences before you'd finished your second comment apologizing for wasting time with non-constructive praise in the first.
Although the culture of rationality has changed greatly since the olden days, and our teachings are dispersed in many offspring movements, some remember good old-fashioned rationality with the same nostalgia MIRI computer scientists long for the days when AI was a science.
Suffering-focused ethics (SFE) is a family of moral views that give priority to reducing suffering, especially intense suffering. Rather than treating happiness and suffering as fully symmetric, suffering-focused views hold that preventing severe suffering often matters more urgently than creating additional happiness.
Once upon a time, LessWrong was a place where you'd be told to Read The Sequences before you'd finished your second comment apologizing for wasting time with non-constructive praise in the first.
Although the culture of rationality has changed greatly since the olden days, and our teachings are dispersed in many offspring movements, some remember good old-fashioned rationality with the same nostalgia MIRI computer scientists long for the days when AI was a science.
See also: Evolutionary Psychology, Goodhart's Law, Wireheading
A candy bar is a superstimulus: it contains more concentrated sugar, salt, and fat than anything that exists in the ancestral environment.
A candy bar matches taste buds that evolved in a hunter-gatherer environment, but it matches those taste buds much more strongly than anything that actually existed in the hunter-gatherer environment. The signal that once reliably correlated to healthy food has been hijacked, blotted out with a point in tastespace that wasn't in the training dataset - an impossibly distant outlier on the old ancestral graphs.
So your own, far more ordered and orderly experience in this very moment weighs heavily against the hypothesis "I am a Boltzmann brain". InderUnder most systems of anthropic reasoning this weighs heavily against the possibility "a supervast majority of all moments of consciousness are vastly less orderly than my own".
Christopher Alexander (1936-2022) was an architect who studied the way nature and traditionally built buildings (such as peasant huts, or cathedrals) are a particular kind of beautiful, and have (he argued) the ability to bring a person back into a sense of perspective (e.g., a person may be quite stressed out about some detail, and then go for a long walk in nature, and find themselves "coming back to themselves.") Alexander attempted to work out a theory of design (for buildings, but also for design work broadly) that would create houses and other built objects with this same sort of beauty and sense of perspective embedded in them. His work inspired the "design patterns" movement in computer science, and, indirectly, wikis.
A Newcomblike dilemma that pries apart the recommendations of EDT and CDT/FDT. Also known as XOR Blackmail (albeit I (EY) would object to this because it isn't what's normally understood as "blackmail").
> Exactly one of the following statements is true: You will pay me $1000, or,XOR your building already has a terrible termite infestation (I didn't put it there).
A Newcomblike dilemma that pries apart the recommendations of EDT and CDT/FDT.
Suppose that an excellent Predictor, greedy for cash, known to be absolutely honest, sends to the owner of an apartment complex the following letter:
> Exactly one of the following statements is true: You will pay me $1000, or, your building already has a terrible termite infestation (I didn't put it there).
Since a terrible termite infestation would cost $1,000,000 to control, an evidential decision theorist will reason, "It is better to pay $1000; this is better news about whether I have a termite infestation." They pay the $1000.
But this makes the Predictor's statement be true! So the Predictor can go around sending letters like this to all the EDT agents in town who can afford the $1000.
Conversely a CDT or FDT agent will reason that, if they get a letter like this, they must already have a termite infestation, which will be unaffected by whether they pay (CDT) / by whether people like them predictably pay (FDT). So the Predictor won't send them letters if they have no termites, because they won't pay, and because the contents would be false. CDT/FDT agents only see these letters if the Predictor, acting perhaps to increase its credibility with EDT agents, sends them to some CDT/FDT agents who *do* have termites -- and then the result of this policy from an FDT perspective is to get a valuable free warning from the Predictor about their termite infestation.
Updateful decision theories change the probability distribution used to evaluate actions over time. Updateless Decision Theory (UDT) does not, instead always maximizing a priori expected utility. This works well in well-defined decision problems with high probability in the prior, but in some senses, does not learn. Open-Minded Updatelessness seeks to combine the advantages of updatelessness and updatefulness by allowing some changes in probabilities, without fully updating on evidence.
From Wikipedia:
An Egregore (also spelled egregor; from French égrégore, from Ancient Greek ἐγρήγορος, egrēgoros 'wakeful') is a concept in Western esotericism of a non-physical entity or thoughtform that arises from the collective thoughts and emotions of a distinct group of individuals.
Egregores don't need to have well-defined "members". Generally, they are programs that run on groups of people and have some level of self-persistence. Moloch, for instance, is an egregore consisting of all failures of coordination, and so it runs on almost all humans but does not fully envelop any of them.
Suffering-focused ethics (SFE) is a family of moral views that give priority to reducing suffering, especially intense suffering. Rather than treating happiness and suffering as fully symmetric, suffering-focused views hold that preventing severe suffering often matters more urgently than creating additional happiness.
The Smoking Lesion is a needlessly confusing Newcomblike problem for testing to probe the stances of alternative decision theories,. If you want an equivalent problem that is not needlessly confusing, see the Toxoplasmosis dilemma.
Smoking Lesion is stated as follows:as:
Naive(Again, note that contrary to how causality works in the real world, this example wantonly inverts the real world to say: "Actually smoking does not cause cancer; instead, enjoyment of smoking is correlated with a genetic cause for cancer"; except that instead of just calling it a gene, they're going to call it a "lesion" that you might otherwise associate with a brain lesion or something. This is why the statement is needlessly confusing and why presentations should perhaps use some less confusing presentation like causalToxoplasmosis instead.)
Once you've got those needlessly confusing specifications straight in your mind, perhaps after first looking at Toxoplasmosis to understand what the idea is actually about:
Causal decision theory says "yes", to smoking in Smoking Lesion, since smoking in this world has no causal effect on whether or not you get cancer. You either get cancer or not; in both cases, smoking is preferred. Naive
(Naive) evidential decision theory says "no", because smoking is strongly correlated with cancer.
Functional Decision Theory says "yes": your decision procedure in this problem doesn't influence whether or not you get cancer -cancer; and with or without cancer, smoking is preferred.
But this makes the Predictor's statement be true! So the Predictor can go around sending letters like this to all the EDT agents in town who can afford the $1000.$1000, after checking to make sure they don't actually have a termite infestation.
Conversely a CDT or FDT agent will reason that, if they get a letter like this, they must already have a termite infestation, which will be unaffected by whether they pay (CDT) / by whether people like them predictably pay (FDT). So the Predictor won't send them letters if they have no termites, because they won't pay, and because the contents would be false. CDT/FDT agents only see these letters if the Predictor, acting perhaps to increase its credibility with EDT agents, sends them to some CDT/FDT agents who *do* have termites -- and then the result of this general policy and disposition, from an FDT perspectiveperspective, is to get a valuable free warning from the Predictor about their termite infestation.
Coordinal: A Postmortem (Ronak Mehta, 2026-05-19)
In the form(more confusing) forms of Solomon's Problem, and later the Smoking Lesion, this dilemma was historically significant and influential in the invention of causal decision theory and its widespread adoption over the alternative of evidential decision theory.
On a technical level, it's possible that updating on observing yourself to pet the kitten might introduce difficulties into some formal LDT variants. We can imagine toxoplasmosis as a disease that influences the utility function of the agent, raising upward the amount that it enjoys petting kittens. Observing yourself to pet a kitten is informative about having toxoplasmosis because of what this tells you about your own utility function. But the algorithm Q for functional decision theory quotes itself as ┌Q┐ within its definition, including its own utility function U. So ideal FDT agents should already know their own utility functions U and should not be able to gain more information about their source code by watching themselves pet kittens.
The relative quantities chosen to be similar to those in Newcomb's Problem.
Of course, it could also be the case that non-ideal humans espousing LDT as a theoretical ideal are still influenced by being told about toxoplasmosis at the start of the experiment, and that thinking about this psychologically affects the degree to which a known-safe kitten seems enjoyable for petting.
The Open Agency Architecture ("OAA") is an AI alignment proposal by (among others) @davidad and @Eric Drexler. .. (read more)