Wiki-Tags in Need of Work

Axioms (together with definitions) forms the basis of mathematical theorems. Every mathematical theorem is only proven inside its axiom system... (read more)

AI Control in the context of AI Alignment is a category of plans that aim to ensure safety and benefit from AI systems, even if they are goal-directed and are actively trying to subvert your control measures. From The case for ensuring that powerful AIs are controlled:.. (read more)

The Open Agency Architecture ("OAA") is an AI alignment proposal by (among others) @davidad and @Eric Drexler.  .. (read more)

Singluar learning theory is a theory that applies algebraic geometry to statistical learning theory, developed by Sumio Watanabe. Reference textbooks are "the grey book", Algebraic Geometry and Statistical Learning Theory, and "the green book", Mathematical Theory of Bayesian Statistics.

Archetypal Transfer Learning (ATL) is a proposal by @whitehatStoic for what is argued by the author to be a fine tuning approach that "uses archetypal data" to "embed Synthetic Archetypes". These Synthetic Archetypes are derived from patterns that models assimilate from archetypal data, such as artificial stories. The method yielded a shutdown activation rate of 57.33% in the GPT-2-XL model after fine-tuning. .. (read more)

Open Threads are informal discussion areas, where users are welcome to post comments that didn't quite feel big enough to warrant a top-level post, nor fit in other posts... (read more)

A Black Marble is a technology that by default destroys the civilization that invents it. It's one type of Existential Risk. AGI may be such an invention, but isn't the only one... (read more)

AI Evaluations focus on experimentally assessing the capabilities, safety, and alignment of advanced AI systems. These evaluations can be divided into two main categories: behavioral and understanding-based... (read more)

Löb's Theorem is theorem proved by Martin Hugo Löb which states: .. (read more)

Tag Voting Activity

User Post Title Tag Pow When Vote

Recent Tag & Wiki Activity

Reinforcement Learning from Human Feedback (RLHF) is a machine learning technique and an alignment technique where the model's training signal uses human evaluations of the model's outputs, rather than labeled data or a ground truth reward signal.

Reinforcement Learning from Human Feedback (RLHF) is a machine learning technique and an alignment technique where the model's training signal uses human evaluations of the model's outputs, rather than labeled data or a ground truth reward signal.

tino10

hi! some of the german links no longer work and get redirected to https://personalvermittlung-osteuropa.com/

 the content is still available via wayback machine https://web.archive.org/web/20140322130753/http://giordano-bruno-stiftung.ch/rationalitaet/ (for example)

i think most of germans have never heard of the sequences, but they are needed! Are there any other complete translations available? I could try and help with translating (using LLMs and correcting mistakes).

How to celebrate [stub]

petrov.com has an organizer's guide for a small quiet ceremony, and Modes of Petrov Day contains additional ideas. Otherwise see the many posts tagged Petrov Day for things people have done in various times/places/groups.

LessWrong Website Celebrations

The LessWrong site has an annual tradition of commemorating the day involving users having the potential to take down the LessWrong frontpage for 24 hours. This has happened since 2019. Here are the relevant posts.

See also: also

Placing belief in belief as one of the three things religion involves in the description seems dismissive and false. It certainly can involve belief in belief, but that looks to me like a very specific subset of religious practitioners in an incredibly diverse field.

The "See also" also seems biased against religion. I think "religion" here is being conflated with "fundamentalism," and all of the arguments 'against' are actually against dogmatic, black and white thinking, not against 'religion' itself. 

You may think I'm religious by posting these comments, but I don't identify with any particular religion. I think religion is more like the institutionalization of a set of transformative spiritual practices, mostly for the purpose of preservation and propagation.

Growth Mindset is the name for a hypothetical mental state wherein a person believes that they can grow as a person, and this belief causes them to try to grow, which might plausibly lead to more growth!

This is a very natural and obvious hypothesis that could apply to agents in general, and could be defended as a sort of useful philosophic presupposition to often start with, in most domains, about most agents, and most positive traits, that would lead to improvement behaviors that might be quite cheap, and might be quite valuable to have tried!

The primary academic popularizer of the idea within academic psychology is Carol Dweck, who has been embroiled in some controversies.

Sometimes the idea is taken farther than is empirically justified when psychometric rigor is brought to bear on specific traits in specific situations where people might honesty (or dishonestly!) disagree about whether growth or change is possible.

In these debates, sometimes people with strong specific arguments about specific limits to specific growth in specific people are treated as "just not having growth mindset" and then their specific arguments are seemingly speciously dismissed.

Like most concepts, the concept of growth mindset can be abused by idiots.

Growth Mindset

Growth Mindset is the name for a hypothetical mental state wherein a person believes that they can grow as a person, and this belief causes them to try to grow, which might plausibly lead to more growth!

This is a very natural and obvious hypothesis that could apply to agents in general, and could be defended as a sort of useful philosophic presupposition to often start with, in most domains, about most agents, and most positive traits, that would lead to improvement behaviors that might be quite cheap, and might be quite valuable to have tried!

The primary academic popularizer of the idea within academic psychology is Carol Dweck, who has been embroiled in some controversies....

(Read More)

Theory of mind [Wikipedia] refers to the capacity to understand other people by ascribing mental states to them. A theory of mind includes the knowledge that others' beliefs, desires, intentions, emotions, and thoughts may be different from one's own

The term language model agent better captures the inherent agency of such a system, while "language model cognitive architecture" better captures the extensive additions to the LLM, and the resulting change in function and capabilities. Episodic memory may be one key cognitive system; vector-based systems in LMCAs to-date are quite limited, and humans with episodic memory problems are limited in the complexity and time-horizon of problems they can solve.

bcare1810

Plugging my more light-hearted rationality podcast, Recreational Overthinking. Been a LessWrong reader for a long time now and I think a lot of the community would find it entertaining. https://open.spotify.com/show/3xZEkvyXuujpkZtHDrjk7r?si=ryufuZjZSe2ryw7aKFOUpQ

A p-zombie is as likely as anyone else to ask, "When"When I see red, do I see the same color that you see when you see red?"", but they have no real experience of the color red; the zombie'zombie's speech must be explained in some other terms which do not require them to have real experiences.

The zombie thought experiment is purported to show that consciousness cannot be reduced to merely physical things: our universe is purported to perhaps have special "bridging laws""bridging laws" which bring a mind into existence when there are atoms in a suitably brain-like configuration.

Physicalists typically deny the possibility of zombies: if a p-zombie is atom-by-atom identical to a human being in our universe, then our speech can be explained by the same mechanisms as the zombie'zombie's and yet it would seem awfully peculiar that our words and actions would have an entirely materialistic explanation, but also, furthermore, our universe happens to contain exactly the right bridging law such that our utterances about consciousness are true and our consciousness syncs up with what our merely physical bodies do. It'It's too much of a stretch: Occam'Occam's razor dictates that we favor a monistic universe with one uniform set of laws.


 


 

It has been argued that a careful (re)definition of AIXI's off-policy behavior may patch the anvil problem in practice.

AI Control in the context of AI Alignment is a category of plans that aim to ensure safety and benefit from AI systems, even if they are goal-directed and are actively trying to subvert your control measures. From The case for ensuring that powerful AIs are controlled:

In this post, we argue that AI labs should ensure that powerful AIs are controlled. That is, labs should make sure that the safety measures they apply to their powerful models prevent unacceptably bad outcomes, even if the AIs are misaligned and intentionally try to subvert those safety measures. We think no fundamental research breakthroughs are required for labs to implement safety measures that meet our standard for AI control for early transformatively useful AIs; we think that meeting our standard would substantially reduce the risks posed by intentional subversion.

 and

There are two main lines of defense you could employ to prevent schemers from causing catastrophes.

  • Alignment: Ensure that your models aren't scheming.[2]
  • Control: Ensure that even if your models are scheming, you'll be safe, because they are not capable of subverting your safety measures.[3]

Paperclip maximization would be a quantitative internal alignment error. Ironically the error of drawing boundary between paperclip maximization & squiggle maximization was itself arbitrary decision. Feel free to message me to discuss this.