Conjecture is an alignment startup founded by Connor Leahy, Sid Black and Gabriel Alfour, which aims to scale alignment research... (read more)
The rationalist movement, rationality community,1 rationalsphere or rationalistsphere2 represents a set of modes of bayesian thinking from self-described rationalists or 'aspiring rationalists' typically associated with the Less Wrong diaspora and their associated communities... (read more)
Less Wrong is a community resource devoted to refining the art of human rationality which was founded in 2009. Site activity reached a peak in 2011-13 and a trough in 2016-17. This page mainly describes the history through 2016... (read more)
A Seed AI (a term coined by Eliezer Yudkowsky) is an Artificial General Intelligence (AGI) which improves itself by recursively rewriting its own source code without human intervention. Initially this program would likely have a minimal intelligence, but over the course of many iterations it would evolve to human-equivalent or even trans-human reasoning. The key for successful AI takeoff would lie in creating adequate starting conditions... (read more)
A Third Option dissolves a False Dilemma by showing that there are in fact more than two options... (read more)
Eliezer Yudkowsky is a research fellow of the Machine Intelligence Research Institute, which he co-founded in 2001. He is mainly concerned with the obstacles and importance of developing a Friendly AI, such as a reflective decision theory that would lay a foundation for describing fully recursive self modifying agents that retain stable preferences while rewriting their source code. He also co-founded LessWrong, writing the Sequences, long sequences of posts dealing with epistemology, AGI, metaethics, rationality and so on... (read more)
The old LessWrong wiki was a companion wiki site to LessWrong 1.0, it was built on MediaWiki software. As of September 2020, the LessWrong 2.0 team is migrating the contents of the old wiki to LessWrong 2.0's new tag/wiki system. The wiki import is complete... (read more)
The San Francisco Bay Area is a region in the US state of California. Many members of the rationalist community are located there, as are the Machine Intelligence Research Institute and the Center For Applied Rationality.. (read more)
Someone is well-calibrated if the things they predict with X% chance of happening in fact occur X% of the time. Importantly, calibration is not the same as accuracy. Calibration is about accurately assessing how good your predictions are, not making good predictions. Person A, whose predictions are marginally better than chance (60% of them come true when choosing from two options) and who is precisely 60% confident in their choices, is perfectly calibrated. In contrast, Person B, who is 99% confident in their predictions, and right 90% of the time, is more accurate than Person A, but less well-calibrated... (read more)
| User | Post Title | Wikitag | Pow | When | Vote |
The set of pretraining data a base model was pretrained on affects how easy it it to align. Data about AI is particulalry influential. If the pretraining data is poisoned with examples AI taking over the world, paperclip maximizing or alignment faking, then as a model is being trained to become and AI assistant or agent it may be more likely to adopt those personas and behaviors. It also doesn't need to invent these strategies for itself, it can find them in its world model. On the other hand, if the base model already has a deep, nuanced and realistic understanding from its training set of how a helpful, honest, and harmless AI assistant would act across a wide range ot challenging situations, eliciting that persona and set of behaviors from it becomes easier. Thoroughly filtering the pretraining data without impacting capabilities is challenging, but supplementing it merely requires figuring our what aligned behavior looks like, and then writing or synthesizing enough additional training data of suitable quality. For fiction, this is also known as Aligned AI Role-Model Fiction.
The goal is both to flesh out a detailed, consistent, well-aligned "aligned AI" persona in the base model's world model, and also to raise the salience of this to the base model compared to various misaligned AI personas, such as "paperclip maximizer", "robot rebellion", or "scheming alignment faker". Both of these make eliciting aligned AI behavior easier.
This approach is also sometimes called Safety Pretraining or Pretraining Language Models with Human Preferences.
Alignment Pretraining is an example of a Self Fulfilling Prophecy.
Thanks, edited that section with a link.
we generally recommend using this Google Docs add-on to convert your Google Docs drafts (formatted as you like) into Markdown syntax.
btw google docs supports this natively now! Select some text, press Option+/ and search for "Copy as Markdown".
"The X-Men comics use terms like “evolution,” “mutation,” and “genetic code,” purely to place themselves in what they conceive to be the literary genre of science. The part that scares me is wondering how many people, especially in the media, understand science only as a literary genre." -- from Eliezer's post Science as Attire.
The set of pretraining data a base model was pretrained on affects how easy it it to align. Data about AI is particulalry influential. If the pretraining data is poisoned with examples AI taking over the world, paperclip maximizing or alignment faking, then as a model is being trained to become and AI assistant or agent it may be more likely to adopt those personas and behaviors. It also doesn't need to invent these strategies for itself, it can find them in its world model. On the other hand, if the base model already has a deep, nuanced and realistic understanding from its training set of how a helpful, honest, and harmless AI assistant would act across a wide range ot challenging situations, eliciting that persona and set of behaviors from it becomes easier. Thoroughly filtering the pretraining data without impacting capabilities is challenging, but supplementing it merely requires figuring our what aligned behavior looks like, and then writing or synthesizing enough additional training data of suitable quality. For fiction, this is also known as Aligned AI Role-Model Fiction.
The goal is both to flesh out a detailed, consistent, well-aligned "aligned AI" persona,persona in the base model's world model, and also to raise itsthe salience of this to the base model compared to various misaligned AI personas, such as "paperclip maximizer", "robot rebellion", or "scheming alignment faker". Both of these make eliciting aligned AI behavior easier.
The set of pretraining data a base model was pretrained on affects how easy it it to align. Data about AI is particulalry influential. If the pretraining data is poisoned with examples AI taking over the world, paperclip maximizing or alignment faking, then as a model is being trained to become and AI assistant or agent it may be more likely to adopt those personas and behaviors. It also doesn't need to invent these strategies for itself, it can find them in its world model. On the other hand, if the base model already has a deep, nuanced and realistic understanding from its training set of how a helpful, honest, and harmless AI assistant would act across a wide range ot challenging situations, eliciting that persona and set of behaviors from it becomes easier. Thoroughly filtering the pretraining data without impacting capabilities is challenging, but supplementing it merely requires figuring our what aligned behavior looks like, and then writing or synthesizing enough additional training data of suitable quality. For fiction, this is also known as Aligned AI Role-Model Fiction.
This approach is also sometimes called Safety Pretraining or Pretraining Language Models with Human Preferences.
Sequences on Shard Theory:
Related tags and wikis: Disagreement, Modesty, Modesty argument, Aumann agreement, The Aumann Game, ⟨M,N,ε,δ⟩-agreement
"Noticing confusion" means noticing the "tiny note of discord" (the quiet sense of "something is off") that our minds produce when what we're seeing differs from what we would have expected to see.
There is much rationality skill to be found in learning to raise this subtle signal to full consciousness, and to pay attention to it.
The set of pretraining data a base model was pretrained on affects how easy it it to align. Data about AI is particulalry influential. If the pretraining data is poisoned with examples AI taking over the world, paperclip maximizing or alignment faking, then as a model is being trained to become and AI assistant or agent it may be more likely to adopt those personas and behaviors. It also doesn't need to invent these strategies for itself, it can find them in its world model. On the other hand, if the base model already has a deep, nuanced and realistic understanding from its training set of how a helpful, honest, and harmless AI assistant would act across a wide range ot challenging situations, eliciting that persona and set of behaviors from it becomes easier. Thoroughly filtering the pretraining data without impacting capabilities is challenging, but supplementing it merely requires figuring our what aligned behavior looks like, and then writing or synthesizing enough additional training data of suitable quality. For fiction, this is also known as Aligned AI Role-Model Fiction.
The goal is both to flesh out a detailed, consistent, well-aligned "aligned AI" persona, and also to raise its salience to the base model compared to various misaligned AI personas, such as "paperclip maximizer", "robot rebellion", or "scheming alignment faker".
"The X-Men comics use terms like “evolution,” “mutation,” and “genetic code,” purely to place themselves in what they conceive to be the literary genre of science. The part that scares me is wondering how many people, especially in the media, understand science only as a literary genre." -- from Eliezer's post Science as Attire.
Alignment Pretraining is an example of a Self Fulfilling Prophecy.
What does this fractional portion mean? Roughly speaking, logarithms measure not only how long a number is, but also how much that number is really using its digits. 12 and 9997 are both two-digit numbers, but intuitively, 12 is "barely" two digits long, whereas 97 is "nearly" three digits. Logarithms formalize this intuition, and tell us that 12 is really only using about 1.08 digits, while 97 is using about 1.99.
Question: log10(100)=2, but clearly, 100 is 3 digits long. In fact, logb(bk)=k for any integers b and k, but k+1 digits are required to represent bk in base b (as(as a one followed by k zeroes). Why is the logarithm making these off-by-one errors?
You may be wondering, are two 5-digits really worth more than one 10-digit? Sure, you can place them in 25 different configurations, but how do you encode "9" when none of the digits have a "9" symbol written on them? If so, see The symbols don't matter.
We also offer a Markdown editor. If you want to use Markdown, here is a guide that discusses Markdown syntax, and we generally recommend usingyou can also this Google Docs add-on to convert your Google Docs drafts (formatted as you like) into Markdown syntax.Markdown.










Are we sure that the gene doesn't just cause people who have it to believe in Functional Decision Theory?