Archetypal Transfer Learning (ATL) is a proposal by @whitehatStoic for what is argued by the author to be a fine tuning approach that "uses archetypal data" to "embed Synthetic Archetypes". These Synthetic Archetypes are derived from patterns that models assimilate from archetypal data, such as artificial stories. The method yielded a shutdown activation rate of 57.33% in the GPT-2-XL model after fine-tuning. .. (read more)
Religion is a complex group of human activities — involving commitment to higher power, belief in belief, and a range of shared group practices such as worship meetings, rites of passage, etc... (read more)
| User | Post Title | Wikitag | Pow | When | Vote |
The arguments about which entities to include or exclude seem to contradict each other, or don't really justify their positions. Examples:
The only argument that seems to me to have force is "avoid a slap-fight over who gets to rule the world". The argument for excluding particular (plausibly-)moral patients is that if you try to include them, you might be conquered by someone else who doesn't include them, and get a worse ultimate outcome.
Summaries of discussions, takeaways, etc. from LessWrong meetups that have already taken place.
Inkhaven is a 30-day residency where one has to publish posts every day, as part of an effort to grow stronger as a writer. While this has produced some excellent posts it also produces a fair bit of noise too, and also many more hastily-written or experimental posts than usual.
Inkhaven-like posts emerge when other people try to imitate this manner on a smaller scale (e.g. Lightcone team members doing their own 1-week writing stints, or 'HalfHaven' where remote LessWrongers aim to post 30 posts over the course of two months).
Interp on Deepseeks mHC architecture
Summaries of discussions, takeaways, etc. from LessWrong meetups that have already taken place.
Inkhaven is a 30-day residency where one has to publish posts every day. While this likely helps one in the longer term, the shorter-term effect is a more likely creation of posts with less effort to doublecheck the arguments and, as a result, with epistemic problems.
Inkhaven-like posts emerge when other people try to imitate this manner on a smaller scale (e.g. Lightcone team members doing their own 1-week writing stints).
When formalized, causal relationships are usually formalized as a directed acyclic graph from parent events to child events saying how to compute the probable child given the state of its parents.
Inkhaven is a 30-day residency where one has to publish posts every day.day, as part of an effort to grow stronger as a writer. While this likely helps one in the longer term, the shorter-term effect ishas produced some excellent posts it also produces a fair bit of noise too, and also many more likely creation ofhastily-written or experimental posts with less effort to doublecheck the arguments and, as a result, with epistemic problems. than usual.
Inkhaven-like posts emerge when other people try to imitate this manner on a smaller scale (e.g. Lightcone team members doing their own 1-week writing stints)stints, or 'HalfHaven' where remote LessWrongers aim to post 30 posts over the course of two months).
The main problems with CEV include, firstly, the great difficulty of implementing such a program - “If one attempted to write an ordinary computer program using ordinary computer programming skills, the task would be a thousand lightyears beyond hopeless.” Secondly, the possibility that human values may not converge. Yudkowsky considered CEV obsolete almost immediately after its publication in 2004. He states that there's a "principled distinction between discussing CEV as an initial dynamic of Friendliness, and discussing CEV as a Nice Place to Live" and his essay was essentially conflating the two definitions.
ML4Good is a France-based field-building organisation that runs AI Safety bootcamps.
Scalable oversight is an approach to AI control [1]in which AIs supervise each other.the problem of providing reliable supervision of outputs from AIs, even as they become smarter than humans. Often groups of weaker AIs supervise a stronger AI, or AIs are set in a zero-sum interactiondebate with each other.
People used to refer to scalable oversight as a set of AI alignment techniques, but they usually work on the level of incentives to the AIs, and have less to do with architecture.
A reasoning step is "logically valid" when that kind of step never produces a false conclusion from true premises. For example, in algebra, "Add 2 to both sides of the equation" is valid because it only produces true equations from true equations, while "Divide both sides by x" is invalid because x might be 0. So even if "2x = (y+1)x", letting x = 0 and y = 2, the original equation can be true while "2 = y + 1" is false. But "2x + 2 = (y+1)x + 2" will be true in every semantic model where the original equation is true.
More generally in life, there's a question of "did you execute each local step of reasoning correctly", which can be considered apart from "did you arrive at the correct conclusion". Validity is a local property of a reasoning step or sequence; we can (and should) evaluate each step's validity separately from whether we agree with the premises or end up agreeing with the conclusion. For near-logical domains, this asks "Does the next proposition follow (with very high probability, given other things usually believed about the world or explicitly introduced as premises) from the previous proposition?" For probabilistic reasoning, informal validity asks, "Given everything else believed or introduced as a premise, is this next step adjusting probabilities by the right amount?" or "Does this kind of reasoning step in general produce well-calibrated conclusions from well-calibrated premises?"
Eg, consider why the ad hominem fallacy should be seen as "invalid" or a "locally invalid reasoning step" from this viewpoint. Suppose you start out with well-calibrated probabilities (things you say "60%" for, happen around 60% of the time). You assign 60% probability that the sky is blue. Then somebody says, "Yeah, well, people who believe in blueskyism are ugly" and you nod and adjust your credence in blueskyism down to 40%. Your odds just went from 3:2 to 2:3, so by Bayes's Rule you should've heard evidence with a likelihood ratio of 4:9 to produce that probability shift. Unless you already believe that false propositions are 225% as likely as true propositions to be believed by ugly people, you should already expect that believing an ad hominem argument is something that can produce ill-calibrated conclusions in expectation from well-calibrated premises.
Main articles:
By Ruthenis (summarized; includes level 0):
Hey everyone! My name's Rishi. Hoping to explore more of the Rationalist community and float some of my ideas. Any initial reading recs? I'm mostly interested in the relation of rationalism to metaphysics.