Archetypal Transfer Learning (ATL) is a proposal by @whitehatStoic for what is argued by the author to be a fine tuning approach that "uses archetypal data" to "embed Synthetic Archetypes". These Synthetic Archetypes are derived from patterns that models assimilate from archetypal data, such as artificial stories. The method yielded a shutdown activation rate of 57.33% in the GPT-2-XL model after fine-tuning. .. (read more)
Religion is a complex group of human activities — involving commitment to higher power, belief in belief, and a range of shared group practices such as worship meetings, rites of passage, etc... (read more)
| User | Post Title | Wikitag | Pow | When | Vote |
"Goodfire is a research company using interpretability to understand, learn from, and design AI systems. Our mission is to build the next generation of safe and powerful AI—not by scaling alone, but by understanding the intelligence we're building."
The arguments about which entities to include or exclude seem to contradict each other, or don't really justify their positions. Examples:
The only argument that seems to me to have force is "avoid a slap-fight over who gets to rule the world". The argument for excluding particular (plausibly-)moral patients is that if you try to include them, you might be conquered by someone else who doesn't include them, and get a worse ultimate outcome.
Summaries of discussions, takeaways, etc. from LessWrong meetups that have already taken place.
Inkhaven is a 30-day residency where one has to publish posts every day, as part of an effort to grow stronger as a writer. While this has produced some excellent posts it also produces a fair bit of noise too, and also many more hastily-written or experimental posts than usual.
Inkhaven-like posts emerge when other people try to imitate this manner on a smaller scale (e.g. Lightcone team members doing their own 1-week writing stints, or 'HalfHaven' where remote LessWrongers aim to post 30 posts over the course of two months).
Inkhaven is a 30-day residency where one has to publish posts every day. While this likely helps one in the longer term, the shorter-term effect is a more likely creation of posts with less effort to doublecheck the arguments and, as a result, with epistemic problems.
Inkhaven-like posts emerge when other people try to imitate this manner on a smaller scale (e.g. Lightcone team members doing their own 1-week writing stints).
"Goodfire is a research company using interpretability to understand, learn from, and design AI systems. Our mission is to build the next generation of safe and powerful AI—not by scaling alone, but by understanding the intelligence we're building."
Inkhaven is a 30-day residency where one has to publish posts every day.day, as part of an effort to grow stronger as a writer. While this likely helps one in the longer term, the shorter-term effect ishas produced some excellent posts it also produces a fair bit of noise too, and also many more likely creation ofhastily-written or experimental posts with less effort to doublecheck the arguments and, as a result, with epistemic problems. than usual.
Inkhaven-like posts emerge when other people try to imitate this manner on a smaller scale (e.g. Lightcone team members doing their own 1-week writing stints)stints, or 'HalfHaven' where remote LessWrongers aim to post 30 posts over the course of two months).
See Omega (alien philosopher-troll) for a standard list of properties.
ML4Good is a France-based field-building organisation that runs AI Safety bootcamps.
Interp on Deepseeks mHC architecture
For the purposes of Agent Foundations, Payor's lemma has been proposed as an alternative to Löb's theorem, due to both being simpler and possibly having a probabilistic generalization in a way that breaks for Löb's theorem.theorem. If it works out, this would provide a way for agents to do the probabilistic version of logical decision theory type stuff like cooperate in the Prisoner's dilemma when given each other's source code, this time with uncertainty.
Löb's theorem states that, given any statement P, if Peano Arithmetic (PA for short) proves that it can be 'trusted' if it proves P (that is, Prove(P) implies P), then it actually just proves P. This means that PA cannot tell you that it can be trusted about P, unless it also just tells you P. It also holds for theories that contain PA.
As a consequence, whenever we try to prove a statement P, we can go ahead and just assume that P is provable, and then see if we can show that that implies that P is true. This might sound really stupid and contradictory at first glance - the important thing is to be really clear about what is proving what. In the condition, PA is saying that if it proves P, then P is true. In 'our' view (that is, in a metatheory), we see that if PA says that, then PA will also say that P is true.
It became much less important later after the invention/discovery of the Garrabrant Inductor. There is also work on using the similar Payor's Lemma, which possibly allows for a probabilistic version in a way that break's for Löb's theorem.
In formal notation, let Prv stand for the standard provability predicate of PA. Then, Prv(T) is true if and only if there is a proof from the axioms and rules of inference of PA of T. Then what we would like PA to say is that Prv(S)⟹S for every sentence S.
But alas, PA suffers from a problem of self-trust.
Löb's theorem states that if PA⊢Prv(S)⟹S then PA⊢S. This immediately implies that if PA is consistent, the sentences PA⊢Prv(S)⟹S are not provable when S is false, even though according to our intuitive understanding of the standard model every sentence of this form must be true.
Thus, PA is incomplete, and fails to prove a particular set of sentences that would increase massively our confidence in it.
Notice that Gödel's second incompleteness theorem follows immediately from Löb's theorem, as if PA is consistent, then by Löb's PA⊬Prv(0=1)⟹0=1, which by the propositional calculus implies PA⊬¬Prv(0=1).
When formalized, causal relationships are usually formalized as a directed acyclic graph from parent events to child events saying how to compute the probable child given the state of its parents.
By Ruthenis (summarized; includes level 0):
Summaries of discussions, takeaways, etc. from LessWrong meetups that have already taken place.
Hey everyone! My name's Rishi. Hoping to explore more of the Rationalist community and float some of my ideas. Any initial reading recs? I'm mostly interested in the relation of rationalism to metaphysics.