| User | Post Title | Wikitag | Pow | When | Vote |
The Supervised Program for Alignment Research (SPAR) is a part-time, remote research program hosted by Kairos AI pairing aspiring researchers with established researchers working to mitigate the risks posed by advanced AI, focusing in particular on AI safety and security, interpretability, biosecurity, AI policy and societal impacts.
Metacognition Is thinking about thinking - the capacity to monitor, evaluate, and govern one's own reasoning while it is happening rather than only after the fact.
Training on narrow examples of misaligned behavior sometimes extrapolates to broadly misaligned behavior, seemingly altering the assistant's goals or persona rather than just training on that specific behavior
The agent then uses an unbounded proof search, which no current AI algorithm could tackle in reasonable time (albeit a human engineer would be able to do it with a bunch of painstaking work)
"Current," here, is indexed to a decade ago, and can no longer be claimed confidently.
The ML Alignment & Theory ScholarsMachine Alignment, Transparency, and Security (MATS) Program is an independent research and educational seminar and independent research program that aims to provide talented scholarsprovides emerging researchers with mentorship, talks, workshops, and research mentorship in the field of AI alignment,workshops and connectconnects them with the BerkeleySF Bay Area and London AI safety research community.communities.
There are fundamental confusions about intelligent agents, that is, about minds that try to make stuff that they want happen. Some believe that working out these fundamental confusions is necessary for AI alignment. Others prefer more prosaic approaches; or something else not mentioned.
Here's some fundamental confusions that agent foundations tries to answer:
Prosaic alignment is an approach to AI alignment.
Payor's lemma is a theorem in mathematical logic that is similar to Löb's theorem.
For the purposes of Agent Foundations, Payor's lemma has been proposed as an alternative to Löb's theorem, due to both being simpler and possibly having a probabilistic generalization in a way that breaks for Löb's theorem. If it works out, this would provide a way for agents to do the probabilistic version of logical decision theory type stuff like cooperate in the Prisoner's dilemma when given each other's source code, this time with uncertainty.
The Supervised Program for Alignment Research (SPAR) is a part-time, remote research program hosted by Kairos AI pairing aspiring researchers with established researchers working to mitigate the risks posed by advanced AI, focusing in particular on AI safety and security, interpretability, biosecurity, AI policy and societal impacts.
Training on narrow examples of misaligned behavior sometimes extrapolates to broadly misaligned behavior, seemingly altering the assistant's goals or persona rather than just training on that specific behavior


Metacognition Is thinking about thinking - the capacity to monitor, evaluate, and govern one's own reasoning while it is happening rather than only after the fact.
?? what is this? Should this be here?