AI Evaluations, or "Evals", focus on assessing the capabilities, safety, and alignment of advanced AI systems. These evaluations can be divided into two main categories: behavioral and understanding-based... (read more)

Löb's Theorem is theorem proved by Martin Hugo Löb which states: .. (read more)

ChatGPT is a language model created by OpenAI... (read more)

[T]he orangutan effect: If you sit down with an orangutan and carefully explain to it one of your cherished ideas, you may leave behind a puzzled primate, but will yourself exit thinking more clearly.

.. (read more)

Emotivism is a meta-ethical model. It's central claim is that moral statements do not express propositions but emotional attitudes.

Reinforcement Learning from Human Feedback (RLHF) is a machine learning technique where the model's training signal uses human evaluations of the model's outputs, rather than labeled data or a ground truth reward signal.

An alignment tax (sometimes called a safety tax) is the additional cost incurred when making an AI aligned, relative to unaligned AI... (read more)

