by Ruby
1 min read7th Jul 20192 comments

6

New to LessWrong?

New Comment
2 comments, sorted by Click to highlight new comments since: Today at 2:21 PM

Chris_Leong (1) Counterfactuals are an Answer, Not a Question; Davidmanheim (1) Hackable Rewards as a Safety Valve?; Donald Hobson (1) Logical Counterfactuals and Proposition graphs, Part 3; Grue_Slinky (1) What are concrete examples of potential "lock-in" in AI research?; (2) Dissatisfactions with Garrabrant Induction; Stuart_Armstrong (1) Best utility normalisation method to date?; (2) Could I survive an ontology change?; (3) Issues with conservative agency; (4) Toy model; (5) Simple and composite partial preferences; (6) Is my result wrong? Maths vs intuition vs evolution in learning human preferences; (7) Toy model piece #4: partial preferences, re-re-visited; (8) Toy model piece #5: combining partial preferences; TurnTrout (1) What You See Isn't Always What You Want; Wei_Dai (1) AI Safety "Success Stories"; (2) Counterfactual Oracles = online supervised learning with random selection of training episodes; abergal (1) Conversation with Paul Christiano; abramdemski (1) Do Sufficiently Advanced Agents Use Logic?; evhub (1) Concrete experiments in inner alignment; (2) Are minimal circuits deceptive?; (3) Relaxed adversarial training for inner alignment; johnswentworth (1) Probability as Minimal Map; (2) How to Throw Away Information; (3) Theory of Ideal Agents, or of Existing Agents?; michaelcohen (1) Utility uncertainty vs. expected information gain; paulfchristiano (1) The strategy-stealing assumption; rohinmshah (1) [AN #63] How architecture search, meta learning, and environment design could lead to general intelligence; (2) [AN #64]: Using Deep RL and Reward Uncertainty to Incentivize Preference Learning; vlad_m (1) Utility ≠ Reward; wdmacaskill (1) A Critique of Functional Decision Theory;

Test comment. Does this show up?