







In this new essay collection, LessWrong writers seek to understand key elements of the art of rationality. The collection features essays from Eliezer Yudkowsky, Scott Alexander, Zvi Mowshowitz, and over 30 more LessWrong writers. Starting with the simple epistemic question of when and how to trust different sources of information, the essays in the books move through understanding the lens of incentives, an exploration of when and why complex systems become modular, and finally into a discussions of failure, both personal and civilizational.
What if you just modeled your partner well enough to do what they would want without needing to ask? And what if you tried this from the very first date?
Cross-posted from Putanumonit, this is the conclusion of a four-part sequence on selfless dating.
My last three posts talked about all the things wrong with dating today, from the narratives to the frameworks to the myriad ways you’re self-sabotaging your own romantic life. This post is about the opposite — how dating can go very very right.
To me the best style of relationship is one where both partners treat the other’s real preferences with equal weight to their own, and make decisions on that basis and not based on pre-agreed rules. I call this ideal “selfless relationships”; others have...
(This is an unofficial explanation of Inner Alignment based on the Miri paper Risks from Learned Optimization in Advanced Machine Learning Systems (which is almost identical to the LW sequence) and the Future of Life podcast with Evan Hubinger (Miri/LW). It's meant for anyone who found the sequence too long/challenging/technical to read.)
Note that bold and italics means "this is a new term I'm introducing," whereas underline and italics is used for emphasis.
Let's start with an abridged guide to how Deep Learning works:
If the problem is "find a tool that can look at any image and decide whether or not it contains a cat," then each conceivable set of rules for...
I'm still a bit confused by the difference between inner alignment and out-of-distribution generalization. What's the fundamental difference between the cat-classifying problem and the maze problem. The model itself is an optimizer for the latter? But why this is any special?
What if the neural network used to solve the maze problem just learns a mapping (but doesn't do any search)? Is that still an inner-alignment problem?
In this 2014 paper by Mike Robinson and Kent Berridge at University of Michigan (see also this more theoretical follow-up discussion by Berridge and Peter Dayan), rats were raised in an environment where they were well-nourished, and in particular, where they were never salt-deprived—not once in their life. The rats were sometimes put into a test cage with a lever which, when it appeared, was immediately followed by a device spraying ridiculously salty water directly into their mouth. The rats were disgusted and repulsed by the extreme salt taste, and quickly learned to hate the lever—which from their perspective would seem to be somehow causing the saltwater spray. One of the rats went so far as to stay tight against the opposite...
But if your definition of alignment is "an AI that does things in a way such that all humans agree on it's ethical choices" I think you're doomed from the start, so this counterintuition proves too much. I don't think there is an action an AI could take or a recommendation it could make that would satisfy that criteria (in fact, many people would say that the AI by it's nature shouldn't be taking actions or making recommendations)
When I imagine an animal welfare EA group, I imagine views breaking down something like:
This seems like a super hard question, and not one that changes the importance of working to promote animal welfare, so naively it should have a 50/50 split within animal welfare circles. Possibly more effort should go into the net-positive view because it's more neglected (animal...
Your Boycott-itarianism could work just through market signals. As long as your diet makes you purchase less high-cruelty food and more low-cruelty food, you'll increase the average welfare of farm animals, right? Choosing a simple threshold and telling everyone about it is additionally useful for coordination and maybe sending farmers non-market signals, if you believe those work.
If you really want the diet to be robustly good with respect to the question of whether farm animals' lives are net-positive, you'd want to tune the threshold so as not to change... (read more)
Hello, I am curious about your most basic preferences. I would like you to state them as many as possible.
You can use relation signs (<, >, ...), E. g.: true knowledge > ignorance > false knowledge, which would mean that you prefer true knowledge over ignorance, and ignorance over false knowledge.
Specific, replicable actions that lead to predictable, desired results > specific, replicable actions that lead to unpredictable, desired results
(with the understanding that you may need to grind unpredictability for a while until you get what you need to consistently achieve predictability)
(example being "building freelance career" vs. "maintaining freelance career")
(actions that lead to undesired results aren't even on the table for consideration, of course)
I've previously argued that the concept of counterfactuals can only be understood from within the counterfactual perspective.
I will be awarding a $1000 prize for the best post that engages with the idea that counterfactuals may be circular in this sense. The winning entry may be one of the following (these categories aren't intended to be exclusive):
a) A post that attempts to draw out the consequences of this principle for decision theory
b) A post that attempts to evaluate the arguments for and against adopting the principle that counterfactuals only make sense from within the counterfactual perspective
c) A review of relevant literature in philosophy or decision theory
d) A post that states already existing ideas in a clearer manner (I don't think this topic has been explored much on LW,...
Phenomenal experience is technically a subset of reality.
LessWrong isn't exactly founded on the map-territory model of truth, but it's definitely pretty core to the LessWrong worldview. The map-territory model implies a correspondence theory of truth. But I'd like to convince you that the map-territory model creates confusion and that the correspondence theory of truth, while appealing, makes unnecessary claims that infect your thinking with extraneous metaphysical assumptions. Instead we can see what's appealing about the map-territory metaphor but drop most of it in favor of a more nuanced and less confused model of how we know about the world.
The map-territory metaphor goes something like this: a map is a representation of some part of the world—a territory. The mind that models the world via thoughts can be said to create a map of the...
I would surmise that we don't disagree about anything except what the term "view from nowhere" means. And I don't really know what "view from nowhere" means anyway, I was just guessing.
The larger context was: I think there's a universe, and that I live in it, and that claims about the universe can be true or false independently of what I or any other creature know and believe. And then (IIUC) G Gordon was saying that this perspective is wrong or incomplete or something, and in fact I'm missing out on insights related to AI alignment by having this perspect... (read more)
What are the most useful practical applications of Bayesian thinking that don't require the person to understand the math?
I think the unit square can be used to visualize this.
https://www.researchgate.net/figure/The-tree-diagram-and-the-unit-square-with-natural-frequencies_fig1_307569151

The most important thing I discovered in regards to my current partnership is that the relationship is the thing that exists between the partners. The relationship is the choices that are made by both people.
In every previous situation, the relationship was dysfunctional because there was an unwillingness to acknowledge "what I want it to be," "what the other person wants it to be," and "what actually exists between us".
(Think 500 Days of Summer, when Tom says "you can't say we're not a couple, we do all of the things couples do" and Summer say... (read more)