X

The Engines of Cognition

Newly Published Essays by the LessWrong Community

In this new essay collection, LessWrong writers seek to understand key elements of the art of rationality. The collection features essays from Eliezer Yudkowsky, Scott Alexander, Zvi Mowshowitz, and over 30 more LessWrong writers. Starting with the simple epistemic question of when and how to trust different sources of information, the essays in the books move through understanding the lens of incentives, an exploration of when and why complex systems become modular, and finally into a discussions of failure, both personal and civilizational.

Hide
Learn More

Recent Discussion

What if you just modeled your partner well enough to do what they would want without needing to ask? And what if you tried this from the very first date?

Cross-posted from Putanumonit, this is the conclusion of a four-part sequence on selfless dating.


My last three posts talked about all the things wrong with dating today, from the narratives to the frameworks to the myriad ways you’re self-sabotaging your own romantic life. This post is about the opposite — how dating can go very very right.

To me the best style of relationship is one where both partners treat the other’s real preferences with equal weight to their own, and make decisions on that basis and not based on pre-agreed rules. I call this ideal “selfless relationships”; others have...

The most important thing I discovered in regards to my current partnership is that the relationship is the thing that exists between the partners. The relationship is the choices that are made by both people

In every previous situation, the relationship was dysfunctional because there was an unwillingness to acknowledge "what I want it to be," "what the other person wants it to be," and "what actually exists between us". 

(Think 500 Days of Summer, when Tom says "you can't say we're not a couple, we do all of the things couples do" and Summer say... (read more)

(This is an unofficial explanation of Inner Alignment based on the Miri paper Risks from Learned Optimization in Advanced Machine Learning Systems (which is almost identical to the LW sequence) and the Future of Life podcast with Evan Hubinger (Miri/LW). It's meant for anyone who found the sequence too long/challenging/technical to read.)

Note that bold and italics means "this is a new term I'm introducing," whereas underline and italics is used for emphasis.

What is Inner Alignment?

Let's start with an abridged guide to how Deep Learning works:

  1. Choose a problem
  2. Decide on a space of possible solutions
  3. Find a good solution from that space

If the problem is "find a tool that can look at any image and decide whether or not it contains a cat," then each conceivable set of rules for...

I'm still a bit confused by the difference between inner alignment and out-of-distribution generalization. What's the fundamental difference between the cat-classifying problem and the maze problem. The model itself is an optimizer for the latter? But why this is any special?

What if the neural network used to solve the maze problem just learns a mapping (but doesn't do any search)? Is that still an inner-alignment problem?

Introduction: The Dead Sea Salt Experiment

In this 2014 paper by Mike Robinson and Kent Berridge at University of Michigan (see also this more theoretical follow-up discussion by Berridge and Peter Dayan), rats were raised in an environment where they were well-nourished, and in particular, where they were never salt-deprived—not once in their life. The rats were sometimes put into a test cage with a lever which, when it appeared, was immediately followed by a device spraying ridiculously salty water directly into their mouth. The rats were disgusted and repulsed by the extreme salt taste, and quickly learned to hate the lever—which from their perspective would seem to be somehow causing the saltwater spray. One of the rats went so far as to stay tight against the opposite...

I still think this post is correct in spirit, and was part of my journey towards good understanding of neuroscience, and promising ideas in AGI alignment / safety.

But there are a bunch of little things that I got wrong or explained poorly. Shall I list them?

First, my "neocortex vs subcortex" division eventually developed into "learning subsystem vs steering subsystem", with the latter being mostly just the hypothalamus and brainstem, and the former being everything else, particularly the whole telencephalon and cerebellum. The main difference is that the "... (read more)

2Matt Goldenberg6hI don't think it's that weak?
1Samuel Shadrach1hHere's a countering intuition (which is also weak to me, but to show why stronger intuitions are needed): Humans have disagreements on ethics, and have done so for millenia, so they're not 100% aligned.

But if your definition of alignment is "an AI that does things in a way such that all humans agree on it's ethical choices" I think you're doomed from the start, so this counterintuition proves too much.  I don't think there is an action an AI could take or a recommendation it could make that would satisfy that criteria (in fact, many people would say that the AI by it's nature shouldn't be taking actions or making recommendations)

1Samuel Shadrach11hThanks. I'll check out the book. "Partial control" seems exactly what I'm referring to. Although the book does seem to be on a slightly different topic, and I haven't heard of the author. Do you by any chance have link to a summary or review? I'm not criticising anything on LW btw (not here atleast). It's just that - even if you assume the naturalist compatibilist stance that LW assumes - you still need a phrase to refer to things that feel like they're in or not in our control; you still need to talk about the first-person experience.

When I imagine an animal welfare EA group, I imagine views breaking down something like:

  • 50%: If factory farmed animals are moral patients, it's more likely that they have net-negative lives (i.e., it would better for them not to exist, than to live such terrible lives).
  • 50%: If factory farmed animals are moral patients, it's more likely that they have net-positive lives (i.e., their lives may be terrible, but they aren't so lacking in value that preventing the life altogether is a net improvement).

This seems like a super hard question, and not one that changes the importance of working to promote animal welfare, so naively it should have a 50/50 split within animal welfare circles. Possibly more effort should go into the net-positive view because it's more neglected (animal...

Your Boycott-itarianism could work just through market signals. As long as your diet makes you purchase less high-cruelty food and more low-cruelty food, you'll increase the average welfare of farm animals, right? Choosing a simple threshold and telling everyone about it is additionally useful for coordination and maybe sending farmers non-market signals, if you believe those work.

If you really want the diet to be robustly good with respect to the question of whether farm animals' lives are net-positive, you'd want to tune the threshold so as not to change... (read more)

2Rob Bensinger40mNote that there might be other crucial factors in assessing whether 'more factory farming' or 'less factory farming' is good on net — e.g., the effect on wild animals, including indirect effects like 'factory farming changes the global climate, which changes various ecosystems around the world, which increases/decreases the population of various species (or changes what their lives are like)'. It then matters a lot how likely various wild animal species are to be moral patients, whether their lives tend to be 'worse than death' vs. 'better than death', etc. And regarding: I do think that most of EA's distinctive moral views are best understood as 'moves in the direction of utilitarianism' relative to the typical layperson's moral intuitions. This is interesting because utilitarianism seems false as a general theory of human value (e.g., I don't reflectively endorse being perfectly morally impartial between my family and a stranger). But utilitarianism seems to get one important core thing right, which is 'when the stakes are sufficiently high and there aren't complicating factors, you should definitely be impartial, consequentialist, scope-sensitive, etc. in your high-impact decisions'; the weird features of EA morality seem to mostly be about emulating impartial benevolent maximization in this specific way, without endorsing utilitarianism as a whole. Like, an interest in human challenge trials is a very recognizably ‘EA-moral-orientation’ thing to do, even though it’s not a thing EAs have traditionally cared about — and that’s because it’s thinking seriously, quantitatively, and consistently about costs and benefits, it’s consequentialist, it’s impartially trying to improve welfare, etc. There’s a general, very simple and unified thread running through all of these moral divergences AFAICT, and it’s something like ‘when choices are simultaneously low-effort enough and high-impact enough, and don’t involve severe obvious violations of ordinary interpersonal e
2Rob Bensinger44mI'd guess the most controversial part of this post will be the claim 'it's not incredibly obvious that factory-farmed animals (if conscious) have lives that are worse than nonexistence'? But I don't see why. It's hard to be confident of any view on this, when we understand so little about consciousness, animal cognition, or morality. Combining three different mysteries doesn't tend to create an environment for extreme confidence — rather, you end up even more uncertain in the combination than in each individual component. And there are obvious (speciesist) reasons people would tend to put too much confidence in 'factory-farmed animals have net-negative lives'. E.g., when we imagine the Holocaust, we imagine relatively rich and diverse experiences, rather than reducing concentration camp victims to a very simple thing like 'pain in the void'. I would guess that humans' nightmarish experience in concentration camps was usually better than nonexistence; and even if you suspect this is false, it seems easy to imagine how it could be true, because there's a lot more to human experience than 'pain, and beyond that pain, darkness'. It feels like a very open question in the human case. But just because chickens lack some of the specific faculties humans have, doesn't mean that (if conscious) chicken minds are 'simple', or simple in the particular ways people tend to assume. In particular, it's far from obvious (and depends on contingent theories about consciousness and cognition) that you need human-style language or abstraction in order to have 'rich' experience that just has a lot of morally important stuff going on. A blank map doesn't correspond to a blank territory; it corresponds to a thing we know very little about. (For similar reasons, I think EAs in general worry far too little about whether chickens and other animals are utility monsters — this seems like a very live hypothesis to me, whether factory-farmed chickens have net-positive lives or net-negative o

Hello, I am curious about your most basic preferences. I would like you to state them as many as possible. 

You can use relation signs (<, >, ...), E. g.: true knowledge > ignorance > false knowledge, which would mean that you prefer true knowledge over ignorance, and ignorance over false knowledge.

Specific, replicable actions that lead to predictable, desired results > specific, replicable actions that lead to unpredictable, desired results 

(with the understanding that you may need to grind unpredictability for a while until you get what you need to consistently achieve predictability)

(example being "building freelance career" vs. "maintaining freelance career") 

(actions that lead to undesired results aren't even on the table for consideration, of course)

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Subscribe to Curated posts
Log In Reset Password
...or continue with

I've previously argued that the concept of counterfactuals can only be understood from within the counterfactual perspective.

I will be awarding a $1000 prize for the best post that engages with the idea that counterfactuals may be circular in this sense. The winning entry may be one of the following (these categories aren't intended to be exclusive):

a) A post that attempts to draw out the consequences of this principle for decision theory

b) A post that attempts to evaluate the arguments for and against adopting the principle that counterfactuals only make sense from within the counterfactual perspective

c) A review of relevant literature in philosophy or decision theory

d) A post that states already existing ideas in a clearer manner (I don't think this topic has been explored much on LW,...

2Chris_Leong1hPhenomenal experience with external reality.
1Samuel Shadrach39mAlso yeah sorry if I'm taking this convo on a different tangent, I don't see anything more to directly add on the topic of counterfactuals. Feel free to end convo if you feel like.
1Samuel Shadrach40mWhat if your phenomenal experience doesn't match external reality, which one decides truth? [Say you're experiencing phantom limb pain, is it a true statement that "you are in pain"]

Phenomenal experience is technically a subset of reality.

LessWrong isn't exactly founded on the map-territory model of truth, but it's definitely pretty core to the LessWrong worldview. The map-territory model implies a correspondence theory of truth. But I'd like to convince you that the map-territory model creates confusion and that the correspondence theory of truth, while appealing, makes unnecessary claims that infect your thinking with extraneous metaphysical assumptions. Instead we can see what's appealing about the map-territory metaphor but drop most of it in favor of a more nuanced and less confused model of how we know about the world.

The map-territory metaphor goes something like this: a map is a representation of some part of the world—a territory. The mind that models the world via thoughts can be said to create a map of the...

I would surmise that we don't disagree about anything except what the term "view from nowhere" means. And I don't really know what "view from nowhere" means anyway, I was just guessing.

The larger context was: I think there's a universe, and that I live in it, and that claims about the universe can be true or false independently of what I or any other creature know and believe. And then (IIUC) G Gordon was saying that this perspective is wrong or incomplete or something, and in fact I'm missing out on insights related to AI alignment by having this perspect... (read more)

1tomcatfish1hI think... (correct me if I'm wrong, trying to check myself here as well as responding) If you thought that "The snow is white" was true, but it turns out that the snow is, in fact, red, then your statement was false. In the anticipation-prediction model, "The snow is white" (appears) to look more like "I will find 'The snow is white' true to my perceptions", and it is therefore still true.
1tomcatfish1hWhy would the universe need to exist within the universe in order for it to exist? In the GOL example, why would the whole N∗N bits have to be visible to some particular bit in order for them to exist?
1tailcalled1hThe bits exist but the view of the bits don't exist. The map is not the territory.

What are the most useful practical applications of Bayesian thinking that don't require the person to understand the math?

I think the unit square can be used to visualize this. 

https://www.researchgate.net/figure/The-tree-diagram-and-the-unit-square-with-natural-frequencies_fig1_307569151

1Answer by kithpendragon1hIn this [https://www.youtube.com/watch?v=7GgLSnQ48os] video, Matt Parker and Hannah Fry perform a thought experiment from Bayes's original notes that uses no math at all. After that, they overlay a probability distribution on the experiment and show the certainty increasing, all without worrying about the math. Finally, they show the equation briefly near the end of the video. (Turns out Bayes didn't actually work out the math himself, anyway; Laplace did that work.) They don't really go into the math at all, but rather discuss the idea of updating beliefs based on new information. The whole video is about 13 minutes long (+ end of video stuff), but most of that is the experiment itself (which they do in real time).