Recent Discussion

The standard formulation of Newcomb's problem has always bothered me, because it seemed like a weird hypothetical designed to make people give the wrong answer. When I first saw it, my immediate response was that I would two-box, because really, I just don't believe in this "perfect predictor" Omega. And while it may be true that Newcomblike problems are the norm, most real situations are not so clear cut. It can be quite hard to demonstrate why causal decision theory is inadequate, let alone build up an intuition about it. In fact, the closest I've seen to a real-worl... (Read more)

I don't know if these comments will be helpful or even pertinent to the underlying effort related to posing and answering these types of problems. I do have a "why care" type of reaction to both the standard Newcomb's Paradox/Problem and the above formulation. I think that is because I fail to see how either really relates to anything I have to deal with in my life so seem to be "solutions in search of a problem". That could just be me though....

I do notice, for me at least, a subtle difference in the two settings. Newcomb see... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

2Dagon16h In fact, current lie detector technology isn't that good - it relies on a repetitive and careful mix of calibration and test questions, and even then isn't reliable enough for most real-world uses. The original ambiguity remains that the problem is underspecified: why do I believe that it's accuracy for other people (probably mostly psych students) applies to my actions?


Aeon: Post-Empirical Science Is An Oxymoron And It is Dangerous:

There is no agreed criterion to distinguish science from pseudoscience, or just plain ordinary bullshit, opening the door to all manner of metaphysics masquerading as science. This is ‘post-empirical’ science, where truth no longer matters, and it is potentially very dangerous.

It’s not difficult to find recent examples. On 8 June 2019, the front cover of New Scientist magazine boldly declared that we’re ‘Inside the Mirrorverse’. Its editors bid us ‘Welcome to the parallel reality that’s hiding in plain sight’. […]

[Some physicis

... (Read more)
2Vanessa Kosoy15h Well, I am a "semi-instrumentalist [] ": I don't think it is meaningful to ask what reality "really is" except for the projection of the reality on the "normative ontology".
1TAG2h But you still don't have an apriori guarantee that a computable model will succeed--that doesn't follow from the claim that the human mind operated within computable limits. You could be facing evidence that all computable models must fail, in which case you should adopt a negative belief about physical/naturalism, even if you don't adopt a positive belief in some supernatural model.

Well, you don't have a guarantee that a computable model will succeed, but you do have some kind of guarantee that you're doing your best, because computable models is all you have. If you're using incomplete/fuzzy models, you can have a "doesn't know anything" model in your prior, which is a sort of "negative belief about physical/naturalism", but it is still within the same "quasi-Bayesian" framework.

I remember seeing a talk by a synthetic biologist, almost a decade ago. The biologist used a genetic algorithm to evolve an electronic circuit, something like this:


He then printed out the evolved circuit, brought it to his colleague in the electrical engineering department, and asked the engineer to analyze the circuit and figure out what it did.

“I refuse to analyze this circuit,” the colleague replied, “because it was not designed to be understandable by humans.” He has a point - that circuit is a big, opaque mess.

This, the biologist argued, is the root pro... (Read more)

This post really shocked me with the level of principle that apparently can be found in such systems.

If you're interested in this theme, I recommend reading up on convergent evolution, which I find really fascinating. Here's Dawkins in The Blind Watchmaker:

The primitive mammals that happened to be around in the three areas [of Australia, South America and the Old World] when the dinosaurs more or less simultaneously vacated the great life trades, were all rather small and insignificant, probably nocturnal, previously overshadowed and overpowered
... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post
[Event]Hesinki Slate Star Codex Readers November Meetup
1Nov 26thMannerheimintie 5, HelsinkiShow Highlight

See for more information.

If you don't want to bother with registration, that's fine, you are welcome to show up just because you're reading this post! Here is all the info you need:

Place: Restaurant Dubliner, Kaivopiha, Helsinki (note that the map pin is slightly wrong, click here for the exact location)
Time: Tuesday 2019-11-26, 18:00 onwards
How to recognize the group: We will be in the area called "Bushmills Corner". There will be a small wooden parrot at the table

2Vanessa Kosoy2h Its utility function would have to say that all conscious AIs should run on Intel processors. There is self-reference there. But, I only have rather low confidence this idea is correct (what being correct means here) or important.
2Vanessa Kosoy2h The point is, if you put this "quasi-Bayesian" agent into an iterated Newcomb-like problem, it will learn to get the maximal reward (i.e. the reward associated with FDT). So, if you're judging it from the side, you will have to concede it behaves rationally, regardless of its internal representation of reality. Philosophically, my point of view is, it is an error to think that counterfactuals have objective, observer-independent, meaning. Instead, we can talk about some sort of consistency conditions between the different points of view. From the agent's point of view, it would reach Nirvana if it dodged the predictor. From Omega's point of view, if Omega two-boxed and the agent one-boxed, the agent's reward would be zero (and the agent would learn its beliefs were wrong). From a third-person point of view, the counterfactual "Omega makes an error of prediction" is ill-defined, it's conditioning on an event of probability 0.
1Linda Linsefors2h I agree that you can assign what ever belief you want (e.g. what ever is useful for the agents decision making proses) for for what happens in the counterfactual when omega is wrong, in decision problems where Omega is assumed to be a perfect predictor. However if you want to generalise to cases where Omega is an imperfect predictor (as you do mention), then I think you will (in general) have to put in the correct reward for Omega being wrong, becasue this is something that might actually be observed.

The method should work for imperfect predictors as well. In the simplest case, the agent can model the imperfect predictor as perfect predictor + random noise. So, it definitely knows the correct reward for Omega being wrong. It still believes in Nirvana if "idealized Omega" is wrong.

  • If it’s worth saying, but not worth its own post, here's a place to put it.
  • And, if you are new to LessWrong, here's the place to introduce yourself.
    • Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are welcome.

If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ.

The Open Thread sequence is here.

2John_Maxwell13h I'm not sure I'm familiar with the word "mixture" in the way you're using it.

I mean a weighted sum where weights add to unity.

Mosquito Net Fishing
92h1 min readShow Highlight

I recently saw a study claiming:

Distributed mosquito nets are intended to be used for malaria protection, yet increasing evidence suggests that fishing is a primary use for these nets, providing fresh concerns for already stressed coastal ecosystems.
  —The perverse fisheries consequences of mosquito net malaria prophylaxis in East Africa (Jones and Unsworth, 2019)

Mosquito nets are harmful fishing tools because (a) they're insecticide-treated and (b) with such small holes you catch a lot of immature fish before they've had a chance to reproduce. But how harmful this pract... (Read more)

Contrast these two expressions (hideously mashing C++ and pseudo-code):

  1. ,
  2. .

The first expression just selects the action that maximises for some function , intended to be seen as a reward function.

The second expression borrows from the syntax of C++; means the memory address of , while means the object at the memory address of . How is that different from itself? Well, it's meant to emphasise the ease of the agent wireheading in that scenario: all it has to do is overwrite whatever is written at memory location . Then

... (Read more)

Will blackboxing the reward function help, either physically or cryptographically? It also should include the obscurity about the boundary between the BB and internal computations in AI, that is, the AI will not know which data actually trigger the BB reaction.

This is how human reward function seems to work. It is well protected from internal hacking: if I imagine that I got 100 USD, it will not create as much pleasure as in the situation when I am actually getting 100. When I send mental image of 100 USD into the my reward box, the box "knows"... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

3steve215216h Another aspect / method is, let's call it, value hysteresis. If there are two functions which ambiguously both agree with the reward, then it's possible that the agent will come across one interpretation first, and then (motivated by that first goal) resist adopting the other interpretation. Like how drugs and family both give us dopamine, but if we start by caring about our family, we may shutter at the thought of abandoning the family life for drugs, and vice versa! So maybe we need to have the target interpretation salient and simple to learn early in training (for the kind of introspective agent to which this consideration applies)? (Doesn't seem all that reliable as a safety measure, but maybe worth keeping in mind..)

In his AI Safety “Success Stories” post, Wei Dai writes:

[This] comparison table makes Research Assistant seem a particularly attractive scenario to aim for, as a stepping stone to a more definitive success story. Is this conclusion actually justified?

I share Wei Dai's intuition that the Research Assistant path is neglected, and I want to better understand the safety problems involved in this path.

Specifically, I'm envisioning AI research assistants, built without any kind of reinforcement learning, that help AI alignment researchers identify, understand, and solve AI alignment problems. S

... (Read more)
2John_Maxwell13h Do you have any thoughts on how specifically those failure modes might come about?

Those specific failure modes seem to me like potential convergent instrumental goals of arbitrarily capable systems that "want to affect the world" and are in an air-gapped computer.

I'm not sure whether you're asking about my thoughts on:

  1. how can '(un)supervised learning at arbitrarily large scale' produce such systems; or

  2. conditioned on such systems existing, why might they have convergent instrumental goals that look like those failure modes.

Pieces of time
342d2 min readShow Highlight

My friend used to have two ‘days’ each day, with a nap between—in the afternoon, he would get up and plan his day with optimism, whatever happened a few hours before washed away. Another friend recently suggested to me thinking of the whole of your life as one long day, with death on the agenda very late this evening. I used to worry, when I was very young, that if I didn’t sleep, I would get stuck in yesterday forever, while everyone else moved on to the new day. Right now, indeed some people have moved on to Monday, but I’m still winding down Sunday because I had a bad headache and couldn’t ... (Read more)

Oh yeah, I think I get something similar when my sleep schedule gets very out of whack, or for some reason when I moved into my new house in January, though it went back to normal with time. (Potentially relevant features there: bedroom didn't seem very separated from common areas, at first was sleeping on a pile of yoga mats instead of a bed, didn't get out much.)

3ESRogs16h What about traveling in the Midwest gives you this feeling? Is it the travel? Is it the Midwest itself? Is it that you're in a non-urban part of the Midwest, but you're used to the hustle and bustle of a city?
[Event]SSC Madison: Neuroscience
1Nov 16thMadisonShow Highlight

Vegan food and discussion of SSC posts on Neuroscience

If you log into your credit card account you'll see a list of charges, each with a date, amount, and merchant. It would be helpful if this also included receipt data:

  • If you didn't recognize a charge, seeing what it was for could remind you.

  • If you needed a receipt for taxes or reimbursement one could be captured automatically.

  • Personal finance tools (or corporate equivalents for company cards) could track spending with higher granularity.

  • Because the credit card company knows what the items are they can better detect fraud.

Receipt data isn't currently part of the protoc... (Read more)

I work in the area. In the EMV specification the receipt content is already saved electronically and is somewhat standardized, see, for example What is missing is for the consumers and point-of-sale owners to be able to access it easily. The receipt does not identify the product sold, by the way, but enough details to verify the transaction's occurrence.

Of course, if your chipped card is stolen and pinless tap is supported for small purchases, no transactio... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

I am pondering a hypothetical scenario that I think is fascinating but quite unrealistic and involves knowledge across a wide variety of fields, of which IMO physics gets the better part.

I'm considering some sites that I know. Reddit has a sub called r/AskScienceDiscussion but this sub is not very warm to this type of query. Quora has degraded so much and doesn't even have the option to expand the subject over a length of mere 150 chars or so, which is utterly ridiculous. I'm not sure about Stackexchange - should I post in their physics site? LessWrong boasts that people can ask... (Read more)

Oh, that very last sentence is something I didn't think about. I also discovered worldbuilding very recently, looks promising too. Thanks!

Full title: Is the Orthogonality Thesis Defensible if We Assume Both Valence Realism and Open Individualism? (a)

An excerpt:

The cleanest typology for metaphysics I can offer is: some theories focus on computations as the thing that’s ‘real’, the thing that ethically matters – we should pay attention to what the *bits* are doing. Others focus on physical states – we should pay attention to what the *atoms* are doing. I
... (Read more)
Levers error
205d2 min readShow Highlight

Anna writes about bucket errors. To gloss the idea: sometimes two facts are mentally tracked by only one variable; in that case, correctly updating the belief about one fact can also incorrectly update the belief about the other fact, so it is sometimes epistemic to flinch away from the truth of the first fact (until you can create more variables to track the facts separately).

I think there's a sort of conjugate error: two actions are bound together in one "lever". An action is a class of motor outputs, and a lever is a thing actually available to the mind to decide to do or not.

For example, I

... (Read more)

This seems quite right to me, that in our minds things are often confused and conflated that don't need to be and as a result we act in ways that aren't what we think should be possible and it feels like doing what we really want is impossible because in our minds we don't know how to separate the thing we want from the thing we don't want. One possible way to deal with these sorts of problems that I've been excited about lately as a good framing for the mechanism that underlies the processes that clear these sorts of confusions is... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

133d1 min readShow Highlight

Some things can be described only via experience.

  • Direct sensory experience (such as the color red)
  • Foreign untranslatable words and phrases
  • Rasas
  • Certain meditative states (such as kenshō and satori)

Other things cannot be precisely described at all.

  • Any particular noncomputable number

Indescribable things cannot be described in a finite number of words. That's because each one contains an infinite quantity of information. I don't mean they convey this information all at once (except for noncomputable numbers). Rather, they open up a new channel of information.

Opening up a new channel of

... (Read more)

I guess this is a matter of opinion on how much explanation makes something "untranslatable". For example, maybe it takes 1000 words to give enough context to adequately convey the meaning of a word with a very precise meaning in another language. Is this word "translatable"? In a certain sense no, because making sense of it required giving the person a lot of new context that they didn't have before such that they could make sense of it that was beyond simple reference to existing concepts they had. Obviously the other end of the ... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

The Technique Taboo
3614d1 min readShow Highlight

For a strange few decades that may just be starting to end, if you went to art school you'd be ostracised by your teachers for trying to draw good representational art. "Representational art" means pictures that look like real things. Art school actively discouraged students from getting better at drawing.

"Getting better at drawing" is off-topic at my weekly local drawing club too. I've literally never heard it discussed.

This taboo extends far beyond art. My nearest gym forbids weightlifters from using electronic systems to log their progress. I'm friends with programmers who can't touch type.

... (Read more)

You are mixing up two topics. Separating them does not provide any immediate clarity, but it's important to separate them. One topic is keeping records, observing progress, and trying to do better. The weightlifters record objective performance. Math students try to see if they can do an exercise. Practice helps, but the feedback of performance doesn't say how to improve. The other topic, evoked in my mind by the word "technique," is breaking down a big skill into small skills. Biology students learn the specific technique of how to use... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

Rohin Shah on reasons for AI optimismΩ
4013d1 min readΩ 9Show Highlight
Rohin Shah

I along with several AI Impacts researchers recently talked to talked to Rohin Shah about why he is relatively optimistic about AI systems being developed safely. Rohin Shah is a 5th year PhD student at the Center for Human-Compatible AI (CHAI) at Berkeley, and a prominent member of the Effective Altruism community.

Rohin reported an unusually large (90%) chance that AI systems will be safe without additional intervention. His optimism was largely based on his belief that AI development will be relatively gradual and AI researchers will correct safety issues that come up.

He reported ... (Read more)

4ricraz14h I predict that Rohin would say something like "the phrase 'approximately optimal for some objective/utility function' is basically meaningless in this context, because for any behaviour, there's some function which it's maximising". You might then limit yourself to the set of functions that defines tasks that are interesting or relevant to humans. But then that includes a whole bunch of functions which define safe bounded behaviour as well as a whole bunch which define unsafe unbounded behaviour, and we're back to being very uncertain about which case we'll end up in.

That would probably be part of my response, but I think I'm also considering a different argument.

The thing that I was arguing against was "(c): agents that we build are optimizing some objective function". This is importantly different from "mesa-optimisers [would] end up being approximately optimal for some objective/utility function" when you consider distributional shift.

It seems plausible that the agent could look like it is "trying to achieve" some simple utility function, and perhaps it would even be approximately ... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

The Credit Assignment ProblemΩ
506d9 min readΩ 19Show Highlight

This post is eventually about partial agency. However, it's been a somewhat tricky point for me to convey; I take the long route. Epistemic status: slightly crazy.

I've occasionally said that everything boils down to credit assignment problems.

One big area which is "basically credit assignment" is mechanism design. Mechanism design is largely about splitting gains from trade in a way which rewards cooperative behavior and punishes uncooperative behavior. Many problems are partly about mechanism design:

  • Building functional organizations;
  • Designing markets to solve problems (suc
... (Read more)

(I don't speak for Abram but I wanted to explain my own opinion.) Decision theory asks, given certain beliefs an agent has, what is the rational action for em to take. But, what are these "beliefs"? Different frameworks have different answers for that. For example, in CDT a belief is a causal diagram. In EDT a belief is a joint distribution over actions and outcomes. In UDT a belief might be something like a Turing machine (inside the execution of which the agent is supposed to look for copies of emself). Learning theory allows us to gain insight through t

... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post
What I’ll be doing at MIRIΩ
7616h1 min readΩ 28Show Highlight

Note: This is a personal post describing my own plans, not a post with actual research content.

Having finished my internship working with Paul Christiano and others at OpenAI, I’ll be moving to doing research at MIRI. I’ve decided to do research at MIRI because I believe MIRI will be the easiest, most convenient place for me to continue doing research in the near future. That being said, there are a couple of particular aspects of what I’ll be doing at MIRI that I think are worth being explicit about.

First, and most importantly, this decision does not represent any substantive change in my bel

... (Read more)

That you're working full time on research, have a stable salary, and are in a geographical location conducive to talking with a lot of other thoughtful people who think a lot about these topics, are all very valuable things, and I'm pleased to hear these things are happening for you :-)

On the subject of privacy, I was recently reading a friend's career plan, who was looking for jobs in AI alignment, and I wrote this:

Do not accept secrets lightly. If you accept one wrong secret, you will go the way of MIRI or Leverage or US government officials with a secur

... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post
Load More