Decision Theory: Newcomb's Problem


I enjoyed this post, both for its satire of a bunch of peoples' thinking styles (including mine, at times), and because IMO (and in the author's opinion, I think), there are some valid points near here and it's a bit tricky to know which parts of the "jokes/poetry" may have valid analogs.

I appreciate the author for writing it, because IMO we have a whole bunch of different subcultures and styles of conversation and sets of assumptions colliding all of a sudden on the internet right now around AI risk, and noticing the existence of the others seems useful, and IMO the OP is an attempt to collide LW with some other styles.  Judging from the comments it seems to me not to have succeeded all that much; but it was helpful to me, and I appreciate the effort.  (Though, as a tactical note, it seems to me the approximate failure was due mostly to piece's the sarcasm, and I suspect sarcasm in general tends not to work well across cultural or inferential distances.)

Some points I consider valid, that also appear within [the vibes-based reasoning the OP is trying to satirize, and also to model and engage with]:

1) Sometimes, talking a lot about a very specific fear can bring about the feared scenario.  (An example I'm sure of: a friend's toddler stuck her hands in soap.  My friend said "don't touch your eyes."  The toddler, unclear on the word 'not,' touched her eyes.) (A possible example I'm less confident in: articulated fears of AI risk may have accelerated AI because humanity's collective attentional flows, like toddlers, has no reasonable implementation of the word "not.")  This may be a thing to watch out for for an AI risk movement.

(I think this is non-randomly reflected in statements like: "worrying has bad vibes.")

2) There's a lot of funny ways that attempting to control people or social processes can backfire.  (Example: lots of people don't like it when they feel like something is trying to control them.)  (Example: the prohibition of alcohol in the US between 1917-1933 is said to have fueled organized crime.)  (Example I'm less confident of:  Trying to keep e.g. anti-vax views out of public discourse leads some to be paranoid, untrusting of establishment writing on the subject.)This is a thing that may make trouble for some safety strategies, and that seems to me to be non-randomly reflected in "trying to control things has bad vibes."

(Though, all things considered, I still favor trying to slow things!  And I care about trying to slow things.)

3) There're a lot of places where different schelling equilibria are available, and where groups can, should, and do try to pick the equilibrium that is better.  In many cases this is done with vibes.  Vibes, positivity, attending to what is or isn't cool or authentic (vs boring), etc., are part of how people decide which company to congregate on, which subculture to bring to life, which approach to AI to do research within, etc.  -- and this is partly doing some real work discerning what can become intellectually vibrant (vs boring, lifeless, dissociated).

TBC, I would not want to use vibes-based reasoning in place of reasoning, and I would not want LW to accept vibes in place of reasons.   I would want some/many in LW to learn to model vibes-based reasoning for the sake of understanding the social processes around us.  I would also want some/many at LW to sometimes, if the rate of results pans out in a given domain, use something like vibes-based reasoning as a source of hypotheses that one can check against actual reasoning.  LW seems to me pretty solid on reasoning relative to other places I know on the internet, but only mediocre on generativity; I think learning to absorb hypotheses from varied subcultures (and from varied old books, from people who thought at other times and places) would probably help, and the OP is gesturing at one such subculture.

I'm posting this comment because I didn't want to post this comment for fear of being written off by LW, and I'm trying to come out of more closets.  Kinda at random, since I've spent large months or small years failing to successfully implement some sort of more planned approach.

The public early Covid-19 conversation (in like Feb-April 2020) seemed pretty hopeful to me -- decent arguments, slow but asymmetrically correct updating on some of those arguments, etc.  Later everything became politicized and stupid re: covid.

Right now I think there's some opportunity for real conversation re: AI.  I don't know what useful thing follows from that, but I do think it may not last, and that it's pretty cool.  I care more about the "an opening for real conversation" thing than for the changing overton window as such, although I think the former probably follows from the latter (first encounters are often more real somehow).

This seems like a very off-distribution move from Eliezer—which I suspect is in large part the point: when your model predicts doom by default, you go off-distribution in search of higher-variance regions of outcome space.

That's not how I read it.  To me it's an attempt at the simple, obvious strategy of telling people ~all the truth he can about a subject they care a lot about and where he and they have common interests.  This doesn't seem like an attempt to be clever or explore high-variance tails.  More like an attempt to explore the obvious strategy, or to follow the obvious bits of common-sense ethics, now that lots of allegedly clever 4-dimensional chess has turned out stupid.

Thanks for the suggestion.  I haven't read it.  I'd thought from hearsay that it is rather lacking in "light" -- a bunch of people who're kinda bored and can't remember the meaning of life -- is that true?  Could be worth it anyway.

Not sure where you're going with this.  It seems to me that political methods (such as petitions, public pressure, threat of legislation) can be used to restrain the actions of large/mainstream companies, and that training models one or two OOM larger than GPT4 will be quite expensive and may well be done mostly or exclusively within large companies of the sort that can be restrained in this sort of way.

Maybe also: anything that bears on how an LLM, if it realizes it is not human and is among aliens in some sense, might want to relate morally to thingies that created it and aren't it.  (I'm not immediately thinking of any good books/similar that bear on this, but there probably are some.)

I was figuring GPT4 was already trained on a sizable fraction of the internet, and GPT5 would be trained on basically all the text (plus maybe some not-text, not sure).  Is this wrong?

In terms of what kinds of things might be helpful:

1. Object-level stuff:

Things that help illuminate core components of ethics, such as "what is consciousness," "what is love," "what is up in human beings with the things we call 'values', that seem to have some thingies in common with beliefs," "how exactly did evolution end up producing the thing where we care about stuff and find some things worth caring about," etc.

Some books I kinda like in this space: 

  • Martin Buber's book "I and thou"; 
  • Christopher Alexander's writing, especially his "The Nature of Order" books
  • The Tao Te Ching (though this one I assume is thoroughly in any huge training corpus already)
  • (curious for y'all's suggestions)

2.  Stuff that aids processes for eliciting peoples' values, or for letting people elicit each others' values:

My thought here is that there're dialogs between different people, and between people and LLMs, on what matters and how we can tell.  Conversational methodologies for helping these dialogs go better seem maybe-helpful.  E.g. active listening stuff, or circling, or Gendlin's Focusing stuff, or ... [not sure what -- theory of how these sorts of fusions and dialogs can ever work, what they are, tips for how to do them in practice, ...]

3.  Especially, maybe: stuff that may help locate "attractor states" such that an AI, or a network of humans and near-human-level AIs, might, if it gets near this attractor state, choose to stay in this attractor state.  And such that the attractor state has something to do with creating good futures.

  • Confucius (? I haven't read him, but he at least shaped for society for a long time in a way that was partly about respecting and not killing your ancestors?)
  • Hayek (he has an idea of "natural law" as sort of how you have to structure minds and economies of minds if you want to be able to choose at all, rather than e.g. making random mouth motions that cause random other things to happen that have nothing to do with your intent really, like what would happen if a monarch says "I want to abolish poverty" and then people try to "implement" his "decree").

It may not be possible to prevent GPT4-sized models, but it probably is possible to prevent GPT-5-sized models, if the large companies sign on and don't want it to be public knowledge that they did it.  Right?

Load More