Wiki Contributions


Oh, excellent!

It's a little hard to tell from the lack of docs, but you're modelling dilemmas with Bayesian networks? I considered that, but wasn't sure how to express Sleeping Beauty nicely, whereas it's easy to express (and gives the right answers) in my tree-shaped dilemmas. Have you tried to express Sleeping Beauty?

And have you tried to express a dilemma like smoking lesion where the action that an agent takes is not the action their decision theory tells them to take? My guess is that this would be as easy as having a chain of two probabilistic events, where the first one is what the decision theory says to do and the second one is what the agent actually does, but I don't see any of this kind of dilemma in your test cases.

I have a healthy fear of death; it's just that none of it stems from an "unobserved endless void". Some of the specific things I fear are:

  • Being stabbed is painful and scary (it's scary even if you know you're going to live)
  • Most forms of dying are painful, and often very slow
  • The people I love mourning my loss
  • My partner not having my support
  • Future life experiences, not happening
  • All of the things I want to accomplish, not happening

The point I was making in this thread was that "unobserved endless void" is not on this list, I don't know how to picture it, and I'm surprised that other people think it's a big deal.

Who knows, maybe if I come close to dying some time I'll suddenly gain a new ontological category of thing to be scared of.

What's the utility function of the predictor? Is there necessarily a utility function for the predictor such that the predictor's behavior (which is arbitrary) corresponds to maximizing its own utility? (Perhaps this is mentioned in the paper, which I'll look at.)

EDIT: do you mean to reduce a 2-player game to a single-agent decision problem, instead of vice-versa?

I was not aware of Everitt, Leike & Hutter 2015, thank you for the reference! I only delved into decision theory a few weeks ago, so I haven't read that much yet.

Would you say that this is similar to the connection that exists between fixed points and Nash equilibria?

Nash equilibria come from the fact that your action depends on your opponent's action, which depends on your action. When you assume that each player will greedily change their action if it improves their utility, the Nash equilibria are the fixpoints at which no player changes their action.

In single-agent decision theory problems, your (best) action depends on the situation you're in, which depends on what someone predicted your action would be, which (effectively) depends on your action.

If there's a deeper connection than this, I don't know it. There's a fundamental difference between the two cases, I think, because a Nash equilibrium involves multiple agents that don't know each others' decision process (problem statement: maximize the outputs of two functions independently), while single-agent decision theory involves just one agent (problem statement: maximize the output of one function).

My solution, which assumes computation is expensive

Ah, so I'm interested in normative decision theory: how one should ideally behave to maximize their own utility. This is what e.g. UDT&FDT are aiming for. (Keep in mind that "your own utility" can, and should, often include other people's utility too.)

Minimizing runtime is not at all a goal. I think the runtime of the decision theories I implemented is something like doubly exponential in the number of steps of the simulation (the number of events in the simulation is exponential in its duration; each decision typically involves running the simulation using a trivial decision theory).

reason about other agents based on their behavior towards a simplified-model third agent

That's an interesting approach I hadn't considered. While I don't care about efficiency in the "how fast does it run" sense, I do care about efficiency in the "does it terminate" sense, and that approach has the advantage of terminating.

Defect against bots who defect against cooperate-bot, otherwise cooperate

You're doing to defect against UDT/FDT then. They defect against cooperate-bot. You're thinking it's bad to defect against cooperate-bot, because you have empathy for the other person. But I suspect you didn't account for that empathy in your utility function in the payoff matrix, and that if you do, you'll find that you're not actually in a prisoner's dilemma in the game-theory sense. There was a good SlateStarCodex post about this that I can't find.

Yeah, exactly. For example, if humans had a convention of rounding probabilities to the nearest 10% when writing them, then baseline GPT-4 would follow that convention and it would put a cap on the maximum calibration it could achieve. Humans are badly calibrated (right?) and baseline GPT-4 is mimicking humans, so why is it well calibrated? It doesn't follow from its token stream being well calibrated relative to text.

I like the idea of Peacemakers. I even had the same idea myself---to make an explicitly semi-cooperative game with a goal of maximizing your own score but every player having a different scoring mechanism---but haven't done anything with it.

That said, I think you're underestimating how much cooperation there is in a zero-sum game.

If you offer a deal, you must be doing it because it increases your chance of winning, but only one person can win under the MostPointsWins rule, so that deal couldn’t be very good for me, and I’ll always suspect your deal of being a trick, so in most cases no detailed deals will be offered.

Three examples of cooperation that occur in three-player Settlers of Catan (between, say, Alice, Bob, and Carol), even if all players are trying only to maximize their own chance of winning:

  • Trading. Trading increases the chances that the two trading players win, to the detriment of the third. As long as there's sufficient uncertainty about who's winning, you want to trade. (There's a world Catan competition. I bet that these truly competitive games involve less trading than you would do with your friends, but still a lot. Not sure how to find out.)
  • Refusing to trade with the winning player, once it's clear who that is. If Alice is ahead then Bob and Carol are in a prisoner's dilemma, where trading with Alice is defecting.
  • Alice says at the beginning of the game: "Hey Bob, it sure looks like Carol has the strongest starting position, doesn't it? Wouldn't be very fair if she won just because of that. How about we team up against her by agreeing now to never trade with her for the entire game?" If Bob agrees, than the winning probabilities of Alice, Bob, Carol go from (say) 20%,20%,60% to 45%,45%,10%. Cooperation!

So it's not that zero-sum games lack opportunities for cooperation, it's just that every opportunity for cooperation with another player is at the detriment to a third. Which is why there isn't any cooperation at all in a two player zero-sum game.

Realize that even in a positive-sum game, players are going to be choosing between doing things for the betterment of everyone, and doing things for the betterment of themselves, and maximizing your own score involves doing more of the latter than the former, ideally while convincing everyone else that you're being more than fair.

Suggestion for the game: don't say the goal is to maximize your score. Instead say you're roleplaying a character who's goal is to maximize [whatever]. For a few reasons:

  • It makes every game (more) independent of every other game. This reduces the possibility that Alice sabotages Bob in their second game together because Bob was a dick in their first game together. The goal is to have interesting negotiations, not to ruin friendships.
  • It encourages exploration. You can try certain negotiating tactics in one game, and then abandon them in the next, and the fact that you were "roleplaying" will hopefully reduce how much people associate those tactics with you instead of that one time you played.
  • It could lighten the mood. You should try really hard to lighten the mood. Because you know what else is a semi-cooperative game that's heavy on negotiation? Diplomacy.

Expanding on this, there are several programming languages (Idris, Coq, etc.) whose type system ensures that every program that type checks will halt when it's run. One way to view a type system is as an automated search for a proof that your program is well-typed (and a type error is a counter-example). In a language like Idris or Coq, a program being well-typed implies that it halts. So machine generated proofs that programs halt aren't just theoretically possible, they're used extensively by some languages.

I too gathered people's varied definitions of consciousness for amusement, though I gathered them from the Orange Site:

[The] ability to adapt to environment and select good actions depending on situation, learning from reward/loss signals.


Consciousness is the ability of an organism to predict the future

The problem is that we want to describe consciousness as "that thing that allows an organism to describe consciousness as 'that thing that allows an organism to describe consciousness as ´that thing that allows an organism to describe consciousness as [...]´'"

To me consciousness is the ability to re-engineer our existing models of the world based on new incoming data.

The issue presented at the beginning of the article is (as most philosophical issues are) one of semantics. Philosophers as I understand it use "consciousness" as the quality shared by things that are able to have experiences. A rock gets wet by the rain, but humans "feel" wet when it rains. A bat might not self-reflect but it feels /something/ when it uses echo-location.

On the other hand, conciseness in our everyday use of the term is very tied to the idea of attention and awareness, i.e. a "conscious action" or an "unconscious motivation". This is a very Freudian concept, that there are thoughts we think and others that lay behind.


Start with the definition: A conscious being is one which is conscious of itself.

You could probably use few more specific words to a greater effect. Such as self-model, world model, memory, information processing, directed action, responsiveness. Consciousness is a bit too underdefined a word. It is probably not as much of a whole as a tree or human as an organism is - it is not even persistent nor stable - and leaves no persistent traces in the world.

"The only thing we know about consciousness is that it is soluble in chloroform" ---Luca Turin


It's a clever scheme, but you can accomplish nearly the same thing by having a physical shutdown switch for the server room and giving a key to N people, and the shutdown switch was never the weak part. Here are some ways to escape your encryption, if the AI can interact with the outside world:

  • Exploiting a hardware or software vulnerability. There are a lot of these. No one noticed a vulnerability that's been in the spec for the CPUs everyone uses for decades.
  • Convincing one person to share it's source code with people that won't bother to run it in FHE
  • Convincing everyone that it's benevolent and helpful beyond our wildest dreams, until we use it to run the world, then doing whatever it wants
  • Successfully threatening m of the key holders, and also the utility company that's keeping the power on, and also whoever owns the server room
  • Something something nanobots
  • Convincing a rival company to unethically steal its source code
Load More