It was partially to demonstrate that bad Nash equilibria even affect common-payoff games, there don't even need to be dynamics of some agents singling out other agents to reward and punish.
It wasn't just that, it was also based on thinking I had more control over other people than I realistically had. Probably it is partly latent personality factors. But a heroic responsibility mindset will tend to cause people to think other people's actions are their fault if they could, potentially, have affected them through any sort of psychological manipulation (see also, Against Responsibility).
I think I thought I was working on AI risk but wasn't taking heroic responsibility because I wasn't owning the whole problem. People around me encouraged me to...
In the round after the round where the 30 applies, the Shelling temperature for the next round increases to 100, and it's a Nash equilibrium for everyone to always select the Schelling temperature.
You can claim this is an unrealistic Nash equilibrium but I am pretty sure that unilateral deviation from the Schelling temperature, assuming everyone else always plays the Schelling temperature, never works out in anyone's favor.
If a mathematical model doesn't reflect at all the thing it's supposed to represent, it's not a good model. Saying "this is what the model predicts" isn't helpful.
There is absolutely zero incentive to anyone to put the temperature to 100 at any time. Even as deterrence, there is no reason for the equilibrium temperature to be an unsurvivable 99. It makes no sense, no one gains anything from it, especially if we assume communication between the parties (which is required for there to be deterrence and other such mechanisms in place). There is no reason to p...
Formally, it's an arbitrary strategy profile that happens to be a Nash equilibrium, since if everyone else plays it, they'll punish if you deviate from it unilaterally.
In terms of more realistic scenarios there are some examples of bad "punishing non punishers" equilibria that people have difficulty escaping. E.g. an equilibrium with honor killings, where parents kill their own children partly because they expect to be punished if they don't. Rober Trivers, an evolutionary psychologist, has studied these equilibria, as they are anomalous from an evolutionary psychology perspective.
I'm saying it's a Nash equilibrium, not that it's particularly realistic.
They push it to 100 because they expect everyone else to do so, and they expect that if anyone sets it to less than 100, the equilibrium temperature in the round after that will be 100 instead of 99. If everyone else is going to select 100, it's futile to individually deviate and set the temperature to 30, because that means in the next round everyone but you will set it to 100 again, and that's not worth being able to individually set it to 30 in this round.
Start by playing 99. If someone played less than they were supposed to last round, you're now supposed to play 100. Otherwise, you're now supposed to play 99.
I think what people are missing (I know I am) is where does the "supposed to" come from? I totally understand the debt calculation to get altruistic punishment for people who deviate in ways that hurt you - that's just maximizing long-term expectation through short-term loss. I don't understand WHY a rational agent would punish someone who is BENEFITTING you with their deviant play.
I'd totally get it if you reacted to someone playing MORE than they were supposed to. But if someone plays less than, there's no debt or harm to punish.
This is specifically for Nash equilibria of iterated games. See the folk theorems Wikipedia article.
After someone chooses 30 once, they still get to choose something different in future rounds. In the strategy profile I claim is a Nash equilibrium, they'll set it to 100 next round like everyone else. If anyone individually deviates from setting it to 100, then the equilibrium temperature in the next round will also be 100. That simply isn't worth it, if you expect to be the only person setting it less than 100. Since in the strategy profile I am constructing everyone does set it to 100, that's the condition we need to check to check whether it's a Nash equilibrium.
I rewrote part of the post to give an equilibrium that works with a discount rate as well.
"The way it works is that, in each round, there's an equilibrium temperature, which starts out at 99. If anyone puts the dial less than the equilibrium temperature in a round, the equilibrium temperature in the next round is 100. Otherwise, the equilibrium temperature in the next round is 99 again. This is a Nash equilibrium because it is never worth deviating from. In the Nash equilibrium, everyone else selects the equilibrium temperature, so by selecting a lower tem...
I am confused. Why does everyone else select the equilibrium temperature? Why would they push it to 100 in the next round? You never explain this.
I understand you may be starting off a theorem that I don’t know. To me the obvious course of action would be something like: the temperature is way too high, so I’ll lower the temperature. Wouldn’t others appreciate that the temperature is dropping and getting closer to their own preference of 30 degrees ?
Are you saying what you’re describing makes sense, or are you saying that what you’re describing is a weird (and meaningless?) consequence of Nash theorem?
Ah, here's a short proof of a folk theorem: *
But it doesn't show that it's a subgame perfect equilibrium. This paper claims to prove it for subgame perfect equilibria, although I haven't checked it in detail.
I only skimmed the paper, it was linked from Wikipedia as a citation for one of the folk theorems. But, it's easy to explain the debt-based equilibrium assuming no discount rate. If you're considering setting the temperature lower than the equilibrium, you will immediately get some utility by setting the temperature lower, but you will increase the debt by 1. That means the equilibrium temperature will be 1 degree higher in 1 future round, more than compensating for the lower temperature in this round. So there is no incentive to set the temperature lower ...
I believe many of the theorems have known proofs (e.g. this paper). Here's an explanation of the debt mechanic:
Debt is initially 0. Equilibrium temperature is 99 if debt is 0, otherwise 100. For everyone who sets the temperature less than equilibrium in a round, debt increases by 1. Debt decreases by 1 per round naturally, unless it was already 0.
If you think you're responsible for everything, that means you're responsible for everything bad that happens. That's a lot of very bad stuff, some of which is motivated by bad intentions. An entity who's responsible for that much bad stuff couldn't be like a typical person, who is responsible for a modest amount of bad stuff. It's hard to conceptualize just how much bad stuff this hypothetical person is responsible for without supernatural metaphors; it's far beyond what a mere genocidal dictator like Hitler or Stalin is responsible for (at least, if you ...
I have a version of heroic responsibility in my head that I don’t think causes one to have false beliefs about supernatural phenomena, so I’m interested in engaging on whether the version in my head makes sense, though I don’t mean to invalidate your strongly negative personal experiences with the idea.
I think there’s a difference between causing something and taking responsibility for it. There’s a notion of “I didn’t cause this mess but I am going to clean it up.” In my team often a problem arises that we didn’t cause and weren’t expecting. A few months ...
”You could call it heroic responsibility, maybe,” Harry Potter said. “Not like the usual sort. It means that whatever happens, no matter what, it’s always your fault. Even if you tell Professor McGonagall, she’s not responsible for what happens, you are. Following the school rules isn’t an excuse, someone else being in charge isn’t an excuse, even trying your best isn’t an excuse. There just aren’t any excuses, you’ve got to get the job done no matter what.” –HPMOR, chapter 75.
I think a typical-ish person actually doing this doesn't look like them risin...
Read The Doomsday Machine. The US Air Force is way less of a defensive or utilitarian actor than you are implying, e.g. for a significant period of time the only US nuclear war plan (which was hidden from Kennedy) involved bombing as many Chinese cities as possible even if it was Russia who had attacked the US. (In general I avoid giving the benefit of the doubt to dishonest violent criminals even if they call themselves "the government", but here we have extra empirical data)
...National Intelligence Estimate (NIE) 11-10-57, issued in December 1957, predicted that the Soviets would "probably have a first operational capability with up to 10 prototype ICBMs" at "some time during the period from mid-1958 to mid-1959." The numbers started to inflate. A similar report gathered only a few months later, NIE 11-5-58, released in August 1958, concluded that the USSR had "the technical and industrial capability... to have an operational capability with 100 ICBMs" some time in 1960 and perhaps 500 ICBMs "some time in 1961, or at the latest
There's a lot in this post that I agree with, but in the spirit of the advice in this post, I'll focus on where I disagree:
...If you are moving closer to truth—if you are seeking available information and updating on it to the best of your ability—then you will inevitably eventually move closer and closer to agreement with all the other agents who are also seeking truth.
But this can’t be right. To see why, substitute “making money on prediction markets” for “moving closer to truth”, “betting” for “updating”, and “trying to make money on prediction market
I think I would have totally agreed in 2016. One update since then is that I think progress scales way less than resources than I used to think it did. In many historical cases, a core component of progress driven by a small number of people (which is reflected in citation counts, who is actually taught in textbooks), and introducing lots of funding and scaling too fast can disrupt that by increasing the amount of fake work.
$1B in safety well-spent is clearly more impactful than $1B less in semiconductors, it's just that "well-spent" is doing a lot of work...
The probability a given solution is valid to a given problem is . The probability that a given solution is invalid is . The probability that all are invalid is . The probability that not all are invalid (there is some solution) is .
The probability that all problems are solved is . The probability of doom (that not all problems are solved) is .
For , , Google thinks this number is 0.
I don't especially think AI capabilities increases are bad on the margin, but if I did I would think of this as a multilateral disarmament problem where those who have the most capabilities (relative to something else, population/economy) and worst technological coordination should disarm first, similar to nukes; that would currently indicate US and UK over China etc. China has more precedent for government control over the economy than the West, so could more easily coordinate AI slowdown.
If the falsifying Turing-computable agent has access to the oracle A' and a different oracle A'', and A' and A'' give different answers on some Turing machine (which must never halt if A' and A'' are arbitration oracles), then there is some way to prove that by exhibiting this machine.
The thing I'm proving is that, given that the falsifier knows A' is an arbitration oracle, that doesn't help it falsify that oracle B satisfies property P. I'm not considering a case where there are two different putative arbitration oracles. In general it seems hard f...
Do you think of counterfactuals as a speedup on evolution? Could this be operationalized by designing AIs that quantilize on some animal population, therefore not being far from the population distribution, but still surviving/reproducing better than average?
Note the preceding
Let's first, within a critical agential ontology, disprove some very basic forms of determinism.
I'm assuming use of a metaphysics in which you, the agent, can make choices. Without this metaphysics there isn't an obvious motivation for a theory of decisions. As in, you could score some actions, but then there isn't a sense in which you "can" choose one according to any criterion.
Maybe this metaphysics leads to contradictions. In the rest of the post I argue that it doesn't contradict belief in physical causality including as applied to the self.
AFAIK the best known way of reconciling physical causality with "free will" like choice is constructor theory, which someone pointed out was similar to my critical agential approach.
To expand on strawberries vs diamonds:
It seems to me that the strawberry problem is likely easier than the "turn the universe into diamond" problem. Immediate reasons:
...AI improving itself is most likely to look like AI systems doing R&D in the same way that humans do. “AI smart enough to improve itself” is not a crucial threshold, AI systems will get gradually better at improving themselves. Eliezer appears to expect AI systems performing extremely fast recursive self-improvement before those systems are able to make superhuman contributions to other domains (including alignment research), but I think this is mostly unjustified. If Eliezer doesn’t believe this, then his arguments about the alignment problem that hum
My sense is that we are on broadly the same page here. I agree that "AI improving AI over time" will look very different from "humans improving humans over time" or even "biology improving humans over time." But I think that it will look a lot like "humans improving AI over time," and that's what I'd use to estimate timescales (months or years, most likely years) for further AI improvements.
“myopia” (not sure who correctly named this as a corrigibility principle),
I think this is from Paul Christiano, e.g. this discussion.
I've been thinking recently about AI alignment perhaps being better thought of as a subfield of cognitive science than either AI (since AI focuses on artificial agents, not human values) or philosophy (since philosophy is too open-ended); cognitive science is a finite endeavor (due to the limited size of the human brain) compatible with executable philosophy.
It seems to me that an approach that would "work" for AI alignment (in the sense of solving or reframing it) would be to understand the human mind well enough to determine whether it has "values" / "be...
Yeah, it’s not going so well. It is in fact going so incredibly poorly that so far the whole thing is quite plausibly vastly net negative, with most funding that has gone into “AI Safety” efforts serving as de facto capabilities research that both speeds things up and divides them and is only serving to get us killed faster.
I'm pretty curious about your, Eliezer's, or others' opinions on when AI safety started being net negative. Was very early concern (by Minsky, I.J. Good, etc) net negative? What about Eliezer writing the parts of the Sequences that w...
OpenAI was the point where the turning point became visible; obviously the actual turn must have occurred before then. Arguably it was with DeepMind, since it looks (from way outside) like Demis has basically lost the struggle for reliable/final control of it inside Google.
Did Kant have the concept of knowledge that we are born with, which nonetheless is contingent on how the world happens to be?
I am not sure how much he has this. He grants that his philosophy applies to humans, not to other possible minds (e.g. God); it's a contingent fact that, unlike God, we don't produce things just by conceiving of them. And since he thinks spacetime is non-analytic, he grants that in a sense it "could" be otherwise, it's just that that "could" counterfactual structure must branch before we get empirical observations. But he doesn't ...
the axiomatisation of logic, geometry, and all of mathematics
Euclid's Elements predated Kant.
Nowadays we would say that the symmetries of the mathematical, geometric square are a theorem derivable from axioms, and that the symmetries of a wooden block are a physical property of a system that empirically satisfies such axioms.
I think the main problem with this is that it requires the wooden block rotation to be an empirical fact. It seems like with enough sense of space, it wouldn't require empirically observing rotating blocks to predict that a squa...
Why is space necessary? “External” seems like a good description of the relationship of objective stuff to minds, but that relationship doesn’t seem like it couldn’t be well-described in non-spatial terms. E.g. “reality is that which, when you stop believing in it, doesn’t stop affecting you”. (Though I had to modify the “go away”.)
Such a sense of reality might be external in the sense of "unpredictable" but not in the sense of "apparently outside me".
...I doubt or don’t understand this. I agree that time is the form of the inner sense, but it’s also the
See Bayesian Games, it handles agent types being sampled from a joint distribution over types.
There's a Haskell implementation of modal UDT.
Previous discussion of modal UDT: 1, 2.
FDT is a family of decision theories, modal UDT is a specific algorithm. As stated it requires a hypercomputer, however has bounded variants.
You're right that the uploading case wouldn't necessarily require strong algorithmic insight. However, it's a kind of bounded technical problem that's relatively easy to evaluate progress in relative to the difficulty, e.g. based on ability to upload smaller animal brains, so would lead to >40 year timelines absent large shifts in the field or large drivers of progress. It would also lead to a significant degree of alignment by default.
For copying culture, I think the main issue is that culture is a protocol that runs on human brains, not on computers. ...
Some ways of giving third parties Bayesian evidence that you have some secret without revealing it:
I'm not against "tenure" in this case. I don't think it makes sense for people to make their plans around the idea that person X has secret Y unless they have particular reason to think secret Y is really important and likely to be possessed by...
Wouldn't the AC unit have to intake cool air from the room (since it's expelling cold air into the room), and mix the cool air with the warm outside air? (Maybe the numbers work out differently in this condition but I'm not convinced yet, would have to see a calculation)
A two hose AC does take in both indoor and outdoor air, but they never mix. (The two hoses both carry outdoor air; indoor air is pumped through two vents in the AC.) The AC just pumps heat from the indoor air to the outdoor air. Similar to a fridge.
Here it’s done mainly to sidestep an issue of “dividing by zero”, which makes me think that there’s some kind of argument which sidesteps it by using limits or something like that.
Here's my attempt at sidestepping: EDT solves 5 and 10 with conditional oracles.
I assumed EER did account for that based on:
All portable air conditioner’s energy efficiency is measured using an EER score. The EER rating is the ratio between the useful cooling effect (measured in BTU) to electrical power (in W). It’s for this reason that it is hard to give a generalized answer to this question, but typically, portable air conditioners are less efficient than permanent window units due to their size.
Regarding the back-and-forth on air conditioners, I tried Google searching to find a precedent for this sort of analysis; the first Google result was "air conditioner single vs. dual hose" was this blog post, which acknowledges the inefficiency johnswentworth points out, overall recommends dual-hose air conditioners, but still recommends single-hose air conditioners under some conditions, and claims the efficiency difference is only about 12%.
Highlights:
...In general, a single-hose portable air conditioner is best suited for smaller rooms. The reason being
Well, at some point in the writing process there is an opportunity to edit the text, I specifically didn't for this post because I wanted to demonstrate the raw output of the process so as to make it easier to judge the process itself. Also my discount rate is such that short term gains in writing output quality-weighted are somewhat valuable.
Are you saying it has a non-functional definition? What might that be, and would it allow for zombies? If it doesn't have a definition, how is it semantically meaningful?
I began reading this charitably (unaware of whatever inside baseball is potentially going on, and seems to be alluded to), but to be honest struggled after “X” seemed to really want someone (Eliezer) to admit they’re “not smart”? I’m not sure why that would be relevant.
I'm not sure exactly what is meant, one guess is that it's about centrality: making yourself more central (more making executive decisions, more of a bottleneck on approving things, more looked to as a leader by others, etc) makes more sense the more you're more correct about relevant thi...
Btw, there is some amount of philosophical convergence between this and some recent work I did on critical agential physics; both are trying to understand physics as laws that partially (not fully) predict sense-data starting from the perspective of a particular agent.
It seems like "infra-Bayesianism" may be broadly compatible with frequentism; extending Popper's falsifiability condition to falsify probabilistic (as opposed to deterministic) laws yields frequentist null hypothesis significance testing, e.g. Neyman Pearson; similarly, frequentism also attem...
The following is an edited partial chat transcript of a conversation involving me, Benquo, and an anonymous person (X). I am posting it in the hope that it has enough positive value-of-information compared to the attentional cost to be of benefit to others. I hope people can take this as "yay, I got to witness a back-room conversation" rather than "oh no, someone has made a bunch of public assertions that they can't back up"; I think it would be difficult and time-consuming to argue for all these points convincingly, although I can explain to some degree...
I began reading this charitably (unaware of whatever inside baseball is potentially going on, and seems to be alluded to), but to be honest struggled after "X" seemed to really want someone (Eliezer) to admit they're "not smart"? I'm not sure why that would be relevant.
I think I found these lines especially confusing, if you want to explain:
Do you think it was clear to over 90% of readers that the part where he says "April fools, this is just a test!" is not a statement of truth?
What about objective priors?
With objective priors one can always ask "so what?" If it's not my subjective prior, then its posterior will not equal my subjective posterior. There isn't an obvious way to bound the difference between my subjective prior and the objective prior.
With frequentist methods it's possible to get guarantees like "no matter what prior over you start with, if you run this method, you'll correctly estimate to within with at least probability". It's clear that a subjective Bayesian (with imperfect knowledge of their prior)...
I think those are hard to separate. Bad social circumstances can make people act badly. There's the "hurt people hurt people" truism and numerous examples of people being caused to act morally worse by their circumstances e.g. in war. I do think I have gone through extraordinary measures to understand the ways in which I act badly (often in response to social cues) and to act more intentionally well.