All of jessicata's Comments + Replies

I think those are hard to separate. Bad social circumstances can make people act badly. There's the "hurt people hurt people" truism and numerous examples of people being caused to act morally worse by their circumstances e.g. in war. I do think I have gone through extraordinary measures to understand the ways in which I act badly (often in response to social cues) and to act more intentionally well.

1Thoth Hermes1mo
Yes, but the point is that we're trying to determine if you are under "bad" social circumstances or not. Those circumstances will not be independent from other aspects of the social group, e.g. the ideology it espouses externally and things it tells its members internally.  What I'm trying to figure out is to what extent you came to believe you were "evil" on your own versus you were compelled to think that about yourself. You were and are compelled to think about ways in which you act "badly" - nearby or adjacent to a community that encourages its members to think about how to act "goodly." It's not a given, per se, that a community devoted explicitly to doing good in the world thinks that it should label actions as "bad" if they fall short of arbitrary standards. It could, rather, decide to label actions people take as "good" or "gooder" or "really really good" if it decides that most functional people are normally inclined to behave in ways that aren't necessarily un-altruistic or harmful to other people.  I'm working on a theory of social-group-dynamics which posits that your situation is caused by "negative-selection groups" or "credential-groups" which are characterized by their tendency to label only their activities as actually successfully accomplishing whatever it is they claim to do - e.g., "rationality" or "effective altruism." If it seems like the group's ideology or behavior implies that non-membership is tantamount to either not caring about doing well or being incompetent in that regard, then it is a credential-group.  Credential-groups are bad social circumstances, and in a nutshell, they act badly by telling members who they know not to be intentionally causing harm that they are harmful or bad people (or mentally ill). 

It was partially to demonstrate that bad Nash equilibria even affect common-payoff games, there don't even need to be dynamics of some agents singling out other agents to reward and punish.

It wasn't just that, it was also based on thinking I had more control over other people than I realistically had. Probably it is partly latent personality factors. But a heroic responsibility mindset will tend to cause people to think other people's actions are their fault if they could, potentially, have affected them through any sort of psychological manipulation (see also, Against Responsibility).

I think I thought I was working on AI risk but wasn't taking heroic responsibility because I wasn't owning the whole problem. People around me encouraged me to... (read more)

1Thoth Hermes1mo
This is cool because what you're saying has useful information pertinent to model updates regardless of how I choose to model your internal state.  Here's why it's really important: You seem to have been motivated to classify your own intentions as "evil" at some point, based entirely on things that were not entirely under your own control.  That points to your social surroundings as having pressured you to come to that conclusion (I am not sure it is very likely that you would have come to that conclusion on your own, without any social pressure). So that brings us to the next question: Is it more likely that you are evil, or rather, that your social surroundings were / are?

In the round after the round where the 30 applies, the Shelling temperature for the next round increases to 100, and it's a Nash equilibrium for everyone to always select the Schelling temperature.

You can claim this is an unrealistic Nash equilibrium but I am pretty sure that unilateral deviation from the Schelling temperature, assuming everyone else always plays the Schelling temperature, never works out in anyone's favor.

If a mathematical model doesn't reflect at all the thing it's supposed to represent, it's not a good model. Saying "this is what the model predicts" isn't helpful.

There is absolutely zero incentive to anyone to put the temperature to 100 at any time. Even as deterrence, there is no reason for the equilibrium temperature to be an unsurvivable 99. It makes no sense, no one gains anything from it, especially if we assume communication between the parties (which is required for there to be deterrence and other such mechanisms in place). There is no reason to p... (read more)

Formally, it's an arbitrary strategy profile that happens to be a Nash equilibrium, since if everyone else plays it, they'll punish if you deviate from it unilaterally.

In terms of more realistic scenarios there are some examples of bad "punishing non punishers" equilibria that people have difficulty escaping. E.g. an equilibrium with honor killings, where parents kill their own children partly because they expect to be punished if they don't. Rober Trivers, an evolutionary psychologist, has studied these equilibria, as they are anomalous from an evolutionary psychology perspective.

5dr_s2mo
This doesn't really answer the question. If some prisoner turns the dial to 30, everyone gets higher utility the next round, with no downside. In order to have some reason to not put it to 30, they need some incentive (e.g. that if anyone puts it strictly below average they also get an electric shock or whatever).

I'm saying it's a Nash equilibrium, not that it's particularly realistic.

They push it to 100 because they expect everyone else to do so, and they expect that if anyone sets it to less than 100, the equilibrium temperature in the round after that will be 100 instead of 99. If everyone else is going to select 100, it's futile to individually deviate and set the temperature to 30, because that means in the next round everyone but you will set it to 100 again, and that's not worth being able to individually set it to 30 in this round.

1Aorou1mo
After giving it some thought, I do see a lot of real-life situations where you get to such a place. For instance-  I was recently watching The Vow, the documentary about the NXIVM cult ("nexium"). In very broad strokes, one of the core fucked up things the leader does, is to gaslight the members into thinking that pain is good. If you resist him, don't like what he says, etc, there is something deficient in you. After a while, even when he's not in the picture so it would make sense for everyone to suffer less and get some slack, people punish each other for being deficient or weak. And now that I wrote it about NXIVM I imagine those dynamics are actually commonplace in everyday society too.   
2Aorou2mo
Gotcha. Thanks for clarifying! 

Start by playing 99. If someone played less than they were supposed to last round, you're now supposed to play 100. Otherwise, you're now supposed to play 99.

I think what people are missing (I know I am) is where does the "supposed to" come from?  I totally understand the debt calculation to get altruistic punishment for people who deviate in ways that hurt you - that's just maximizing long-term expectation through short-term loss.  I don't understand WHY a rational agent would punish someone who is BENEFITTING you with their deviant play.

I'd totally get it if you reacted to someone playing MORE than they were supposed to.  But if someone plays less than, there's no debt or harm to punish.

This is specifically for Nash equilibria of iterated games. See the folk theorems Wikipedia article.

After someone chooses 30 once, they still get to choose something different in future rounds. In the strategy profile I claim is a Nash equilibrium, they'll set it to 100 next round like everyone else. If anyone individually deviates from setting it to 100, then the equilibrium temperature in the next round will also be 100. That simply isn't worth it, if you expect to be the only person setting it less than 100. Since in the strategy profile I am constructing everyone does set it to 100, that's the condition we need to check to check whether it's a Nash equilibrium.

3tgb2mo
I guess the unstated assumption is that the prisoners can only see the temperatures of others from the previous round and/or can only change their temperature at the start of a round (though one tried to do otherwise in the story). Even with that it seems like an awfully precarious equilibrium since if I unilaterally start choosing 30 repeatedly, you'd have to be stupid to not also start choosing 30, and the cost to me is really quite tiny even while no one else ever 'defects' alongside me. It seems to be too weak a definition of 'equilibrium' if it's that easy to break - maybe there's a more realistic definition that excludes this case?

I rewrote part of the post to give an equilibrium that works with a discount rate as well.

"The way it works is that, in each round, there's an equilibrium temperature, which starts out at 99. If anyone puts the dial less than the equilibrium temperature in a round, the equilibrium temperature in the next round is 100. Otherwise, the equilibrium temperature in the next round is 99 again. This is a Nash equilibrium because it is never worth deviating from. In the Nash equilibrium, everyone else selects the equilibrium temperature, so by selecting a lower tem... (read more)

I am confused. Why does everyone else select the equilibrium temperature? Why would they push it to 100 in the next round? You never explain this.

I understand you may be starting off a theorem that I don’t know. To me the obvious course of action would be something like: the temperature is way too high, so I’ll lower the temperature. Wouldn’t others appreciate that the temperature is dropping and getting closer to their own preference of 30 degrees ?

Are you saying what you’re describing makes sense, or are you saying that what you’re describing is a weird (and meaningless?) consequence of Nash theorem?

Ah, here's a short proof of a folk theorem: *

But it doesn't show that it's a subgame perfect equilibrium. This paper claims to prove it for subgame perfect equilibria, although I haven't checked it in detail.

2jessicata2mo
I rewrote part of the post to give an equilibrium that works with a discount rate as well. "The way it works is that, in each round, there's an equilibrium temperature, which starts out at 99. If anyone puts the dial less than the equilibrium temperature in a round, the equilibrium temperature in the next round is 100. Otherwise, the equilibrium temperature in the next round is 99 again. This is a Nash equilibrium because it is never worth deviating from. In the Nash equilibrium, everyone else selects the equilibrium temperature, so by selecting a lower temperature, you cause an increase of the equilibrium temperature in the next round. While you decrease the temperature in this round, it's never worth it, since the higher equilibrium temperature in the next round more than compensates for this decrease."

I only skimmed the paper, it was linked from Wikipedia as a citation for one of the folk theorems. But, it's easy to explain the debt-based equilibrium assuming no discount rate. If you're considering setting the temperature lower than the equilibrium, you will immediately get some utility by setting the temperature lower, but you will increase the debt by 1. That means the equilibrium temperature will be 1 degree higher in 1 future round, more than compensating for the lower temperature in this round. So there is no incentive to set the temperature lower ... (read more)

3Dagon2mo
I understand the debt calculation as group-enforced punishment for defection.  It's a measure of how much punishment is due to bring the average utility of an opponent back down to expectation, after that opponent "steals" utility by defecting when they "should" cooperate.  It's not an actual debt, and not symmetrical around that average.  In fact, in the temperature example, it should be considered NEGATIVE debt for someone to unilaterally set their temperature lower.
2jessicata2mo
Ah, here's a short proof of a folk theorem: * [https://www.cs.ubc.ca/~kevinlb/teaching/cs532a%20-%202003-4/folk.pdf] But it doesn't show that it's a subgame perfect equilibrium. This paper [https://link.springer.com/article/10.1007/s00182-020-00735-z] claims to prove it for subgame perfect equilibria, although I haven't checked it in detail.

I believe many of the theorems have known proofs (e.g. this paper). Here's an explanation of the debt mechanic:

Debt is initially 0. Equilibrium temperature is 99 if debt is 0, otherwise 100. For everyone who sets the temperature less than equilibrium in a round, debt increases by 1. Debt decreases by 1 per round naturally, unless it was already 0.

5Dagon2mo
Hmm.  A quick reading of that paper talks about punishment for defection, not punishment for unexpected cooperation.   Can you point to the section that discusses the reason for the "debt" concept as applied to deviations that benefit the player in question? Note that I'm going to have to spend a bit more time on the paper, because I'm fascinated by the introduction of a discount rate to make the punishment non-infinite.  I do not expect to find that it mandates punishment for unexpected gifts of utility.

If you think you're responsible for everything, that means you're responsible for everything bad that happens. That's a lot of very bad stuff, some of which is motivated by bad intentions. An entity who's responsible for that much bad stuff couldn't be like a typical person, who is responsible for a modest amount of bad stuff. It's hard to conceptualize just how much bad stuff this hypothetical person is responsible for without supernatural metaphors; it's far beyond what a mere genocidal dictator like Hitler or Stalin is responsible for (at least, if you ... (read more)

I have a version of heroic responsibility in my head that I don’t think causes one to have false beliefs about supernatural phenomena, so I’m interested in engaging on whether the version in my head makes sense, though I don’t mean to invalidate your strongly negative personal experiences with the idea.

I think there’s a difference between causing something and taking responsibility for it. There’s a notion of “I didn’t cause this mess but I am going to clean it up.” In my team often a problem arises that we didn’t cause and weren’t expecting. A few months ... (read more)

”You could call it heroic responsibility, maybe,” Harry Potter said. “Not like the usual sort. It means that whatever happens, no matter what, it’s always your fault. Even if you tell Professor McGonagall, she’s not responsible for what happens, you are. Following the school rules isn’t an excuse, someone else being in charge isn’t an excuse, even trying your best isn’t an excuse. There just aren’t any excuses, you’ve got to get the job done no matter what.” –HPMOR, chapter 75.

I think a typical-ish person actually doing this doesn't look like them risin... (read more)

3Portia1mo
I share your concern and insight, yet I also strongly identify with what Eliezer calls heroic responsibility, and have found it an empowering concept. For me, it resonates with two groups of fundamental values and assumptions for me: Group 1: 1. If something evil is happening, do not assume someone else has already stepped forward and is competently handling it unless proven otherwise. If everyone thinks someone is handling it, likely, noone is; step up, and verify. (Bystander effect: if you hear someone screaming faintly in the distance, and think, there are a hundred people between me and the screaming one, surely someone has alerted the authorities... stop assuming this, right now, verify.) In these scenarios, I will happily hand over to someone more qualified who will handle the thing better. But this often involves handling it while alerting the people who should, and pushing them repeatedly until they actually show up, and staying on site and doing what you can until they do and are sure they will actually take over. 2. New forms of evil often have noone who was assigned responsibility yet; someone needs to choose to  take it - and on this point, see 1. (Relevant for relatively novel problems like AI alignment.) 3. Enormous forms of evil are too big for any one person to handle, so assume you need to chip in, even if responsible people exist. (E.g. Politicians ought to handle the climate crisis; but they can't, so each of us needs to help.) 4. Existential evil is the responsibility of everyone, no matter how weak, yourself included. If you lived in nazi Germany while the Jews were being exterminated, you had the responsibility to help, no matter who you were and what you did. There is no "this is not my job". If you are human, it is. There is something each of us can do, always. Start small - something is better than nothing - but do not stop building. Recognise contemporary
1Thoth Hermes1mo
Did you conclude this entirely because there continue to be horrible things happening in the world, or was this based on other reflective information that was consistent with horrible things happening in the world too?  I imagine that this conclusion must at least be partly based on latent personality factors as well. But if so, I'm very curious as to how these things jive with your desire to be heroically responsible at the same time. E.g., how do evil intentions predict your other actions and intentions regarding AI-risk and wanting to avert the destruction of the world?
0DivineMango2mo
I agree with this, thanks for the feedback! Edited.
4Ben Pace2mo
My other comment notwithstanding, I do think the HPMOR quote is not very helpful for someone's mental health when they're in pain and seems a bit odd placed atop a section on advice, and I think the advice at the wrong time can feel oppressive. The hero-licensing post feels much less like it risks feeling oppressed by every bad thing that happens in the world. And personally I found Anna's post [https://www.lesswrong.com/posts/mmHctwkKjpvaQdC3c/what-should-you-change-in-response-to-an-emergency-and-ai] linked earlier to be much more helpful advice that is related to and partially upstream of the sorts of changes in my life that have reduced a lot of anxiety. If it were me I'd probably put that at the top of the list there, perhaps along with Come to Your Terms [https://mindingourway.com/come-to-your-terms/] by Nate which also resonates strongly with me. (Looking further) I see, the point of that section isn't to be "the advice section", it's to be "the advice posts that don't talk about AI". I still think something about that is confusing. My first-guess is that I'd structure a post like this like an FAQ, "Are you feeling X because Y? Then here's two posts that address this" and so on, so that people can find the bit that is relevant to their problem. But not sure.
5Ben Pace2mo
I can understand thinking of yourself as having evil intentions, but I don't understand believing you're a partly-demonic entity.  I think the way that the global market and culture can respond to ideas is strange and surprising, with people you don't know taking major undertakings based on your ideas, with lots of copying and imitation and whole organizations or people changing their lives around something you did without them ever knowing you. Like the way that Elon Musk met a girlfriend of his via a Roko's Basilisk meme, or one time someone on reddit I don't know believed that an action I'd taken was literally "the AGI" acting in their life (which was weird for me). I think that one can make straightforward mistakes in earnestly reasoning about strange things (as is argued in this [https://astralcodexten.substack.com/p/contra-kavanaugh-on-fideism] Astral Codex Ten post that IIRC argues that conspiracy theories often have surprisingly good arguments for them that a typical person would find persuasive on their own merits). So I'm not saying that really trying to act on a global scale on a difficult problem couldn't cause you to have supernatural beliefs.  But you said it's what would happen to a 'typical-ish person'. If you believe a 'typical-ish person' trying to have an epistemology will reliably fail in ways that lead to them believing in conspiracies, then I guess yes, they may also come to have supernatural beliefs if they try to take action that has massive consequences in the world. But I think a person with just a little more perspective can be self-aware about conspiracy theories and similarly be self-aware about whatever other hypotheses they form, and try to stick to fairly grounded ones. It turns out that when you poke civilization the right way does a lot of really outsized and overpowered things sometimes.  I imagine it was a trip for Doug Engelbart to watch everyone in the world get a personal computer, with a computer mouse and a graphical user-

Read The Doomsday Machine. The US Air Force is way less of a defensive or utilitarian actor than you are implying, e.g. for a significant period of time the only US nuclear war plan (which was hidden from Kennedy) involved bombing as many Chinese cities as possible even if it was Russia who had attacked the US. (In general I avoid giving the benefit of the doubt to dishonest violent criminals even if they call themselves "the government", but here we have extra empirical data)

1Gerald Monroe4mo
I am not arguing that. I know the government does bad things and I read other books on that era. I was really just noting the consequences of an alternate policy might not have been any better.

National Intelligence Estimate (NIE) 11-10-57, issued in December 1957, predicted that the Soviets would "probably have a first operational capability with up to 10 prototype ICBMs" at "some time during the period from mid-1958 to mid-1959." The numbers started to inflate. A similar report gathered only a few months later, NIE 11-5-58, released in August 1958, concluded that the USSR had "the technical and industrial capability... to have an operational capability with 100 ICBMs" some time in 1960 and perhaps 500 ICBMs "some time in 1961, or at the latest

... (read more)
1Gerald Monroe4mo
So by presenting a "potential" number of missiles as "the soviets had this many" what were the consequences? This led to more funding for the USA to build weapons, which in turn caused the soviets to build more? Or did it deter a "sneak attack" where the soviets could build in secret far more missiles and win a nuclear war? Basically until the USA had enough arms for "assured destruction" this was a risk. A more realistic view of how many missiles the Soviets probably had extends the number of years there isn't enough funding to pay for "assured destruction". Then again maybe the scenario where by the 1980s, civilization ending numbers of missiles were possessed by each side could have been avoided. My point here is that a policy of honesty may not really work in a situation where the other side is a bad actor.

There's a lot in this post that I agree with, but in the spirit of the advice in this post, I'll focus on where I disagree:

If you are moving closer to truth—if you are seeking available information and updating on it to the best of your ability—then you will inevitably eventually move closer and closer to agreement with all the other agents who are also seeking truth.

But this can’t be right. To see why, substitute “making money on prediction markets” for “moving closer to truth”, “betting” for “updating”, and “trying to make money on prediction market

... (read more)
1TAG5mo
In Popperian epistemology, it's a virtue to propose hypotheses that are easily disproven...which isn't the same thing as always incrementally moving towards truth: it's more like babble-and-prune. Of course, the instruction to converge on truth doesnt quite say "get closer to truth in every step --no backtracking" -- it's just that Bayesians are likely to take it that way. And of course, epistemology is unsolved. No one can distill the correct theoretical epistemology into practical steps, because no one knows what it is ITFP.

I think I would have totally agreed in 2016. One update since then is that I think progress scales way less than resources than I used to think it did. In many historical cases, a core component of progress driven by a small number of people (which is reflected in citation counts, who is actually taught in textbooks), and introducing lots of funding and scaling too fast can disrupt that by increasing the amount of fake work.

$1B in safety well-spent is clearly more impactful than $1B less in semiconductors, it's just that "well-spent" is doing a lot of work... (read more)

Oh I see how the formula follows from that assumption.

The probability a given solution is valid to a given problem is . The probability that a given solution is invalid is . The probability that all are invalid is . The probability that not all are invalid (there is some solution) is .

The probability that all problems are solved is . The probability of doom (that not all problems are solved) is .

For , , Google thinks this number is 0.

1Cleo Nardo6mo
Yep, the crux is: do we need a unique solution which solves all our problems, or can we accept that different problems are solved by different solutions? I somewhat lean to the former.

I don't especially think AI capabilities increases are bad on the margin, but if I did I would think of this as a multilateral disarmament problem where those who have the most capabilities (relative to something else, population/economy) and worst technological coordination should disarm first, similar to nukes; that would currently indicate US and UK over China etc. China has more precedent for government control over the economy than the West, so could more easily coordinate AI slowdown.

If the falsifying Turing-computable agent has access to the oracle A' and a different oracle A'', and A' and A'' give different answers on some Turing machine (which must never halt if A' and A'' are arbitration oracles), then there is some way to prove that by exhibiting this machine.

The thing I'm proving is that, given that the falsifier knows A' is an arbitration oracle, that doesn't help it falsify that oracle B satisfies property P. I'm not considering a case where there are two different putative arbitration oracles. In general it seems hard f... (read more)

2interstice10mo
Hmm, let me put it this way -- here's how I would (possible incorrectly) summarize the proof: "For a given oracle O with property P, you can't falsify that O uses only arbitration-oracle levels of hypercomputation, even with access to an arbitration oracle A'. This is because there could be a Turing machine T in the background, with access to an arbitration oracle A, which controls the output of both A' and O, in such a way that A' being an arbitration oracle and O having property P cannot be disproved in PA" I'm wondering whether we can remove the condition that the background Turing machine T controls the output of A', instead only allowing it to control the output of O, in such a way that "A' being an arbitration oracle and O having property P" cannot be disproved in PA for all possible arbitration oracles A'. This would be a scenario where there are two possibly-distinct arbitration oracles, the oracle A' used by the verifier and the oracle A used by the background Turing machine T.

Do you think of counterfactuals as a speedup on evolution? Could this be operationalized by designing AIs that quantilize on some animal population, therefore not being far from the population distribution, but still surviving/reproducing better than average?

2Chris_Leong10mo
Speedup on evolution? Maybe? Might work okayish, but doubt the best solution is that speculative.

Note the preceding

Let's first, within a critical agential ontology, disprove some very basic forms of determinism.

I'm assuming use of a metaphysics in which you, the agent, can make choices. Without this metaphysics there isn't an obvious motivation for a theory of decisions. As in, you could score some actions, but then there isn't a sense in which you "can" choose one according to any criterion.

Maybe this metaphysics leads to contradictions. In the rest of the post I argue that it doesn't contradict belief in physical causality including as applied to the self.

4Chris_Leong10mo
  I've noticed that issue as well. Counterfactuals are more a convenient model/story than something to be taken literally. You've grounded decision by taking counterfactuals to exist a priori. I ground them by noting that our desire to construct counterfactuals is ultimately based on evolved instincts and/or behaviours so these stories aren't just arbitrary stories but a way in which we can leverage the lessons that have been instilled in us by evolution. I'm curious, given this explanation, why do we still need choices to be actual?
1JBlack10mo
There isn't really any need for "choices", except in the sense of "internal states are also inputs that affect the outputs". Sufficiently complex agents can have one or more decision theories encoded into their internal state, and it seems like being able to communicate, evaluate, and update such theories would at least sometimes be a useful trait. It's easy to imagine agents that can't do that, but it's more interesting (and more reflective of "intelligence") to assume they can.

AFAIK the best known way of reconciling physical causality with "free will" like choice is constructor theory, which someone pointed out was similar to my critical agential approach.

2Chris_Leong10mo
I commented directly on your post.
3shminux10mo
No disrespect to David Deutsch, but every time I try to make sense of constructor theory I run into objections like "there are solutions to the Einstein equations, like Godel universe and Kerr black hole, that contain closed timelike curves and so are not derivable with any constructor... meaning they are "impossible". But one can happily solve the covariant version, not the ADM-decomposed version and find these solutions without any transformations required by the constructor theory." I might be missing something here... on the other hand, Deutsch insists that the double-slit experiment is evidence of MWI (it's not), so some skepticism is warranted.

To expand on strawberries vs diamonds:

It seems to me that the strawberry problem is likely easier than the "turn the universe into diamond" problem. Immediate reasons:

  • the strawberry problem is bounded in space and time
  • strawberry materials can be conveniently placed close to the strawberry factory
  • turning the universe into diamond requires nanobots to burrow through a variety of materials
  • turning the universe into diamond requires overcoming all territorial adversaries trying to protect themselves from nanobots
  • turning the universe into diamond requires
... (read more)

I didn't write that reply (or this one) using the method. IMO it's more appropriate to longform.

AI improving itself is most likely to look like AI systems doing R&D in the same way that humans do. “AI smart enough to improve itself” is not a crucial threshold, AI systems will get gradually better at improving themselves. Eliezer appears to expect AI systems performing extremely fast recursive self-improvement before those systems are able to make superhuman contributions to other domains (including alignment research), but I think this is mostly unjustified. If Eliezer doesn’t believe this, then his arguments about the alignment problem that hum

... (read more)

My sense is that we are on broadly the same page here. I agree that "AI improving AI over time" will look very different from "humans improving humans over time" or even "biology improving humans over time." But I think that it will look a lot like "humans improving AI over time," and that's what I'd use to estimate timescales (months or years, most likely years) for further AI improvements.

“myopia” (not sure who correctly named this as a corrigibility principle),

I think this is from Paul Christiano, e.g. this discussion.

I've been thinking recently about AI alignment perhaps being better thought of as a subfield of cognitive science than either AI (since AI focuses on artificial agents, not human values) or philosophy (since philosophy is too open-ended); cognitive science is a finite endeavor (due to the limited size of the human brain) compatible with executable philosophy.

It seems to me that an approach that would "work" for AI alignment (in the sense of solving or reframing it) would be to understand the human mind well enough to determine whether it has "values" / "be... (read more)

3Jan1y
Great point! And thanks for the references :)  I'll change your background to Computational Cognitive Science in the table! (unless you object or think a different field is even more appropriate)

Yeah, it’s not going so well. It is in fact going so incredibly poorly that so far the whole thing is quite plausibly vastly net negative, with most funding that has gone into “AI Safety” efforts serving as de facto capabilities research that both speeds things up and divides them and is only serving to get us killed faster.

I'm pretty curious about your, Eliezer's, or others' opinions on when AI safety started being net negative. Was very early concern (by Minsky, I.J. Good, etc) net negative? What about Eliezer writing the parts of the Sequences that w... (read more)

OpenAI was the point where the turning point became visible; obviously the actual turn must have occurred before then.  Arguably it was with DeepMind, since it looks (from way outside) like Demis has basically lost the struggle for reliable/final control of it inside Google.

Did Kant have the concept of knowledge that we are born with, which nonetheless is contingent on how the world happens to be?

I am not sure how much he has this. He grants that his philosophy applies to humans, not to other possible minds (e.g. God); it's a contingent fact that, unlike God, we don't produce things just by conceiving of them. And since he thinks spacetime is non-analytic, he grants that in a sense it "could" be otherwise, it's just that that "could" counterfactual structure must branch before we get empirical observations. But he doesn't ... (read more)

the axiomatisation of logic, geometry, and all of mathematics

Euclid's Elements predated Kant.

Nowadays we would say that the symmetries of the mathematical, geometric square are a theorem derivable from axioms, and that the symmetries of a wooden block are a physical property of a system that empirically satisfies such axioms.

I think the main problem with this is that it requires the wooden block rotation to be an empirical fact. It seems like with enough sense of space, it wouldn't require empirically observing rotating blocks to predict that a squa... (read more)

3Richard_Kennaway1y
The "sense of space" is empirical data. It is not derivable from Euclidean geometry, but results from the empirical fact that the space we live in is Euclidean (on human scales). Even if it's embedded in our nervous system at birth (I do not know if it is), it's still empirical data. If space were not like that, we would not have (evolutionarily) developed to perceive it like that. Did Kant have the concept of knowledge that we are born with, which nonetheless is contingent on how the world happens to be? I'll grant Coulomb's law for electromagnetism, but geology and chemistry were mainly catalogues of observations, and philosophical aspects of steam engines had to wait for 19th century thermodynamics. Geology was producing the idea of a discoverable timeline for the Earth's changes, and chemistry was groping towards the idea of elements, but that is small potatoes compared with their development in the 19th century for chemistry and the 20th for geology. The 19th century in geology was mainly filling in the timeline in more and more detail across more of the Earth, and some wrong estimates for its age.

Why is space necessary? “External” seems like a good description of the relationship of objective stuff to minds, but that relationship doesn’t seem like it couldn’t be well-described in non-spatial terms. E.g. “reality is that which, when you stop believing in it, doesn’t stop affecting you”. (Though I had to modify the “go away”.)

Such a sense of reality might be external in the sense of "unpredictable" but not in the sense of "apparently outside me".

I doubt or don’t understand this. I agree that time is the form of the inner sense, but it’s also the

... (read more)

See Bayesian Games, it handles agent types being sampled from a joint distribution over types.

There's a Haskell implementation of modal UDT.

Previous discussion of modal UDT: 1, 2.

FDT is a family of decision theories, modal UDT is a specific algorithm. As stated it requires a hypercomputer, however has bounded variants.

You're right that the uploading case wouldn't necessarily require strong algorithmic insight. However, it's a kind of bounded technical problem that's relatively easy to evaluate progress in relative to the difficulty, e.g. based on ability to upload smaller animal brains, so would lead to >40 year timelines absent large shifts in the field or large drivers of progress. It would also lead to a significant degree of alignment by default.

For copying culture, I think the main issue is that culture is a protocol that runs on human brains, not on computers. ... (read more)

Some ways of giving third parties Bayesian evidence that you have some secret without revealing it:

  • Demos, show off the capability somehow
  • Have the idea evaluated by a third party who doesn't share it with the public
  • Do public work that is impressive the way you're claiming the secret is (so it's a closer analogy)

I'm not against "tenure" in this case. I don't think it makes sense for people to make their plans around the idea that person X has secret Y unless they have particular reason to think secret Y is really important and likely to be possessed by... (read more)

Wouldn't the AC unit have to intake cool air from the room (since it's expelling cold air into the room), and mix the cool air with the warm outside air? (Maybe the numbers work out differently in this condition but I'm not convinced yet, would have to see a calculation)

A two hose AC does take in both indoor and outdoor air, but they never mix. (The two hoses both carry outdoor air; indoor air is pumped through two vents in the AC.) The AC just pumps heat from the indoor air to the outdoor air. Similar to a fridge.

Here it’s done mainly to sidestep an issue of “dividing by zero”, which makes me think that there’s some kind of argument which sidesteps it by using limits or something like that.

Here's my attempt at sidestepping: EDT solves 5 and 10 with conditional oracles.

I assumed EER did account for that based on:

All portable air conditioner’s energy efficiency is measured using an EER score. The EER rating is the ratio between the useful cooling effect (measured in BTU) to electrical power (in W). It’s for this reason that it is hard to give a generalized answer to this question, but typically, portable air conditioners are less efficient than permanent window units due to their size.

5habryka1y
This article explains the difference: https://www.consumeranalysis.com/guides/portable-ac/best-portable-air-conditioner/ [https://www.consumeranalysis.com/guides/portable-ac/best-portable-air-conditioner/] EER measures performance in BTUs, which are simply measuring how much work the AC performs, without taking into account any backflow of cold air back into the AC, or infiltration issues.

Regarding the back-and-forth on air conditioners, I tried Google searching to find a precedent for this sort of analysis; the first Google result was "air conditioner single vs. dual hose" was this blog post, which acknowledges the inefficiency johnswentworth points out, overall recommends dual-hose air conditioners, but still recommends single-hose air conditioners under some conditions, and claims the efficiency difference is only about 12%.

Highlights:

In general, a single-hose portable air conditioner is best suited for smaller rooms. The reason being

... (read more)
3habryka1y
EER does not account for heat infiltration issues, so this seems confused. CEER does, and that does suggest something in the 20% range, but I am pretty sure you can't use EER to compare a single-hose and a dual-hose system.

Well, at some point in the writing process there is an opportunity to edit the text, I specifically didn't for this post because I wanted to demonstrate the raw output of the process so as to make it easier to judge the process itself. Also my discount rate is such that short term gains in writing output quality-weighted are somewhat valuable.

1Aleksi Liimatainen1y
Did you write this reply using a different method? It has a different feel than the original post. Partway through reading your post, I noticed that reading it felt similar to reading GPT-3-generated text. That quality seems shared by the replies using the technique. This isn't blinded so I can't rule out confirmation bias. ETA: If the effect is real, it may have something to do with word choice or other statistical features of the text. It takes a paragraph or two to build and shorter texts feel harder to judge.

Are you saying it has a non-functional definition? What might that be, and would it allow for zombies? If it doesn't have a definition, how is it semantically meaningful?

0TAG1y
It has a standard definition which you can look up in standard references works. It's unreasonable to expect a definition to answer every possible question by itself.

I began reading this charitably (unaware of whatever inside baseball is potentially going on, and seems to be alluded to), but to be honest struggled after “X” seemed to really want someone (Eliezer) to admit they’re “not smart”? I’m not sure why that would be relevant.

I'm not sure exactly what is meant, one guess is that it's about centrality: making yourself more central (more making executive decisions, more of a bottleneck on approving things, more looked to as a leader by others, etc) makes more sense the more you're more correct about relevant thi... (read more)

6Wei Dai1y
I don't think this is a valid argument. Counter-example: you could build an AGI by uploading a human brain onto an artificial substrate, and you don't "need to create very general analysis and engineering tools that generalize across these situations" to do this. More realistically, it seems pretty plausible that all of the necessary patterns/rules/heuristics/algorithms/forms of reasoning necessary for "being generally intelligent" can be found in human culture, and ML can distill these elements of general intelligence into a (language or multimodal) model that will then be generally intelligent. This also doesn't seem to require very general analysis and engineering tools. What do you think of this possibility?
2Ben Pace1y
Thanks so much for the one-paragraph summary of The Debtors’ Revolt, that was clarifying.

Btw, there is some amount of philosophical convergence between this and some recent work I did on critical agential physics; both are trying to understand physics as laws that partially (not fully) predict sense-data starting from the perspective of a particular agent.

It seems like "infra-Bayesianism" may be broadly compatible with frequentism; extending Popper's falsifiability condition to falsify probabilistic (as opposed to deterministic) laws yields frequentist null hypothesis significance testing, e.g. Neyman Pearson; similarly, frequentism also attem... (read more)

4Vanessa Kosoy1y
Thanks, I'll look at that! Yes! In frequentism, we define probability distributions as limits of frequencies. One problem with this is, what to do if there's no convergence? In the real world, there won't be convergence unless you have an infinite sequence of truly identical experiments, which you never have. At best, you have a long sequence of similar experiments. Arguably, infrabayesianism solves it by replacing the limit with the convex hull of all limit points. But, I view infrabayesianism more as a synthesis between bayesianism and frequentism. Like in frequentism, you can get asymptotic guarantees. But, like in bayesiansim, it makes sense to talk of priors (and even updates), and measure the performance of your policy regardless of the particular decomposition of the prior into hypotheses (as opposed to regret which does depend on the decomposition). In particular, you can define the optimal infrabayesian policy even for a prior which is not learnable and hence doesn't admit frequentism-style guarantees.

The following is an edited partial chat transcript of a conversation involving me, Benquo, and an anonymous person (X). I am posting it in the hope that it has enough positive value-of-information compared to the attentional cost to be of benefit to others. I hope people can take this as "yay, I got to witness a back-room conversation" rather than "oh no, someone has made a bunch of public assertions that they can't back up"; I think it would be difficult and time-consuming to argue for all these points convincingly, although I can explain to some degree... (read more)

3TekhneMakre1y
Glad this is shared. There's an awkward issue here, which is: how can there be people who are financially supported to do research on stuff that's heavily entangled with ideas that are dangerous to spread? It's true that there are dangerous incentive problems here, where basically people can unaccountably lie about their private insight into dangerous issues; on the other hand, it seems bad for ideas to be shared that are more or less plausible precursors to a world-ending artifact. My understanding about Eliezer and MIRI is basically, Eliezer wrote a bunch of public stuff that demonstrated that he has insight into the alignment problem, and professed his intent to solve alignment, and then he more or less got tenure from EA. Is that not what happened? Is that not what should have happened? That seems like the next best thing to directly sharing dangerous stuff. I could imagine a lot of points of disagreement, like 1. that there's such a thing as ideas that are plausible precursors to world-ending artifacts; 2. that some people should be funded to work on dangerous ideas that can't be directly shared / evidenced; 3. that Eliezer's public writing is enough to deserve "tenure"; 4. that the danger of sharing ideas that catalyze world-ending outweighs the benefits of understanding the alignment problem better and generally coordinating by sharing more. The issue of people deciding to keep secrets is a separate issue from how *other people* should treat these "sorcerers". My guess is that it'd be much better if sorcerers could be granted tenure without people trusting their opinions or taking instructions from them, when those opinions and instructions are based on work that isn't shared. (This doesn't easily mesh with intuitions about status: if someone should be given sorcerer tenure, isn't that the same thing as them being generally trusted? But no, it's not, it should be perfectly reasonable to believe someone is a good bet to do well within their cabal, but n

I began reading this charitably (unaware of whatever inside baseball is potentially going on, and seems to be alluded to), but to be honest struggled after "X" seemed to really want someone (Eliezer) to admit they're "not smart"? I'm not sure why that would be relevant. 

I think I found these lines especially confusing, if you want to explain:

  • "I just hope that people can generalize from "alignment is hard" to "generalized AI capabilities are hard".

    Is capability supposed to be hard for similar reasons as alignment? Can you expand/link? The only argument
... (read more)

Do you think it was clear to over 90% of readers that the part where he says "April fools, this is just a test!" is not a statement of truth?

5johnlawrenceaspden1y
It's not clear to me. Maybe it is an April fool joke! 

What about objective priors?

With objective priors one can always ask "so what?" If it's not my subjective prior, then its posterior will not equal my subjective posterior. There isn't an obvious way to bound the difference between my subjective prior and the objective prior.

With frequentist methods it's possible to get guarantees like "no matter what prior over you start with, if you run this method, you'll correctly estimate to within with at least probability". It's clear that a subjective Bayesian (with imperfect knowledge of their prior)... (read more)

Load More