All of Tom Shlomi's Comments + Replies

Context windows could make the claim from the post correct. Since the simulator can only consider a bounded amount of evidence at once, its P[Waluigi] has a lower bound. Meanwhile, it takes much less evidence than fits in the context window to bring its P[Luigi] down to effectively 0.

Imagine that, in your example, once Waluigi outputs B it will always continue outputting B (if he's already revealed to be Waluigi, there's no point in acting like Luigi). If there's a context window of 10, then the simulator's probability of Waluigi never goes below 1/1025, w... (read more)

The Constitutional AI paper, in a sense, shows that a smart alien with access to an RLHFed helpful language model can figure out how to write text according to a set of human-defined rules. It scares me a bit that this works well, and I worry that this sort of self-improvement is going to be a major source of capabilities progress going forward.

Talking about what a language model "knows" feels confused. There's a big distinction between what a language model can tell you if you ask it directly, what it can tell you if you ask it with some clever prompting, and what a smart alien could tell you after only interacting with that model. A moderately smart alien that could interact with GPT-3 could correctly answer far more questions than GPT-3 can even with any amount of clever prompting.

1Tom Shlomi10mo
The Constitutional AI paper, in a sense, shows that a smart alien with access to an RLHFed helpful language model can figure out how to write text according to a set of human-defined rules. It scares me a bit that this works well, and I worry that this sort of self-improvement is going to be a major source of capabilities progress going forward.

As a sort-of normative realist wagerer (I used to describe myself that way, and still have mostly the same views, but now longer consider it a good way to describe myself), I really enjoyed this post, but I think it misses the reasons the wager seems attractive to me.

To start, I don't think of the wager as being "if normative realism is true, things matter more, so I should act as if I'm a normative realist", but as being "unless normative realism is true, I don't see how I could possibly determine what matters, and so I should act as if I'm a normative re... (read more)

I don't know if I count as a nihilist, as it's unclear what precisely is meant by the term in this and other contexts. I don't think there are stance-independent normative facts, and I don't think anything "matters" independently of it mattering to people, but I find it strange to suggest that if nothing matters in the former sense that nothing matters in the latter sense. 

Compare all this to gastronomic realism and nihilism. A gastronomic realist may claim there are facts about what food is intrinsically tasty or not tasty that is true of that food i... (read more)

4Dagon1y
I don't know if I qualify or not.  I don't like nor agree with object-level morality of many who call themselves "nihilist", but I'm definitely an anti-realist in that I don't think there's an objective or observable "truth" to be had on ethical issues (or meta-ethical issues).  I'm with you on disregarding Martha's question, and I think the problem of "what is true" goes as deep as you like in meta-meta-meta-etc. ethics.  "if nihilism is true..." is not a valid start to a proposition.  It's NOT true, nor is it false.  It's a different dimension than truth.  But there are still pretty strong "should" statements to be made.  They're based on common preferences and observations of working equilibria, not directly testable.  There IS truth in expressed and observed preferences and interactions among moral actors in various contexts (modern societies and subcultures).  It is actually the case that some equilibria seem to work OK, and some seem less so, and it's very reasonable to have pro-social preferences about one's own and others' behavior.   My "should" comes from observations and extensions of what things seem to make for a more pleasant/attractive world.  Everyone SHOULD avoid burning people, even for a lot of money.  I like the world where that's the common choice much better than the world where it isn't.  Doesn't make it "true", and I can imagine contexts where there would be different common preferences and equilibria.

I really love this idea! Thanks for sharing this, I'm excited to try Calibrate.

How can list sorting be O(n)? There are n! ways to sort a list, which means that it's impossible to have a list sorting algorithm faster than O(log(n!)) = O(n*log(n)).

1deepthoughtlife1y
Morpheus is incorrect, I am actually referring to a generalization of 'counting sort' that I independently reinvented (where you are no longer limited to small positive numbers). I had never heard of sub n log n sorts at that time either. It's actually quite simple. In fact, it's how humans prefer to sort things. If you look at the object, and just know where it goes, put it there. The downside is just that there has to be a known place to put it. The O(n log n) lower bound is only for comparison sorts. On a computer, we can just make a place to put it. If each value is unique, or we're sorting numbers, or each value that is the same does not need further sorting, we can implement this in constant time per object being sorted. Also linear space, with space being O(m + n) where n is the number of objects being sorted and m being the range of values they can hold. This only works well if m is reasonable in size, not being too much larger than n. For instance, you might be sorting integers between 0 and 100, requiring only a hundred and one extra spots in memory, and be sorting 1,000,000,000 of them. (This is conceptually related to the bucket sort mentioned by Morpheus, but they are distinct, and this is truly linear). Worst case is usually that you are sorting things whose values are an unknown 32bit integer, in which case you would need about 16GB of memory just for the possible values. This is normally unjustifiable, but if you were sorting a trillion numbers (and had a large amount or RAM), it's nothing. (And the fact that it is linear would be quite a godsend.) Another positive is that this sort has virtually no branching, and is this much easier on a per instruction basis too. (Branching kills performance). This can also be integrated as a sub-routine (that requires very little memory) in an overall sorting routine similar to (but better than) quicksort, and then it is more similar to bucket sort, but kind of reversed.
2Morpheus1y
He probably means bucket sort or basically same concept radix sort. You might call that cheating, but if the range of values you sort is small, then you can think of it as linear.

That's for linking the post! I quite liked it, and I agree that computational complexity doesn't pose a challenge to general intelligence. I do want to dispute your notion that "if you hear that a problem is in a certain complexity class, that is approximately zero evidence of any conclusion drawn from it". The world is filled with evidence, and it's unlikely that closely related concepts give approximately zero evidence for each other unless they are uncorrelated or there are adversarial processes present. Hearing that list-sorting is O(n*log(n)) is prett... (read more)

1deepthoughtlife1y
Obviously, if computational complexity really forbid general intelligence...then we wouldn't have it. Technically, list sorting is O(n) if done a certain way...we just do it the O(n log n) way because it uses slightly less memory and is simpler for us (which is kind of related to Gwern's rant, and he does technically mention a specialized one, though he doesn't realize what it means, and puts it in a section about giving up generality for speed). I actually invented an example of that type of algorithm when I was teaching myself c++ using sorting algorithms and wanted to beat the built-in algorithms. In practice, sorting algorithms are often multiple different algorithms of different complexity classes mashed together...that then end up still being O(n log n) with better constant factors. We already optimize the hell out of constant factors for important problems.  I agree with you rather than Gwern. In my experience, complexity class really is a good indicator for how much compute something takes. It's not a perfect proxy, and can definitely be goodharted, but it is still effective. Often, the constant factors are closely related to the complexity class too. Even describing an 'NP' problem is vastly more difficult than a 'P' problem, because there is just so much more to it. I would claim that difficulty describing the process is actually a good proxy for how large the constant factors are likely to be, since they are what you have to keep in mind even for problem size 0, where we aren't even trying to solve the issue, just keep it in mind. 

I'm definitely only talking about probabilities in the range of >90%. >50% is justifiable without a strong argument for the disjunctivity of doom.

I like the self-driving car analogy, and I do think the probability in 2015 that a self-driving car would ever kill someone was between 50% and 95% (mostly because of a >5% chance that AGI comes before self-driving cars).

There's still the problem of successor agents and self-modifying agents, where you need to set up incentives to create successor agents with the same utility functions and to not strategically self-modify, and I think a solution to that would probably also work as a solution to normal dishonesty.

I do expect that in a case where agents can also see each other's histories, we can make bargaining go well with the bargaining theory we know (given that the agents try to bargain well, there are of course possible agents which don't try to cooperate well).

3Daniel Kokotajlo1y
In the cases I'm thinking about you don't just read their minds now, you read their entire history, including predecessor agents. All is transparent. (Fictional but illustrative example: the French AGI and the Russian AGI are smart like sherlock holmes, they can deduce pretty much everything that happened in great detail leading up to and during the creation of each other + also they are still running on human hardware at human institutions and thanks to constant leaking and also the offence/defense balance favoring offense, they can see logs of what each other is and was thinking the entire time, including through various rounds of modification-to-successor agent.)

I'm really glad that this post is addressing the disjunctivity of AI doom, as my impression is that it is more of a crux than any of the reasons in https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities.

Still, I feel like this post doesn't give a good argument for disjunctivity. To show that the arguments for a scenario with no outside view are likely, it takes more than just describing a model which is internally disjunctive. There needs to be some reason why we should strongly expect there to not be some external variables ... (read more)

Depends on what you mean by very high. If you mean >95% I agree with you. If you mean >50% I don't.

Deep learning hits a wall for decades: <5% chance. I'm being generous here. Moore's law comes to a halt: Even if the price of compute stopped falling tomorrow, it would only push my timelines back a few years. (It would help a lot for >20 year timeline scenarios, but it wouldn't be a silver bullet for them either.) Anti-tech regulation being sufficiently strong, sufficiently targeted, and happening sufficiently soon that it actually prevents doom:... (read more)

You're welcome!

The main one that comes to mind for me is, there are many possible solutions/equilibria/policy-sets that get to pareto-optimal outcomes, but they differ in how good they are for different players, and so it's not enough that players be aware of a solution, and it's also not even enough that there be one solution which stands out as extra salient--because players will be hoping to achieve a solution that is more favorable to them and might do various crazy things to try to achieve that.

This seems like it's solved by just not letting your oppo... (read more)

3Daniel Kokotajlo1y
Huh. I don't worry much about the problem of incentivizing honesty myself, because the cases I'm most worried about are cases where everyone can read everyone else's minds (with some time lag). Do you think there's basically no problem then, in those cases?

It's a combination of evidential reasoning and norm-setting. If you're playing the ultimatum game over $10 with a similarly-reasoning opponent, then deciding to only accept an (8, 2) split mostly won't increase the chance they give in, it will increase the chance that they also only accept an (8, 2) split, and so you'll end up with $2 in expectation. The point of an idea of fairness is that, at least so long as there's common knowledge of no hidden information, both players should agree on the fair split. So if, while bargaining with a similarly-reasoning ... (read more)

3Dagon1y
I see the norm-setting, which is exactly what I'm trying to point out.  Norm-setting is outside the game, and won't actually work with a lot of potential trading partners.  I seem to be missing the evidential reasoning component, other than figuring out who has more power to "win" the race. Again, this requirement weakens the argument greatly.  It's my primary objection - why do we believe that our correspondent is sufficiently similarly-reasoning for this to hold?  If it's set up long in advance that all humans can take or leave an 8,2 split, then those humans who've precommitted to reject that offer just get nothing (as does the offerer, but who knows what motivated that ancient alien)?

Thanks. I know it's that algorithm, I just want a more detailed and comprehensive description of it, so I can look at the whole thing and understand the problems with it that remain.

It's really a class of algorithms, depending on how your opponent bargains, such that if the fair bargain (by your standard of fairness) gives X utility to you and Y utility to your partner, then you refuse to accept any other solution which gives your partner at least Y utility in expectation. So if they give you a take-it-or-leave-it offer which gives you positive utility and... (read more)

2Daniel Kokotajlo1y
Thanks, this was super helpful! What would you say are the remaining problems that need to be solved, if we assume everyone has a way to accurately estimate everyone else's utility function? The main one that comes to mind for me is, there are many possible solutions/equilibria/policy-sets that get to pareto-optimal outcomes, but they differ in how good they are for different players, and so it's not enough that players be aware of a solution, and it's also not even enough that there be one solution which stands out as extra salient--because players will be hoping to achieve a solution that is more favorable to them and might do various crazy things to try to achieve that. (This is a vague problem statement though, perhaps you can do better!)

I believe it's the algorithm from https://www.lesswrong.com/posts/z2YwmzuT7nWx62Kfh/cooperating-with-agents-with-different-ideas-of-fairness. Basically, if you're offered an unfair deal (and the other trader isn't willing to renegotiate), you should accept the trade with a probability just low enough that the other trader does worse in expectation than if they offered a fair trade. For example, if you think that a fair deal would provide $10 to both players over not trading and the other trader offers a deal where they get $15 and you get $4, then you shou... (read more)

6Daniel Kokotajlo1y
Thanks. I know it's that algorithm, I just want a more detailed and comprehensive description of it, so I can look at the whole thing and understand the problems with it that remain. "Any Pareto bargaining method is vulnerable..." Interesting, thanks! I take it there is a proof somewhere of this? Where can I read about this? What is a pareto bargaining method? I feel like arguably "My bargaining protocol works great except that it incentivises people to try to fool each other about what their utility function is" .... is sorta like saying "my utopian legal system solves all social problems except for the one where people in power are incentivised to cheat/defect/abuse their authority." Though maybe that's OK if we are talking about AI bargaining and they have magic mind-reading supertechnology that lets them access each other's original (before strategic modification) utility functions?

But because Eliezer would never precommit to probably turn down a rock with an un-Shapley offer painted on its front (because non-agents bearing fixed offers created ex nihilo cannot be deterred or made less likely through any precommitment) there's always some state for Bot to stumble into in its path of reflection and self-modification where Bot comes out on top.


This is exactly why Eliezer (and I) would turn down a rock with an unfair offer. Sure, there's some tiny chance that it was indeed created ex nihilo, but it's far more likely that it was produced... (read more)