The Constitutional AI paper, in a sense, shows that a smart alien with access to an RLHFed helpful language model can figure out how to write text according to a set of human-defined rules. It scares me a bit that this works well, and I worry that this sort of self-improvement is going to be a major source of capabilities progress going forward.
Talking about what a language model "knows" feels confused. There's a big distinction between what a language model can tell you if you ask it directly, what it can tell you if you ask it with some clever prompting, and what a smart alien could tell you after only interacting with that model. A moderately smart alien that could interact with GPT-3 could correctly answer far more questions than GPT-3 can even with any amount of clever prompting.
As a sort-of normative realist wagerer (I used to describe myself that way, and still have mostly the same views, but now longer consider it a good way to describe myself), I really enjoyed this post, but I think it misses the reasons the wager seems attractive to me.
To start, I don't think of the wager as being "if normative realism is true, things matter more, so I should act as if I'm a normative realist", but as being "unless normative realism is true, I don't see how I could possibly determine what matters, and so I should act as if I'm a normative re...
I don't know if I count as a nihilist, as it's unclear what precisely is meant by the term in this and other contexts. I don't think there are stance-independent normative facts, and I don't think anything "matters" independently of it mattering to people, but I find it strange to suggest that if nothing matters in the former sense that nothing matters in the latter sense.
Compare all this to gastronomic realism and nihilism. A gastronomic realist may claim there are facts about what food is intrinsically tasty or not tasty that is true of that food i...
How can list sorting be O(n)? There are n! ways to sort a list, which means that it's impossible to have a list sorting algorithm faster than O(log(n!)) = O(n*log(n)).
That's for linking the post! I quite liked it, and I agree that computational complexity doesn't pose a challenge to general intelligence. I do want to dispute your notion that "if you hear that a problem is in a certain complexity class, that is approximately zero evidence of any conclusion drawn from it". The world is filled with evidence, and it's unlikely that closely related concepts give approximately zero evidence for each other unless they are uncorrelated or there are adversarial processes present. Hearing that list-sorting is O(n*log(n)) is prett...
I'm definitely only talking about probabilities in the range of >90%. >50% is justifiable without a strong argument for the disjunctivity of doom.
I like the self-driving car analogy, and I do think the probability in 2015 that a self-driving car would ever kill someone was between 50% and 95% (mostly because of a >5% chance that AGI comes before self-driving cars).
There's still the problem of successor agents and self-modifying agents, where you need to set up incentives to create successor agents with the same utility functions and to not strategically self-modify, and I think a solution to that would probably also work as a solution to normal dishonesty.
I do expect that in a case where agents can also see each other's histories, we can make bargaining go well with the bargaining theory we know (given that the agents try to bargain well, there are of course possible agents which don't try to cooperate well).
I'm really glad that this post is addressing the disjunctivity of AI doom, as my impression is that it is more of a crux than any of the reasons in https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities.
Still, I feel like this post doesn't give a good argument for disjunctivity. To show that the arguments for a scenario with no outside view are likely, it takes more than just describing a model which is internally disjunctive. There needs to be some reason why we should strongly expect there to not be some external variables ...
Depends on what you mean by very high. If you mean >95% I agree with you. If you mean >50% I don't.
Deep learning hits a wall for decades: <5% chance. I'm being generous here. Moore's law comes to a halt: Even if the price of compute stopped falling tomorrow, it would only push my timelines back a few years. (It would help a lot for >20 year timeline scenarios, but it wouldn't be a silver bullet for them either.) Anti-tech regulation being sufficiently strong, sufficiently targeted, and happening sufficiently soon that it actually prevents doom:...
You're welcome!
The main one that comes to mind for me is, there are many possible solutions/equilibria/policy-sets that get to pareto-optimal outcomes, but they differ in how good they are for different players, and so it's not enough that players be aware of a solution, and it's also not even enough that there be one solution which stands out as extra salient--because players will be hoping to achieve a solution that is more favorable to them and might do various crazy things to try to achieve that.
This seems like it's solved by just not letting your oppo...
It's a combination of evidential reasoning and norm-setting. If you're playing the ultimatum game over $10 with a similarly-reasoning opponent, then deciding to only accept an (8, 2) split mostly won't increase the chance they give in, it will increase the chance that they also only accept an (8, 2) split, and so you'll end up with $2 in expectation. The point of an idea of fairness is that, at least so long as there's common knowledge of no hidden information, both players should agree on the fair split. So if, while bargaining with a similarly-reasoning ...
Thanks. I know it's that algorithm, I just want a more detailed and comprehensive description of it, so I can look at the whole thing and understand the problems with it that remain.
It's really a class of algorithms, depending on how your opponent bargains, such that if the fair bargain (by your standard of fairness) gives X utility to you and Y utility to your partner, then you refuse to accept any other solution which gives your partner at least Y utility in expectation. So if they give you a take-it-or-leave-it offer which gives you positive utility and...
I believe it's the algorithm from https://www.lesswrong.com/posts/z2YwmzuT7nWx62Kfh/cooperating-with-agents-with-different-ideas-of-fairness. Basically, if you're offered an unfair deal (and the other trader isn't willing to renegotiate), you should accept the trade with a probability just low enough that the other trader does worse in expectation than if they offered a fair trade. For example, if you think that a fair deal would provide $10 to both players over not trading and the other trader offers a deal where they get $15 and you get $4, then you shou...
But because Eliezer would never precommit to probably turn down a rock with an un-Shapley offer painted on its front (because non-agents bearing fixed offers created ex nihilo cannot be deterred or made less likely through any precommitment) there's always some state for Bot to stumble into in its path of reflection and self-modification where Bot comes out on top.
This is exactly why Eliezer (and I) would turn down a rock with an unfair offer. Sure, there's some tiny chance that it was indeed created ex nihilo, but it's far more likely that it was produced...
Context windows could make the claim from the post correct. Since the simulator can only consider a bounded amount of evidence at once, its P[Waluigi] has a lower bound. Meanwhile, it takes much less evidence than fits in the context window to bring its P[Luigi] down to effectively 0.
Imagine that, in your example, once Waluigi outputs B it will always continue outputting B (if he's already revealed to be Waluigi, there's no point in acting like Luigi). If there's a context window of 10, then the simulator's probability of Waluigi never goes below 1/1025, w... (read more)