humans, despite being fully general, have vastly varying ability to do various tasks, e.g. they're much better at climbing mountains than playing GO it seems. Humans also routinely construct entirely technology bases to enable them to do tasks that they cannot do themselves. This is, in some sense, a core human economic activity: the construction of artifacts that can do tasks better/faster/more efficiently than humans can do themselves. It seems like by default, you should expect a similar dynamic with "fully general" AIs. That is, AIs trained to do semic...
Not literally the best, but retargetable algorithms are on the far end of the spectrum of "fully specialized" to "fully general", and I expect most tasks we train AIs to do to have heuristics that enable solving the tasks much faster than "fully general" algorithms, so there's decently strong pressure to be towards the "specialized" side.
I also think that heuristics are going to be closer to multiplicative speed ups than additive, so it's going to be closer to "general algorithms just can't compete" than "it's just a little worse". E.g. random search is te...
oops thanks
yeah, should be x AND y.
One of the main reasons I expect this to not work is because optimization algorithms that are the best at optimizing some objective given a fixed compute budget seem like they basically can't be generally-retargetable. E.g. if you consider something like stockfish, it's a combination of search (which is retargetable), sped up by a series of very specialized heuristics that only work for winning. If you wanted to retarget stockfish to "maximize the max number of pawns you ever have" you had, you would not be able to use [specialized for telling whether a mo...
Flagging that I don't think your description of what ELK is trying to do is that accurate, e.g. we explicitly don't think that you can rely on using ELK to ask your AI if it's being deceptive, because it might just not know. In general, we're currently quite comfortable with not understanding a lot of what our AI is "thinking", as long as we can get answers to a particular set of "narrow" questions we think is sufficient to determine how good the consequences of an action are. More in “Narrow” elicitation and why it might be sufficient.
Separately, I think ...
If powerful AIs are deployed in worlds mostly shaped by slightly less powerful AIs, you basically need competitiveness to be able to take any "pivotal action" because all the free energy will have been eaten by less powerful AIs.
The humans presumably have access to the documents being summarized.
Here's a conversation that I think is vaguely analogous:
Alice: Suppose we had a one-way function, then we could make passwords better by...
Bob: What do you want your system to do?
Alice: Well, I want passwords to be more robust to...
Bob: Don't tell me about the mechanics of the system. Tell me what you want the system to do.
Alice: I want people to be able to authenticate their identity more securely?
Bob: But what will they do with this authentication? Will they do good things? Will they do bad things?
Alice: IDK I just think the world is likely to be gener...
Some common issues with alignment plans, on Eliezer's account, include:
Bob: Oh OK, we're just going to create this user authetication technology and hope people use it for good?
Seems to me that the answer "I hope people will use it for good" is quite okay for authentication, but not okay for alignment. Doing good is outside the scope of authentication, but is kinda the point of alignment.
Isn't there an equilibrium where people assume other people's militaries are as strong as they can demonstrate, and people just fully disclose their military strength?
Yep, thanks. Fixed.
See https://www.nickbostrom.com/aievolution.pdf for a discussion about why such arguments probably don't end up pushing timelines forward that much.
From my perspective, ELK is currently very much "A problem we don't know how to solve, where we think rapid progress is being made (as we're still building out the example-counterexample graph, and are optimistic that we'll find an example without counterexamples)" There's some question of what "rapid" means, but I think we're on track for what we wrote in the ELK doc: "we're optimistic that within a year we will have made significant progress either towards a solution or towards a clear sense of why the problem is hard."
We've spent ~9 months on the proble...
The official deadline for submissions is "before I check my email on the 16th", which I tend to do around 10 am PST.
Before I check my email on Feb 16th, which I will do around 10am PST.
The high-level reason is that the 1e12N model is not that much better at prediction than the 2N model. You can correct for most of the correlation even with only a vague guess at how different the AI and human probabilities are, and most AI and human probabilities aren't going to be that different in a way that produces a correlation the human finds suspicious. I think that the largest correlations are going to be produced by the places the AI and the human have the biggest differences in probabilities, which are likely also going to be the places where th...
I agree that i does slightly worse than t on consistency checks, but i also does better on other regularizers you're (maybe implicitly) using like speed/simplicity, so as long as i doesn't do too much worse it'll still beat out the direct translator.
One possible thing you might try is some sort of lexicographic ordering of regularization losses. I think this rapidly runs into other issues with consistency checks, like the fact that the human is going to be systematically wrong about some correlations, so i potentially is more consistent than t.
I think latex renders if you're using the markdown editor, but if you're using the other editor then it only works if you use the equation editor.
I feel mostly confused by the way that things are being framed. ELK is about the human asking for various poly-sized fragments and the model reporting what those actually were instead of inventing something else. The model should accurately report all poly-sized fragments the human knows how to ask for.
Like the thing that seems weird to me here is that you can't simultaneously require that the elicited knowledge be 'relevant' and 'comprehensible' and also cover these sorts of obfuscated debate like scenarios.
I don't know what you mean by "relevant" or ...
I don’t think I understand your distinction between obfuscated and non-obfuscated knowledge. I generally think of non-obfuscated knowledge as NP or PSPACE. The human judgement of a situation might only theoretically require a poly sized fragment of a exp sized computation, but there’s no poly sized proof that this poly sized fragment is the correct fragment, and there are different poly sized fragments for which the human will evaluate differently, so I think of ELK as trying to elicit obfuscated knowledge.
We would prefer submissions be private until February 15th.
We generally assume that we can construct questions sufficiently well that there's only one unambiguous interpretation. We also generally assume that the predictor "knows" which world it's in because it can predict how humans would respond to hypothetical questions about various situations involving diamonds and sensors and that humans would say in theory Q1 and Q2 could be different.
More concretely, our standard for judging proposals is exhibiting an unambiguous failure. If it was plausible you asked the wrong question, or the AI didn't know what you mean...
I think we would be trying to elicit obfuscated knowledge in ELK. In our examples, you can imagine that the predictor's Bayes net works "just because", so an argument that is convincing to a human for why the diamond in the room has to be arguing that the Bayes net is a good explanation of reality + arguing that it implies the diamond is in the room, which is the sort of "obfuscated" knowledge that debate can't really handle.
Note that this has changed to February 15th.
The dataset is generated with the human bayes net, so it's sufficient to map to the human bayes net. There is, of course, an infinite set of "human" simulators that use slightly different bayes nets that give the same answers on the training set.
Does this mean that the method needs to work for ~arbitrary architectures, and that the solution must use substantially the same architecture as the original?
Yes, approximately. If you can do it for only e.g. transformers, but not other things, that would be interesting.
Does this mean that it must be able to deal with a broad variety of questions, so that we cannot simply sit down and think about how to optimize the model for getting a single question (e.g. "Where is the diamond?") right?
Yes, approximately. Thinking about how to get one question rig...
We generally imagine that it’s impossible to map the predictors net directly to an answer because the predictor is thinking in terms of different concepts, so it has to map to the humans nodes first in order to answer human questions about diamonds and such.
The SmartFabricator seems basically the same. In the robber example, you might imagine the SmartVault is the one that puts up the screen to conceal the fact that it let the diamond get stolen.
Looks good to me.
Yes. Section Strategy: have a human operate the SmartVault and ask them what happened describes what I think you're asking about.
A different way of phrasing Ajeya's response, which I think is roughly accurate, is that if you have a reporter that gives consistent answers to questions, you've learned a fact about the predictor, namely "the predictor was such that when it was paired with this reporter it gave consistent answers to questions." if there were 8 predictor for which this fact was true then "it's the [7th] predictor such that when it was paired with this reporter it gave consistent answers to questions" is enough information to uniquely determine the reporter, e.g. the previ...
[deleted]
There is a distinction between the way that the predictor is reasoning and the way that the reporter works. Generally, we imagine that that the predictor is trained the same way the "unaligned benchmark" we're trying to compare to is trained, and the reporter is the thing that we add onto that to "align" it (perhaps by only training another head on the model, perhaps by finetuning). Hopefully, the cost of training the reporter is small compared to the cost of the predictor (maybe like 10% or something)
In this frame, doing anything to train the way the pred...
I think that problem 1 and problem 2 as you describe them are potentially talking about the same phenomenon. I'm not sure I'm understanding correctly, but I think I would make the following claims:
Thanks for your proposal! We have considered similar strategies in the past. The main points of the breaker response would be:
My point is either that:
Thanks for your proposal! I'm not sure I understand how the "human is happy with experiment" part is supposed to work. Here are some thoughts:
We don't think that real humans are likely to be using Bayes nets to model the world. We make this assumption for much the same reasons that we assume models use Bayes nets, namely that it's a test case where we have a good sense of what we want a solution to ELK to look like. We think the arguments given in the report will basically extend to more realistic models of how humans reason (or rather, we aren't aware of a concrete model of how humans reason for which the arguments don't apply).
If you think there's a specific part of the report where the human Bayes net assumption seems crucial, I'd be happy to try to give a more general form of the argument in question.
Agreed, but the thing you want to use this for isn’t simulating a long reflection, which will fail (in the worst case) because HCH can’t do certain types of learning efficiently.
I want to flag that HCH was never intended to simulate a long reflection. It’s main purpose (which it fails in the worse case) is to let humans be epistemically competitive with the systems you’re trying to train.
The way that you would think about NN anchors in my model (caveat that this isn't my whole model):
My model is something like:
In general, Baumol type effects (spending decreasing in sectors where productivity goes up), mean that we can have scenarios in which the economy is growing extremely fast on "objective" metrics like energy consumption, but GDP has stagnated because that energy is being spent on extremely marginal increases in goods being bought and sold.
A similar point is made by Korinek in his review of Could Advanced AI Drive Explosive Economic Growth:
...My first reaction to the framing of the paper is to ask: growth in what? It’s important to keep in mind that concepts like “gross domestic product” and “world gross domestic product” were defined from an explicit anthropocentric perspective - they measure the total production of final goods within a certain time period. Final goods are what is either consumed by humans (e.g. food or human services) or what is invested into “capital goods” that last for m
Yeah that seems like a reasonable example of a good that can't be automated.
I think I'm mostly interested in whether these sorts of goods that seem difficult to automate will be a pragmatic constraint on economic growth. It seems clear that they'll eventually be ultimate binding constraints as long as we don't get massive population growth, but it's a separate question about whether or not they'll start being constraints early enough to prevent rapid AI-driven economic growth.
Related: https://forum.effectivealtruism.org/posts/GfWJCF3rqdW48D7k8/be-specific-about-your-career
You might consider cross-posting this to the EA forum.
Thanks! I will try, although they will likely stay very intermittent.
My house implemented such a tax.
Re 1, we ran into some of the issues Matthew brought up, but all other COVID policies are implicitly valuing risk at some dollar amount (possibly inconsistently), so the Pigouvian tax seemed like the best option available.
I'd be interested to see the rest of this list, if you're willing to share.
I think competitiveness matters a lot even if there's only moderate amounts of competitive pressure. The gaps in efficiency I'm imagining are less "10x worse" and more like "I only had support vector machines and you had SGD"