By "reliable" I mean it in the same way as we think of it for self-driving cars. A self-driving car that is great 99% of the time and fatally crashes 1% of the time isn't really "high skill and unreliable" - part of having "skill" in driving is being reliable.
In the same way, I'm not sure I would want to employ an AI software engineer that 99% of the time was great, but 1% of the time had totally weird inexplicable failure modes that you'd never see with a human. It would just be stressful to supervise, to limit its potential harmful impact to the company, etc. So it seems to me that AI's won't be given control of lots of things, and therefore won't be transformative, until that reliability threshold is met.
Two possibilities have most of the "no agi in 10 years" probability mass for me:
Well sure, but the interesting question is the minimum value of P at which you'd still push
I also agree with the statement. I'm guessing most people who haven't been sold on longtermism would too.
When people say things like "even a 1% chance of existential risk is unacceptable", they are clearly valuing the long term future of humanity a lot more than they are valuing the individual people alive right now (assuming that the 99% in that scenario above is AGI going well & bringing huge benefits).
Related question: You can push a button that will, with probability P, cure aging and make all current humans immortal. But with probability 1-P, all humans die. How high does P have to be before you push? I suspect that answers to this question are highly correlated with AI caution/accelerationsim
Not sure I understand; if model runs generate value for the creator company, surely they'd also create value that lots of customers would be willing to pay for. If every model run generates value, and there's ability to scale, then why not maximize revenue by maximizing the number of people using the model? The creator company can just charge the customers, no? Sure, competitors can use it too, but does that really override losing an enormous market of customers?
I won't argue with the basic premise that at least on some metrics that could be labeled as evolution's "values", humans are currently doing very well.
But, the following are also true:
Examples of such actions in (3) could be:
None of those actions is guaranteed to happen. But if I were creating an AI, and I found that it was enough smarter than me that I no longer had any way to control it, and if I noticed that it was considering total-value-destroying actions as reasonable things to maybe do someday, then I would be extremely concerned.
If the claim is that evolution has "solved alignment", then I'd say you need to argue that the alignment solution is stable against arbitrary gains in capability. And I don't think that's the case here.
That's great. "The king can't fetch the coffee if he's dead"
Wow. When I use GPT-4, Ive had a distinct sense of "I bet this is what it would have felt like to use one of the earliest computers". Until this post I didnt realize how literal that sense might be.
This is a really cool and apt analogy - computers and LLM scaffolding really do seem like the same abstraction. Thinking this way seems illuminating as to where we might be heading.
I always assumed people were using "jailbreak" in the computer sense (e.g. jailbreak your phone/ps4/whatever), not in the "escape from prison" sense.
Jailbreak (computer science), a jargon expression for (the act of) overcoming limitations in a computer system or device that were deliberately placed there for security, administrative, or marketing reasons
I think the definition above is a perfect fit for what people are doing with ChatGPT
I would say:
A theory always takes the following form: "given [premises], I expect to observe [outcomes]". The only way to say that an experiment has falsified a theory is to correctly observe/set up [premises] but then not observe [outcomes].
If an experiment does not correctly set up [premises], then that experiment is invalid for falsifying or supporting the theory. The experiment gives no (or nearly no) Bayesian evidence either way.
In this case, [premises] are the assumptions we made in determining the theoretical pendulum period; things like "the string length doesn't change", "the pivot point doesn't move", "gravity is constant", "the pendulum does not undergo any collisions", etc. The fact that (e.g.) the pivot point moved during the experiment invalidates the premises, and therefore the experiment does not give any Bayesian evidence one way or another against our theory.
Then the students could say:
"But you didn't tell us that the pivot point couldn't move when we were doing the derivation! You could just be making up new "necessary premises" for your theory every time it gets falsified!"
In which case I'm not 100% sure what I'd say. Obviously we could have listed out more assumptions that we did, but where do you stop? "the universe will not explode during the experiment"...?