Even if that's the case, the amount of 0-days out there (and just generally shitty infosec landscape) is enough to pwn almost any valuable target.
While I'd appreciate some help to screen out the spammers and griefers, this doesn't make me feel safe existentially.
Eliezer believes humans aligning superintelligent AI to serve human needs is as unsolvable as perpetual motion.
I'm confused. He said many times that alignment is merely hard*, not impossible.
I'm getting the same conclusions.
Think of a company like Google: building the biggest and best model is immensely valuable in a global, winner-takes-all market like search.
And this is in a world, where Google already announced that they're going to build even bigger model of their own
We are not, and won't for some* time.
I doubt that any language less represented than English (or JS/Python) would be better since the amount of good data to ingest would be much less for them.
When we evo-pressure visibly negative traits from the progressively capable AIs using RLHF (or honeypots, or whatever, it doesn't matter), we are also training it for better evasion. And what we can't see and root out will remain in the traits pool. With time it would be progressively harder to spot deceit and the capabilities for it would accumulate at an increasing rate.
And then there's another problem to it. Deceit may be linked to actually useful (for alignment and in general) traits and since those would be gimped too, the less capable models would be...
Confused how?.. The only thing that comes to mind is that's FOOM sans F. Asking for 0.2 FOOMS limit seens reasonable given current trajectory 😅
I think the proper narrative in the rocket alignment post is "We have cannons and airplanes. Now, how do we land a man on the Moon", not just "rocketry is hard":
We’re worried that if you aim a rocket at where the Moon is in the sky, and press the launch button, the rocket may not actually end up at the Moon.
So, the failure modes look less like "we misplaced booster tank and the thing exploded" and more like "we've built a huge-ass rocket, but it missed its objective and the astronauts are en-route to Oort's".
Most domains of human endeavor aren't like computer security, as illustrated by just how counterintuitive most people find the security mindset.
But some of the most impactful are - law making, economics and various others where one ought to think about incentives, "other side", or doing pre-mortems. Perhaps this could be stretched as far as "security mindset is an invaluable part of a rationality toolbox".
...If security mindset were a productive frame for tackling a wide range of problems outside of security, then many more people would have experience w
By this ruler most humans aren't GIs either. And if it passes the bar, then humans are indeed screwed and it is too late for alignment.
That's entirely expected. Hallucilying is a typical habit of language models. They do that unless some prompt engineering have been applied.
Why do you think it can't?
PS: Mimicry is a fine art too, check this out: https://www.deepmind.com/publications/creating-interactive-agents-with-imitation-learning
Some tasks improve others, some don't:
Therefore, transfer from image captioning or visual grounded question answering tasks is possible. We were not able to observe any benefit from pretraining on boxing.
In order to seek, in good faith, to help improve an interlocutor's beliefs
But it isn't the set of any specific beliefs that SE is about. Think about it as "improving credence calibration by accounting for justification methodology". What would be your advice then?
Getting stuff formally specified is insanely difficult, thus unpractical, thus pervasive verified software is impossible without some superhuman help. Here we go again.
Even going from "one simple spec" to "two simple spec" is a huge complexity jump: https://www.hillelwayne.com/post/spec-composition/
And real-world software has a huge state envelope.