AI alignment researcher supported by MIRI and LTFF. Working on the learning-theoretic agenda. Based in Israel. See also LinkedIn.
E-mail: vanessa DOT kosoy AT {the thing reverse stupidity is not} DOT org
Can you explain what's your definition of "accuracy"? (the 87.7% figure)
Does it correspond to some proper scoring rule?
(just for fun)
Rings true. Btw, I heard many times people with experience in senior roles making "ha ha only serious" jokes about how obviously any manager would hire more underlings if only you let them. I also feel the pull of this motivation myself, although usually I prefer other kinds of status. (Of the sort "people liking/admiring me" rather than "me having power over people".)
You're ignoring the part where making something cheaper is a real benefit. For example, it's usually better to have a world where everyone can access a thing of slightly lower quality, than a world where only a small elite can access a thing, but the thing is of slightly higher quality.
I think that some people are massively missing the point of the Turing test. The Turing test is not about understanding natural language. The idea of the test is, if an AI can behave indistinguishably from a human as far as any other human can tell, then obviously it has at least as much mental capability as humans have. For example, if humans are good at some task X, then you can ask the AI to solve the same task, and if it does poorly then it's a way to distinguish the AI from a human.
The only issue is how long the test should take and how qualified the judge. Intuitively, it feels plausible that if an AI can withstand (say) a few hours of drilling by an expert judge, then it would do well even on tasks that take years for a human. It's not obvious, but it's at least plausible. And I don't think existing AIs are especially near to passing this.
Here's the sketch of an AIT toy model theorem that in complex environments without traps, applying selection pressure reliably produces learning agents. I view it as an example of Wentworth's "selection theorem" concept.
Consider any environment μ of infinite Kolmogorov complexity (i.e. uncomputable). Fix a computable reward function
r:(A×O)∗→[0,1]Suppose that there exists a policy π∗ of finite Kolmogorov complexity (i.e. computable) that's optimal for μ in the slow discount limit. That is,
limγ→1(1−γ)(maxπEμπ[∞∑n=0γnrn]−Eμπ∗[∞∑n=0γnrn])=0Then, μ cannot be the only environment with this property. Otherwise, this property could be used to define μ using a finite number of bits, which is impossible[1]. Since μ requires infinitely many more bits to specify than π∗ and r, there has to be infinitely many environments with the same property[2]. Therefore, π∗ is a reinforcement learning algorithm for some infinite class of hypothesis.
Moreover, there are natural examples of μ as above. For instance, let's construct μ as an infinite sequence of finite communicating infra-RDP refinements that converges to an unambiguous (i.e. "not infra") environment. Since each refinement involves some arbitrary choice, "most" such μ have infinite Kolmogorov complexity. In this case, π∗ exists: it can be any learning algorithm for finite communicating infra-RDP with arbitrary number of states.
Besides making this a rigorous theorem, there are many additional questions for further investigation:
Probably, making this argument rigorous requires replacing the limit with a particular regret bound. I ignore this for the sake of simplifying the core idea.
There probably is something more precise that can be said about how "large" this family of environment is. For example, maybe it must be uncountable.