I am not one of them - I was wondering the same thing, and was hoping you had a good answer.
If I was trying to answer this question, I would probably try to figure out what fraction of all economically-valuable labor each year was cognitive, the breakdown of which tasks comprise that labor, and the year-on-year productivity increases on those task, then use that to compute the percentage of economically-valuable labor that is being automated that year.
Concretely, to get a number for the US in 1900 I might use a weighted average of productivity increases across cognitive tasks in 1900, in an approach similar to how CPI is computed
and thus I would estimate ~1% of all cognitive labor was automated in 1900. By the same methodology I would probably estimate closer to 5% for 2024.
Again, though, I am not associated with Open Phil and am not sure if they think about cognitive task automation in the same way.
Oh, indeed I was getting confused between those. So as a concrete example of your proof we could consider the following degenerate example case
def f(N: int) -> int:
if N == 0x855bdad365f9331421ab4b13737917cf97b5e8d26246a14c9af1adb060f9724a:
return 1
else:
return 0
def check(x: int, y: float) -> bool:
return f(x) >= y
def argsat(y: float, max_search: int = 2**64) -> int or None:
# We postulate that we have this function because P=NP
if y > 1:
return None
elif y <= 0:
return 0
else:
return 0x855bdad365f9331421ab4b13737917cf97b5e8d26246a14c9af1adb060f9724a
but we could also replace our degenerate f
with e.g. sha256
.
Is that the gist of your proof sketch?
Is Amodei forecasting that, in 3 to 6 months, AI will produce 90% of the value derived from written code, or just that AI will produce 90% of code, by volume? It would not surprise me if 90% of new "art" (defined as non-photographic, non-graph images) by volume is currently AI-generated, and I would not be surprised to see the same thing happen with code.
And in the same way that "AI produces 90% of art-like images" is not the same thing as "AI has solved art", I expect "AI produces 90% of new lines of code" is not the same thing as "AI has solved software".
I'm skeptical.
Did the Sakana team publish the code that their scientist agent used to write the compositional regularization paper? The post says
For our choice of workshop, we believe the ICBINB workshop is a highly relevant choice for the purpose of our experiment. As we wrote in the main text, we selected this workshop because of its broader scope, challenging researchers (and our AI Scientist) to tackle diverse research topics that address practical limitations of deep learning, unlike most workshops with a narrow focus on one topic.
This workshop focuses particularly on understanding limitations of deep learning methods applied to real world problems, and encourages participants to study negative experimental outcomes. Some may criticize our choice of a workshop that encourages discussion of “negative results” (implying that papers discussing negative results are failed scientific discoveries), but we disagree, and we believe this is an important topic.
and while it is true that "negative results" are important to report, "we report a negative result because our AI agent put forward a reasonable and interesting hypothesis, competently tested the hypothesis, and found that the hypothesis was false" looks a lot like "our AI agent put forward a reasonable and interesting hypothesis, flailed around trying to implement it, had major implementation problems, and wrote a plausible-sounding paper describing its failure as a fact about the world rather than a fact about its skill level".
The paper has a few places with giant red flags where it seems that the reviewer assumes that there were solid results that the author of the paper was simply not reporting skillfully, for example in section B2
,
I favor an alternative hypothesis: the Sakana agent determines where a graph belongs, what would be on the X and Y axis of that graph, what it expects that the graph would look like, and how to generate that graph. It then generates the graph and inserts the caption the graph would show if its hypothesis was correct. The agent has no particular ability to notice that its description doesn't work with the graph.
Has anyone trained a model to, given a prompt-response pair and an alternate response, generate an alternate prompt which is close to the original and causes the alternate response to be generated with high probability?
I ask this because
A quick search found some vaguely adjacent research, but nothing I'd rate as a super close match.
If this research really doesn't exist I'd find that really surprising, since it's a pretty obvious thing to do and there are O(100,000) ML researchers in the world. And it is entirely possible that it does exist and I just failed to find it with a cursory lit review.
Anyone familiar with similar research / deep enough in the weeds to know that it doesn't exist?
I think the ability to "just look up this code" is a demonstration of fluency - if your way of figuring out "what happens when I invoke this library function" is "read the source code", that indicates that you are able to fluently read code.
That said, fluently reading code and fluently writing code are somewhat different skills, and the very best developers relative to their toolchain can do both with that toolchain.
As a newly-minted +1 strong upvote, I disagree, though I feel that this change reflects the level of care and attention to detail that I expect out of EA.