faul_sname

Wikitag Contributions

Comments

Sorted by

As a newly-minted +1 strong upvote, I disagree, though I feel that this change reflects the level of care and attention to detail that I expect out of EA.

I am not one of them - I was wondering the same thing, and was hoping you had a good answer.

If I was trying to answer this question, I would probably try to figure out what fraction of all economically-valuable labor each year was cognitive, the breakdown of which tasks comprise that labor, and the year-on-year productivity increases on those task, then use that to compute the percentage of economically-valuable labor that is being automated that year.

Concretely, to get a number for the US in 1900 I might use a weighted average of productivity increases across cognitive tasks in 1900, in an approach similar to how CPI is computed

  • Look at the occupations listed in the 1900 census records
  • Figure out which ones are common, and then sample some common ones and make wild guesses about what those jobs looked like in 1900
  • Classify those tasks as cognitive or non-cognitive
  • Come to estimate that record-keeping tasks are around a quarter to a half of all cognitive labor
  • Notice that typewriters were starting to become more popular - about 100,000 typewriters sold per year
  • Note that those 100k typewriters were going to the people who would save the most time by using them
  • As such, estimate 1-2% productivity growth in record-keeping tasks in 1900
  • Multiply the productivity growth for record-keeping tasks by the fraction of time (technically actually 1-1/productivity increase but when productivity increase is small it's not a major factor)
  • Estimate that 0.5% of cognitive labor was automated by specifically typewriters in 1900
  • Figure that's about half of all cognitive labor automation in 1900

and thus I would estimate ~1% of all cognitive labor was automated in 1900. By the same methodology I would probably estimate closer to 5% for 2024.

Again, though, I am not associated with Open Phil and am not sure if they think about cognitive task automation in the same way.

What fraction of economically-valuable cognitive labor is already being automated today?

Did e.g. a telephone operator in 1910 perform cognitive labor, by the definition we want to use here?

Oh, indeed I was getting confused between those. So as a concrete example of your proof we could consider the following degenerate example case

def f(N: int) -> int:
    if N == 0x855bdad365f9331421ab4b13737917cf97b5e8d26246a14c9af1adb060f9724a:
        return 1
    else:
        return 0

def check(x: int, y: float) -> bool:
    return f(x) >= y

def argsat(y: float, max_search: int = 2**64) -> int or None:
    # We postulate that we have this function because P=NP
    if y > 1:
        return None
    elif y <= 0:
        return 0
    else:
        return 0x855bdad365f9331421ab4b13737917cf97b5e8d26246a14c9af1adb060f9724a

but we could also replace our degenerate f with e.g. sha256.

Is that the gist of your proof sketch?

Finding the input x such that f(x) == argmax(f(x)) is left as an exercise for the reader though.

Is Amodei forecasting that, in 3 to 6 months, AI will produce 90% of the value derived from written code, or just that AI will produce 90% of code, by volume? It would not surprise me if 90% of new "art" (defined as non-photographic, non-graph images) by volume is currently AI-generated, and I would not be surprised to see the same thing happen with code.

And in the same way that "AI produces 90% of art-like images" is not the same thing as "AI has solved art", I expect "AI produces 90% of new lines of code" is not the same thing as "AI has solved software".

I'm skeptical.

Did the Sakana team publish the code that their scientist agent used to write the compositional regularization paper? The post says

For our choice of workshop, we believe the ICBINB workshop is a highly relevant choice for the purpose of our experiment. As we wrote in the main text, we selected this workshop because of its broader scope, challenging researchers (and our AI Scientist) to tackle diverse research topics that address practical limitations of deep learning, unlike most workshops with a narrow focus on one topic.

This workshop focuses particularly on understanding limitations of deep learning methods applied to real world problems, and encourages participants to study negative experimental outcomes. Some may criticize our choice of a workshop that encourages discussion of “negative results” (implying that papers discussing negative results are failed scientific discoveries), but we disagree, and we believe this is an important topic.

and while it is true that "negative results" are important to report, "we report a negative result because our AI agent put forward a reasonable and interesting hypothesis, competently tested the hypothesis, and found that the hypothesis was false" looks a lot like "our AI agent put forward a reasonable and interesting hypothesis, flailed around trying to implement it, had major implementation problems, and wrote a plausible-sounding paper describing its failure as a fact about the world rather than a fact about its skill level".

The paper has a few places with giant red flags where it seems that the reviewer assumes that there were solid results that the author of the paper was simply not reporting skillfully, for example in section B2

 

I favor an alternative hypothesis: the Sakana agent determines where a graph belongs, what would be on the X and Y axis of that graph, what it expects that the graph would look like, and how to generate that graph. It then generates the graph and inserts the caption the graph would show if its hypothesis was correct. The agent has no particular ability to notice that its description doesn't work with the graph.

 

Plausibly going off into the woods decreases the median output while increasing the variance.

Has anyone trained a model to, given a prompt-response pair and an alternate response, generate an alternate prompt which is close to the original and causes the alternate response to be generated with high probability?

I ask this because

  1. It strikes me that many of the goals of interpretability research boil down to "figure out why models say the things they do, and under what circumstances they'd say different things instead". If we could reliably ask the model and get an intelligible and accurate response back, that would almost trivialize this sort of research.
  2. This task seems like it has almost ideal characteristics for training on - unlimited synthetic data, granular loss metric, easy for a human to see if the model is doing some weird reward hacky thing by spot checking outputs

A quick search found some vaguely adjacent research, but nothing I'd rate as a super close match.

If this research really doesn't exist I'd find that really surprising, since it's a pretty obvious thing to do and there are O(100,000) ML researchers in the world. And it is entirely possible that it does exist and I just failed to find it with a cursory lit review.

Anyone familiar with similar research / deep enough in the weeds to know that it doesn't exist?

I think the ability to "just look up this code" is a demonstration of fluency - if your way of figuring out "what happens when I invoke this library function" is "read the source code", that indicates that you are able to fluently read code.

That said, fluently reading code and fluently writing code are somewhat different skills, and the very best developers relative to their toolchain can do both with that toolchain.

Load More