There's something completely different going on for tasks longer than 1 minute, clearly not explained by the log-linear fit.
Perhaps humans generating training data are, for longer tasks, taking cognitive steps which are opaque to these models, or at least relatively more difficult to learn?
I'd wager 1:1 that this sort of abstraction-domain mismatch between human training data and LLMs is causing more of the HCAST weirdness than skewed finetuning investment.
Interesting!
What do we see if we apply interpretability tools to the filler tokens or repeats of the problem?
I would be especially interested in how this evolves through training, perhaps by training a more accessible model to do math / code classification with many filler tokens.
Overall, these results demonstrate a case where LLMs can do (very basic) meta-cognition without CoT.
Can you clarify what you mean by meta-cognition? I'm intuiting that these LLMs are using the extra embeddings afforded by appended tokens to do more parallel ops, which does not sound like meta-cognition to me.
I am aiming all of my resources at this, which for now looks externally like saving/investing personal capital, writing biological (molecular, NN) simulations, and searching for advice. Feel free to message me on Signal at (+1)-478-456-9667 if you want specific examples of my ideas; I expect that the entities I'm worried about accessing my research will do so after (if) it is legibly useful.
Awesome! I'm looking forward to reading many of these while traveling in the coming weeks.
Might I suggest, though, that you add to the importance score instead of multiplying? It doesn't make sense to multiply a non-log term by a logspace term.
And a fiat decision to stay sane, implemented by not instructing myself that any particular stupidity or failure will be my reaction to future stress.
I have not implemented the other two, but this decision I made during HPPD-like psychosis; yes, it is for some a learnable skill.
How much would you say (3) supports (1) on your model? I'm still pretty new to AIS and am updating from your model.
I agree that marginal improvements are good for fields like medicine, and perhaps so too AIS. E.g. I can imagine self-other overlap scaling to near-ASI, though I'm doubtful about stability under reflection. I'll put 35% we find a semi-robust solution sufficient to not kill everyone.
Given my model, I think 20% generalizability is worth a person's time. Given yours, I'd say 1% is enough.
I think that the distribution of success probability of typical optimal-from-our-perspective solutions is very wide for both of the ways we describe generalizability; within that, we should weight generalizability heavier than my understanding of your model does.
Earlier:
Designing only best-worst-case subproblem solutions while waiting for Alice would be like restricting strategies in game to ones agnostic to the opponent's moves
Is this saying people should coordinate in case valuable solutions aren't in the apriori generalizable space?
I strongly think cancer research has a huge space and can't think of anything more difficult within biology.
I was being careless / unreflective about the size of the cancer solution space, by splitting the solution spaces of alignment and cancer differently; nor do I know enough about cancer to make such claims. I split the space into immunotherapies, things which target epigenetics / stem cells, and "other", where in retrospect the latter probably has the optimal solution. This groups many small problems with possibly weakly-general solutions into a "bottleneck", as you mentioned:
aging may be a general factor to many diseases, but research into many of the things aging relates to is composed of solving many small problems that do not directly relate to aging, and defining solving aging as a bottleneck problem and judging generalizability with respect to it doesn't seem useful.
Later:
Define the baseline distribution generalizability is defined on.
For a given problem, generalizability is how likely a given sub-solution is to be part of the final solution, assuming you solve the whole problem. You might choose to model expected utility, if that differs between full solutions; I chose not to here because I natively separate generality from power.
Give a little intuition about why a threshold is meaningful, rather than a linear "more general is better".
I agree that "more general is better" with a linear or slightly superlinear (because you can make plans which rely heavier on solution) association with success probability. We were already making different value statements about "weakly" vs "strongly" general, where putting concrete probabilities / ranges might reveal us to agree w.r.t the baseline distribution of generalizability and disagree only on semantics.
I.e. thresholds are only useful for communication.
Perhaps a better way to frame this is in ratios of tractability (how hard to identify and solve) and usefulness (conditional on the solution working) between solutions with different levels generalizability. E.g. suppose some solution is 5x less general than . Then you expect, for the types of problems and solutions humans encounter, that will be more than 5x as tractable * useful as .
I disagree in expectation, meaning for now I target most of my search at general solutions.
My model of the central AIS problems:
I'd be extremely interested to hear anyone's take on my model of the central problems.
I think general solutions are especially important for fields with big solution spaces / few researchers, like alignment. If you were optimizing for, say, curing cancer, it might be different (I think both the paradigm-and subproblem-spaces are smaller there).
From my reading of John Wentworth's Framing Practicum sequence, implicit in his (and my) model is that solution spaces for these sorts of problems are apriori enormous. We (you and I) might also disagree on what apriori feasibility would be "weakly" vs "strongly" generalizable; I think my transition is around 15-30%.
Shoot, thanks. Hopefully it's clearer now.
Bounties (fractional funds distributed in good faith if you solve part of a problem):