Sure, but coming up with what to try, which hyperparameters to adjust, which heuristics to apply, etc. is not something for which we have a meaningful programme. You can’t brute force taste!

(Even if we eventually CAN, we’ve got a long way to go before we can disregard the intentions of the researchers entirely. You can’t Goodhart taste either!)

[-]Jessica Rumbelow3y30

“being able to reorganise a question in the form of a model-appropriate game” seems like something we already have built a set of reasonable heuristics around - categorising different types of problems and their appropriate translations into ML-able tasks. There are well established ML approaches to, e.g. image captioning, time-series prediction, audio segmentation etc etc. is the bottleneck you’re concerned with the lack of breadth and granularity of these problem-sets, OP - and we can mark progress (to some extent) by the number of these problem sets we have robust ML translations for?

[-]aog3y21

There are well established ML approaches to, e.g. image captioning, time-series prediction, audio segmentation etc etc. is the bottleneck you’re concerned with the lack of breadth and granularity of these problem-sets, OP - and we can mark progress (to some extent) by the number of these problem sets we have robust ML translations for?

I think this is an important problem. Going from progress on ML benchmarks to progress on real-world tasks is a very difficult challenge. For example, years after human level performance on ImageNet, we still have lots of trouble with real-world applications of computer vision like self-driving cars and medical diagnostics. That's because ImageNet isn't a directly valuable real world task, but rather is built to be amenable to supervised learning models that output a single class label for each input.

While scale will improve performance within established paradigms, putting real world problems into ML paradigms remains squarely a problem for human research taste.

Moderation Log

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

16

[Crosspost] AlphaTensor, Taste, and the Scalability of AI

16

16