One lesson you could take away from this is "pay attention to the data, not the process" - this happened because the data had longer successes than failures. If successes were more numerous than failures, many algorithms would have imitated those as well with null reward.
I think the "fraction of Training compute" going towards agency vs nkn agency will be lower in video models than llms, and llms will likely continue to be bigger, so video models will stay behind llms in overall agency
Helpfullness finetuning might make these models more capable when they're on the correct side of the debate. Sometimes RLHF(like) models simply perform worse on tasks they're finetuned to avoid even when they don't refuse or give up. Would be nice to try base model debaters
A core advantage of bandwidth limiting over other cybersec interventions is its a simple system we can make stronger arguments about, implemented on a simple processor, without the complexity and uncertainty of modern processors and OSes
no clock speed stays the same, but clock cycle latency of communication between regions increases. Just like CPUs require more clock cycles to access memory than they used to.
do we have any reason to believe that particular election won't be close
I'd expect artificial sweeteners are already very cheap, and most people want more tested chemicals.
I'd be interested in experiments with more diverse data. Maybe this only works because the passages are very short and simple and uniform, and are using very superposition-y information that wouldn't exist in longer and more diverse text
I've recently gotten into partner dancing and I think it's a pretty superior activity