Wiki Contributions

Comments

Tao Lin1mo10

I've recently gotten into partner dancing and I think it's a pretty superior activity

Tao Lin1mo19

One lesson you could take away from this is "pay attention to the data, not the process" - this happened because the data had longer successes than failures. If successes were more numerous than failures, many algorithms would have imitated those as well with null reward.

Tao Lin2mo50

I think the "fraction of Training compute" going towards agency vs nkn agency will be lower in video models than llms, and llms will likely continue to be bigger, so video models will stay behind llms in overall agency

Tao Lin2mo70

Helpfullness finetuning might make these models more capable when they're on the correct side of the debate. Sometimes RLHF(like) models simply perform worse on tasks they're finetuned to avoid even when they don't refuse or give up. Would be nice to try base model debaters

Tao Lin2mo53

A core advantage of bandwidth limiting over other cybersec interventions is its a simple system we can make stronger arguments about, implemented on a simple processor, without the complexity and uncertainty of modern processors and OSes

Tao Lin3mo10

no clock speed stays the same, but clock cycle latency of communication between regions increases. Just like CPUs require more clock cycles to access memory than they used to.

Tao Lin3mo30

do we have any reason to believe that particular election won't be close

Tao Lin3mo85

I'd expect artificial sweeteners are already very cheap, and most people want more tested chemicals.

Tao Lin3mo10

I'd be interested in experiments with more diverse data. Maybe this only works because the passages are very short and simple and uniform, and are using very superposition-y information that wouldn't exist in longer and more diverse text

Load More