This is, by far, the alignment approach I’m most optimistic about—more so than mechanistic interpretability, which feels too narrow to reliably constrain a sufficiently sophisticated actor.
I’ve been thinking about an datasets where the reward function is explicitly coupled to the well-being of an external entity, not merely to semantic or linguistic correctness.
At this point, we aren’t really looking for systems that are better at language. If anything, we appear to be asymptoting on those benchmarks already.
What matters is that there are countless things ...
actually there are dozens of ADHD type stimulants with meaningfully distinct properties that have prescribed (or studded) in humans. Far from having picked the low hanging fruit, the FDA just... stopped picking. For example, before Ketamine was approved, the last time the FDA approved an antidepressant with a new mechanism of action, was over 50 years ago.
most of the limits we place on ourselves are self-imposed. Wimsey is the breaking of those bonds
as I understand it, the AI capabilities necessary for Intelligence amplification via BCI already exist, and we simply need to show/encourage people how to start using it
If a person were to provide a state-of-the-art model with a month's worth of their data typically collected by our eyes and ears and the ability to interject in real time in conversations via earbuds or speaker.
Such an intervention wouldn't be the superhuman "team of geniuses in your datacenter" but it would be more helpful than even some of the best personal assistant's (and 10...
.Beautifully sad and honest.
I’ve been sitting with a similar dilemma: spending so much of my time reading, thinking, and caring about rationality (and adjacent topics) has led me to live a much lonelier life than I otherwise might have. But for better or worse, I love it, and I’m unlikely to change anytime soon.