LESSWRONG
LW

710
David Rein
1553290
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
1phone.spinning's Shortform
3y
19
METR Research Update: Algorithmic vs. Holistic Evaluation
David Rein17d30

Important to caveat that these results are pretty small—I wouldn't take the absolute numbers too seriously beyond the general "algorithmic scoring may often overestimate software capabilities". 

Reply
Will compute bottlenecks prevent a software intelligence explosion?
David Rein2mo10

Hmm, I actually kind of lean towards it being rational, and labs just underspending on labor vs. capital for contigent historical/cultural reasons. I do think a lot of the talent juice is in "banal" progress like efficiently running lots of experiments, and iterating on existing ideas straightforwardly (as opposed to something like "only a few people have the deep brilliance/insight to make progress"), but that doesn't change the upshot IMO.

Reply
Will compute bottlenecks prevent a software intelligence explosion?
David Rein2mo10

Salaries have indeed now gotten pretty high—it seems like they're within an OOM of compute spend (at least at Meta). 

Reply
NYU Code Debates Update/Postmortem
David Rein1y10

That's indeed what I meant! 

Reply
When can we trust model evaluations?
David Rein2yΩ010

the existence of predicates on the world that are easier to evaluate than generate examples of (in the same way that verifying the answer to a problem in NP can be easier than generating it) guarantees that the model should be better at distinguishing between evaluation and deployment than any evaluator can be at tricking it into thinking it's in deployment

 

Where does the guarantee come from? Why do we know that for this specific problem (generating vs. evaluating whether the model is deployed) it's easier to evaluate? For many problems it's equally difficult, right?

Reply
Anthropic Fall 2023 Debate Progress Update
David Rein2y20

Given that the judge that selects the best argument for BoN is the same as the one that chooses the winner, what is your main takeaway from the fact that ELO increases as you increase N? I see this as mainly a sanity check, but want to check if I'm missing something. 

Reply
Debate helps supervise human experts [Paper]
David Rein2y52

Another author here! Regarding specifically the 74% vs. 84% numbers - a key takeaway that our error analysis is intended to communicate is that we think a large fraction of the errors judges made in debates were pretty easily solvable with more careful judges, whereas this didn't feel like it was the case with consultancy. 

For example, Julian and I both had 100% accuracy as judges on human debates for the 36 human debates we judged, which was ~20% of all correct human debate judgments. So I'd guess that more careful judges overall could increase debate accuracy to at least 90%, maybe higher, although at that point we start hitting measurement limits from the questions themselves being noisy. 

Reply
Models Don't "Get Reward"
David Rein3y30

The issue with early finetuning is that there’s not much that humans can actually select on, because the models aren’t capable enough - it’s really hard for me to say that one string of gibberish is better/worse.

Reply
What specific thing would you do with AI Alignment Research Assistant GPT?
David Rein3y41

I think the issue with the more general “neocortex prosthesis” is that if AI safety/alignment researchers make this and start using it, every other AI capabilities person will also start using it.

Reply
Debate update: Obfuscated arguments problem
David Rein3y10
  • unreasonable ^5

 

 

I think there's a typographical error - this doesn't link to any footnote for me, and there doesn't appear to be a fifth footnote at the end of the post

Reply
Load More
101METR Research Update: Algorithmic vs. Holistic Evaluation
1mo
7
27NYU Code Debates Update/Postmortem
1y
4
1phone.spinning's Shortform
3y
19