LESSWRONG
LW

484
Adam Scholl
4170Ω19092180
Message
Dialogue
Subscribe

Researcher at Missing Measures

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
1adam_scholl's Shortform
6y
34
Leaving Open Philanthropy, going to Anthropic
Adam Scholl9d42

Yeah, certainly there are other possible forms of bias besides financial conflicts of interest; as you say, I think it's worth trying to avoid those too.

Reply
Leaving Open Philanthropy, going to Anthropic
Adam Scholl10d4343

Sure, but humanity currently has so little ability to measure or mitigate AI risk that I doubt it will be obvious in any given case that the survival of the human race is at stake, or that any given action would help. And I think even honorable humans tend to be vulnerable to rationalization amidst such ambiguity, which (as I model it) is why society generally prefers that people in positions of substantial power not have extreme conflicts of interest.

Reply
Leaving Open Philanthropy, going to Anthropic
Adam Scholl10d7551

I’m going to try to make sure that my lifestyle and financial commitments continue to make me very financially comfortable both with leaving Anthropic, and with Anthropic’s equity (and also: the AI industry more broadly – I already hold various public AI-correlated stocks) losing value, but I recognize some ongoing risk of distorting incentives, here.

Why do you feel comfortable taking equity? It seems to me that one of the most basic precautions one ought ideally take when accepting a job like this (e.g. evaluating Claude's character/constitution/spec), is to ensure you won't personally stand to lose huge sums of money should your evaluation suggest further training or deployment is unsafe.

(You mention already holding AI-correlated stocks—I do also think it would be ideal if folks with influence over risk assessment at AGI companies divested from these generally, though I realize this is difficult given how entangled they are with the market as a whole. But I'd expect AGI company staff typically have much more influence over their own company's value than that of others, so the COI seems much more extreme).

Reply1
leogao's Shortform
Adam Scholl20d40

They typically explain where the room is located right after giving you the number, which is almost like making a memory palace entry for you. Perhaps the memory is more robust when it includes a location along with the number?

Reply
The "Length" of "Horizons"
Adam Scholl22d72

I agree AI minds might be very different, and best described with different measures. But I think we currently have little clue what those differences are, and so for now humans remain the main source of evidence we have about agents. Certainly human-applicability isn't a necessary condition for measures of AI agency; it just seems useful as a sanity check to me, given the context that nearly all of our evidence about (non-trivial) agents so far comes from humans.

Reply
The "Length" of "Horizons"
Adam Scholl1mo20

Sorry, looking again at the messiness factors fewer are about brute force than I remembered; will edit.

But they do indeed all strike me as quite narrow external validity checks, given that the validity in question is whether the benchmark predicts when AI will gain world-transforming capabilities. 

“messiness” factors—factors that we expect to (1) be representative of how real-world tasks may systematically differ from our tasks

I felt very confused reading this claim in the paper. Why do you think they are representative? It seems to me that real-world problems obviously differ systematically from these factors, too—e.g., solving them often requires having novel thoughts.

Reply
The "Length" of "Horizons"
Adam Scholl1mo*40

I think there is more empirical evidence of robust scaling laws than of robust horizon length trends, but broadly I agree—I think it's also quite unclear how scaling laws should constrain our expectations about timelines.

(Not sure I understand what you mean about the statistical analyses, but fwiw they focused only on very narrow checks for external validity—mostly just on whether solutions were possible to brute force).

Reply
The "Length" of "Horizons"
Adam Scholl1mo64

I agree it seems plausible that AI could accelerate progress by freeing up researcher time, but I think the case for horizon length predicting AI timelines is even weaker in such worlds. Overall I expect the benchmark would still mostly have the same problems—e.g., that the difficulty of tasks (even simple ones) is poorly described as a function of time cost; that benchmarkable proxies differ critically from their non-benchmarkable targets; that labs probably often use these benchmarks as explicit training targets, etc.—but also the additional (imo major) source of uncertainty about how much freeing up researcher time would accelerate progress.

Reply
Cole Wyeth's Shortform
Adam Scholl1mo53

Fwiw, in my experience LLMs lie far more than early Wikipedia or any human I know, and in subtler and harder to detect ways. My spot checks for accuracy have been so dismal/alarming that at this point I basically only use them as search engines to find things humans have said.

Reply
CFAR update, and New CFAR workshops
Adam Scholl2mo220

I'm really excited to hear this, and wish you luck :)

My thinking benefited a lot from hanging around CFAR workshops, so for whatever it's worth I do recommend attending them; my guess is that most people who like reading LessWrong but haven't tried attending a workshop would come away glad they did.

Reply21
Load More
182The "Length" of "Horizons"
1mo
27
222Pay Risk Evaluators in Cash, Not Equity
1y
19
290Non-Disparagement Canaries for OpenAI
1y
51
190OMMC Announces RIP
2y
5
264Safetywashing
Ω
3y
Ω
20
155Matt Botvinick on the spontaneous emergence of learning algorithms
Ω
5y
Ω
87
29At what point should CFAR stop holding workshops due to COVID-19?
Q
6y
Q
11
63CFAR: Progress Report & Future Plans
6y
5
27Why are the people who could be doing safety research, but aren’t, doing something else?
Q
6y
Q
19
1adam_scholl's Shortform
6y
34
Load More