Adam Scholl — LessWrong

Yeah, certainly there are other possible forms of bias besides financial conflicts of interest; as you say, I think it's worth trying to avoid those too.

Leaving Open Philanthropy, going to Anthropic

Adam Scholl10d4343

Sure, but humanity currently has so little ability to measure or mitigate AI risk that I doubt it will be obvious in any given case that the survival of the human race is at stake, or that any given action would help. And I think even honorable humans tend to be vulnerable to rationalization amidst such ambiguity, which (as I model it) is why society generally prefers that people in positions of substantial power not have extreme conflicts of interest.

Leaving Open Philanthropy, going to Anthropic

Adam Scholl10d7551

I’m going to try to make sure that my lifestyle and financial commitments continue to make me very financially comfortable both with leaving Anthropic, and with Anthropic’s equity (and also: the AI industry more broadly – I already hold various public AI-correlated stocks) losing value, but I recognize some ongoing risk of distorting incentives, here.

Why do you feel comfortable taking equity? It seems to me that one of the most basic precautions one ought ideally take when accepting a job like this (e.g. evaluating Claude's character/constitution/spec), is to ensure you won't personally stand to lose huge sums of money should your evaluation suggest further training or deployment is unsafe.

(You mention already holding AI-correlated stocks—I do also think it would be ideal if folks with influence over risk assessment at AGI companies divested from these generally, though I realize this is difficult given how entangled they are with the market as a whole. But I'd expect AGI company staff typically have much more influence over their own company's value than that of others, so the COI seems much more extreme).

leogao's Shortform

Adam Scholl20d40

They typically explain where the room is located right after giving you the number, which is almost like making a memory palace entry for you. Perhaps the memory is more robust when it includes a location along with the number?

The "Length" of "Horizons"

Adam Scholl22d72

I agree AI minds might be very different, and best described with different measures. But I think we currently have little clue what those differences are, and so for now humans remain the main source of evidence we have about agents. Certainly human-applicability isn't a necessary condition for measures of AI agency; it just seems useful as a sanity check to me, given the context that nearly all of our evidence about (non-trivial) agents so far comes from humans.

The "Length" of "Horizons"

Adam Scholl1mo20

Sorry, looking again at the messiness factors fewer are about brute force than I remembered; will edit.

But they do indeed all strike me as quite narrow external validity checks, given that the validity in question is whether the benchmark predicts when AI will gain world-transforming capabilities.

“messiness” factors—factors that we expect to (1) be representative of how real-world tasks may systematically differ from our tasks

I felt very confused reading this claim in the paper. Why do you think they are representative? It seems to me that real-world problems obviously differ systematically from these factors, too—e.g., solving them often requires having novel thoughts.

The "Length" of "Horizons"

Adam Scholl1mo*40

I think there is more empirical evidence of robust scaling laws than of robust horizon length trends, but broadly I agree—I think it's also quite unclear how scaling laws should constrain our expectations about timelines.

(Not sure I understand what you mean about the statistical analyses, but fwiw they focused only on very narrow checks for external validity~~—mostly just on whether solutions were possible to brute force~~).

The "Length" of "Horizons"

Adam Scholl1mo64

I agree it seems plausible that AI could accelerate progress by freeing up researcher time, but I think the case for horizon length predicting AI timelines is even weaker in such worlds. Overall I expect the benchmark would still mostly have the same problems—e.g., that the difficulty of tasks (even simple ones) is poorly described as a function of time cost; that benchmarkable proxies differ critically from their non-benchmarkable targets; that labs probably often use these benchmarks as explicit training targets, etc.—but also the additional (imo major) source of uncertainty about how much freeing up researcher time would accelerate progress.

Cole Wyeth's Shortform

Adam Scholl1mo53

Fwiw, in my experience LLMs lie far more than early Wikipedia or any human I know, and in subtler and harder to detect ways. My spot checks for accuracy have been so dismal/alarming that at this point I basically only use them as search engines to find things humans have said.

CFAR update, and New CFAR workshops

Adam Scholl2mo220

I'm really excited to hear this, and wish you luck :)

My thinking benefited a lot from hanging around CFAR workshops, so for whatever it's worth I do recommend attending them; my guess is that most people who like reading LessWrong but haven't tried attending a workshop would come away glad they did.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments