LESSWRONG
LW

2417
Nikola Jurkovic
2434Ω1026871
Message
Dialogue
Subscribe

contact: jurkovich.nikola@gmail.com

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
5nikola's Shortform
2y
132
Mourning a life without AI
Nikola Jurkovic8d1310

I don't see why either of those things stop you from having a family. 

I think we might be using different operationalizations of "having a family" here. I was imagining it to mean something that at least includes "raise kids from the age of ~0 to 18". If x-risk were to materialize within the next ~19 years, I would be literally stopped from "having a family" by all of us getting killed. 

But under a definition of "have a family" which is means "raise a child from the age of ~0 to 1", then yeah, I think P(doom) is <20% in the next 2 years and I'm probably not literally getting stopped.
 

Also to be clear, my P(ASI within our lifetimes) is like 85%, and my P(doom) is like 2/3. 

Reply1
Mourning a life without AI
Nikola Jurkovic9d40

This is because the correct answer is option three: try to modify the button to lower the 60 and raise the 15, until such time as a 1-in-5 chance of survival is a net improvement relative to your default situation.

 

Yes, the counterfactual I was imagining in this button world was just living a normal life and dying at the end. If indeed there's a way to shift around the probabilities I'd devote my life to it. Which is what we're doing!
 

It's been honestly very freeing to be able to discuss these things somewhere other than this community.

I agree. This year I've had the policy of being very direct about what I think about crazy AI futures even with people outside of the AI safety community. I held a powerpoint presentation to my close family members talking about AGI and AI safety and how the world is going to be crazy in the coming decades. When my relatives ask me about having kids, I say "By the time I'd have had kids, if humanity is even around, who knows what the concept of kids will look like. Maybe we'll be growing them in vats. Maybe we'll all be uploaded." 

Of course, I don't say all of that every time. Most of the time people aren't in the mood for those sorts of discussions. But people have started taking these arguments more seriously as AI has had more and more of an effect and appeared more and more in the news.

Reply
How to survive until AGI
Nikola Jurkovic12d20

that's unfair; if there's no utopia, none of the other interventions work either,

You're so right! Thanks for catching this.

I think I probably want to be clearer about the units of measurement being slightly different. Every intervention except the cryonics one are naively reducing acute micromorts, which can be converted into "microutopias" by multiplying by P(utopia). The cryonics one is about increasing microutopias, because the counterfactual is ~purely in utopian worlds.

Reply
nikola's Shortform
Nikola Jurkovic19d5211

Anthropic wrote a pilot risk report where they argued that Opus 4 and Opus 4.1 present very low sabotage risk. METR independently reviewed their report and we agreed with their conclusion. 

During this process, METR got more access than during any other evaluation we've historically done, and we were able to review Anthropic's arguments and evidence presented in a lot of detail. I think this is a very exciting milestone in third-party evaluations! 

I also think that the risk report itself is the most rigorous document of its kind. AGI companies will need a lot more practice writing similar documents, so that they can be better at assessing risks once AI systems become very capable. 

I'll paste the twitter thread below (twitter link)

We reviewed Anthropic’s unredacted report and agreed with its assessment of sabotage risks. We want to highlight the greater access & transparency into its redactions provided, which represent a major improvement in how developers engage with external reviewers. Reflections:

To be clear: this kind of external review differs from holistic third-party assessment, where we independently build up a case for risk (or safety). Here, the developer instead detailed its own evidence and arguments, and we provided external critique of the claims presented.

Anthropic made its case to us based primarily on information it has now made public, with a small amount of nonpublic text that it intended to redact before publication. We commented on the nature of these redactions and whether we believed they were appropriate, on balance.

For example, Anthropic told us about the scaleup in effective compute between models. Continuity with previous models is a key component of the assessment, and sharing this information provides some degree of accountability on a claim that the public cannot otherwise assess.

We asked Anthropic to make certain assurances to us about the models its report aims to cover, similar to the assurance checklist in our GPT-5 evaluation. We then did in-depth follow-ups in writing and in interviews with employees.

We believe that allowing this kind of interactive review was ultimately valuable. In one instance, our follow-ups on the questions we asked helped Anthropic notice internal miscommunications about how its training methods might make chain-of-thought harder to monitor reliably.

A few key limitations. We have yet to see any rigorous roadmap for addressing sabotage risk from AI in the long haul. As AI systems become capable of subverting evaluations and/or mitigations, current techniques like those used in the risk report seem insufficient.

Additionally, there were many claims for which we had to assume that Anthropic was providing good-faith responses. We expect that verification will become more important over time, but that our current practices would not be robust to a developer trying to game the review.

Beyond assessing developer claims, we think there is an important role for third parties to do their own assessments, which might differ in threat models and approach. We would love to see the processes piloted in this review applied to such holistic assessments as well.

Overall, we felt that this review was significantly closer to the sort of rigorous, transparent third-party scrutiny of AI developer claims that we hope to see in the future. Full details on our assessment: https://alignment.anthropic.com/2025/sabotage-risk-report/2025_pilot_risk_report_metr_review.pdf

Reply
Hyperbolic trend with upcoming singularity fits METR capabilities estimates.
Nikola Jurkovic3mo30

Thanks for the post!

Nikola Jurkovic suggests that as soon as model can do a month-long task with 80% accuracy, it should be able to do any cognitive human task

I didn't mean to suggest this in my original post. My claim was something more like "a naive extrapolation means that 80% time horizons will be at least a month by the end of the decade, but I think they'll be much higher and we'll have AGI"

Reply
Daniel Kokotajlo's Shortform
Nikola Jurkovic3mo40

it has to go to infinity when we get AGI / superhuman coder.

 

This isn't necessarily true, as even an AGI or a superhuman coder might get worse at tasks-that-take-humans-longer compared to tasks-that-take-humans-shorter (this seems pretty likely given constant-error-rate considerations), meaning that even an extremely capable AI might be like 99.999% reliable for 1 hour tasks, but only 99.9% reliable for 10,000 hour tasks, meaning the logistic fit still has an intercept with 50%, it's just a very high number. 

In order for the 50% intercept to approach infinity, you'd need a performance curve which approaches a flat line, and this seems very hard to pull off and probably requires wildly superhuman AI.

Reply
Thoughts on extrapolating time horizons
Nikola Jurkovic3mo62

I think a 1-year  50%-time-horizon is very likely not enough to automate AI research. The reason I think AI research will be automated by EOY 2028 is because of speedups from partial automation as well as leaving open the possibility of additional breakthroughs naturally occurring.

A few considerations that make me think the doubling time will get faster:

  1. AI speeding up AI research probably starts making a dent in the doubling time (making it at least 10% faster) by the time we hit 100hr time horizons (although it's pretty hard to reason about the impacts here)
  2. I think I place some probability on the "inherently superexponential time horizons" hypothesis. The reason I think it is because to me, 1-month-coherence, 1-year-coherence, and 10-year-coherence (of the kind performed by humans) seem like extremely similar skills and will thus be learned in quick succession.
  3. It's plausible reasoning models decreased the doubling time from 7 months to 4 months. It's plausible we get another reasoning-shaped breakthrough.

So my best guess for the 50% and 80% time horizons at EOY 2028 are more like 10yrs and 3yrs or something. But past ~2027 I care more about how much AI R&D is being automated rather than the time horizon itself (partially because I have FUD about what N-year tasks should even look like by definition).

Reply
nikola's Shortform
Nikola Jurkovic4mo183

Grok 4 is slightly above SOTA on 50% time horizon and slightly below SOTA on 80% time horizon: https://x.com/METR_Evals/status/1950740117020389870

Image

Reply1
nikola's Shortform
Nikola Jurkovic4mo30

I heard it from someone who works at xAI

Reply
nikola's Shortform
Nikola Jurkovic4mo4318

xAI's safety team is 3 people.

Reply4
Load More
Inflection.ai
2 years ago
10Are AI time horizons inherently superexponential?
5d
1
26How likely is dangerous AI in the short term?
7d
3
164Mourning a life without AI
9d
61
29How to survive until AGI
13d
3
56Thoughts on extrapolating time horizons
3mo
7
19Forbes: Fear Of Super Intelligent AI Is Driving Harvard And MIT Students To Drop Out
3mo
0
62Outcomes of the Geopolitical Singularity
6mo
5
9Survey of Multi-agent LLM Evaluations
6mo
0
35Forecasting time to automated superhuman coders [AI 2027 Timelines Forecast]
Ω
7mo
Ω
0
41Could Advanced AI Accelerate the Pace of AI Progress? Interviews with AI Researchers
9mo
1
Load More