LESSWRONG
LW

Roman Leventov
1443Ω1095144525
Message
Dialogue
Subscribe

An independent researcher/blogger/philosopher about intelligence and agency (esp. Active Inference), alignment, ethics, interaction of the AI transition with the sociotechnical risks (epistemics, economics, human psychology), collective mind architecture, research strategy and methodology.

Twitter: https://twitter.com/leventov. E-mail: leventov.ru@gmail.com (the preferred mode of communication). I'm open to collaborations and work.

Presentations at meetups, workshops and conferences, some recorded videos.

I'm a founding member of the Gaia Consoritum, on a mission to create a global, decentralised system for collective sense-making and decision-making, i.e., civilisational intelligence. Drop me a line if you want to learn more about it and/or join the consoritum.

You can help to boost my sense of accountability and give me a feeling that my work is valued by becoming a paid subscriber of my Substack (though I don't post anything paywalled; in fact, on this blog, I just syndicate my LessWrong writing).

For Russian speakers: русскоязычная сеть по безопасности ИИ, Telegram group. 

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
A multi-disciplinary view on AI safety
2Roman Leventov's Shortform
3y
5
Roman Leventov's Shortform
Roman Leventov9d123

I don't understand why people rave so much about Claude Code etc., nor how they really use these agents. The problem is not capability--sure, today agents can go far without stumbling or losing the plot. The problem is that they will go not in the direction I want.

It's because my product vision, architectural vision, and code quality "functions" are complex: very tedious to express in CLAUDE/AGENTS .md, and often hardly expressible in language at all. "I know it when I see it." Hence keeping agent "on a short leash" (Karpathy)--in Cursor.

This makes me think that at least in coding (also, probably some other types of engineering, design, soon perhaps content creation, perhaps deep research, etc.) agents are hobbled by alignment, not capability. I predict that in the next few months, "agentic alignment" concept will be taken over by AI engineers from AI safety, and will become a trendy topic/area of focus in AI engineering, like "context engineering" or "memory" are today.

When agentic alignment is largely solved (likely with a mix of specific model post-training and harness), agents will be unhobbled in a big way in engineering, research, business, etc., akin to how RLHF ("alignment technique") unhobbled LLM chat bots.

Reply
An “Optimistic” 2027 Timeline
Roman Leventov3mo20

But then the possibilities for 2027 branch on whether there are reliable agents, which doesn't seem knowable either way right now.

Very reliable, long-horizon agency is already in the capability overhang of Gemini 2.5 pro, perhaps even the previous-tier models (gemini 2.0 exp, sonnet 3.5/3.7, gpt-4o, grok 3, deepseek r1, llama 4). It's just the matter of harness/agent-wrapping logic and inference-time compute budget.

Agency engineering is currently in the brute-force stage. Agent engineers over rely on a "single LLM rollout" to be robust, but also often use LLM APIs that sometimes lack certain nitty-gritty affordances for implementing reliable agency, such as "N completions" with timely self-consistency pruning and perhaps scaling N up again when model's own uncertainty is up.

This somewhat reminds me of the early LLM scale-up era where LLM engineers over relied on "stack more layers" without digging more into the architectural details. The best example is perhaps Megatron, a trillion-parameter model from 2021 whose performance is probably abysmal relative to the 2025 models of ~10B parameters (perhaps even 1B).

So, the current agents (such as Cursor, Claude Code, Replit, Manus) are in the "Megatron era" of efficiency. In four years, even with the same raw LLM capability, agents will be very reliable.

To give a more specific example when robustness is a matter of spending more on inference, let's consider Gemini 2.5 pro: contrary to the hype, it often misses crucial considerations or acts strangely stupidly on modestly sized contexts (less than 50k tokens). However, seeing these omissions, it's obvious to me that if someone applied ~1k token-sized chunks of that context to 2.5-pro's output and asked a smaller LLM (flash or flash lite) "did this part of the context properly informed that output", flash would answer No when 2.5-pro indeed missed something important from that part of the context. This should trigger a fallback on N-completions, 2.5 self-review with smaller pieces of the context, breaking down the context hierarchically, etc.

Reply
Roman Leventov's Shortform
Roman Leventov4mo50

It seems that a lot of white collar jobs will become (already becoming) positional goods, such as aristocratic titles, at least for a few years, possibly longer.

AI will do 100% of the "meat" of the job better than almost all humans, and ~equally for every user (prompting won't matter much).

But business will still demand accountability for results, and that the workers can claim that they understand and attest AI outputs (these claims themselves won't be tested, though, nor would it really matter in the grand scheme of things). At the same time, the productivity of these jobs will increase more than businesses can absorb, at least for a few years (and then perhaps fully automated companies will ensue). Thus, fewer total white collar workers are needed.

When the skill doesn't really matter, and the demand decreases, the jobs will become highly contested and the credentials, prestige (pedigree), connections, and "soft skills" (primarily: of passing the interviews) will decide these contents rather than "hard skills" (of which only the skill of understanding sophisticated AI outputs and potentially fix remaining issues with AI outputs will really matter, but the marginal difference between workers who are good and bad at this skill will be relatively small for the company's bottom line, and testing candidates for this skill will be too hard).

The above straightforwardly applies to all "digital"/online/IT/analyst/manager jobs.

I don't buy the takes like Steve Yegge's https://sourcegraph.com/blog/revenge-of-the-junior-developer and similar, with projections of white collar workers becoming 10x, 100x more productive than today. Backlogs are not that deep, and the marginal value of churning through 99% of these backlog issues for companies is ~0.

I also don't believe in Jevon's paradox wonders of increased demand for "digital" work, again at least for a few years (or realistically, 10+ years) until the economy goes through a deeper transformation (including geographically). In the meantime, the economy looks to be already ~saturated (or even oversaturated) with IT/digitalization, marketing, compliance, legal proceedings, analysis, educational materials, and other similar outputs of white collar work.


 

Reply
Gradual Disempowerment, Shell Games and Flinches
Roman Leventov5mo4-2

Even for those not directly employed by AI labs, there are similar dynamics in the broader AI safety community. Careers, research funding, and professional networks are increasingly built around certain ways of thinking about AI risk.  Gradual disempowerment doesn't fit neatly into these frameworks. It suggests we need different kinds of expertise and different approaches than what many have invested years developing. Academic incentives also currently do not point here - there are likely less than ten economists taking this seriously, trans-disciplinary nature of the problem makes it hard sell as a grant proposal.

I agree this is unfortunate, but this also seems irrelevant? Academic economics (as well as sociology, political science, anthropology, etc.) are approximately completely irrelevant to shaping major governments' AI policies. "Societal preparedness" and "governance" teams at major AI labs and BigTech giants seem to have approximately no influence on the concrete decisions and strategies of their employers.

The last economist who influenced the economic and policy trajectory significantly was Milton Friedman perhaps?

If not research, what can affect the economic and policy trajectory at all in a deliberate way (disqualifying the unsteerable memetic and cultural drift forces), apart from powerful leaders themselves (Xi, Trump, Putin, Musk, etc.)? Perhaps the way we explore the "technology tree" (see https://michaelnotebook.com/optimism/index.html)? Such as the internet, social media, blockchain, form factors of AI models, etc. I don't hold too much hope here, but this looks to me like the only plausible lever.

Reply
Gradual Disempowerment, Shell Games and Flinches
Roman Leventov5mo110

My quick impression is that this is a brutal and highly significant limitation of this kind of research. It's just incredibly expensive for others to read and evaluate, so it's very common for it to get ignored.

I'd predict that if you improved the arguments by 50%, it would lead to little extra uptake.

I think this is wrong. The introduction of the GD paper takes no more than 10 minutes to read and no significant cognitive effort to grasp, really. I don't think there is more than 10% potential of making it any clearer or approachable.

Reply
The Failed Strategy of Artificial Intelligence Doomers
Roman Leventov5mo20

https://gradual-disempowerment.ai/ is mostly about institutional progress, not narrow technical progress.

Reply
AI research assistants competition 2024Q3: Tie between Elicit and You.com
Roman Leventov9mo70

Undermind.ai I think is much more useful for searching concepts and ideas in papers rather than extracting tabular info a la Elicit. Nominally Elicit can do the former, too, but is quite bad in my experience.

Reply
The Great Data Integration Schlep
Roman Leventov9mo40

https://openmined.org/ develops Syft, a framework for "private computation" in secure enclaves. It potentially reduces the barriers for data integration both within particularly bureaucratic orgs and across orgs.

Reply
My motivation and theory of change for working in AI healthtech
Roman Leventov9mo130

Thanks for the post, I agree with it!

I just wrote a post with differential knowledge interconnection thesis, where I argue that it is on net beneficial to develop AI capabilities such as

  • Federated learning, privacy-preserving multi-party computation, and privacy-preserving machine learning.
  • Federated inference and belief sharing.
  • Protocols and file formats for data, belief, or claim exchange and validation.
  • Semantic knowledge mining and hybrid reasoning on (federated) knowledge graphs and multimodal data.
  • Structured or semantic search.
  • Datastore federation for retrieval-based LMs.
  • Cross-language (such as, English/French) retrieval, search, and semantic knowledge integration. This is especially important for low-online-presence languages.

I discuss whether knowledge interconnection exacerbates or abates the risk if industrial dehumanization on net in a section. It's a challenging question, but I reach the tentative conclusion that AI capabilities that favor obtaining and leveraging "interconnected" rather than "isolated" knowledge are on net risk-reducing. This is because the "human economy" is more complex than the hypothetical "pure machine-industrial economy", and "knowledge interconnection" capabilities support that greater complexity.

Would you agree or disagree with this?

Reply
There Should Be More Alignment-Driven Startups
Roman Leventov1y60

I think the model of commercial R&D lab would often suit alignment work better than a "classical" startup company. Conjecture and AE Studio come to mind. Answer.AI, founded by Jeremy Howard (of Fast.ai and Kaggle) and Eric Ries (Lean Startup) elaborates on this business and organisational model here: https://www.answer.ai/posts/2023-12-12-launch.html.

Reply
Load More
9Personal agents
25d
1
6Differential knowledge interconnection
9mo
0
13The AI Revolution in Biology
1y
0
5The two-tiered society
1y
9
8From Conceptual Spaces to Quantum Concepts: Formalising and Learning Structured Conceptual Models
1y
1
22AI alignment as a translation problem
1y
2
21Workshop (hackathon, residence program, etc.) about for-profit AI Safety projects?
Q
1y
Q
5
8Institutional economics through the lens of scale-free regulative development, morphogenesis, and cognitive science
1y
0
3Gaia Network: An Illustrated Primer
1y
2
5Worrisome misunderstanding of the core issues with AI transition
1y
2
Load More
Open Agency Architecture
1y
(+133)
Reinforcement learning
2y
(+16/-4)
Free Energy Principle
2y
GFlowNets
2y
(+1418)
Deceptive Alignment
2y
(+11)