Roman Leventov

An independent researcher/blogger/philosopher about intelligence and agency (esp. Active Inference), alignment, ethics, interaction of the AI transition with the sociotechnical risks (epistemics, economics, human psychology), collective mind architecture, research strategy and methodology.

Twitter: https://twitter.com/leventov. E-mail: leventov.ru@gmail.com (the preferred mode of communication). I'm open to collaborations and work.

Presentations at meetups, workshops and conferences, some recorded videos.

I'm a founding member of the Gaia Consoritum, on a mission to create a global, decentralised system for collective sense-making and decision-making, i.e., civilisational intelligence. Drop me a line if you want to learn more about it and/or join the consoritum.

You can help to boost my sense of accountability and give me a feeling that my work is valued by becoming a paid subscriber of my Substack (though I don't post anything paywalled; in fact, on this blog, I just syndicate my LessWrong writing).

For Russian speakers: русскоязычная сеть по безопасности ИИ, Telegram group.

Sequences

A multi-disciplinary view on AI safety

Posts

Sorted by New

2Roman Leventov's Shortform

6From Conceptual Spaces to Quantum Concepts: Formalising and Learning Structured Conceptual Models

3mo

21AI alignment as a translation problem

3mo

20Workshop (hackathon, residence program, etc.) about for-profit AI Safety projects?

3mo

8Institutional economics through the lens of scale-free regulative development, morphogenesis, and cognitive science

3mo

1Gaia Network: An Illustrated Primer

3mo

5Worrisome misunderstanding of the core issues with AI transition

3mo

33AGI will be made of heterogeneous components, Transformer and Selective SSM blocks will be among them

4mo

16Gaia Network: a practical, incremental pathway to Open Agency Architecture

4mo

17SociaLLM: proposal for a language model design for personalised apps, social science, and AI safety research

4mo

13Assessment of AI safety agendas: think about the downside risk

4mo

Wiki Contributions

Open Agency Architecture

3mo

(+133)

Reinforcement Learning

(+16/-4)

Free Energy Principle

GFlowNets

(+1418)

Deceptive Alignment

(+11)

Deception

(+20)

Reinforcement Learning

(+9)

Comments

On attunement

Roman Leventov1mo20

John Vervaeke calls attunement "relevance realization".

Modern Transformers are AGI, and Human-Level

Roman Leventov1mo138

Cf. DeepMind's "Levels of AGI" paper (https://arxiv.org/abs/2311.02462), calling modern transformers "emerging AGI" there, but also defining "expert", "virtuoso", and "superhuman" AGI.

AI Alignment Metastrategy

Roman Leventov1mo20

Humane/acc, https://twitter.com/AndrewCritchPhD

Value learning in the absence of ground truth

Roman Leventov3mo2-2

Well, yes, it also includes learning weak agent's models more generally, not just the "values". But I think the point stands. It's elaborated better in the linked post. As AIs will receive most of the same information that humans receive through always-on wearable sensors, there won't be much to learn for AIs from humans. Rather, it's humans that will need to do their homework, to increase the quality of their value judgements.

Value learning in the absence of ground truth

Roman Leventov3mo30

I agree with the core problem statement and most assumptions of the Pursuit of Happiness/Conventions Approach, but suggest a different solution: https://www.lesswrong.com/posts/rZWNxrzuHyKK2pE65/ai-alignment-as-a-translation-problem

I agree with OpenAI folks that generalisation is the key concept for understanding alignment process. But I think that with their weak-to-strong generalisation agenda, they (as well as almost everyone else) apply it I'm the reverse direction: learning values of weak agents (humans) doesn't make sense. Rather, weak agents should learn the causal models that strong agents employ to be able to express an informed value judgement. This is the way to circumvent the "absence of the ground truth for values" problem: instead, agent try to generalise their respective world models so that they sufficiently overlap, and then choose actions that seem net beneficial to both sides, without knowing how this value judgement way made by the other side.

In order to be able to generalise to shared world models with AIs, we must also engineer AIs to have human inductive biases from the beginning. Otherwise, this won't be feasible. This observation makes "brain-like AGI" one of the most important alignment agendas in my view.

AI alignment as a translation problem

Roman Leventov3mo64

If I understand correctly, by "discreteness" you mean that it simply says that one agent can know neither the meaning of symbols used by another agent nor the "degree" of grokking the meaning. Just cannot say anything.

This is correct, but the underlying reason why this is correct is the same as why solipsism or the simulation hypothesis cannot be disproven (or proven!).

So yeah, I think there is no tangible relationship to the alignment problem, except that it corroborates that we couldn't have 100% (literally, probability=1) certainty of alignment or safety of whatever we create, but it was obvious even without this philosophical argument.

So, I removed that paragraph about Quine's argument from the post.

Making every researcher seek grants is a broken model

Roman Leventov3mo20

That also was, naturally, the model in the Soviet Union, with orgs called "scientific research institutes". https://www.jstor.org/stable/284836

Workshop (hackathon, residence program, etc.) about for-profit AI Safety projects?

Roman Leventov3mo20

See a discussion of this point here with Marius Hobbhahn and others.

This might be the last AI Safety Camp

Roman Leventov3mo30

This post has led me to this idea: Workshop (hackathon, residence program, etc.) about for-profit AI Safety projects?

Gaia Network: An Illustrated Primer

Roman Leventov3mo30

Collusion detection and prevention and trust modelling don't trivially follow from the basic architecture of the system described on the level of this article. Some specific mechanisms should be implemented in the Protocol to have collusion detection and trust modelling. And we don't have these mechanisms actually developed yet, but we think that they should be doable (though this is still a research bet, not a 100% certainty) because the Gaia Network directly embodies (or is amenable to) all six general principles for anti-collusion mechanism design (agency architecture) proposed by Eric Drexler (and these principles themselves should be further validated via formalisation and proving theorems about the collusion properties of the systems of distributed intelligence).

Of course, there should also be (at least initially, but practically for a very long time, if not forever) "traditional" governance mechanisms of the Gaia Network, nodes, model and data ownership, etc. So, there are a lot of open questions about interfacing GN with existing codes of law, judicial and law enforcement practice, intellectual property, political and governance processes, etc. Some of these interfaces and connections with existing institutions should in practice deal with bad actors and certain types of malicious behaviour on GN.