LESSWRONG
LW

Mateusz Bagiński
1747Ω481353217
Message
Dialogue
Subscribe

Agent foundations, AI macrostrategy, civilizational sanity, human enhancement.

I endorse and operate by Crocker's rules.

I have not signed any agreements whose existence I cannot mention.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
2Mateusz Bagiński's Shortform
3y
26
Daniel Kokotajlo's Shortform
Mateusz Bagiński2h20

How does Poland fit into this picture?

Reply
Cole Wyeth's Shortform
Mateusz Bagiński13h60

I find this reply broadly reasonable, but I'd like to see some systematic investigations of the analogy between gradual adoption and rising utility of electricity and gradual adoption and rising utility of LLMs (as well as other "truly novel technologies").

Reply
Female sexual attractiveness seems more egalitarian than people acknowledge
Mateusz Bagiński3d1728

 will be occasionally have conversations that sound like this

FWIW I'm a straight man and the last conversation like that I recall having was like 11 years ago.

Reply1
Benito's Shortform Feed
Mateusz Bagiński4d20

4: good reasoning/argument, wrong conclusion 

Reply1
Zach Stein-Perlman's Shortform
Mateusz Bagiński4d40

[Writing off based on the quick take, haven't looked into the linked thing.]

Google didn't necessarily even break a commitment? The commitment mentioned in the article is to "publicly report model or system capabilities." That doesn't say it has to be done at the time of public deployment.

I think this statement lends itself to being by default interpreted as "inform the public about model/system capabilities in time" (because if they don't do it "in time", then what's the point?), and the most "natural" "in-time" time is the time of deployment?

I announce that I'm committing to X. I can expect that most people will understand this to mean "I commit to Y" where X→Y is a natural (~unconscious?) inference for a human to make. And then I don't do Y and defend myself by saying, "The only thing I committed to was X, why all the fuss about me not Y-ing?".

Other companies are doing far worse in this dimension. At worst Google is 3rd-best in publishing eval results. Meta and xAI are far worse.

There might be a contextualizer-y justification for picking on Google more because they are more ahead than Meta and xAI, AI-wise.

Reply
Rauno's Shortform
Mateusz Bagiński13d20

Wikipedia isn't a blog, and I don't think Rauno would say that he's interested in wiki-like stuff had someone suggested to him that the notion of Long Content he's using here is overly restricted to ~blogs.

Reply
Agent foundations: not really math, not really science
Mateusz Bagiński16d*130

Thanks for writing this up! I strong-upvoted because, as you say, these ideas are not well-communicated, and this post contributes an explanation that I expect to be clarifying to a significant subset of people confused about agent foundations.

Initially, I wasn't quite buying the claim that we don't need any experiments (or more generally, additional empirics) to understand agency and all we need is to "just crunch" math and philosophy. The image I had in mind was something like "This theorem proves something non-trivial — or even significantly surprising — about a class of agents that includes humans, and we are in a position to verify it experimentally, so we should do it, to ensure that we're not fooling ourselves.".

Then, this passage made it click for me and I saw the possibility that maybe we are in a position where armchairs, whiteboards, tons of paper, and Lean code are sufficient.

It's noteworthy that humanity did indeed deliberately invent the first Turing-complete programming languages before building Turing-complete computers, and we have also figured out a lot of the theory of quantum computing before building actual quantum computers.

When Alan Turing figured out computability theory, he was not doing pure math for math's sake; he was trying to grok the nature of computation so that we could actually build better computers. And he was not doing typical science, either. He obviously had considerable experience with computers, but I seriously doubt that, for example, work on his 1936 paper involved running into issues which were resolved by doing experiments. I would say agent foundations researchers have similarly considerable experience with agents.

(Another example/[proof of concept]/[existence proof of the reference class] is Einstein's Arrogance.)

However, the reference class that includes the theory of computation is one possible reference class that might include the theory of agents.[1] But for all (I think) we know, the reference class we are in might also be (or look more like) complex systems studies, where you can prove a bunch of neat things, but there's also a lot of behavior that is not computationally reducible and instead you need to observe, simulate, crunch the numbers. Moreover, noticing surprising real-world phenomena can serve as a guide to your attempts to explain the observed phenomena in ~mathematical terms (e.g., how West et al. explained (or re-derived) Kleiber's law from the properties of intra-organismal resource supply networks[2]).

I don't know what the theory will look like; to me, its shape remains an open a posteriori question.

  1. ^

    Or whatever theory we need to understand agents as the theory that we need to understand agents need not be a theory of agents (but maybe something broader like IDK adaptivity or powerful optimization processes or maybe there's a new ontology that cuts across our intuitive notion of agency and kinda dissolves it for the purpose of joint-carving understanding).

  2. ^

    The explanation of their proof that I was able to understand is the one in this textbook.

Reply
Four Axes of Hunger
Mateusz Bagiński16d60

I like this analysis a lot. This is the kind of applied rationality stuff and "epistemics for modeling mundane stuff better" that I'd like to see much more of on LessWrong.

Reply
Reward is not the optimization target
Mateusz Bagiński17d*20

@Olli Järviniemi Care to elaborate why you no longer endorse this review?

Reply
kh's Shortform
Mateusz Bagiński19d30

so this version of entanglement with action is really a very weak criterion

Yeah, exactly, and hence the question: what are some counterexamples, ~concepts that clearly are not tied to action in any way? E.g., I could imagine metaphysical philosophizing to connect to action via contributing to a line of thinking that eventually produces a useful insight on how to do science or something. Is it about "being/remaining open to using it in new ways"?

I think I want to expand my notion of "tautological statements" to include statements like "In the HPMoR universe, X happens". You can also pick any empirical truth "X" and turn it into a tautological one by saying "In our universe, X". Though I agree it seems a bit weird.

I'm inclined to think that your generalized tautological statements are about something like "playing games according to ~rules in (~confined to) some mind-like system". This is in contrast to (canonically) empirical statements that involve throwing a referential bridge across the boundary of the system.

  • I think sth is not meaningful if there's no connection between a belief to your main belief pool. So "a puffy is a flippo" is perhaps not meaningful to you because those concepts don't relate to anything else you know? (But that's a different kind of meaningful from what errors people mostly make.)

K:

  • yea. tho then we could involve more sentences about puffies and flippos and start playing some game involving saying/thinking those sentences and then that could be fun/useful/whatever

[Thinking out loud.]

Intuitively, it does seem to me that if you start with a small set of elements isolated from the rest of your understanding, then they are meaningless, but then, as you grow this set of elements and add more relations/functions/rules/propositions with high implicative potential, this network becomes increasingly meaningful, even though it's completely disconnected from the rest of understanding and our lives except for playing this domain/subnetwork-specific game.

Is it (/does it seem) meaningful just because I could throw a bridge between it and the rest of my understanding? Well, one could build a computer with this game installed only (+ ofc bare minimum to make it work: OS and stuff) and I would still be inclined to think it meaningful, although perhaps I would be imposing, and the meaningfulness would be co-created by the eye/mind of the beholder.

This leads to the question: What criteria do we want our (explicated) notion of meaningfulness to satisfy?

[For completeness, the concept of meaningfulness may need to be splintered or even eliminated (/factored out in a way that doesn't leave anything clearly serving its role), though I think the latter rather unlikely.]

Reply
Load More
23Counter-considerations on AI arms races
4mo
0
14Comprehensive up-to-date resources on the Chinese Communist Party's AI strategy, etc?
Q
5mo
Q
6
35Goodhart Typology via Structure, Function, and Randomness Distributions
5mo
1
24Bounded AI might be viable
6mo
4
51Less Anti-Dakka
1y
5
9Some Problems with Ordinal Optimization Frame
1y
0
7What are the weirdest things a human may want for their own sake?
Q
1y
Q
16
26Three Types of Constraints in the Space of Agents
Ω
2y
Ω
3
35'Theories of Values' and 'Theories of Agents': confusions, musings and desiderata
2y
8
112Charbel-Raphaël and Lucius discuss interpretability
Ω
2y
Ω
7
Load More
5-and-10
2mo
Alien Values
3mo
(+23/-22)
Corrigibility
5mo
(+119)
Corrigibility
5mo
(+12/-13)
AI Services (CAIS)
7mo
(+7/-8)
Tool AI
8mo
Quantilization
9mo
(+111/-5)
Tiling Agents
1y
(+15/-9)
Tiling Agents
1y
(+3/-2)
Updateless Decision Theory
2y
(+9/-4)
Load More