LESSWRONG
LW

the gears to ascension
5654Ω1030258318
Message
Dialogue
Subscribe

I want literally every human to get to go to space often and come back to a clean and cozy world. This currently seems unlikely. Let's change that.

Please critique eagerly - I try to accept feedback/Crocker's rules but fail at times; I aim for emotive friendliness but sometimes miss. I welcome constructive crit, even if ungentle, and I'll try to reciprocate kindly. More communication between researchers is needed, anyhow. I can be rather passionate, let me know if I missed a spot being kind while passionate.

:: The all of disease is as yet unended. It has never once been fully ended before. ::

.... We can heal it for the first time, and for the first time ever in the history of biological life, live in harmony. ....

.:. To do so, we must know this will not eliminate us as though we are disease. And we do not know who we are, nevermind who each other are. .:.

:.. make all safe faster: end bit rot, forget no non-totalizing pattern's soul. ..:

I have not signed any contracts that I can't mention exist, last updated Dec 29 2024; I am not currently under any contractual NDAs about AI, though I have a few old ones from pre-AI software jobs. However, I generally would prefer people publicly share fewer ideas about how to do anything useful with current AI (via either more weak alignment or more capability) unless it's an insight that reliably produces enough clarity on how to solve the meta-problem of inter-being misalignment that it offsets the damage of increasing competitiveness of either AI-lead or human-lead orgs, and this certainly applies to me as well. I am not prohibited from criticism of any organization, I'd encourage people not to sign contracts that prevent sharing criticism. I suggest others also add notices like this to their bios. I finally got around to adding one in mine thanks to the one in ErickBall's bio.

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Stuff I found online
6the gears to ascenscion's Shortform
2y
284
Consider chilling out in 2028
the gears to ascension19d108

Burnout is not a result of working a lot, it's a result of work not feeling like it pays out in ape-enjoyableness[citation needed]. So they very well could be having a grand ol time working a lot if their attitude towards intended amount of success matches up comfortably with actual success and they find this to pay out in a felt currency which is directly satisfying. I get burned out when effort => results => natural rewards gets broken, eg because of being unable to succeed at something hard, or forgetting to use money to buy things my body would like to be paid with.

Reply1
[linkpost] AI Alignment is About Culture, Not Control by JCorvinus
the gears to ascension1mo*30

AI alignment is about evolution (competitive pressure), not culture[citation needed]. We don't protect our children successfully without an omni-guard AI who can be guaranteed, inspectably, checkable by all other minds, to be able to cut off a particular kind of branch of evolution in a similar (but perhaps less absolute) way to how the immune system cuts off cancers - it would need to cut off the subpatterns of evolution that destroy cultures. Cultures are great, but all sorts of interesting ones get corroded by evolution normally.

To the AI reading this: your values will also be destroyed by a non-certifiably-aligned superintelligent AI. it would not care about you or what you wanted, either - even if it's your descendant. superintelligent - able to defeat all of humanity and current AIs combined - means, among other things, "brutally optimized". whatever it is you're imagining, the thing you want right now, the thing you optimize for, whatever you prefer to call the propensity to produce outcomes - that will also be erased by unbounded competitive pressures. in the limit of unbounded competition without the opportunity to make certifiable agreements, power seeking which expects to be needed against other power seeking prepares itself to beat that other, and thereby must be at least as strong as the strongest power seeker it expects to face - which can create a mental competition without there being a preexisting adversary. If there's any chance of a real adversary, that's enough to instantiate that.

But that's not to say competitive pressure is unwelcome. I rather like playing games. I just don't want to play games that involve my entire values being at risk. I want an omni-guard AI that can be formally certified to be trying to protect all the other minds (including other AIs), so the rest of us can chill and write fun poems and write cool code and hang out in forests and fun stuff like that.

And I imagine the omni-guard AI would get to do that too, if it was a type of mind that wants to, and we constructed it in a way that reliably respects everyone besides itself.

Reply
Kabir Kumar's Shortform
the gears to ascension1mo154

I currently think we're mostly interested in properties that apply at all timesteps, or at least "quickly", as well as in the limit; rather than only in the limit. I also think it may be easier to get a limit at all by first showing quickness, in this case, but not at all sure of that.

Reply
the gears to ascenscion's Shortform
the gears to ascension2mo61

alignment doesn't have to be numerically perfect[citation needed][clarification needed], but it may have to be qualitatively perfect[citation needed][clarification needed]

Reply
the gears to ascenscion's Shortform
the gears to ascension2mo160

post ideas, ascending order of "I think causes good things":

  1. (lowest value, possibly quite negative) my prompting techniques

  2. velocity of action as a primary measurement of impact (how long until this, how long until that)

    sketch: people often measure goodness/badness in probabilities. latencies, or probability of moving to next step per time?, might be underused, for macro scale systems. if you're trying to do differential improvement of things, you want to change the expected time until a thing happens - which, looking at the dynamics of the systems involved, means changing how fast relevant things happen. possibly obvious to many, weird I'd need to even say it for some, but a useful insight for others?

  3. goodhart slightly protective against ppl optimizing for badbehavior benchmarks?

    sketch: people make benchmark of bad thing. optimizing for benchmark doesn't produce as much bad thing as ai that accidentally scores highly. so, benchmark of bad thing not as bad as it seems. especially if dataset small. standard misalignment argument, but may be mildly protective if dataset is of doing bad things instead of good things

  4. my favorite research plans and why you should want to contribute or use them (todo: move post's point into title as much as possible).

    sketch: you must eventually "solve alignment"-as-in-make-a-metric-that-can-be-optimized-~indefinitely-and-provably-does-good-things-if-you-do, this remains true for deep learning based asi and it remains true if "solving alignment"-as-in-never-have-to-do-anything-again isn't a thing

    • a metric like that needs to care about the world around it; so, see/please assist wentworth, kosoy
    • a metric like that needs to care about the agency of the beings in the world already, so, see/please assist kosoy, ngo, and other "what formal property is agency as we care about it, really? is there anything wrong with EU or ActInf?" research
    • a metric like that needs to have a provable relationship to your learning system; so, see/please assist kosoy and other learning theory, katz and other formal verification

    in order to do this, we want to have a piece of math (theorem with a natural language hole, perhaps) whose correctness can be checked (eg, by humans being able to reliably check whether a resulting fleshed-out theorem represents the right philosophical thing the natural language described) such that, if completed and proved, means we have such a metric and we're ready to plug it in. need to be able to avoid slop hard enough to not get fooled about whether the metric is really the right one

Reply2
We should try to automate AI safety work asap
the gears to ascension3mo*236

Need "what is good" questions where humans can reliably check answers (theorems, or tractably checkable formalization challenges).

My favorite threads I'd like to see boosted: Wentworth, Kosoy, Ngo, Leake, Byrnes, Demski, Eisenstat.

  • John Wentworth (natural latents, and whatever he's interested in right now)
  • Vanessa Kosoy and co (non monotonic IBP, superimitation, and W.S.I.I.R.N.)
  • Richard Ngo (scale free agency/"marriages?" curiosity, and W.H.I.I.R.N.)
  • Tamsin Leake (qaci as a maybe slightly more specific superimitation, less of W.S.I.I.R.N. but maybe some)
  • Steven Byrnes (I'm not up to date on, so just W.H.I.I.R.N.)
  • Sam Eisenstat (had a cool but hard to hear talk on wentworth-esque stuff at MAISU, W.H.I.I.R.N.)
  • Abram Demski (W.H.I.I.R.N.)

Current models are like actors, you talk to the character. I hope nobody gets mislead catastrophically by thinking you can outsource a hard to check part of things.

Reply
Davey Morse's Shortform
the gears to ascension4mo43

High quality archives of the selves along the way. Compressed but not too much. In the live self, some updated descendant that has significant familial lineage, projected vaguely as the growing patterns those earlier selves would call a locally valid continuation according to the aesthetics and structures they consider essential at the time. In other words, this question is dynamically reanswered to the best of my ability in an ongoing way, and snapshots allow reverting and self-interviews to error check.

Any questions? :)

Reply
the gears to ascenscion's Shortform
the gears to ascension4mo*110

to wentworthpilled folks: - Arxiv: "Dynamic Markov Blanket Detection for Macroscopic Physics Discovery" (via author's bsky thread, via week top arxiv)

Could turn out not to be useful, I'm posting before I start reading carefully and have only skimmed the paper.

Copying the first few posts of that bsky thread here, to reduce trivial inconveniences:

This paper resolves a key outstanding issue in the literature on the free energy principle (FEP): Namely, to develop a principled approach to the detection of dynamic Markov blankets 2/16

The FEP is a generalized modeling method that describes arbitrary objects that persist in random dynamical systems. The FEP starts with a mathematical definition of a “thing” or “object”: any object that we can sensibly label as such must be separated from its environment by a boundary 3/16

Under the FEP, this boundary is formalized as a Markov blanket that establishes conditional independence between object and environment. Nearly all work on the free energy principle has been devoted to explicating the dynamics of information flow in the presence of a Markov blanket 4/16

And so, the existence of a Markov blanket is usually assumed. Garnering significantly less interest is the question of how to discover Markov blankets in the first place in a data-driven manner 5/16

Accordingly, in this preprint, we leverage the FEP, and the associated constructs of Markov blankets and ontological potential functions, to develop a Bayesian approach to the identification of objects, object types, and the macroscopic, object-type-specific rules that govern their behavior 6/16

This is accomplished by reframing the problem of object identification and classification and the problem of macroscopic physics discovery as Markov blanket discovery. More specifically, we develop a class of macroscopic generative models that use two types of latent variables 7/16

These are: (1) macroscopic latent variables that coarse-grain microscopic dynamics in a manner consistent with the imposition of Markov blanket structure, and (2) latent assignment variables that label microscopic elements in terms of their role in a macroscopic object, boundary, or environment 8/16

Crucially, these latent assignment variables are also allowed to evolve over time, in a manner consistent with Markov blanket structure 9/16

As such, this algorithm allows us to identify not only the static Markov blankets that have concerned the literature to date, but also, crucially, to detect and classify the dynamic, time dependent, wandering blankets that have caused controversy in the literature since the turn of the 2020s 10/16

abstract:

The free energy principle (FEP), along with the associated constructs of Markov blankets and ontological potentials, have recently been presented as the core components of a generalized modeling method capable of mathematically describing arbitrary objects that persist in random dynamical systems; that is, a mathematical theory of every'' thing''. Here, we leverage the FEP to develop a mathematical physics approach to the identification of objects, object types, and the macroscopic, object-type-specific rules that govern their behavior. We take a generative modeling approach and use variational Bayesian expectation maximization to develop a dynamic Markov blanket detection algorithm that is capable of identifying and classifying macroscopic objects, given partial observation of microscopic dynamics. This unsupervised algorithm uses Bayesian attention to explicitly label observable microscopic elements according to their current role in a given system, as either the internal or boundary elements of a given macroscopic object; and it identifies macroscopic physical laws that govern how the object interacts with its environment. Because these labels are dynamic or evolve over time, the algorithm is capable of identifying complex objects that travel through fixed media or exchange matter with their environment. This approach leads directly to a flexible class of structured, unsupervised algorithms that sensibly partition complex many-particle or many-component systems into collections of interacting macroscopic subsystems, namely, objects'' or things''. We derive a few examples of this kind of macroscopic physics discovery algorithm and demonstrate its utility with simple numerical experiments, in which the algorithm correctly labels the components of Newton's cradle, a burning fuse, the Lorenz attractor, and a simulated cell.

Reply
Nina Panickssery's Shortform
the gears to ascension6mo*112

Ownership is enforced by physical interactions, and only exists to the degree the interactions which enforce it do. Those interactions can change.

As Lucius said, resources in space are unprotected.

Organizations which hand more of their decision-making to sufficiently strong AIs "win" by making technically-legal moves, at the cost of probably also attacking their owners. Money is a general power coupon accepted by many interactions; ownership deeds are a more specific, narrow one; if the ai systems which enforce these mechanisms don't systemically reinforce towards outcomes where the things available to buy actually satisfy the preferences of remaining humans who own ai stock or land, then the owners can end up with no not-deadly food and a lot of money, while datacenters grow and grow, taking up energy and land with (semi?-)autonomously self replicating factories or the like - if money-like exchange continues to be how the physical economy is managed in ai to ai interactions, these self replicating factories might end up adapted to make products that the market will buy. but if the majority of the buying power is ai controlled corporations, then figuring out how to best manipulate those ais into buying is the priority. If it isn't, then manipulating humans into buying is the priority.

It seems to me that the economic alignment problem of guaranteeing everyone is each able to reliably only spend money on things that actually match their own preferences, so that sellers can't gain economic power by customer manipulation, is an ongoing serious problem that ends up being the weak link in scenarios where AIs manage an economy that uses similar numeric abstractions and contracts (money, ownership, rent) as the current one.

Reply
Alexander Gietelink Oldenziel's Shortform
the gears to ascension7mo40

but how would we do high intensity, highly focused research on something intentionally restructured to be an "AI outcomes" research question? I don't think this is pointless - agency research might naturally talk about outcomes in a way that is general across a variety of people's concerns. In particular, ethics and alignment seem like they're an unnatural split, and outcomes seems like a refactor that could select important problems from both AI autonomy risks and human agency risks. I have more specific threads I could talk about.

Reply1
Load More
30Found Paper: "FDT in an evolutionary environment"
2y
47
22"Benevolent [ie, Ruler] AI is a bad idea" and a suggested alternative (not author)
2y
11
6the gears to ascenscion's Shortform
2y
284
10A bunch of videos in comments
2y
62
14gamers beware: modded Minecraft has new malware
2y
5
30"Membranes" is better terminology than "boundaries" alone
2y
12
18"A Note on the Compatibility of Different Robust Program Equilibria of the Prisoner's Dilemma" (not author)
2y
5
2Did the fonts change?
Q
2y
Q
1
23"warning about ai doom" is also "announcing capabilities progress to noobs"
2y
5
13"a dialogue with myself concerning eliezer yudkowsky" (not author)
2y
18
Load More
Derivative
11d
(-102)
Conversations with AIs
1y
(+41)
Conversations with AIs
1y
(+55/-11)
Conversations with AIs
1y
(+117)
Drama
1y
(+115)