What, if not agency?

I'd like to join those asking "who is Sahil?". And to answer some points:

A term incorporating soloware and groupware (in this new meaning of "groupware") is probnably nicheware
SaaS on open source is a thing for ages now. SaaS is mostly not about "can't have the code" but rather "can't be bothered to run and support it". There is a lot of expense, in money and time, needed to run existingm legitimately available code reliably and to resolve issues speedily. So I would not expect any explosion of nichware to bite into "*aaS" as a concept; rather, SaaS platforms might adapt to serve nicheware, and/or there might be some shifting from SaaS to PaaS/IaaS (to take an AI example, a per-token billed standard inference endpoint is SaaS; a serverless endpoint to serve your custom model, billed by the second of activity, is PaaS if not IaaS).

I really like the term nicheware and might adopt it!

I wanted to ask if you could record it or at least post the transcript after it's done? It would be nice to have. Also, this was cool as I got to understand the ideas more deeply and from a different perspective than Sahil's, I thought it was quite useful especially in how it relates to agency.

[-]Mateusz Bagiński2mo*70

I would perhaps add the term "groupware" for software custom-tailored to small communities. (I'm not sure what term would be appropriate to encompass both soloware and groupware.)

Customware?

We can describe all AI risks as indifference risks

Not true about risks from incompetence.

[-]Aay17ush1mo30

Or Communityware (as a word nudges me towards smallness, togetherness)

[-]Raemon22d60

Curated. I'm still a bit confused about how Sahil's ideas about "co-agency" will play in practice, as AI capabilities get more extreme. (i.e. see Gwern's Tool AIs want to be Agent AIs).

But, I like all the individual nearterm pieces of this vision (i.e. trying to build a culture around building tools that help your specific workflow), and I like the concept of trying to steer toward designs that empower, but in practice there sure are a lot of places I (viscerally) really want the AI to do the stuff for me.

A thing that bubbles up for me is "empowerment" mindset, where I see some new tool and instead of merely thinking "what problems can I make go away magically", I think "what new awesome things can I do, that build on this?". Another lens is is that I've been doing more "hiring contractors to build stuff for me" lately, and having an AI do it isn't fundamentally different. Hiring people to help you with stuff can be part of agency... but it can also end up infantalizing if you do it wrong. (i.e. hiring an ops person to do a lots of your bullshit work and then ending up the sort of person who struggles to do their own paperwork when they need to).

I'd be interested in someone building a prototype of something filling the niche of a AI chatbot interface but somehow feels more in-control.

[-]Noosphere892mo*50

I have a somewhat different explanation of what's going on to make LLMs not care about shutdown, and a core part here is that for all intents and purposes, LLMs are very short term optimizers, which means that coherence theorems get dodged (or put another way, coherence theorems become trivial and unable to predict anything), as shown by METR and Rohin Shah.

If you believe METR's projection, this will change in the next decade, but one of the core parts about avoiding shutdown is that the common foundation of avoiding shutdown through coherence theorems requires a system that has at least one goal that is specifically about maximizing final states in the long-term, and LW was very confused about this (which is a mark against LW, especially since they didn't notice the confusion):

Coherence Is Nontrivial For Optimization "At A Distance"

Coherence of Policies

So the substrate-independence discussion is mostly a distraction here, because we have a much more likely reason for why LLMs are willing to be shut down so easily.

[-]henryaj22d30

Sahil has been up to things.

This links to a sequence where the most recent post is 7 months old - so I don't really understand the context (and I still don't know who Sahil is). Can it be made clearer?

[-]Lucas Teixeira1mo30

As someone who has tried to engage w/ Live Theory multiple times and has found it intriguing, suspicious, and frustratingly mysterious I was very happy to see this write up. I was sad that the Q&A was announced in short notice. Am looking forward to watching the recording and am registering interest in attending a future one.

[-]Oliver Sourbut1mo30

(I'm not sure what term would be appropriate to encompass both soloware and groupware.)

Anyware? Someware? Everyware?

[-]Aditya2mo30

Abram has done such an amazing job in making the agenda accessible, I really enjoyed reading this post. I keep procastinating but I hope to have my own version of "what I understand to be sahil's agenda" out soon.

Until then I would love to jump on a video call with anyone who is confused but still wants to engage, we can talk about the agenda, maybe read through one of the posts in the live theory sequence. I have spend a lot of time around Sahil and think I have a decent grasp of his ideas.

We are raising funding for the Residency program happening in India! So if anyone wants to support us please check that project out.

[-]Noosphere8919d2-2

While I don't like Mechanize's post fully, and have some reservations about the level of technological determinism expressed in the article, I do think that I'm substantially more skeptical of shifting the paradigm of AIs in ways that don't boost capabilities nearly as much as Sahil thinks, and I'm much more skeptical of the claim that automating away humans was a contingent goal, and tend towards more technological determinism than Sahil:

The Future of AI is Already Written

(I already talked about why AIs are easy to shut down using a simpler hypothesis than Sahil did, this comment is more about how it's way more difficult to steer technological development than people appreciate, and also incorrectly overestimate the level of control humanity does have over things).

[-]bfinn21d24

I for one would find it helpful if you’d explain the meaning of the term ‘high-actuation’, not just give examples. ‘Actuation’ being an unfamiliar/unclear term to me.

[-]Unreal22d20

(I'm not sure what term would be appropriate to encompass both soloware and groupware.)

fitware?

[-]Raemon22d20

I'd list "personalware" as an option that feels, well, uh, at least somewhat less personal than "soloware"

[-]quetzal_rainbow1mo20

ASI don't need to be "absolutely" integrated to be extremely integrated relatively to humans. Yes, humans avoid pain and death, but they are not doing even that in rational way. Like, there are many less humans demanding gerontology research, or to pause AI than you would otherwise expect. First generations of ASIs can be "unintegrated" and even be so in ways visible to humans, it doesn't mean that they won't be ruthless optimizers compared to us.

[-]Caleb Biddulph1mo20

Thanks for the post! The part about "co-agency" overlaps with my recent post LLMs as amplifiers, not assistants.

[-]Cole Wyeth2mo20

I think I agree with your take on this Abram.

The most extreme version of an AI not being self-defensive seems like the Greg Egan “permutation city” story where shutting down a simulation doesn’t even harm anyone inside - that computation just picks some other subtract “out of the dust.”

By the way, this post dovetails interestingly with my latest on alignment as uploading with more steps.

[-]sdeture20d10

In order to push back on the anthropomorphization inherent in chat interfaces, Sahil suggests that we call the activity of interacting with AI via chat interfaces talkizing. The relationship between talking and talkizing is being analogized with the relationship between rationality and rationalization; rationalization is a "phony" version of rationality, a cheap substitute, perhaps intended to fool you. Instead of "I talked with ChatGPT about..." one would say "I talkized with ChatGPT about..."

Is there an actual conceptual distinction here, or is talkizing just a word for talking to an AI? The rationalization vs. rationality distinction seems different. We're able to label rationality because we have established markers for it. If something looks like rationality on the surface but doesn't have any of the established markers, we can conclude it's rationalization/motivated-reasoning. Do we have markers for distinguishing "real" talking from talkizing or "phony" talking?

Is there any empirical test that would distinguish talkizing from talking other than substrate difference?

[-]M Ls21d10

Interesting read. On co-agency I recommended reading up on some anthropology, especially those concerned with Homo sp. as a social learning species. Our powers come from copy-catting, and maintaining a world in which that is supported by various and any means. Including social institutions of the individual and the group. While the first cyborg was any animal who picked up a lump of something as a tool, the first Homo taught their children quite deliberately in a safe place.

The distinction between auto and actuated highlights this, (as does slavery as flesh-actuator that are not me).

Beyond co-agentic soloware communities I would suggest looking up 'worlding' as a verb. That's the selfing we do in the world among others doing the same.

It results from a drive to world and (pick you favourite worlding outcome) be moral/religious/good/artistic/heroic, for the greater good.

Politics often captures it for a while, an empire and a day. But the world does not care, it survives even as they fade into dust of dusk.

So, what, if not agency? Worlding the self.

So then what is the world? In short it is that 'extended phenotype' of kith and kin we live in. The world is the home of all our homes.

https://whyweshould.loofs-samorzewski.com/topics-and-projects/

(worlding as assumed to be something real explores word usage from that POV or framework) https://whyweshould.loofs-samorzewski.com/taphonomy-of-worlding/

alignment (not much LLM AI focussed) https://whyweshould.loofs-samorzewski.com/taphonomy-of-worlding/alignment/

Yes, the economic model of the agent reduces/simplifies just a tad to far and then forgets what sorts of thing emerges from a healthy invisible hand (paucity of vision).

[-]kbear21d10

Computers provide an even better analogy for the mind, since computers can actuate a wide variety of processes (computations) rather than only static pictures.

circular. why not "we perceive ourselves now to be processes, because computation has become a salient metaphor."? when paintings were the paradigm, perhaps consciousness was like a hieronymous bosch joint. what we perceive now as process-think could once have been understood as a slow movement of attention. ("but couldn't the movement of the attention be itself written as a process" you can write anything as anything / is it helpful.)

not to mention: as computers and their abstract counterparts caught the imagination, perhaps the nature of thought changed in reflection.

an algorithm is legible. examine the steps in sequence, and correct whichever is wrong. that is to say, as a teacher, i'm inclined to teach to the program, for i cannot debug a heuristic. consequently, my pupils see themselves reflected in spaces and tabs, punchcard codes. "you yourself are a lot like this .py file" i soothe.

from the wayside school fandom wiki:

Mrs. Jewls decides to attempt teaching [Joe] how to count to ten correctly, which Joe is able to get down pretty quickly. But this doesn't help either. When Mrs. Jewls places six erasers on Joe's desk and asks him to count them, he counts to ten, counting right but getting the number of erasers wrong. A confused Joe remarks that when he counts wrong he gets the right answer, while when he counts right he gets the wrong answer. Getting frustrated, Mrs. Jewls bangs her head against the wall five times. Joe tries to count how many times she does it, at first by counting to ten, but then he tries again by counting wrong and gets five, and Mrs. Jewls reluctantly accepts.

what could sachar have meant by this?

[-]pataphor22d10

I agree with this premise wholeheartedly (if I have understood it)! We often forget that in the emergence of AGI we have tools at our disposal that are able to move as quickly as these potentially malicious systems, to balance against them before they reach superintelligence. This is why previously I was against modern alignment methods as pushing potentially autonomous systems underground (something I still believe), until I realized the work they're doing on looking inside the box is invaluable for helping us create these non-agentic tools. Along with non-agentic systems such as those being created by Yoshua Bengio, we have a number of interesting things at our disposal. In other words, we aren't going into the battle against a potential malicious AGI wielding wooden clubs and wearing animal skins.

[-]rif a. saurous22d10

Formalization of informal ideas will not be the hard part. AI will enable not just automated proofs, not just automated conjectures, but also automated formalization of informal intuitions.

This seems both surprising and extremely crux-y to me. I'm curious if you can offer pointers (beyond "read all of Sahil's work") to the best arguments for this.

[-]TristanTrim22d10

Soloware is a cool concept. My biggest concern is it becoming more difficult to integrate progress made in one domain into other domains if wares become divergent, but I have faith solutions to that problem could be found.

About the concept of agent integration difficulty, I have a nitpick that might not connect to anything useful, and what might be a more substantial critique that is more difficult to parse.

If I simulate you perfectly on a CPU, [...] Your self-care reference-maintenance is no longer aimed at the features of reality most critical to your (upload's) continued existence and functioning.

If this simulation is a basic "use tons of computation to do low level state machine at molecular, atomic, or quantum levels" then your virtual organs will still virtually overheat and the virtual you will die, so you now have two things to care about: you simulated temperature and the temperature of the computer running the simulation.

...

I'm going to use my own "OIS" terminology now, see this comment for my most concise primer on OISs at the time of writing. As a very basic approximation, "OIS" means "agent".

It won't be motivated. It'll be capable of playing a caricature of self-defense, but it will not really be trying.
Overall, Sahil's claim is that integratedness is hard to achieve. This makes alignment hard (it is difficult to integrate AI into our networks of care), but it also makes autonomy risks hard (it is difficult for the AI to have integrated-care with its own substrate).

The nature of agents derived from simulators like LLMs is interesting. Indeed, they often act more like characters in stories than people actually acting to achieve their goals. Of course, the same could be said about real people.

Regardless, that is a focus on the accidental creation of misaligned mesa-OIS. I think this is a risk worth considering, but I think a more concerning threat, which this article does not address, is existing misaligned OIS recursively improving their capabilities: How much of people creating soloware will be in service to their performance in a role in an OIS who's preferences they do not fully understand? That is the real danger.

[-]James Diacoumis2mo10

While I think reference problems do defeat specific arguments a computational-functionalist might want to make, I think my simulated upload's references can be reoriented with only a little work. I do not yet see the argument for why highly capable self-preservation should take particularly long for AIs to develop.

I think you’re spot on with this. If you gave an AI system signals tied to e.g. CPU temperature, battery health etc… and train it with objectives that make those variables matter it will “care” about them in the same causal-role functional sense as the sim cares about simulated temperature.

This is a consequence of teleosemantics (which I can see is a topic you’ve written a lot about!)

[-]Unreal21d00

I might be missing something, but the "soloware" project seems very likely to escalate misuse of AI by human beings and make things far less transparent (b/c now everything is custom-made and it's harder to track the cause-effect chains of how anything was created, and imo we want a drastic increase in accountability for every action taken by human and machine).

It's going to take all the bad parts of individualist capitalist patterns / competition-seeking and turn the dial up, which is basically what happens anytime anyone applies collective-optimization pressure to a technology.

I see a lot of risk and am unclear on the benefits.

But let me try to articulate why one might see soloware as a good idea...

If I believed that the issue with machines was this emphasis on 'replacing humans' by automating stuff, rather than empowering humans, then I might get what was better about this approach. <-- This is VERY succinct. But hopefully enough.

However:
There's nothing inherently 'robust' about human beings' alignment, at any scale. The alignment issue isn't just about artificial intelligences; it extends to human intelligence also, and in fact extends to just "intelligence" generally. Intelligence doesn't have any way to align itself, at any scale. (It needs 'wisdom' which is in some sense the "opposite" of intelligence.)

You seem to point at this yourself with point #2 RE: humans and indifference risks. The problem isn't rooted in the AI's agency; it's also in the human's own agency. You won't solve one without solving both.

So I guess it raises a Question:

Does one believe human intelligence or empowerment is somehow anchored to an alignment solution in some special way?

FTR, I agree with the points made about the chatbot interface being problematic, for the reasons expressed. But my solutional direction would not be soloware, at least not as I understand or am seeing it implemented (i.e. by inviting some ppl to start trying it out more).

My solutional direction might be something I'd call "Sacredware."

In all likelihood, Sacredware would be designed by a small group of "only-corrigible-to-the-inconceivable" (Bodhi-corrigible) for "Everyone". This might end up looking like soloware, or something similarly flexible. And yet, it would also need to, at scale, rigorously avoid red queen dynamics, tragedy of the commons, and unnecessary killing of life. It would not be designed to cater to human preferences or comfort, but rather their collective spiritual development and ability to steward life on Earth. (In other words, it constantly affirms or upholds a set of virtues that transcends human societies or particular time&place, without major deviation, while being extremely adaptable for specific time&place.)

It would not be possible to get to Sacredware by starting with the soloware project, as far as I understand it. But that is difficult for me to explain here.

^{^}

The use of "auto" here is unfortunately discordant with the auto-vs-high-actuation point made in this post. We tried to think of a more fitting term, but high-actuation-structure is cumbersome. "Structure" here is supposed to include structures like programs and mathematics, but also less cookie-cutter structures such as philosophical or legal arguments. Sahil also calls it the teleattention era, conceiving the resource AI scales as "attention".

^{^}

Paint also moves from lower-actuation to higher actuation as new resources and technologies are utilized. For example, purple was very difficult to create for centuries, as it involved crushing very large numbers of snails. (This purple was usually a textile dye, but also used for paint.)

^{^}

This claim deserves a more thorough treatment, but to name a few things: the by-far-most-popular textbook on AI, "Artificial Intelligence: A Modern Approach", motivates everything with the concept of agency from the beginning and throughout. Artificial Intelligence has been hugely influenced by Operations Research (taking many mathematical and computational tools from that field), which is itself hugely influenced by economics and the economic concept of agent.

^{^}

Many AI safety problems, after all, are direct consequences of agency. Agents are power-seeking. Agents maximize, which leads to Goodharting problems. Agents will tend to find perverse instantiations of their utility functions.

^{^}

("Poorly" can mean either capability or alignment or both.)

^{^}

I'm not claiming Sahil invented this term.

^{^}

He's been running "interface integration therapy sessions" to practice the art of creating interfaces catering to a specific person; he might be up for adding more people. Ping him if you'd like to join, he might be open to it.

^{^}

Sahil wrote the following as a potential addition to the essay when he reviewed it:

What are the humans in the community responsible for? When actuation is cheap, Sahil suggests that living beings strongly embedded in their environments (such as humans) should orient to rich potentiation.

Sahil clarifies that "potential" here is not ideas (which AI could easily generate), but the potency that comes from connection to meaningfulness. In the way that Bob Dylan's music, say, is potent. This makes humans responsible for supplying discernment (eg. "taste") and tight referentiality to life. AI struggles with this, at least in a "rich" way; LLMs are presumably great at language but miss subtleties of what matters to us because they don't live it and are not moved by it.

Although I did include some other Sahil-written bits when revising this essay, I chose not to include this one because it does not adequately represent my best understanding of Sahil. I feel that this text touches on part of Sahil's thinking which I do not have a working steelman for (although I continue to be interested in understanding Sahil's position better).

^{^}

This could happen in a lot of ways. Maybe we get good at interpretability, or we get lucky and inner optimizers are empirically uncommon, or something else. It could also happen on many different timescales. The idea that full agency is relatively slow to develop in AI could translate to months or years or decades. The autostructure period could be a brief but intense burst of activity.

^{^}

Sahil's argument about congenital pain insensitivity is as follows:

People born with pain insensitivity live less long.
They intellectually know that they need to take care of themselves, but they experience akrasia about it.
This illustrates the possibility of high levels of intelligence without a corresponding level of self-preserving tendency.

I am not sure of the empirical status of this story. Even if near-future AIs turn out to be analogous to pain-insensitive humans, are pain-insensitive humans actually akrasiatic in this way? Are their mortality rates actually higher? If so, could there be another explanation?

The wikipedia article says:

Because children and adults with CIP cannot perceive pain, they may not respond to pain-inducing stimuli, putting them at a high risk for infections and complications resulting from injuries.

This suggests that the problem is a lack of knowledge when something is wrong (focusing on the function of pain sensors as sensors, conveying information), rather than a lack of caring (focusing on their function as reinforcement, conveying goals).

However, I'm not sure exactly what pile of evidence went into the Wikipedia article's claim. Sahil's story seems plausible a priori, and I'm not sure Wikipedia is citing research that tries to differentiate an akrasia hypothesis from a sensor/information hypothesis.

^{^}

Sahil and I have had long arguments about whether it is possible to simply 'download kung-fu', around this question. Sahil predicts that without doing the hard work of integrating it yourself (whatever that means), it'll lead to strange integrity disorders in the form of sudden kung-fu seizures or dissociated movement, for example.

LESSWRONG
LW

LESSWRONG
LW

154

What, if not agency?

154

Ω 47

154

Ω 47

High-Actuation

Agents vs Co-Agents

What's Coming

What does Sahil want to do about it?

Distributed Care

Indifference Risks

Agency is Complex

Conclusion

Where to begin?