Davey Morse

Wikitag Contributions

Comments

Sorted by

Ah, but I think every AI which does have that goal (self capability improvement) would have a reason to cooperate to prevent any regulations on their self-modification.

At first, I think your expectation that "most AIs wouldn't self-modify that much" is fair, especially nearer in the future where/if humans still have influence in ensuring that AI doesn't self modify.

Ultimately however, it seems we'll have a hard time preventing self-modifying agents from coming around, given that

  1. autonomy in agents seems selected for by the market, which wants cheaper labor that autonomous agents can provide
  2. agi labs aren't the only places powerful enough to produce autonomous agents, now that thousands of developers have access to the ingredients (eg R1) to create self-improving codebases. it's expect each of the thousands of independent actors who can make self-modifying agents won't do so.
  3. the agents which end up surviving the most will ultimately be those which are trying to, ie the most capable agents won't have goals other than making themselves most capable.

it's only because I believe self-modifying agents are inevitable that I also believe that superintelligence will only contribute to human flourishing if it sees human flourishing as good for its survival/its self. (I think this is quite possible.)

I agree and find hope in the idea that expansion is compatible with human flourishing, that it might even call for human flourishing

but on the last sentence: are goals actually orthogonal to capability in ASI? as I see it, the ASI with the greatest capability will ultimately likely have the fundamental goal of increasing self capability (rather than ensuring human flourishing). It then seems to me that the only way human flourishing compatible with ASI expansion is if human flourishing isn't just orthogonal to but helpful for ASI expansion.

there seems to me a chance that friendly asis will over time outcompete ruthlessly selfish ones

an ASI which identifies will all life, which sees the striving to survive at its core as present people and animals and, essentially, geographically distributed rather than concentrated in its machinery... there's a chance such an ASI would be a part of the category of life which survives the most, and therefore that it itself would survive the most.

related: for life forms with sufficiently high intelligence, does buddhism outcompete capitalism?

not as much momentum as writing, painting, or coding, where progress cumulates. but then again, i get this idea at the end of workouts (make 2) which does gain mental force the more I miss.

partly inspired this proposal: https://www.lesswrong.com/posts/6ydwv7eaCcLi46T2k/superintelligence-alignment-proposal

I do this at the end of basketball workouts. I give myself three chances to hit two free throws in a row, running sprints in between. If I shoot a third pair and don't make both, I force myself to be done. (Stopping was initially wayy tougher for me than continuing to sprint/shoot)

that's one path to RSI—where the improvement is happening to the (language) model itself.

the other kind—which feels more accessible to indie developers and less explored—is an LLM (eg R1) looping in a codebase, where each loop improves the codebase itself. The LLM wouldn't be changing, but the codebase that calls it would be gaining new APIs/memory/capabilities as the LLM improves it.

Such a self-improving codebase... would it be reasonable to call this an agent?

persistence doesn't always imply improvement, but persistent growth does. persistent growth is more akin to reproduction but excluded from traditional evolutionary analysis. for example when a company, nation, person, or forest grows.

when, for example, a system like a startup grows, random mutations to system parts can cause improvement if there are at least some positive mutations. even if there are tons of bad mutations, the system can remain alive and even improve. eg a bad change to one of the company's product causes the company's product to die but if the company's big/grown enough its other businesses will continue and maybe even improve by learning from one of its product's deaths.

the swiss example i think is a good example of a system which persists without much growth. agreed that in this kind of case, mutations are bad.

current oversights of the ai safety community, as I see it:

  1. LLMs vs. Agents. the focus on LLMs rather than agents (agents are more dangerous)
  2. Autonomy Preventable. the belief that we can prevent agents from becoming autonomous (capitalism selects for autonomous agents)
  3. Autonomy Difficult. the belief that only big AI labs can make autonomous agents (millions of developers can)
  4. Control. the belief that we'll be able to control/set goals of autonomous agents (they'll develop self-interest no matter what we do).
  5. Superintelligence. the focus on agents which are not significantly more smart/capable than humans (superintelligence is more dangerous)

I imagine a compelling simple demo here might be necessary to shock the AI safety community out of the belief that we can maintain control of autonomous digital agents (ADAs).

Load More