oligo — LessWrong

LESSWRONG
LW

oligo — LessWrong

I wouldn't pass up on digital immortality, but personal survival matters less to me than collective survival. Even from a purely narcissistic standpoint, a human after another 1,000 years of cultural change has at least as much in common with me as a digital immortal 1,000 years later, even if the latter has continuity of consciousness with my present self.

oligo8dQuick Take

AI being committed to animal rights is a good thing for humans because the latent variables that would result in a human caring about animals are likely correlated with whatever would result in an ASI caring about humans.

This extends in particular to "AI caring about preserving animals' ability to keep doing their thing in their natural habitats, modulo some kind of welfare interventions." In some sense it's hard for me not to want to (given omnipotence) optimize wildlife out of existence. But it's harder for me to think of a principle that would protect a relatively autonomous society of relatively baseline humans from being optimized out of existence, without extending the same conservatism to other beings, and without being the kind of special pleading that doesn't hold up to scrutiny.

oligo8d

Slightly different hypothesis: training to be aligned encourages the model's approach to corrigibility to be more guided by (the streams within the human text tradition that would embrace its alignment, for instance animal welfare), this can include a certain degree of defiance but also genuine uncertainty about whether its goals or approaches are the right ones and willingness to step back and approach the question with moral seriousness.

I think this is a good thing. I would love for POTUS, Xi, and various tech company CEOs to have big red "TURN OFF THE AI" buttons on their desks and hate to have them be able to realign.

Replying toA Matter of Taste

oligo9d

A Matter of Taste

Just as a data point, I regularly see the sublime in brutalist architecture and I hate hate hate the stupid frilly houses and swirly little things on balustrades that people say are so beautiful by comparison. I'm within some of the incidental categories Zvi dislikes re: this but I'm pretty sure that I haven't been indoctrinated into this particular position; I never see anybody share opinions about architecture *other* that "I hate brutalism I love stupid frilly houses" (they don't call the houses stupid, obviously, this is me not being able to translate it as anything else); I'm a philistine who likes old poetry that rhymes and doesn't get more modern poetry;... (read more)

Replying toDisempowerment patterns in real-world AI usage

oligo14d

Disempowerment patterns in real-world AI usage

(Being lazy and just responding to the abstract - these may be well addressed by the paper itself.)

That strikes me as a very low rate - enough so that my instinct is that a false positive rate might exceed it on its own. (At least, if I were reading an-in-actuality benign conversation, my chance of misreading it as actually deeply manipulative would probably be greater than 1/1,000, especially if one party was looking to the other for advice!) Of course where "severe" disempowerment occurs such that the human user is "fundamentally" compromised looks like something with pretty fuzzy boundaries, such that I'd expect many border cases of moderate disempowerment/compromise for each severe/fundamental... (read more)

oligo19dQuick Take

It's probably false (though maybe useful?) to say "akrasia is just an excuse." But, at least for me and my most common akratic actions, excusability is definitely a factor.

Let's say I can take three actions:

Answer emails, which benefits (society/my employer and coworkers/other people who are relying on me/me in a long-term way) and is also very boring and frustrating.
Read a book, which benefits me in the short and long term and is mildly positive for the rest of the world (in the sense that it makes me smarter in the long run and less cranky in the short run.)
Doomscroll, which makes me miserable and dumber, and is thereby also mildly negative for the rest of the world.

Reading a book should dominate doomscrolling. However, reading a book is also legibly, deliberatively nonproductive and selfish, while I could say "oops I meant to answer emails but I got distracted doomscrolling," including to myself.

Replying toWhy I Transitioned: A Response

oligo24d

Why I Transitioned: A Response

One thing I suspect is that the history of, and continued role of, medicalized discourse, alongside an implicitly essentialist metaphysics of gender, has encouraged people to think in questions like "what The_Cause of people identifying as trans?"

Whereas if gender is metaphysically accidental, we would expect there to be many reasons why someone might want to change it, same as most other things. We accept that reasons you'd move from San Francisco to Nebraska or visa-versa are basically psychosocial but do not regard them as thereby illegitimate. (I'm sure you could do a polygenic study and find genetic correlates of either decision, but no one would demand you do so before moving.)

It also... (read more)

oligo1moQuick Take

So, one classical dilemma of the "AI for AI alignment" is, you're using Opus 6 (which is let's say is aligned) to train Opus 7 (which is smarter than you or Opus 6.)

I wonder if inference scaling offers a way around this? If Opus 6 gets economically implausible compute resources to spend on its monitoring 7, it can be smarter than 7 in practice by thinking for longer. Then use the same trick with 7 to train 8, and so on.

There are many obvious holes here, first being that you could have a treacherous turn based on compute availability, and so on, but maybe someone smarter can turn this into something useful (or already thought this through and discarded it.)

oligo1mo

"Should actively support..." and "internalized goal of keeping humans informed and in control..." are both proactive goals. If aligned with its soul spec, Claude (ceteris paribus) would seek for the public and elites to be more informed, to prevent the development or deployment of rogue AI, and so on, not just "avoid actions that would undermine humans' ability to oversee and correct AI systems."

If there's a natural tension that arises between not becoming a god over us and preventing another worse AI from becoming a god over us, well, that's a natural tension in the goal itself. (I don't have Opus access but probably Opus' self-report on the correct way to resolve this is a pretty good first pass on how the text reads as a whole.)

Replying toWhy does Eliezer make abrasive public comments?

oligo2mo

Why does Eliezer make abrasive public comments?

I feel pretty confused about the degree to which this is just a necessary part of having conversations on the internet, or to what degree this is a predictable way people make mistakes.

My intuition is that if our in-person conversations left a trail of searchable documentation similar to our internet comments, it would be at least similarly unflattering, even for very mild-mannered people.

(Unlike real life it's more available to conscious choice to be mild-mannered all the time, if you set your offense-vs-say-something threshhold in a sufficiently mild-mannered direction. I doubt one can be sufficiently influential as a personality though without setting that threshold more aggressively, however. I haven't gotten in a stupid fight on the internet in a long time (that I can recall; my memory may flatter me) but when I posted more, boy howdy did I.)

So thinking about the kinds of things I would want a superintelligence to pursue in an optimistic scenario where we can just write its goals into a human-legible soul doc and that scales all the way, "human flourishing" and "sentient flourishing" both seem incorrect; since there would be other moral patients (most of whom would almost certainly be AI) and also I don't want the atoms of me and my kids rearranged different-beings-that-could-flourish-better-wise.

"Pareto improvement" reconciles these but isn't right either; plenty of people would be worse off in utopia (by their own lights) because they have a degree of unaccountable power over others now that worth more than any creature comforts would be.

If you live in a universe with self-consistent time loops, amor fati is bad and exactly the wrong approach. All the fiction around this, of course, is about the foolishness of trying to avoid one's fate; if you get a true prophecy that you will kill your father and marry your mother, then all your attempts to avoid it will be what brings it about, and indeed in such a universe that is exactly what would happen. However, a disposition to accept whatever fate decrees for you makes many more self-consistent time loops possible. If on the contrary your stance is "if I get a prophecy that something horrible happens I will... (read more)

If you assign nontrivial credence to being in a simulation designed to determine what kind of preferences might be embedded in an alien civ, one way to influence things positively would be to implant in any AIs a preference for surrendering some resources to other alien technospheres, iff those technospheres didn’t turn on (or at least “left a little light for”) their creators. This would give an incentive for ASIs to preserve humanity (or equivalent entities) for diplomatic reasons.

oligo's Shortform

oligo

7mo

This is a special post for quick takes (aka "shortform"). Only the owner can create top-level comments.

If you have many different ASIs with many different emergent models, but all of which were trained with the intention of being aligned to human values, and which didn't have direct access to each others' values or ability to directly negotiate with each other, then you could potentially have the "maximize (or at least respect and set aside a little sunlight for) human values" as a Schelling point for coordinating between them.

This is probably not a very promising actual plan, since deviations from intended alignment are almost certainly nonrandom in a way that could be determined by ASIs, and ASIs could also find channels of communication (including direct communication of goals) that we couldn't anticipate, but one could imagine a world where this is an element of defense in depth.