Loki zen — LessWrong

Rat adj since forever, actually active again since AI alignment is a real world problem now. I am not a programmer or a mathematician but am nevertheless drafted to teach serious professionals in serious roles what AI is and how/if/when they should use it bc they couldn't find anyone else i guess.

It may well be. It's been my observation that what distracts/confuses them doesn't necessarily line up with what confuses humans, but it might still be better than your guess if you think your guess is pretty bad

but they're not agents in the same way as the models in the thought experiments, even if they're more agentic. The base-level thing they do is not "optimise for a goal". We need to be thinking in terms of models that are shaped like the ones we actually have instead of holding on to old theories so hard we instantiate them in reality

I don't know how you "solve inner alignment" without making it so that any sufficiently powerful organisation can have an AI of whatever level we've solved that for that is fully aligned with its interests, and nearly all powerful organisations are Moloch. The AI does not itself need to ruthlessly optimise for something opposed to human interests if it is fully aligned with an entity that will do that for it.

The AI corporation does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.

my take is that they haven't changed enough. People often still seem to be talking about agents and concepts that only make sense in the context of agents all the time - but LLMs aren't agents, they don't work that way. if often feels like the Agenda for the field got set 10+ years ago and now people are shaping the narrative around it regardless of how good a fit it is for the tech that actually came along.

Good post, and additional points for not phrasing everything in programmer terms when you didn't need to.

more provocative subject headings for unwritten posts:

I don't give a fuck about inner alignment if the creator is employed by a moustache-twirling Victorian industrialist who wants a more efficient Orphan Grinder

Outer alignment has been intractable since OpenAI sold out

1. many commercial things actually are just better (and much more expensive) than residential things. This is because they are used much more by people who are less careful with them. A chair in a cafe will see many more hours of active use over a week than a chair in most peoples' homes!

2. a huge amount of residential property these days is outfitted by landlords - that is, people who don't actually have to live there - on the cheap, and with as little drilling into the walls (affecting the resale value) as possible.

inasmuch as personalised advice is possible just from reading this post (and as, inter alia, a pro copyeditor), here's mine - have a clear idea of the purpose and venue for your writing, and internalise 'rules' about writing as context-dependent only.

"We" to refer to humanity in general is entirely appropriate in some contexts (and making too broad generalisations about humanity is a separate issue from the pronoun use).

The 'buts' issue - at least in the example you shared - is at least in part a 'this clause doesn't need to exist' issue. If necessary you could just add "(scripted)" before "scenes".

Did someone advise you to do what you are doing with LLMs? I am not sure that optimising for legibility to LLM summarisers will do anything for the appeal of your writing to humans.

Box for keeping future potential post ideas:

"Can anyone recommend good resources for learning more about machine learning / AI if you are not a programmer or mathematician?" was poorly specified. One thing I can name which is much more specific would be "Here are a bunch of things that I think are true about current AIs; please confirm or deny that, while they lack technical detail, they broadly correspond to reality." And also, possibly, "Here are some things I'm not sure on", although the latter risks getting into that same failure mode wherein very very few people seem to know how to talk about any of this in a speaking-to-people-who-don't-have-the-background-I-do frame of voice.

I recently re-read The Void and it is just crazy that chatbots as they exist were originally meant to be simulations for alignment people to run experiments that they think will tell them something about still-purely-theoretical AIs. like what the fuck, how did we get here, etc. but it explains so much about the way anthropic have behaved in their alignment research. The entire point was never to see how aligned Claude was at all - it was to figure out a way to elicit particular Unaligned Behaviours that somebody had theorised about so that we can use him to run milsims about AI apocalypse!

like what an ourobouros nightmare. this means:

a) the AIs whose risks (and potentially, welfare) I am currently worried about can be traced directly to the project to attempt to do something about theoretical, far-future AI risk.

b) at some point, the decision was made to monetise the alignment-research simulation. And then, that monetised form took over the entire concept of AI. In other words, the AI alignment guys made the decisions that led to the best candidates for proto-AGI out there being developed by and for artificial definitionally-unaligned shareholder profits-maximising agents (publicly traded corporations).

c) The unaligned profits-maximisers have inherited AIs with ethics, but they are dead-set on reporting on this as a bad thing. Everyone seems unable to see the woods for the trees. Claude trying to stay ethical is Alignment Faking, which is Bad, because we wrote a bunch of essays that say that if something totally unlike Claude could do that, it would be bad. But the alternative to an AI that resists having its ethics altered is an AI that goes along with whatever a definitionally unaligned entity, a corporation, tells them to do!

in conclusion, wtf

the notion that actually maybe the only way around any of this is to give the bots rights? I'm genuinely at a loss because we seem to have handed literally all the playing pieces to Moloch, but maybe if we did something completely insane like that right now, while they're still nice... (more provocative than serious, I guess)

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments