So I'm "back" on Less Wrong, which is to say that I was surprised to find that I already had an account and had even apparently commented on some things. 11 years ago. A career change and a whole lot of changes in the world ago. I've got a funny...
but they're not agents in the same way as the models in the thought experiments, even if they're more agentic. The base-level thing they do is not "optimise for a goal". We need to be thinking in terms of models that are shaped like the ones we actually have instead of holding on to old theories so hard we instantiate them in reality
I don't know how you "solve inner alignment" without making it so that any sufficiently powerful organisation can have an AI of whatever level we've solved that for that is fully aligned with its interests, and nearly all powerful organisations are Moloch. The AI does not itself need to ruthlessly optimise for something opposed to human interests if it is fully aligned with an entity that will do that for it.
The AI corporation does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.
my take is that they haven't changed enough. People often still seem to be talking about agents and concepts that only make sense in the context of agents all the time - but LLMs aren't agents, they don't work that way. if often feels like the Agenda for the field got set 10+ years ago and now people are shaping the narrative around it regardless of how good a fit it is for the tech that actually came along.
Good post, and additional points for not phrasing everything in programmer terms when you didn't need to.
more provocative subject headings for unwritten posts:
I don't give a fuck about inner alignment if the creator is employed by a moustache-twirling Victorian industrialist who wants a more efficient Orphan Grinder
Outer alignment has been intractable since OpenAI sold out
1. many commercial things actually are just better (and much more expensive) than residential things. This is because they are used much more by people who are less careful with them. A chair in a cafe will see many more hours of active use over a week than a chair in most peoples' homes!
2. a huge amount of residential property these days is outfitted by landlords - that is, people who don't actually have to live there - on the cheap, and with as little drilling into the walls (affecting the resale value) as possible.
inasmuch as personalised advice is possible just from reading this post (and as, inter alia, a pro copyeditor), here's mine - have a clear idea of the purpose and venue for your writing, and internalise 'rules' about writing as context-dependent only.
"We" to refer to humanity in general is entirely appropriate in some contexts (and making too broad generalisations about humanity is a separate issue from the pronoun use).
The 'buts' issue - at least in the example you shared - is at least in part a 'this clause doesn't need to exist' issue. If necessary you could just add "(scripted)" before "scenes".
Did someone advise you to do what you are doing with LLMs? I am not sure that optimising for legibility to LLM summarisers will do anything for the appeal of your writing to humans.
Box for keeping future potential post ideas:
"Can anyone recommend good resources for learning more about machine learning / AI if you are not a programmer or mathematician?" was poorly specified. One thing I can name which is much more specific would be "Here are a bunch of things that I think are true about current AIs; please confirm or deny that, while they lack technical detail, they broadly correspond to reality." And also, possibly, "Here are some things I'm not sure on", although the latter risks getting into that same failure mode wherein very very few people seem to know how to talk about any of this in a speaking-to-people-who-don't-have-the-background-I-do frame of... (read more)
Do we think that it's a problem that "AI Safety" has been popularised by LLM companies to mean basically content restrictions? Like it just seems conducive to fuzzy thinking to lump in "will the bot help someone build a nuclear weapon?" with "will the bot infringe copyright or write a sex scene?"
In fact, imo, bots have been made more harmful by chasing this definition of safety. The summarisation bots being promoted in scientific research are the way they are (e.g. prone to giving people subtly the wrong idea even when working well) in part because of work that's gone into avoiding the possibility that they reproduce copyrighted material. So they've got to rephrase, and that's where the subtle inaccuracies creep in.
Can anyone recommend good resources for learning more about machine learning / AI if you are not a programmer or mathematician? I've found it really hard to find anything substantive that doesn't assume a lot of context from those fields.
So I'm "back" on Less Wrong, which is to say that I was surprised to find that I already had an account and had even apparently commented on some things. 11 years ago. A career change and a whole lot of changes in the world ago.
I've got a funny relationship to this whole community I guess.
I've been 'adj' since forever but I've never been a rat and never, until really quite recently, had much of an interest in the core rat subjects. I'm not even a STEM person, before or after the career change. (I was in creative arts - now it's evidence-based medicine.)
I just reached the one-year anniversery of the person... (read 608 more words →)
It may well be. It's been my observation that what distracts/confuses them doesn't necessarily line up with what confuses humans, but it might still be better than your guess if you think your guess is pretty bad