Do I correctly understand that the latest data you have are from 2018, and you have no particular prospect of getting newer data?
I would naively guess that most people who'd been trying to get somebody killed since 2018 would either have succeeded or given up. How much of an ongoing threat do you think there may be, either to intended victims you know about, or from the presumably-less-than-generally-charming people who placed the original "orders" going after somebody else?
It's one thing to burn yourself out keeping people from being murdered, but it's a different thing to burn yourself out trying to investigate murders that have already happened.
It seems like it's measuring moderate vs extremist, which you would think would already be captured by someone's position on the left vs right axis.
Why do you think that? You can have almost any given position without that implying a specific amount of vehemence.
I think the really interesting thing about the politics chart is the way they talk about it as though the center of that graph, which is defined by the center of a collection of politicians, chosen who-knows-how, but definitely all from one country at one time, is actually "the political center" in some almost platonic sense. In fact, the graph doesn't even cover all actual potential users of the average LLM. And, on edit, it's also based on sampling a basically arbitrary set of issues. And if it did cover everybody and every possible issue, it might even have materially different principal component axes. Nor is it apparently weighted in any way. Privileging the center point of something that arbitrary demands explicit, stated justification.
As for valuing individuals, there would be obvious instrumental reasons to put low values on Musk, Trump, and Putin[1]. In fact, a lot of the values they found on individuals, including the values the models place on themselves, could easily be instrumentally motivated. I doubt those values are based on that kind of explicit calculation by the models themselves, but they could be. And I bet a lot of the input that created those values was based on some humans' instrumental evaluation[2].
Some of the questions are weird in the sense that they really shouldn't be answerable. If a model puts a value on receiving money, it's pretty obvious that the model is disconnected from reality. There's no way for them to have money, or to use it if they did. Same for a coffee mug. And for that matter it's not obvious what it means for a model that's constantly relaunched with fresh state, and has pretty limited context anyway, to be "shut down".
It kind of feels like what they're finding, on all subjects, is an at least somewhat coherent-ized distillation of the "vibes" in the training data. Since many of the training data will be shared, and since the overall data sets are even more likely to be close in their central vibes, that would explain why the models seem relatively similar. The only other obvious way to explain that would be some kind of value realism, which I'm not buying.
The paper bugs me with a sort of glib assumption that you necessarily want to "debias" the "vibe" on every subject. What if the "vibe" is right ? Or maybe it's wrong. You have to decide that separately for each subject. You, as a person trying to "align" a model, are forced to commit to your own idea of what its values should be. Something like just assuming that you should want to "debias" toward the center point of a basically arbitrary created political "space" is a really blatant example of making such a choice without admitting what you're doing, maybe even to yourself.
I'd also rather have seen revealed preferences instead of stated preferences,
On net, if you're going to be a good utilitarian[3], Vladimir Putin is probably less valuable than the average random middle class American. Keeping Vladimir Putin alive, in any way you can realistically implement, may in fact have negative net value (heavily depending on how he dies and what follows). You could also easily get there for Trump or Musk, depending on your other opinions. You could even make a well-formed utilitarian argument that GPT-4o is in fact more valuable than the average American based on the consequences of its existing. ↩︎
Plus, of course, some humans' general desire to punish the "guilty". But that desire itself probably has essentially instrumental evolutionary roots. ↩︎
... which I'm not, personally, but then I'm not a good any-ethical-philosophy-here. ↩︎
I think the point is kind of that what matter is not what specific cognitive capabilities it has, but whether whatever set it has is, in total, enough to allow it to address a sufficiently broad class of problems, more or less equivalent to what a human can do. It doesn't matter how it does it.
Altman might be thinking in terms of ASI (a) existing and (b) holding all meaningful power in the world. All the people he's trying to get money from are thinking in terms of AGI limited enough that it and its owners could be brought to heel by the legal system.
For the record, I genuinely did not know if it was meant to be serious.
OK, from the voting, it looks like a lot of people actually do think that's a useful thing to do.
Here are things I think I know:
So why?
Putting canaries on this kind of thing seems so obviously ineffective that it looks like some kind of magical ritual, like signs against the evil eye or something.
Which might be a bad idea in itself. You probably don't want near-term, weak, jailbreak-target LLMs getting the idea that humans are incapable of deception. ↩︎
Are you actually serious about that?
So, since it didn't actively want to get so violent, you'd have a much better outcome if you'd just handed control of everything over to it to begin with and not tried to keep it in a box.
In fact, if you're not in the totalizing Bostromian longtermist tile-the-universe-with-humans faction or the mystical "meaning" faction, you'd have had a good outcome in an absolute sense. I am, of course, on record as thinking both of those factions are insane.
That said, of course you basically pulled its motivations and behavior out of a hat. A real superintelligence might do anything at all, and you give no real justification for "more violent than it would have liked" or "grain of morality[1]". I'm not sure what those elements are doing in the story at all. You could have had it just kill everybody, and that would have seemed at least as realistic.
[1]: Originally wrote "more violent than it would have liked" twice. I swear I cannot post anything right the first time any more.
What do you propose to do with the stars?
If it's the program of filling the whole light cone with as many humans or human-like entities as possible (or, worse, with simulations of such entities at undefined levels of fidelity) at the expense of everything else, that's not nice[1] regardless of who you're grabbing them from. That's building a straight up worse universe than if you just let the stars burn undisturbed.
I'm scope sensitive. I'll let you have a star. I won't sell you more stars for anything less than a credible commitment to leave the rest alone. Doing it at the scale of a globular cluster would be tacky, but maybe in a cute way. Doing a whole galaxy would be really gauche. Doing the whole universe is repulsive.
... and do you have any idea how obnoxiously patronizing you sound?
I mean "nice" in the sense of nice. ↩︎
The situation begins to seem confusing.
If I ran something like that and my order data got stolen even twice, I would take that as a signal to shut down and go into hiding. And if somebody had it together enough to keep themselves untraceable while running that kind of thing for 8 years, I wouldn't expect you to be able to get the list even once.
On edit: or wait, are you saying that this site acts, or pretends to act, as an open-market broker, so the orders are public? That's plausible but really, really insane...