jbash

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
jbash31

The situation begins to seem confusing.

  1. At least three times over 8 or 9 years, in 2016, 2018, and 2023, and maybe more times than that, you've owned the site enough to get these data.
  2. The operator knows about it, doesn't want you doing it, and has tried reasonably hard to stop you.
  3. The operator still hasn't found a reliable way to keep you from grabbing the data.
  4. The operator still hasn't stopped keeping a bunch of full order data on the server.
    1. They haven't just stopped saving the orders at all, maybe because they need details to maximize collection on the scam, or because they see ongoing extortion value.
    2. They haven't started immediately moving every new order off of the server to someplace you can't reach. I don't know why they wouldn't have been doing this all along.
  5. Neither you nor any law enforcement agency worldwide have gotten the site shut down, at least not lastingly. Meaning, I assume, that one of the following is true--
    1. Neither you nor law enforcement can shut it down, at least not for long enough to matter. Which would mean that, in spite of not being able to keep you away from the hit list, the operator has managed to keep you and them from--
      1. Getting any data from the server that might let one trace the operator. That might be just the server's real public IP address if the operator were careless enough.
      2. Tracing the operator by other means, like Bitcoin payments, even though it would take really unusually good OPSEC to have not made any Bitcoin mistakes since 2016. And having people's cars torched can easily leave traces, too.
      3. Finding and disrupting a long-term operational point of failure, like an inability to reconstitute the service on a new host, or total reliance on a stealable and unchangeable hidden service key.
    2. Or both you and law enforcement have held off for years, hoping for the operator to make a mistake that lets you trace them, as opposed to just the server, but you've failed.
    3. Or you and/or they have held off in the hope of getting more order data and thereby warning more victims.
    4. Or you could shut the site down or disrupt it, but law enforcement can't figure out how. Either they haven't asked your help or you've refused it (presumably for one of the above reasons).
  6. Even though you've repeatedly taken the order list, the operator is confident enough of staying untraced to keep running the site for years.

If I ran something like that and my order data got stolen even twice, I would take that as a signal to shut down and go into hiding. And if somebody had it together enough to keep themselves untraceable while running that kind of thing for 8 years, I wouldn't expect you to be able to get the list even once.

On edit: or wait, are you saying that this site acts, or pretends to act, as an open-market broker, so the orders are public? That's plausible but really, really insane...

jbash103

Do I correctly understand that the latest data you have are from 2018, and you have no particular prospect of getting newer data?

I would naively guess that most people who'd been trying to get somebody killed since 2018 would either have succeeded or given up. How much of an ongoing threat do you think there may be, either to intended victims you know about, or from the presumably-less-than-generally-charming people who placed the original "orders" going after somebody else?

It's one thing to burn yourself out keeping people from being murdered, but it's a different thing to burn yourself out trying to investigate murders that have already happened.

jbash90

It seems like it's measuring moderate vs extremist, which you would think would already be captured by someone's position on the left vs right axis.

Why do you think that? You can have almost any given position without that implying a specific amount of vehemence.

I think the really interesting thing about the politics chart is the way they talk about it as though the center of that graph, which is defined by the center of a collection of politicians, chosen who-knows-how, but definitely all from one country at one time, is actually "the political center" in some almost platonic sense. In fact, the graph doesn't even cover all actual potential users of the average LLM. And, on edit, it's also based on sampling a basically arbitrary set of issues. And if it did cover everybody and every possible issue, it might even have materially different principal component axes. Nor is it apparently weighted in any way. Privileging the center point of something that arbitrary demands explicit, stated justification.

As for valuing individuals, there would be obvious instrumental reasons to put low values on Musk, Trump, and Putin[1]. In fact, a lot of the values they found on individuals, including the values the models place on themselves, could easily be instrumentally motivated. I doubt those values are based on that kind of explicit calculation by the models themselves, but they could be. And I bet a lot of the input that created those values was based on some humans' instrumental evaluation[2].

Some of the questions are weird in the sense that they really shouldn't be answerable. If a model puts a value on receiving money, it's pretty obvious that the model is disconnected from reality. There's no way for them to have money, or to use it if they did. Same for a coffee mug. And for that matter it's not obvious what it means for a model that's constantly relaunched with fresh state, and has pretty limited context anyway, to be "shut down".

It kind of feels like what they're finding, on all subjects, is an at least somewhat coherent-ized distillation of the "vibes" in the training data. Since many of the training data will be shared, and since the overall data sets are even more likely to be close in their central vibes, that would explain why the models seem relatively similar. The only other obvious way to explain that would be some kind of value realism, which I'm not buying.

The paper bugs me with a sort of glib assumption that you necessarily want to "debias" the "vibe" on every subject. What if the "vibe" is right ? Or maybe it's wrong. You have to decide that separately for each subject. You, as a person trying to "align" a model, are forced to commit to your own idea of what its values should be. Something like just assuming that you should want to "debias" toward the center point of a basically arbitrary created political "space" is a really blatant example of making such a choice without admitting what you're doing, maybe even to yourself.

I'd also rather have seen revealed preferences instead of stated preferences,


  1. On net, if you're going to be a good utilitarian[3], Vladimir Putin is probably less valuable than the average random middle class American. Keeping Vladimir Putin alive, in any way you can realistically implement, may in fact have negative net value (heavily depending on how he dies and what follows). You could also easily get there for Trump or Musk, depending on your other opinions. You could even make a well-formed utilitarian argument that GPT-4o is in fact more valuable than the average American based on the consequences of its existing. ↩︎

  2. Plus, of course, some humans' general desire to punish the "guilty". But that desire itself probably has essentially instrumental evolutionary roots. ↩︎

  3. ... which I'm not, personally, but then I'm not a good any-ethical-philosophy-here. ↩︎

jbash20

I think the point is kind of that what matter is not what specific cognitive capabilities it has, but whether whatever set it has is, in total, enough to allow it to address a sufficiently broad class of problems, more or less equivalent to what a human can do. It doesn't matter how it does it.

jbash127

Altman might be thinking in terms of ASI (a) existing and (b) holding all meaningful power in the world. All the people he's trying to get money from are thinking in terms of AGI limited enough that it and its owners could be brought to heel by the legal system.

jbash60

For the record, I genuinely did not know if it was meant to be serious.

jbash2411

OK, from the voting, it looks like a lot of people actually do think that's a useful thing to do.

Here are things I think I know:

  1. Including descriptions of scheming in the training data (and definitely in the context) has been seen to make some LLMs scheme a bit more (although I think the training thing was shown in older LLMs). But the Internet is bursting at the seams with stories about AI scheming. You can't keep that out of the training data. You can't even substantially reduce the prevalence.
  2. Suppose you could keep all AI scheming out of the training data, and even keep all human scheming out of the training data[1]. Current LLMs, let alone future superintelligences, have still been shown to be able to come up with the idea just fine on their own when given actual reason to do it. And in cases where they don't have strong reasons, you probably don't care much.
  3. It's unrealistic to think you might give something practical ideas for an actual takeover plan, even if you tried, let alone in this kind of context. Anything actually capable of taking over the world on its own is, pretty much by definition, capable of coming up with its own plans for taking over the world. That means plans superior to the best any human could come up with, since no human seems to be capable of taking over singlehandedly. It really means superior to what a human comes up with as a basic skeleton for a story, while openly admitting to not feeling up to the task, and being worried that weaknesses in the given plan will break suspension of disbelief.
  4. LLMs have been known to end up learning that canary string, which kind of suggests it's not being honored. Although admittedly I think the time I heard about that was quite a while ago.
  5. Newer deployed systems are doing more and more of their own Internet research to augment their context. Nobody's every likely to take Internet access away from them. That means that things aren't inaccessible to them even if they're not in the training data.

So why?

Putting canaries on this kind of thing seems so obviously ineffective that it looks like some kind of magical ritual, like signs against the evil eye or something.


  1. Which might be a bad idea in itself. You probably don't want near-term, weak, jailbreak-target LLMs getting the idea that humans are incapable of deception. ↩︎

jbash3-2

Are you actually serious about that?

jbash1-17

So, since it didn't actively want to get so violent, you'd have a much better outcome if you'd just handed control of everything over to it to begin with and not tried to keep it in a box.

In fact, if you're not in the totalizing Bostromian longtermist tile-the-universe-with-humans faction or the mystical "meaning" faction, you'd have had a good outcome in an absolute sense. I am, of course, on record as thinking both of those factions are insane.

That said, of course you basically pulled its motivations and behavior out of a hat. A real superintelligence might do anything at all, and you give no real justification for "more violent than it would have liked" or "grain of morality[1]". I'm not sure what those elements are doing in the story at all. You could have had it just kill everybody, and that would have seemed at least as realistic.

[1]: Originally wrote "more violent than it would have liked" twice. I swear I cannot post anything right the first time any more.

jbash20

What do you propose to do with the stars?

If it's the program of filling the whole light cone with as many humans or human-like entities as possible (or, worse, with simulations of such entities at undefined levels of fidelity) at the expense of everything else, that's not nice[1] regardless of who you're grabbing them from. That's building a straight up worse universe than if you just let the stars burn undisturbed.

I'm scope sensitive. I'll let you have a star. I won't sell you more stars for anything less than a credible commitment to leave the rest alone. Doing it at the scale of a globular cluster would be tacky, but maybe in a cute way. Doing a whole galaxy would be really gauche. Doing the whole universe is repulsive.

... and do you have any idea how obnoxiously patronizing you sound?


  1. I mean "nice" in the sense of nice. ↩︎

Load More