jbash

Posts

Sorted by New

Wiki Contributions

Comments

jbash20

What does "value" mean here? I seriously don't know what you mean by "total loss of value". Is this tied to your use of "economically important"?

I personally don't give a damn for anybody else depending on me as the source of anything they value, at least not with respect to anything that's traditionally spoken of as "economic". In fact I would prefer that they could get whatever they wanted without involving me, and i could get whatever I wanted without involving them.

And power over what? Most people right this minute have no significant power over the wide-scale course of anything.

I thought "extinction", whether for a species or a culture, had a pretty clear meaning: It doesn't exist any more. I can't see how that's connected to anything you're talking about.

I do agree with you about human extinction not necessarily being the end of the world, depending on how it happens and what comes afterwards... but I can't see how loss of control, or value, or whatever, is connected to anything that fits the word "extinction". Not physical, not cultural, not any kind.

jbash20

I also meant existing life sentences. At any given time, you may have a political change that ends them, and once that happens, it's as much a matter of law as the original sentence.

I can't see any given set of laws or constitution, or the sentences imposed under them, lasting more than a few hundred years, and probably much less.

I could see a world where they didn't get the treatments to begin with, though.

jbash41

How much does the rest of the world change?

Suppose that things in general are being run by pervasive AI that monitors everything, with every human being watched by many humans-worth of intelligence, and fast enough, ubiquitous enough robotics to stop most or all human actions before they can be completed. Why would you even have prison sentences of any kind?

If you hold everything constant and just vastly extend everybody's life span, then maybe they stay in prison until it becomes unfashionable to be so punitive, and then get released. Which doesn't mean that kind of punitiveness won't come back into fashion later. Attitudes like that can change a lot in a few centuries. For that matter the governments that enforce the rules have a shelf life.

jbash21

One obvious question, as someone who loves analyzing safety problems through near-term perspectives whenever possible, is what if the models we currently have access to are the most trusted models we'll ever have? Would these kinds of security methods work, or are these models not powerful enough?

My reasonably informed guesses:

  1. No, they are not close to powerful enough. Not only could they be deliberately fooled, but more importantly they'd break things all the time when nobody was even trying to fool them.
  2. That won't stop people from selling the stuff you propose in the short term... or from buying it.

In the long term, the threat actors probably aren't human; humans might not even be setting the high-level goals. And those objectives, and the targets available, might change a great deal. And the basic software landscape probably changes a lot... hopefully with AI producing a lot of provably correct software. At that point, I'm not sure I want to risk any guesses.

I don't know how long the medium term is.

jbash5-1

You seem to be privileging the status quo. Refraining from doing that has equally large effects on your peers.

jbash30

effective-extinction (a few humans kept in zoos or the like, but economically unimportant to the actual intelligent agents shaping the future)

Do you really mean to indicate that not running everything is equivalent to extinction?

jbash20

I think I understand what you mean.

There are definitely possible futures worse than extinction. And some fairly likely ones that might not be worse than extinction but would still suck big time. Varying from comparable to a forced move to a damnsight worse than moving to anywhere that presently exists. I'm old enough to have already had some disappointments (alongside some positive surprises) about how the "future" has turned out. I could easily see how I could get a lot worse ones.

But what are we meant to do with what you've posted and how you've framed it?

Also, if somebody does have the "non-extinction => good" mindset, I suspect they'll be prone to read your post as saying that change in itself is unacceptable, or at least that any change that every single person doesn't agree to is unacceptable. Which is kind of a useless position since, yeah, there will always be change, and things not changing will also always make some people unhappy.

I've gotta say that, even though I definitely worry about non-extinction dystopias, and think that they are, in the aggregate, more probable than extinction scenarios... your use of the word "meaning" really triggered me. That truly is a word people use really incoherently.

Maybe some more concrete concerns?

jbash53

I think it depends not on whether they're real dangers, but on whether the model can be confident that they're not real dangers. And not necessarily even dangers in the extreme way of the story; to match the amount of "safety" it applies to other topics, it should refuse if they might cause some harm.

A lot of people are genuinely concerned about various actors intentionally creating division and sowing chaos, even to the point of actually destabilizing governments. And some of them are concerned about AI being used to help. Maybe the concerns are justified and proportionate; maybe they're not justified or are disproportionate. But the model has at least been exposed to a lot of reasonably respectable people unambiguously worrying about the matter.

Yet when asked to directly contribute to that widely discussed potential problem, the heavily RLHFed model responded with "Sure!".

It then happily created a bunch of statements. We can hope they aren't going to destroy society... you see those particular statements out there already. But at a minimum many of them would at least be pretty good for starting flame wars somewhere... and when you actually see them, they usually do start flame wars. Which is, in fact, presumably why they were chosen.

It did something that at least might make it at least slightly easier for somebody to go into some forum and intentionally start a flame war. Which most people would say was antisocial and obnoxious, and most "online safety" people would add was "unsafe". It exceeded a harm threshold that it refuses to exceed in areas where it's been specifically RLHFed.

At a minimum, that shows that RLHF only works against narrow things that have been specifically identified to train against. You could reasonably say that that doesn't make RLHF useless, but it at least says that it's not very "safe" to use RLHF as your only or primary defense against abuse of your model.

Answer by jbash11-3

Machines should sound robotic. It's that simple.

Any attempt, vocal or otherwise, to make people anthromorphize them, whether consciously or unconsciously, is unethical. It should be met with social scorn and ostracism. Insofar as it can be unambiguously identified, it should be illegal. And that has everything to do with not trusting them.

Voices and faces are major anthromorphization vehicles and should get especially strict scrutiny.

The reason's actually pretty simple and has nothing to do with "doomer" issues.

When a human views something as another human, the real human is built to treat it like one. That is an inbuilt tendency that humans can't necessarily change, even if they delude themselves that they can. Having that tendency works because being an actual human is a package. The tendency to trust other humans is coevolved with the tendency for most humans not to be psychopaths. The ways in which humans distrust other humans are tuned to other humans' actual capacities for deception and betrayal... and to the limitations of those capacities.

"AI", on the other hand, is easily built to be (essentially) psychopathic... and is probably that way by default. It has a very different package of deceptive capabilities that can throw off human defenses. And it's a commercial product created, and often deployed, by commercial institutions that also tend to be psychopathic. It will serve those institutions' interests no matter how perfectly it convinces people otherwise... and if doesn't, that's a bug that will get fixed.

An AI set up to sell people something will sell it to them no matter how bad it is for them. An AI set up to weasel information out of people and use it to their detriment will do that. An AI set up to "incept" or amplify this or that belief will do it, to the best of its ability, whether it's true or false. An AI set up to swindle people will swindle them without mercy, regardless of circumstances.

And those things don't have hard boundaries, and trying to enforce norms against those things-in-themselves has always had limited effect. Mainstream corporations routinely try to do those things to obscene levels, and the groupthink inside those corporations often convinces them that it's not wrong... which another thing AI could be good at.

Given the rate of moral corrosion at the "labs", I give it about two or three years before they're selling stealth manipulation by LLMs as an "advertising" service. Five years if it's made illegal, because they'll have to find a plausibly deniable way to characterize it. The LLMs need to not be good at it.

Don't say "please" to LLMs, either.

jbash20

I think the "crux" is that, while policy is good to have, it's fundamentally a short-term delaying advantage. The stuff will get built eventually no matter what, and any delay you can create before it's built won't really be significant compared to the time after it's built. So if you have any belief that you might be able to improve the outcome when-not-if it's built, that kind of dominates.

Load More