Posts

Sorted by New

Wiki Contributions

Comments

Sorted by

If future more capable models are indeed actively resisting their alignment training, and this is happening consistently, that seems like an important update to be making?

Could someone explain to me what this resisting behavior during alignment training looked like in practice?

Did the model outright say "I don't want to do this?", did it produce nonsensical results, did it become deceptive, did it just ... not work?

This claim seems very interesting if true, is there any further information on this?

glamorize

glomarize is the word I believe you want to use.

oumuamua120

As a native German speaker I believe I can expand upon, and slightly disagree with, your definition.

I suspect that a significant portion of the misunderstanding about slave morality comes from the fact that the german word "Moral" (which is part of the Netzschean-term "Sklavenmoral") has two possible meanings, depending on context: Morality and morale, and it is the latter which I consider to be the more apt translation in this case.

Nietzsche was really speaking about slave morale. It is important to understand that slave morality is not an ethical system or a set of values, rather it is a mindset which facilitates by psychological mechanism the adoption of certain values and moral systems.

To be more concrete, it is a mindset that Nietzsche suspects is common among the downtrodden, raped, unlucky, unworthy, pathethic, and unfit.

Such people, according to Nietzsche, value kindness, "goodness of the heart", humility, patience, softness, and other such things, and tend to be suspicious of power, greatness, risk, boldness, ruthlessness, etc.

To the slave, the warmhearted motherly figure who cares about lost puppies is a perfect example of what a good person is like - in sharp contrast to an entrepeneurial, risk-taking type of person who wants to colonize the universe or create a great empire or whatever.

To the slave, that which causes fear is evil - to the master, inspiring fear (or, rather, awe) is an almost necessary attribute of something great, worthy, good.

So, returning to your definition: Slave morality gives rise to the idea that he who is a good boy and cleans his room deserves a cookie. That, I would agree, is a significant consequence of slave morality, but it is not its definition.

I don't think the primary decision makers at Nvidia do believe AGI is likely to be developed soon. I think they are hyping AI because it makes them money, but not really believing that progress will continue all the way to AGI in the near future.

I agree - and if they are at all rational they have expended significant resources to find out whether this belief is justified or not, and I'd take that seriously. If Nvidia do not believe that AGI is likely to be developed soon, I think they are probably right - and this makes more sense if there in fact aren't any 5-level models around and scaling really has slowed down.

If I were in charge of Nvidia, I'd supply everybody until some design shows up that I believe will scale to AGI, and then I'd make sure to be the one who's got the biggest training cluster. But since that's not what's happening yet, that's evidence that Nvidia do not believe that the current paradigms are sufficiently capable.

But how would this make sense from a financing perspective? If the company reveals that they are in posession of a 5-level model they'd be able to raise money at a much higher valuation. Just imagine what would happen to Alphabet stock if they proved posession of something significantly smarter than GPT4.

Also, the fact that Nvidia is selling its GPUs rather than keeping them all for itself does seem like some kind of evidence against this. If it were really all just a matter of scaling, why not cut everyone off and rush forward? They have more than enough resources by now to pay the foremost experts millions of dollars a year, and they'd have the best equipment too. Seems like a no-brainer if AGI was around the corner.

Similarly, he claims that the bill does not acknowledge trade-offs, but the reasonable care standard is absolutely centered around trade-offs of costs against benefits.

 

Could somebody elaborate on this?

My understanding is that if a company releases an AI model knowing it can be easily exploited ('jailbroken'), they could be held legally responsible - even if the model's potential economic benefits far outweigh its risks.

For example, if a model could generate trillions in economic value but also enable billions in damages through cyberattacks, would releasing it be illegal despite the net positive impact?

Furthermore, while the concept of 'reasonable care' allows for some risk, doesn't it prohibit companies from making decisions based solely on overall societal cost-benefit analysis? In other words, can a company justify releasing a vulnerable AI model just because its benefits outweigh its risks on a societal level?

It seems to me that this would be prohibited under the bill in question, and that very much seems to me to be a bad thing. Destroying lots of potential economic value, while having a negilgible effect on x-risk seems bad. Why not drop everything that isn't related to x-risk, and increase the demands on reporting, openness, sharing risk-assessments, etc.? Seems far more valuable and easier to comply with.

 

Yes, we will live in a world where everything will be under (some level of) cyberattack 24/7, every identity will have to be questioned, every picture and video will have to somehow be proven to be real, and the absolute most this bill can do is buy us a little bit more time before that starts happening. Why not get used to it now, and try to also maximize the advantages of having access to competent AI models (as long as they aren't capable of causing x-risks)?

1, Yes, but they also require far more money to do all the good stuff as well! I’m not saying there isn’t a tradeoff involved here.

2, Yes, I’ve read that. I was saying that this is a pretty low bar, since an ordinary person isn’t good at writing viruses. I’m afraid that the bill might have the effect of making competent jailbreakable models essentially illegal, even if they don’t pose an existential risk (in which case that would be necessary ofc.), and even if their net value for society is positive, because there is a lot of software out there that‘s insecure and that a reasonably competent coding AI could exploit and cause >500 MM in damages.

I’m saying that it might be better to tell companies to git gud at computer security and accept the fact that yes, an AI will absolutely try to break their stuff, and that they won’t get to sue Anthropic if something happens.

Correct me if I'm wrong, but it seems to me that something this law implies is that it's only legal to release jailbreakable models if they (more or less) suck.

Got something that can write a pretty good computer virus or materially enable somebody to do it? Illegal under SB1047, and I think the costs might outweigh the benefits here. If your software is so vulnerable that an LLM can hack it, that should be a you problem. Maybe use an LLM to fix it, I don't know. The benefit of AI systems intelligent enough to do that (but too stupid to pose actual existential risks) seems greater than the downside of initial chaos that would certainly ensue from letting one loose on the world.

If I had to suggest an amendment, I'd word it in such a way that as long as the model outputs publicly available information, or information that could be obtained by a human expert, it's fine. There are already humans who can write computer viruses, so your LLMs should be allowed to do it as well. What they should not be allowed to do is design scary novel biological viruses from scratch, make scary self-replicating nanotech, etc., since human experts currently can't do those things either.

Or, in case that is too scary, maybe apply my amendment only to cyber-risks, but not to bio/nuclear/nanotech,....

How is this not basically the widespread idea of recursive self improvement? This idea is simple enough that it has occurred even to me, and there is no way that, e.g. Ilya Sutskever hasn't thought about that.

oumuamua149

Don't do this, please. Just wait and see. This community is forgiving about changing ones mind.

Load More