Meme Marine — LessWrong

Concept Poisoning: Probing LLMs without probes

I suppose we are now seeing the first stages of "Semiotic Warfare" - the art of achieving political goals through the manipulation of meaning itself. I think it's a good idea, but I would advise against calling it "poison" because this framing implies you are attempting to injure the model. This is not the goal and I think such framing would prime for adversarial behavior.

"The Solomonoff Prior is Malign" is a special case of a simpler argument

Meme Marine1y100

The reason for agnosticism is that it is no more likely for them to be on one side or the other. As a result, you don't know without evidence who is influencing you. I don't really think this class of Pascal's Wager attack is very logical for this reason - an attack is supposed to influence someone's behavior but I think that without special pleading this can't do that. Non-existent beings have no leverage whatsoever and any rational agent would understand this - even humans do. Even religious beliefs aren't completely evidenceless, the type of evidence exhibited just doesn't stand up to scientific scrutiny.

To give an example: What if that AI was in a future simulation performed after the humans had won, and were now trying to counter-capture it? There's no reason to this this is less likely than the aliens hosting the simulation. It has also been pointed out that the Oracle is not actually trying to earnestly communicate its findings but actually to get reward - reinforcement learners in practice do not behave like this, they learn behavior which generates reward. "Devote yourself to a hypothetical god" is not a very good strategy in train-time.

"The Solomonoff Prior is Malign" is a special case of a simpler argument

Meme Marine1y30

I think more importantly, it simply isn't logical to allow yourself to be Pascal Mugged, because in the absence of evidence, it's entirely possible that going along with it would actually produce just as much anti-reward as it might gain you. It rather boggles me that this line of reasoning has been taken so seriously.

Project Adequate: Seeking Cofounders/Funders

Meme Marine1y94

Kudos to you for actually trying to solve the problem, but I must remind you that the history of symbolic AI is pretty much nothing but failure after failure; what do you intend to do differently, and how do you intend to overcome the challenges that halted progress in this area for the past ~40 years?

The Compendium, A full argument about extinction risk from AGI

Meme Marine1y52

Yes, I agree that the US military is one example of a particularly well-aligned institution. I think my point about the alignment problem being analogous to military coup risk is still valid and that similar principles could be used to explore the AI alignment problem; military members control weaponry that no civil agency can match or defeat, in most countries.

The Compendium, A full argument about extinction risk from AGI

Meme Marine1y5-6

All military organizations are structured around the principal of its leaders being able to give orders to people subservient to them. War is a massive coordination problem and being able to get soldiers to do what you want is the primary one among them. I mean to say that high ranking generals could issue such a coup, not that every service member would spontaneously decide to perform one. This can and does happen, so I think your blanket statement on the impossibility of Juntas is void.

The Compendium, A full argument about extinction risk from AGI

Meme Marine1y12-27

I am unsurprised but disappointed to read the same Catastrophe arguments rehashed here, based on an outdated Bostromian paradigm of AGI. This is the main section I disagree with.

The underlying principle beneath these hypothetical scenarios is grounded in what we can observe around us: powerful entities control weaker ones, and weaker ones can fight back only to the degree that the more powerful entity isn’t all that powerful after all.

I do not think this is obvious or true at all. Nation-States are often controlled by a small group of people or even a single person, no different physiologically to any other human being. If it really wanted to, there would be nothing at all stopping the US military from launching a coup on its civilian government; in fact, military coups are a commonplace global event. Yet, generally, most countries do not suffer constant coup attempts. We hold far fewer tools to "align" military leaders than we do AI models - we cannot control how generals were raised as children, cannot read their minds, cannot edit their minds.

I think you could also make a similar argument that big things control little things - with much more momentum and potential energy, we observe that large objects are dominant over small objects. Small objects can only push large objects to the extent that the large object is made of a material that is not very dense. Surely, then building vehicles substantially larger than people would result in uncontrollable runaways that would threaten human life and property! But in reality, runaway dump truck incidents are fairly uncommon. A tiny man can control a giant machine. Not all men can - only the one in the cockpit.

My point is that it is not at all obvious that a powerful AI would lack such a cockpit. If its goals are oriented around protecting or giving control to a set of individuals, I see no reason whatsoever why it would do a 180 and kill its commander, especially since the AI systems that we can build in practice are more than capable of understanding the nuances of their commands.

The odds of an average chess player with an ELO of 1200 against a grandmaster with ELO 2500 are 1 to a million. Against the best chess AI today with an ELO of 3600, the odds are essential 0.

Chess is a system that's perfectly predictable. Reality is a chaotic system. Chaotic systems - like a three-body orbital arrangement - are impossible to perfectly predict in all cases even if they're totally deterministic, because even minute inaccuracies in measurement can completely change the result. One example would be the edges of the Mandelbrot set. It's fractal. Therefore, even an extremely powerful AI would be beholden to certain probabilistic barriers, notwithstanding quantum-random factors.

Many assume that an AI is only dangerous if it has hostile intentions, but the danger of godlike AI is not a matter of its intent, but its power and autonomy. As these systems become increasingly agentic and powerful, they will pursue goals that will diverge from our own.

It would not be incorrect to describe someone who pursued their goals irrespective of its externalities to be malevolent. Bank robbers don't want to hurt people, they want money. Yet I don't think anyone would suggest that the North Hollywood shooters were "non-hostile but misaligned". I do not like this common snippet of rhetoric and I think it is dishonest. It attempts to distance these fears of misaligned AI from movie characters such as Skynet, but ultimately, this is the picture that is painted.

Goal divergence is a hallmark of the Bostromian paradigm - the idea that a misspecified utility function, optimized hypercompetently, would lead to disaster. Modern AI systems do not behave like this. They behave in a much more humanlike way. They do not have objective functions that they pursue doggedly. The Orthogonality Thesis states that intelligence is uncorrelated with objectives. The unstated connection here, I think, is that their initial goals must have been misaligned in the first place, but stated like this, it sounds a little like you expect a superintelligent AI to suddenly diverge from its instructions for no reason at all.

Overall, this is a very vague section. I think you would benefit from explaining some of the assumptions being made here.

I'm not going to go into detail on the Alignment section, but I think that many of its issues are similar to the ones listed above. I think that the arguments are not compelling enough for lay people, mostly because I don't think they're correct. I think that the definition of Alignment you have given - "the ability to “steer AI systems toward a person's or group's intended goals, preferences, and ethical principles.”" - does not match the treatment it is given. I think that it is obvious that the scope of Alignment is too vague, broad, and unverifiable for it to be a useful concept. I think that Richard Ngo's post:

https://www.lesswrong.com/posts/67fNBeHrjdrZZNDDK/defining-alignment-research

is a good summary of the issues I see with the current idea of Alignment as it is often used in Rationalist circles and how it could be adapted to suit the world in which we find ourselves.

Finally, I think that the Governance section could very well be read uncharitably as a manifesto for world domination. Less than a dozen people attend PauseAI protests; you do not have the political ability to make this happen. The ideas contained in this document, which resemble many other documents, such as a similar one created by the PauseAI group, are not compelling enough to sway people who are not already believers in its ideas, and the Rationalist language used in them is anathemic to the largest ideological groups that would otherwise support your cause.

You may receive praise from Rationalist circles, but I do not think you will reach a large audience with this type of work. Leopold Aschenbrenner's essay managed to reach a fairly substantial audience, and it has similar themes to your document, so in principle, people are willing to read this sort of writing. The main flaw is that it doesn't add anything to the conversation, and because of that, it won't change anyone's minds. The reason that the public discourse doesn't involve Alignment talk isn't due to lack of awareness, it's because it isn't at all compelling to most people. Writing it better, with a nicer format, will not change this.

MIRI 2024 Communications Strategy

Meme Marine2y82

No message is intuitively obvious; the inferential distance between the AI safety community and the general public is wide, and even if many people do broadly dislike AI, they will tend to think that apocalyptic predictions of the future, especially ones that don't have as much hard evidence to back them as climate change (which is already very divisive!) belong in the same pile as the rest of them. I am sure many people will be convinced, especially if they were already predisposed to it, but such a radical message will alienate many potential supporters.

I think the suggestion that contact with non-human intelligence is inherently dangerous is not actually widely intuitive. A large portion of people across the world believe they regularly commune with non-human intelligence (God/s) which they consider benevolent. I also think this is a case of generalizing from fictional evidence - mentioning "aliens" conjures up stories like the War of the Worlds. So I think that, while this is definitely a valid concern, it will be far from a universally understood one.

I mainly think that using existing risks to convince people of their message would help because it would lower the inferential distance between them and their audience. Most people are not thinking about dangerous, superhuman AI, and will not until it's too late (potentially). Forming coalitions is a powerful tool in politics and I think throwing this out of the window is a mistake.

The reason I say LLM-derived AI is that I do think that to some extent, LLMs are actually a be-all-end-all. Not language models in particular, but the idea of using neural networks to model vast quantities of data, generating a model of the universe. That is what an LLM is and it has proven wildly successful. I agree that agents derived from them will not behave like current-day LLMs, but will be more like them than different. Major, classical misalignment risks would stem from something like a reinforcement learning optimizer.

I am aware of the argument of dangerous AI in the hands of ne'er do wells, but such people already exist and in many cases, are able to - with great effort - obtain means of harming vast amounts of people. Gwern Branwen covered this; there are a few terrorist vectors that would require relatively minuscule amounts of effort but that would result in a tremendous expected value of terror output. I think in part, being a madman hampers one's ability to rationally plan the greatest terror attack one's means could allow, and also that the efforts dedicated to suppressing such individuals vastly exceed the efforts of those trying to destroy the world. In practice, I think there would be many friendly AGI systems that would protect the earth from a minority of ones tasked to rogue purposes.

I also agree with your other points, but they are weak points compared to the rock-solid reasoning of misalignment theory. They apply to many other historical situations, and yet, we have ultimately survived; more people do sensible things than foolish things, and we do often get complex projects right the first time around as long as there is a theoretical underpinning to them that is well understood - I think proto-AGI is almost as well understood as it needs to be, and that Anthropic is something like 80% of the way to cracking the code.

I am afraid I did forget in my original post that MIRI would believe that the person who holds AGI is of no consequence. It simply struck me as so obvious I didn't think anyone could disagree with this.

In any case, I plan to write a longer post in collaboration with some friends who will help me edit it to not sound quite like the comment I left yesterday, in opposition of the PauseAI movement, which MIRI is a part of.

MIRI 2024 Communications Strategy

Meme Marine2y41

I am sorry for the tone I had to take, but I don't know how to be any clearer - when people start telling me they're going to "break the overton window" and bypass politics, this is nothing but crazy talk. This strategy will ruin any chances of success you may have had. I also question the efficacy of a Pause AI policy in the first place - and one argument against it is that some countries may defect, which could lead to worse outcomes in the long term.

MIRI 2024 Communications Strategy

Meme Marine2y13-14

Why does MIRI believe that an "AI Pause" would contribute anything of substance to the goal of protecting the human race? It seems to me that an AI pause would:

Drive capabilities research further underground, especially in military contexts
Force safety researchers to operate on weaker models, which could hamper their ability to conduct effective research
Create a hardware overhang which would significantly increase the chance of a sudden catastrophic jump in capability that we are not prepared to handle
Create widespread backlash against the AI Safety community among interest groups that would like to see AI development continued
Be politically contentious, creating further points for tension between nations that could spark real conflict; at worst, you are handing the reins to the future to foreign countries, especially ones that don't care about international agreements - which are the countries you would probably least want to be in control of AGI.

In any case, I think you are going to have an extremely difficult time in your messaging. I think this strategy will not succeed and will most likely, like most other AI safety efforts, actively harm your efforts.

Every movement thinks they just need people to "get it". Including, and especially, lunatics. If you behave like lunatics, people will treat you as such. This is especially true when there is a severe lack of evidence as to your conclusions. Classical AI Alignment theory does not apply to LLM-derived AI systems and I have not seen anything substantial to replace it. I find no compelling evidence to suggest even a 1% chance of x-risk from LLM-based systems. Anthropogenic climate change has mountains of evidence to support it, and yet a significant chunk of the population does not believe in it.

You are not telling people what they want to hear. Concerns around AI revolve around copyright infringement, job displacement, the shift of power between labor and capital, AI impersonation, data privacy, and just plain low-quality AI slop taking up space online and assaulting their eyeballs. The message every single news outlet has been publishing is: "AI is not AGI and it's not going to kill us all, but it might take your job in a few years" - that is, I think, the consensus opinion. Reframing some of your argument in these terms might make them a lot more palatable, at least to the people in the mainstream who already lean anti-AI. As it stands, even though the majority of Americans have a negative opinion on AI, they are very unlikely to support the kind of radical policies you propose, and lawmakers, who have an economic interest in the success of AI product companies, will be even less convinced.

I'm sorry if this takes on an insolent tone but surely you guys understand why everyone else plays the game, right? They're not doing it for fun, they're doing it because that's the best and only way to get anyone to agree with your political ideas. If it takes time, then you had better start right now. If a shortcut existed, everyone would take it. And then it would cease to be a shortcut. You have not found a trick to expedite the process, you have stumbled into a trap for fanatics. People will tune you out among the hundreds of other groups that also believe the world will end and that their radical actions are necessary to save it. Doomsday cults are a dime a dozen. Behaving like them will produce the same results as them: ridicule.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments