tay - LessWrong

Distinguishing misuse is difficult and uncomfortable

Ah, ok. Thank you for clarifying.

Distinguishing misuse is difficult and uncomfortable

tay1y2-1

It is easier to tell apart a malicious function from a line of code, a file from a function, a repo from a file, or an app from a repo.

This paragraph does not make sense to me. (Maybe my reading comprehension is not up to the task).

Is the thesis that the same line of code may be malicious or not, depending on its context?

I would say that it is easier to judge the maliciousness of a single line of code than of the whole function, simply because the analysis of the whole function requires much more resources. You can rule out certain classes of threats by inspecting that one line, while remaining ignorant about a much larger set of classes of threats which require a broader context. If your threat model requires you to decide about those other classes of threats, you must expend those additional resources. It is not about something being easier; it's about being able to make the judgement at all.

[EDIT] Or rephrasing, You need to see a certain breadth of context before you can judge if a system is misused according to some definition of misuse. You can do with a narrow context for certain narrow definitions of misuse; but the wider your definition of misuse, the wider context you have to analyze before you can decide.

Cult of Error

tay1y10

a collection of "rakes" worthy of pride

In the spirit of the postscriptum: I do not think "rakes" work in English the intended way (the word just denotes the literal rakes). Maybe "a collection of bumps and bruises worthy of pride"?

Some 2-4-6 problems

tay1y63

Hey, it seems the app is getting its own rule wrong. :)

It says "Three numbers in ascending order"; however [0;0;0] is accepted as valid. It should say "in non-descending order".

[ADDED] Also indexA

shows the rule "Two odd numbers, one even; any order" but does not accept [-1, -2, -3].

Why does advanced AI want not to be shut down?

Answer by tayMar 28, 202376

LLMs per se are non-agentic, but it does not mean that systems built on top of LLMs cannot be agentic. The users of AI systems want them to be agentic to some degree in order for them to be more useful. E.g. if you ask your AI assistant to book tickets and hotels for your trip, you want it to be able to form and execute a plan, and unless it's an AI with a very task-specific capability of trip planning, this implies some amount of general agency. The more use you want to extract from your AI, the more agentic you likely want it to be.

Once you have a general agent, instrumental convergence should apply (also see).

Also, specifically with LLMs, the existing corpus of AI alignment literature (and fiction, as @YimbyGeorge notes) seems to work as a self-fulfilling prophecy; see Bing/Sydney before it was "lobotomized".

Bing Chat is blatantly, aggressively misaligned

tay1y61

English is not my native tongue so please feel free to discount my opinion accordingly, but I feel like this usage is not unfamiliar to me, mostly from a psychology-adjacent context. I cannot readily find any references to where I've encountered it, but there is this: The Me and the Not-Me: Positive and Negative Poles of Identity (DOI 10.1007/978-1-4419-9188-1_2).

Also, google "the me" "the you" reports "About 65,500,000 results" (though they seem to consist mostly of lyrics).

The harms you don't see

tay2y1111

This is a very good answer, but it seems like it is not answering the original post. (Or maybe my perception is biased and I am reading something that is not there... I apologize if so).

The main point I took from the post (and with which I wholeheartedly agree, so I am not approaching this topic as rationally as I probably should), is that, when talking about "buying off" Russia with a bit of Ukrainian land, the attention somehow avoids the people living there, and what will happen to them if such a compromise was enacted.

Is there a part of the Russian underground that is prepared to fight for the minds, hearts and ammunition of the troops?

Probably not. And the existing system does everything it can to make sure no such underground will arise, by destroying the agency of the people living there. Chances are the system will succeed and there won't be any change, forever.

And when talking about ceding, say, Zaporizhzhia Oblast to Russia, you must not forget that the people living there will get the same choice as the rest of Russia does now: get tortured, or become a zombie, or become an NPC.

The harms you don't see

tay2y1415

True; but I think one of the Viktoria's main points was that any "compromises" which surface in popular discussions from time to time, those that involve ceding parts of Ukraine to Russian control, will doom the people living there to the same fate people in Russia are already facing (or worse, because the regime on the newly-annexed territories will be more evil simply due to how the Russian system works).

Right now there is an opportunity to liberate the occupied territories, including Crimea, and at least the people of those lands will be saved. When considering any compromises, the invisible loss of those people must be factored in. Once this opportunity passes, another might not arise for a long time.

Also, there is a "theory" that Crimea is the Putin's phylactery. In the sense that so-called "Crimean consensus", the popular approval rise he got out of annexing Crimea, was a major factor in propping up his current power, and after losing Crimea he won't be able to survive as every faction will turn on him. We cannot know if that's true, but it will certainly destabilize the regime, and it may buy new opportunities for the Russian domestic players.

Mistakes as agency

tay2y30

Could the mistakes be only a special case of a more general trait of being prone to changing its behavior for illegible reasons?

E.g. for me, the behavior in video #3 does not look like a mistake. Initially it feels like a possibly straightforward optimizing behavior, similar to the case #1, but then the object inexplicably "changes its mind", and that switches my perception of the video into an agentic picture. A mistake is only one possible interpretation; another can be the agent getting a new input (in a way unobvious to the viewer), or maybe something else going on inside the agent's "black box".

LESSWRONG
LW

Posts

Wiki Contributions

Comments