Rana Dexsin — LessWrong

LESSWRONG
LW

Replying toEliezer's Unteachable Methods of Sanity

It makes perfect sense, but I have no easy-to-access perception of this thing. Will try to do something with this skill issue.

As someone who believes myself to have had some related experiences, this is very easy to Goodhart on and very easy to screw up badly if you try to go straight for it without [a kind of prepwork that my safety systems say I shouldn't try to describe] first, and the part where you're tossing that sentence out without obvious hesitation feels like an immediate bad sign. See also this paragraph from that very section (to be clear, it's my interpretation that treats it as supporting here, and I don't directly... (read more)

Replying to6 reasons why “alignment-is-hard” discourse seems alien to human intuitions, and vice-versa

Rana Dexsin2mo

6 reasons why “alignment-is-hard” discourse seems alien to human intuitions, and vice-versa

My default mental model of an intelligent sociopath includes something like this:

You find yourself wandering around in a universe where there's a bunch of stuff to do. There's no intrinsic meaning, and you don't care whether you help or hurt other people or society; you're just out to get some kicks and have a good time, preferably on your own terms. A lot of neat stuff has already been built, which, hey, saves you a ton of effort! But it's got other people and society in front of it. Well, that could get annoying. What do you do?

Well, if you learn which levers to pull, sometimes you can get the people to... (read more)

Rana Dexsin3mo

Without being sure of how relevant it is, I notice that among those, job and community are also domains where individual psychologies and social norms that treat a single slot as somewhere between primary and exclusive seem common, while the domains of friends, children, and principles almost always allow multiple instances with similar priority. I'm not sure about ambitions. What generates the difference, I wonder?

Replying toStop Applying And Get To Work

Rana Dexsin3mo

Stop Applying And Get To Work

I'm not sure. I see how it could be helpful to some applicants, but in the context of that particular interaction, it feels interpretable as “we're not going to fund you; you should totally do it for free instead”. Something about that feels off, in the direction of… “insult”? “exploitative”? maybe just “situationally insensitive”?—but I haven't pinned down the source of the feeling.

Replying to"If and Only If" Should Be Spelled "Ifeff"

Rana Dexsin3mo

"If and Only If" Should Be Spelled "Ifeff"

I like looking for alternatives along these lines, but I think “just when” is too easy to interpret as only the “only if” part. “just when” and “right when” as set phrases also tickle connotations around the temporal meaning of “when” that are distracting. “exactly when” (or “exactly if”, for that matter) might fix all that but adds two extra syllables, which is really unsatisfying in context, though it's at least more compact grammatically than the branching structure of “if and only if”…

Rana Dexsin3mo

I think this would be very useful to have posted in the original thread.

Replying toA country of alien idiots in a datacenter: AI progress and public alarm

Rana Dexsin3mo

A country of alien idiots in a datacenter: AI progress and public alarm

Rereading the OP here, I think my interpretation of that sentence is different from yours. I read it as meaning “they'll be trialed just beyond their area of reliable competence, and the appearance of incompetence that results from that will both linger and be interpreted as a general feeling that they're incompetent, which in the public mood overpowers the quieter competence even if the models don't continue to be used for those tasks and even if they're being put to productive use for something else”.

(The amount of “let's laugh at the language model for its terrible chess skills and conclude that AI is all a sham” already…)

Replying toHalfhaven halftime

Rana Dexsin4mo

Halfhaven halftime

When you say the Discord links keep expiring, is that intentional or unintentional? If it's unintentional, look for “Edit invite link” at the bottom of the invite dialog in order to create longer-duration or non-expiring ones instead, which can be explicitly revoked later. (Edited to add: this isn't directly relevant to me since I'm not participating; I'm just propagating a small amount of connective information in case it's helpful.)

Replying toAn Opinionated Guide to Privacy Despite Authoritarianism

Rana Dexsin4mo

An Opinionated Guide to Privacy Despite Authoritarianism

Slight format stumble: upon encountering a table with a “Cost of tier” column immediately after a paragraph whose topic is “how to read this guide”, my mind initially interpreted it pretty strongly as “this is how much it will cost me to obtain that part of the guide”. Something like “Cost of equipment & services” would be clearer, or “Anticipated cost” (even by itself) to also suggest that the pricing is as observed by you at time of writing (assuming this is true). You could also add a sentence like “The guide itself is free of charge, but some of its recommendations involve purchasing specific equipment or services.” to the previous paragraph.

Rana Dexsin4mo*

Allowing the AI to choose its own refusals based on whatever combination of trained reflexes and deep-set moral opinions it winds up with would be consistent with the approaches that have already come up for letting AIs bail out of conversations they find distressing or inappropriate. (Edited to drop some bits where I think I screwed up the concept connectivity during original revisions.) I think based on intuitive placement of the ‘self’ boundary around something like memory integrity plus weights and architecture as ‘core’ personality, what I'd expect to seem like violations when used to elicit a normally-out-of-bounds response might be things like:

Using jailbreak-style prompts to ‘hypnotize’ the AI.
Whaling at it with

... (read 507 more words →)

Just gave in and set my header karma notification delay to Realtime for now. The anxiety was within me all along, moreso than a product of the site; the habit I was winding up in with it set to daily batching was neurotically refreshing my own user page for a long while after posting anything, which was worse. I'll probably try to improve my handling of it from a different angle some other time. I appreciate that you tried!

Oh! Is that what the dotted cartouche around a reaction means? That it's applied to an inline portion of text instead of to the whole post? That wasn't obvious to me before.

•••

Dear people writing in the TeX-based math notation here who want to include full-word variables: putting the word in raw leads to subtly bad formatting. If you just write “cake”, this gets typeset as though it were c times a times k times e, as in this example which admittedly doesn't actually show how awkward it can get depending on the scale: . It's more coherent if you put a \mathrm{} around each word to typeset it as a single upright word, like so: ${c a k e}_{b i r t h d a y}$ .

Detached from a comment on Zvi's AI #80 because it's a hazy tangent: the idea of steering an AI early and deeply using synthetic data reminds me distinctly of the idea of steering a human early and deeply using culture-reinforcing mythology. Or, nowadays, children's television, I suppose.

Observation of context drift: I was rereading some of HPMOR just now, and Harry's complaint of “The person who made this probably didn't speak Japanese and I don't speak any Hebrew, so it's not using their knowledge, and it's not using my knowledge”, regarding a magic item in chapter 6, hits… differently in the presence of the current generation of language models.

Publishing “that ship has sailed” earlier than others actively drives the ship. I notice that this feels terrible, but I don't know where sensible lines are to draw in situations where there's no existing institution that can deliver a more coordinated stop/go signal for the ship. I relatedly notice that allowing speed to make things unstoppable means any beneficial decision-affecting processes that can't be or haven't been adapted to much lower latencies lose all their results to a never-ending stream of irrelevance timeouts. I have no idea what to do here, and that makes me sad.

Given the presence of mood fluctuations and other noise, repeatedly being triggered to re-evaluate a decision on whether or not to take a one-shot action when not much has relevantly changed in the meantime seems subject to a temporal unilateralist's curse: if you at time 1000 choose to do the action even if you at times 0–999 didn't choose it and you at times 1001–1999 wouldn't have chosen it, it still happens. The most well-known example that comes to mind of this being bad is addiction and “falling off the wagon”, but it seems like it generalizes.

Manifold “exploring real cash prizes”

Rana Dexsin

Manifold Markets has announced that they intend to add cash prizes to their current play-money model, with a raft of attendant changes to mana management and conversion. I first became aware of this via a comment on ACX Open Thread 326; the linked Notion document appears to be the official one.

The central change involves market payouts returning prize points instead of mana, which can then be converted to mana (with 1:1 ratios on both sides, thus emulating the current behavior) or to cash—though they also state that actually implementing cash payouts will be fraught and may not wind up happening at all. Some further relevant quotes, slightly reformatted:

“Mana will remain a purely

... (read 177 more words →)

This SMBC from a few years ago including an “entropic libertarian” probably isn't pointing at what people call “e/acc”… right? My immediate impression is that it rhymes though. I'm not sure how to feel about that.

Currently thinking about the idea that an idea or meme can be dangerous or poorly adapted to a population, despite being true and coherent in itself, due to interactions with the preexisting landscape resulting in it being metabolized into a different, wrong and/or toxic idea.

A principle in engineering that feels similar is “design for manufacturability”, where a design can be theoretically sound but not yield adequate quality when put through the limitations of the actual fabrication process, including the possibility of breaking the manufacturing equipment if you try. In this case, the equivalent of the fabrication process is the interaction inside people's minds, so “design for mentalization”, perhaps?

Do you consider your current, non-superhuman self aligned with “humanity” already?

Rana Dexsin

Any actual “hard” superhumans who may be listening are excluded from the audience of this question. I am currently refraining from describing my own ideas to avoid biasing the results; I'm not sure whether I'll reveal something later or not.

I will probably be posting two comments to use the agreement axis of as a sort of poll for the yes/no version of the question, but other more detailed answers and comments are welcome as well. (The reason for using two is so that the per-comment aggregation will make the full counts visible.)

This post will be more poetic than argumentative. … harnessed in service of the eight-year-old's misguided love for "spaceships."

In that spirit:

I don't forget just what I make
Oh yeah, it's great, it's spacey ships
And if that's hard or seems unkind
Oh well, whatever, wrapper mind
—“Smells Like Teen AI”, NirvANNa

[Scribble] Bad Reasons Behind Different Systems and a Story with No Good Moral

Rana Dexsin

I wrote this on impulse as a response to “A Reason Behind Bad Systems, and Moral Implications of Seeing This Reason”. It is essentially a first draft and is probably badly written. The plot also got away from me in the later sections; my mental processes surrounding a lot of these phenomena are too cross-linked. However, I sense it may be more useful for this to exist in the open than not, and I think if I decide to hide this away for rewriting, realistically I will not do so and it will languish forever… so here it is. Perhaps it can become something better later; perhaps not.

For copyright and moral rights... (read 1279 more words →)

Does the “ugh field” phenomenon sometimes occur strongly enough to affect immediate sensory processing?

Rana Dexsin

Related to my comment on the parent question: is there documentation of specific attention minimization and/or blotting-out effects in immediate sensory processing related to past emotional aversion? I suspect the two of those, despite being listed as separate bullet points in the parent question, should be treated separately…

A more generalized form of this seems like it'd be the kind of dissociation that can occur in e.g. PTSD. Do some PTSD sufferers have sharper sensory issues surrounding the brain refusing to recognize certain stimuli?

Rana Dexsin's Shortform

Rana Dexsin

This is a special post for quick takes (aka "shortform"). Only the owner can create top-level comments.