LESSWRONG
LW

Luk27182 — LessWrong

Replying toWould you pay for a search engine limited to rationalist sites?

Luk27182Aug 03, 2023

Would you pay for a search engine limited to rationalist sites?

When I am looking for rationalist content and can't find it, using Metaphor (free) usually finds what I want (sometimes even without a rationalist-specific prompt. Could be the data it was trained on? In any case, it does what I want.)

Don't there already exist extensions for google that you can use to whitelist certain websites (parental locks and such)? I'd think you could just copy paste a list of rationalist blogs into something like that? This seems like what you are proposing to create, unless I misunderstand.

Replying toMech Interp Puzzle 1: Suspiciously Similar Embeddings in GPT-Neo

Luk271823y

Mech Interp Puzzle 1: Suspiciously Similar Embeddings in GPT-Neo

Answer: (decode with rot13)

Should this link instead go to rot13.com?

Good puzzle, thanks!

Replying toWhat money-pumps exist, if any, for deontologists?

Luk271823y

What money-pumps exist, if any, for deontologists?

Weird things CAN happen if others can cause you to kill people with your bare hands (See Lexi-Pessimist Pump here). But assuming you can choose to never be in a world where you kill someone with your bare hands, I also don't think there are problems? The world states may as well just not exist.

(Also, not money pump, but consider: Say I have 10^100 perfectly realistic mannequin robots and one real human captive. I give the constrained utilitarian the choice between choking one of the bodies with their bare hands or let me wipe out humanity. Does the agent really choose to not risk killing someone themself?)

Replying toAlignment works both ways

Luk271823y

Alignment works both ways

I didn't want this change, it just happened.

I might be misunderstanding- isn't this what the question was? Whether we should want (/be willing to) change our values?

Sometimes I felt like a fool afterward, having believed in stupid things

The problem with this is: If I change your value system in any direction, the hypnotized "you" will always believe that the intervention was positive. If I hypnotized you to believe that being carnivorous was more moral by changing your underlying value system to value animal suffering, then that version of you would view the current version of yourself as foolish and immoral.

There are essentially two different beings: carnivorous-Karl, and vegan-Karl. But only one of... (read more)

Replying toAlignment works both ways

Luk271823y

Alignment works both ways

My language was admittedly overly dramatic, but I don't think it make rational sense to want to change your values for the sake of just having the new value. If I wanted to value something, then by definition I would already value that thing. That said, I might not take actions based on that value if:

There was social/economic pressure not to do so
I already had the habit of acting a different way
I didn't realize I was acting against my value
etc.

I think that actions like becoming vegan are more like overcoming the above points than fundamentally changing your values. Presumably, you already valued things like "the absence of death and suffering" before becoming... (read more)

Replying toAlignment works both ways

Luk271823y

Alignment works both ways

If I were convinced to value things, I would no longer be myself. Changing values is suicide.

You might somehow convince me through hypnosis that eating babies is actually kind of fun, and after that, that-which-inhabits-my-body would enjoy eating babies. However, that being would no longer be me. I'm not sure what a necessary and sufficient condition is for recognizing another version of myself, but sharing values is at least part of the necessary condition.

Replying toA Telepathic Exam about AI and Consequentialism

Luk271823y

A Telepathic Exam about AI and Consequentialism

I'd think the goal for 1,2,3 is to find/fix the failure modes? And for 4 to find a definition of "optimizer" that fits evolution/humans, but not paperclips? Less sure about 5,6, but there is something similar to the others about "finding the flaw in reasoning"

Here's my take on the prompts:

The first AI has no incentive to change itself to be more like the second- it can just decide to start working on the wormhole if it wants to make the wormhole. Even more egregious, the first AI should definitely not change its utility function to be more like the second! That would essentially be suicide, the first AI ceases to be itself.

Luk271823y

Escape Velocity from Bullshit Jobs

This is besides the point of your own comment, but “how big are bullshit jobs as % of GDP” is exactly 0 by definition!

-1

A Bite Sized Introduction to ELK

Luk27182

Epistemic Status: This post is mainly me learning in public as my final project for the ML Safety Scholars Program. Errors are very possible, corrections very welcome.

Purpose of this Post

From my limited perspective as a curious undergrad student, ELK research seems like a perfect way for the relatively inexperienced to try out AI safety research:

There is essentially no necessary background knowledge beyond a basic understanding of ML
There are no costs for conducting research (e.g. renting GPUs).
Several cash prizes have been offered in the recent past to bring people into the field, so it seems likely that there will be more in the future.

It seems possible that one of the biggest barriers to... (read 1545 more words →)

Replying toFraming Practicum: Stable Equilibrium

Luk27182Apr 15, 2022

Framing Practicum: Stable Equilibrium

Most metrics of productivity/success are at a stable equilibrium in my life. For example:

The work I get done in a day (month?) is fairly constant. If I work hard throughout the day, I eventually feel satisfied and relax for a while. If I relax for too long, I start feeling sluggish and want to get back to working. Sometimes this happens more so on the scale of an entire month (an incredibly productive week followed by a very sluggish week).
The amount of socialization I partake in each week is also constant. When I socialize too much my battery is drained and I draw back into myself. If I spend too long cooped

... (read more)