LESSWRONG
LW

486
Canaletto
18761170
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
1Canaletto's Shortform
1y
33
On Fleshling Safety: A Debate by Klurl and Trapaucius.
Canaletto2h10

The Correct Alien I think should have made a bit more funny errors. 

Like, it names "love" and "respect" and "imitation" as alternatives to corrigibility, but all of them are kinda right? Should have thrown in some funny wrong guesses, like "cosplay" or "compulsive role play of behaviors your progenitors did".

Or for example, considering that the alien already thought about how humans are short lived, "error correcting/defending/preserving the previous progenitors' advice". That way of relating to your progenitors  should have made it impossible for Inebriated Alien to overwrite human motivations, because they are self preserving wrong ones by now.

Come to think of it, those are too kind of right. I'm bad at making plausible errors.

Reply
leogao's Shortform
Canaletto2d10

>slowing down for 1,000 years here in order to increase the chance of success by like 1 percentage point is totally worth it in expectation.

Is it? What meaning of worth it is used here? If you put it on a vote, as an option, I expect it would lose. People don't care that much about happiness of distant future people.

Reply
Crisis of Faith
Canaletto12d10

And above all, the rule:

>Put forth the same level of desperate effort that it would take for a theist to reject their religion.

Because if you aren’t trying that hard, then—for all you know—your head could be stuffed full of nonsense as bad as religion.

 

I don't think it was particularly hard for me to part ways with religion? 15 year old me just accumulated to much sense that it's a total bullshit. It was important enough to be promoted to my direct attention, but wrong enough for me to recognize it as such.

Hmmm. Maybe I was just not that invested in the boons that religious worldview gives you. That there is somebody who is looking out for you, that everything goes according to good plan after all. I was not emotionally attached to this for some reason.

Am I just emotionally invested in different kinds of stuff or am I just good at discarding wrong beliefs? Or maybe there is something wrong with the "emotional attachment" part of me.

Reply
Lies Told To Children
Canaletto13d10

Hmm. Yeah, it sure looks rigged as hell to be resolved by self consistency/reflection to the side of "care about everyone", but surely there is some percentage of kids who come out of this with reflectively stable redhead hatred? Or, I don't know, "nobody deserves care, not just reds, but you should pretend to care in polite society"?

Reply
Drawing Less Wrong: Technical Skill
Canaletto17d-10

I'm not sure what's the point of learning to draw like that. Could as well close one eye and imagine that you trace a photograph. 

Draw whatever, I'd rather see people reinvent techniques than learn them. 

Reply
Plans A, B, C, and D for misalignment risk
Canaletto17d10

How about more uhh soft uncontrollability? Like, not "it subverted our whole compute and feeds us lies" but more "we train it to do A, which it sees as only telling it to do A, and does A, but its motivations are completely untouched".

Reply
Canaletto's Shortform
Canaletto18d11

Morality as a Coordination Subsidy and Morality as a Public Good.

Night-watchman state, distributed and embedded into heads VS Doing something a lot of people want to be done, regardless if it's cleaner streets or children having homes.

First thing did a lot of flaking, transferring into the second one, it seems like. Or maybe it didn't, maybe it was a process that shaped desires compatible with #1 out of assorted #2 type things.

Reply
Towards a Typology of Strange LLM Chains-of-Thought
Canaletto19d30

Anthropic, GDM, and xAI say nothing about whether they train against Chain-of-Thought (CoT) while OpenAI claims they don't

https://www.lesswrong.com/posts/FG54euEAesRkSZuJN/ryan_greenblatt-s-shortform?commentId=z7sxf8vGEu7E2Y5uW 

Reply
Accelerando as a "Slow, Reasonably Nice Takeoff" Story
Canaletto24d10

It sounds more like there is some kind of moderator, who throttles smart things in intelligent, targeted way. Which is my headcanon.

Reply
A non-review of "If Anyone Builds It, Everyone Dies"
Canaletto1mo30

I overall agree with this framing, but I think even in Before sufficiently bad mistakes can kill you, and in After sufficiently small mistakes wouldn't. So, it's mostly a claim about how strongly the mistakes would start to be amplified at some point. 

Reply
Load More
3Self propagating story.
7mo
0
10Favorite colors of some LLMs.
10mo
3
11Self location for LLMs by LLMs: Self-Assessment Checklist.
1y
0
-4Examine self modification as an intuition provider for the concept of consciousness
1y
2
1Canaletto's Shortform
1y
33
15LLMs could be as conscious as human emulations, potentially
2y
15