LESSWRONG
LW

1801
Lorxus
53581710
Message
Dialogue
Subscribe

Mathematician, agent foundations researcher, doctor. A strange primordial spirit left over from the early dreamtime, the conditions for the creation of which no longer exist; a creature who was once told to eat math and grow vast and who took that to heart; an escaped feral academic.

Reach out to me on Discord and tell me you found my profile on LW if you've got something interesting to say; you have my explicit permission to try to guess my Discord handle if so. You can't find my old abandoned-for-being-mildly-infohazardously-named LW account but it's from 2011 and has 280 karma.

A Lorxus Favor is worth (approximately) one labor-day's worth of above-replacement-value specialty labor, given and received in good faith, and used for a goal approximately orthogonal to one's desires, and I like LessWrong because people here will understand me if I say as much.

Apart from that, and the fact that I am under no NDAs, including NDAs whose existence I would have to keep secret or lie about, you'll have to find the rest out yourself.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
3Lorxus's Shortform
1y
19
Seven-ish Words from My Thought-Language
Lorxus3d41

Spoons are for doing ordinary things, doing things at all. [Blood] is for doing difficult things, doing them more or harder, doing them at significant internal cost.

Reply
Selection Has A Quality Ceiling
Lorxus4d50

Several years on from this post and @Raemon 's comment, where do we stand? From my own starkly limited ground-level perspective, it looks like pretty much everyone who's tried to figure out the secrets of training has smashed into the same flavor of mixed-success, shattered, and given up on it as a bad job. This seems sad and desperately misguided, but what do I know? I haven't exactly tried of my own accord, and I showed up too late to the party for the height of CFAR and the like. I know a few people are still trying to poke at questions like this one from various angles, but everyone I've talked to about the topic (5 or so of them?) seem extremely burned out on the whole enterprise, so maybe it's all for the best that I never went.

What would it look like, to actually try again in these latter days? If the model in this post is correct - and I think that it is, that training is vastly more valuable than selection - then there's gains to be had. It'd start - I think - by figuring out what went wrong previously and noticing and carefully attending to all the skulls without drowning in despair about them. The next step might be some careful trials - maybe randomized ones, maybe they need to be thoughtfully tailored to the student - and tests after the fact at a few time intervals. Something about the zone of proximal development is tickling the back of my mind here; so is something about crystallization and application of the skill, especially in a way that psychologically accords that you Have The Skill, which probably calls for some ordeal? Hard to say. I only have scattered thoughts about this. If anyone wants to try to elicit more of them from me I welcome the attempt.

Reply
Lorxus's Shortform
Lorxus14d10

I didn't realize it when I posted this, but the anvil problem points more sharply at what I want to argue about when I say that making the NAS blind to its own existence will make it give wrong answers; I don't think that the wrong answers would be limited to just such narrow questions, either.

Reply
Directly Try Solving Alignment for 5 weeks
Lorxus14d20

How did this turn out?

Reply
Intellectual lifehacks repo
Lorxus15d00

Flip a coin if you are struggling to decide between option in a situation where there are relatively low stakes. This exposes to you your gut instinct immediately, which is more than good enough most times, and it is far faster than logically finding an answer.

Better yet, if you subscribe to many-worlds and you do actually care about trying both options, use a quantum coin. Don't take one option - take both of them.

Reply
Jan_Kulveit's Shortform
Lorxus15d10

...benign scenarios in which AIs get legal rights and get hired to run our society fair and square. A peaceful AI takeover would be good, IMO.

...humans willingly transfer power to AIs through legal and economic processes. I think this second type will likely be morally good, or at least morally neutral.

Why do you believe this? For my part, one of the major ruinous scenarios on my mind is one where humans delegate control to AIs that then goal-misgeneralize, breaking complex systems in the process; another is one where AIs outcompete ~all human economic efforts "fair and square" and end up owning everything, including (e.g.) rights to all water, partially because no one felt strongly enough about ensuring an adequate minimum baseline existence for humans. What makes those possibilities so unlikely to you?

Reply
We Choose To Align AI
Lorxus18d61

The more EAs I meet, the more I realize that wanting the challenge is a load-bearing pillar of sanity when working on alignment.

When people first seriously think about alignment, a majority freak out. Existential threats are terrifying. And when people first seriously look at their own capabilities, or the capabilities of the world, to deal with the problem, a majority despair. This is not one of those things where someone says “terrible things will happen, but we have a solution ready to go, all we need is your help!”. Terrible things will happen, we don’t have a solution ready to go, and even figuring out how to help is a nontrivial problem. When people really come to grips with that, tears are a common response.

… but for someone who wants the challenge, the emotional response is different. The problem is terrifying? Our current capabilities seem woefully inadequate? Good; this problem is worthy. The part of me which looks at a rickety ladder 30 feet down into a dark tunnel and says “let’s go!” wants this. The part of me which looks at a cliff face with no clear path up and cracks its knuckles wants this. The part of me which looks at a problem with no clear solution and smiles wants this. The response isn’t tears, it’s “let’s fucking do this”.

"Problems worthy of attack prove their worth by fighting back."

Which is to say - despite a lot of other tragedies about me, there is a core part of me, dinged-up and bruised but still fighting, that looks at a beautiful core mystery and says - "No, unacceptable - we must know. We will know. I am hungry, and will chase this truth down, and it will not evade my jaws for long." (Sometimes it even gets what it wants.)

Reply
plex's Shortform
Lorxus20d10

I'm confused - I don't see any? I certainly have some details of arguable value though.

Reply
Hospitalization: A Review
Lorxus20d153

Holy heck! I'm glad you're alright. I would never have thought to make a LW post out of an experience like that. Winning personality, indeed.

Reply
ISO: Name of Problem
Lorxus20d30

I'd propose "(the problem of) abstraction layer underspecification" or maybe just "engineering dyslocation" (in the sense of engineering being about layered abstractions and it'd be a category error to try to do pure materials science to make a spaceship).

Reply1
Load More
7A Sketch of Helpfulness Theory With Equivocal Principals
2d
1
58Seven-ish Words from My Thought-Language
4d
13
15How I Wrought a Lesser Scribing Artifact (You Can, Too!)
1y
0
3Lorxus's Shortform
1y
19
16(Geometrically) Maximal Lottery-Lotteries Are Probably Not Unique
1y
1
13(Geometrically) Maximal Lottery-Lotteries Exist
1y
11
11My submission to the ALTER Prize
2y
0
29Untangling Infrabayesianism: A redistillation [PDF link; ~12k words + lots of math]
2y
22