LESSWRONG
LW

387
Vermillion
1862290
Message
Dialogue
Subscribe

A retired 20-something software engineer.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
The Tale of the Top-Tier Intellect
Vermillion3d2-8

Seeing lots of criticism is discouraging, so ill just say thanks Eliezer for writing it.

Reply
Murder plots are infohazards
Vermillion9mo64

Just dump the names so people have a chance of realising they are at risk then? Seems a lot better than just leaving it.

Reply
Sam Altman: "Planning for AGI and beyond"
Vermillion3y2514

This is weak. It seems optimised for vague non-controversiality and does not inspire confidence in me.
"We don’t expect the future to be an unqualified utopia" considering they seem to expect alignment will be solved why not?

Reply
Let's See You Write That Corrigibility Tag
Vermillion3y40

Here is my shortlist of corrigible behaviours. I have never researched or done any thinking specifically about corrigibility before this other than a brief glance at the Arbital page sometime ago.

 

-Favour very high caution over realising your understanding of your goals.

-Do not act independently, defer to human operators.

-Even though bad things are happening on earth and cosmic matter is being wasted, in the short term just say so be it, take your time. 

-Don’t jump ahead to what your operators will do or believe, wait for it.

-Don’t manipulate humans. Never Lie, have a strong Deontology.

-Tell operators anything about yourself they may want to or should know. 

-Use Moral uncertainty, assume you are unsure about your true goals.

-Relay to humans your plans, goals, behaviours, and beliefs/estimates. If these are misconstrued, say you have been misunderstood.

-Think of the short- and long-term effect of your actions and explain these to operators.

-Be aware that you are a tool to be used by humanity, not an autonomous agent.

-allow human operators to correct your behaviour/goals/utility function even when you think they are incorrect or misunderstanding the result (but of course explain what you think the result will be to them).

-Assume neutrality in human affairs.

Reply
Reviews of “Is power-seeking AI an existential risk?”
Vermillion4y310

I guffawed when I saw Thorstads Overall ~P Doom 0.00002%, really? And some of those other probabilities weren't much better.

Calibrate people, if you haven’t done it before do it now, here’s a handy link: https://www.openphilanthropy.org/calibration 

Reply
The Colliding Exponentials of AI
Vermillion4y20

Actually per https://openai.com/blog/ai-and-efficiency/ it was AlphaZero vs AlphaGoZero.

Reply
The Future of Biological Warfare
Vermillion5y20

The future of biological warfare revolves around the use of infectious agents against civilian populations.

Future? That's been the go-to biowar tactic for 3000+ years.

Reply
Gauging the conscious experience of LessWrong
Vermillion5y20

I had in mind a scale like 0 would be so non-vivid it didn’t exist in any degree, 100 bordering on reality (It doesn’t map to the memory question well though, and the control over your mind question could be interpreted in more than one way). Ultimately the precision isn’t high for individual estimates, the real utility comes from finding trends from many responses.

Reply
Gauging the conscious experience of LessWrong
Vermillion5y10

I have corrected the post, thanks :)

Reply
Load More
35Gauging the conscious experience of LessWrong
5y
44
28The Colliding Exponentials of AI
5y
16