Sorted by New

Wiki Contributions


To repurpose a quote from The Cincinnati Enquirer: The saying "AI X-risk is just one damn cruelty after another," is a gross overstatement. The damn cruelties overlap.

When I saw the title, I thought, "Oh no. Of course there would be a tradeoff between those two things, if for no other reason than precisely because I hadn't even thought about it and I would have hoped there wasn't one." Then as soon as I saw the question in the first header, the rest became obvious.

Thank you so much for writing this post. I'm glad I found it, even if months later. This tradeoff has a lot of implications for policy and outreach/messaging, as well as how I sort and internalize news in those domains.

Without having thought about it enough for an example: It sounds correct to me that in some contexts, appreciating both kinds of risk drives response in the same direction (toward more safety overall). But I have to agree now that in at least some important contexts, they drive in opposite directions.

I don't have any ontological qualms with the idea of gene editing / opt-in eugenics, but I have a lot of doubt about our ability to use that technology effectively and wisely.

I am moderately in favor of gene treatments that could prevent potential offspring / zygotes / fetuses / people in general from being susceptible to specific diseases or debilitating conditions. If we gain a robust understanding of the long-term affects and there are no red flags, I expect to update to strongly in favor (though it could take a lifetime to get the necessary data if we aren't able to have extremely high confidence in the theory).

In contrast, I think non-medical eugenics is likely to be a net negative, for many of the same reasons already outlined by others.

I am a smaller doner (<$10k/yr) who has given to the LTFF in the past. As a data point, I would be very interested in giving to a dedicated AI Safety fund.

The thing that made AI risk "real" for me was a report of an event that turned out not to have happened (seemingly just a miscommunication). My brain was already very concerned, but my gut had not caught up until then. That said, I do not think this should be taken as a norm, for three reasons:

  1. Creating hoaxes in support of a cause is a good way to turn a lot of people against a cause
  2. In general, if you feel a need to fake evidence for your position, that is itself is weak evidence against your position
  3. I don't like dishonesty

If AI capabilities continue to progress and if AI x-risk is a real problem (which I think it is, credence ~95%), then I hope we get a warning shot. But I think a false flag "warning shot" has negative utility.

Hello! I'm not really sure which facts about me are useful in this introduction, but I'll give it a go:
I am a Software QA Specialist / SDET, I used to write songs as a hobby, and my partner thinks I look good in cyan.

I have found myself drawn to LessWrong for at least three reasons:

  1. I am very concerned about existential and extinction risk from advanced AI
  2. I enjoy reading about interesting topics and broadening and filling out my world model
  3. I would very much like to be a more rational person

Lots of words about thing 1: In the past few months, I have deliberately changed how I spend my productive free time, which I now mostly occupy by trying to understand and communicate about AI x-risk, as well as helping with related projects.
I have only a rudimentary / layman's understanding of Machine Learning, and I have failed pretty decisively in the past when attempting mathematical research, so I don't see myself ever being in an alignment research role. I'm focused on helping in small ways with things like outreach, helping build part of the alignment ecosystem, and directing a percentage of my income to related causes.
(If I start writing music again, it will probably either be because I think alignment succeeded or because I think that we are already doomed. Either way, I hope I make time for dancing. ...Yeah. There should be more dancing.)

Some words about thing 2: I am just so glad to have found a space on the internet that holds its users to a high standard of discourse. Reading LessWrong posts and comments tends to feel like I have been prepared a wholesome meal by a professional chef. It's a welcome break from the home-cooking of my friends, my family, and myself, and especially from the fast-food (or miscellaneous hard drugs) of many other platforms.

Frankly just a whole sack of words about thing 3: For my whole life until a few short years ago, I was a conservative evangelical Christian, a creationist, a wholesale climate science denier, and generally a moderately conspiratorial thinker. I was sincere in my beliefs and held truth as the highest virtue. I really wanted to get everything right (including understanding and leaving space for the fact that I couldn't get everything right). I really thought that I was a rational person and that I was generally correct about the nature of reality.
Some of my beliefs were updated in college, but my religious convictions didn't begin to unravel until a couple years after I graduated. It wasn't pretty. The gradual process of discovering how wrong I was about an increasingly long list of things that were important to me was roughly as pleasant as I imagine a slow death to be. Eventually coming out to my friends and family as an atheist wasn't a good time, either. (In any case, here I still am, now a strangely fortunate person, all things considered.)
The point is, I have often been caught applying my same old irrational thought patterns to other things, so I have been working to reduce the frequency of those mistakes. If AI risk didn't loom large in my mind, I would still greatly appreciate this site and its contributors for the service they are doing for my reasoning. I'm undoubtedly still wrong about many important things, and I'm hoping that over time and with effort, I can manage to become slightly less wrong. (*roll credits)

I like your observation. I didn't realize at first that I had seen it before, from you during the critique-a-thon! (Thank you for helping out with that, by the way!)

A percentage or ratio of the "amount" of alignment left to the AI sounds useful as a fuzzy heuristic in some situations, but I think it is probably a little too fuzzy to get at the the failures mode(s) of a given alignment strategy. My suspicion is that which parts of alignment are left to the AI will have much more to say about the success of alignment than how many of those checkboxes are checked. Where I think this proposed heuristic succeeds is when the ratio of human/AI responsibility in solving alignment is set very low. By my lights, that is an indication that the plan is more holes than cheese.

(How much work is left to a separate helper AI might be its own category. I have some moderate opinions on OpenAI's Superalignment effort, but those are very tangential thoughts.)

Thank you for sharing this! I am fascinated by others' internal experiences, especially when they are well-articulated.

Some of this personally resonates with me, as well. I find it very tempting to implement simple theories and pursue simple goals. Simplicity can be elegant and give the appearance of insight, but it can also be reductionist and result in overfitting to what is ultimately just a poor model of reality. Internally self-modifying to overfit a very naive self-model is an especially bad trip, and one I have taken multiple times (usually in relatively small ways, usually brought on by moments of hypomania).

It took me a long time to build epistemic humility about myself and to foster productive self-curiosity. Now I tend to use description more than prescription to align myself to my goals. I rule myself with a light hand.

Here is a rough sketch of how I think that works in my own mind:

Somewhere in my psychology is a self-improvement mechanism that I can conceptualize as a function. It takes my values and facts about myself and the world as inputs and returns my actions as outputs. (I'm not completely sure how it got there, but as long as it exists, even if just a seedling, I expect it to grow over time due to its broad instrumental utility.) I don't understand this function very well, so I can't reliably dictate to myself exactly how to improve. I also don't fully understand my values, so I can't list them cleanly and force-feed them into the function. However, this self-improvement mechanism is embedded in the rest of my psychology, so it automatically has weak access to my values and other facts. Just by giving it a little conscious attention and more accurate information about myself and the world, the mechanism tends to do useful things, without a lot of forceful steering.

If someone did want you to delete the tweet, they might first need to understand the original intent behind creating it and the roles it now serves.


I'm not sure about the laugh react, since it can be easily abused in cases of strong disagreement.

More generally: low-quality replies can be downvoted, but as I understand, low-quality reactions are given equal weight and visibility. Limiting the available vectors of toxicity may be more generally desirable than increasing the available vectors of light-heartedness.