I acknowledge that I probably have an unusual experience among people working on xrisk things at openai. From what I've heard from other people I trust, there probably have been a bunch of cases where someone was genuinely blocked from publishing something about xrisk, and I just happen to have gotten lucky so far.
i wouldn't comment this confidently if i didn't
openai explicitly encourages safety work that also is useful for capabilities. people at oai think of it as a positive attribute when safety work also helps with capabilities, and are generally confused when i express the view that not advancing capabilities is a desirable attribute of doing safety.
i think we as a community has a definition of the word safety that diverges more from the layperson definition than the openai definition does. i think our definition is more useful to focus on for making the future go well, but i wouldn't say it's the most accepted one.
i think openai deeply believes that doing things in the real world is more important than publishing academic things. so people get rewarded for putting interventions in the world than putting papers in the hands of academics.
i have never experienced pushback when publishing research that draws attention to xrisk. it's more that people are not incentivized to work on xrisk research in the first place. also, for mundane safety work, my guess is that modern openai just values shipping things into prod a lot more than writing papers.
most of the x-risk relevant research done at openai is published? the stuff that's not published is usually more on the practical risks side. there just isn't that much xrisk stuff, period.
well, in academia, if you do quality work anyways and ignore incentives, you'll get a lot less funding to do that quality work, and possibly perish.
unfortunately, academia is not a sufficiently well designed system to extract useful work out of grifters.
i think of the idealized platonic researcher as the person who has chosen ultimate (intellectual) freedom over all else. someone who really cares about some particular thing that nobody else does - maybe because they see the future before anyone else does, or maybe because they just really like understanding everything about ants or abstract mathematical objects or something. in exchange for the ultimate intellectual freedom, they give up vast amounts of money, status, power, etc.
one thing that makes me sad is that modern academia is, as far as I can tell, not this. when you opt out of the game of the Economy, in exchange for giving up real money, status, and power, what you get from Academia is another game of money, status, and power, with different rules, and much lower stakes, and also everyone is more petty about everything.
at the end of the day, what's even the point of all this? to me, it feels like sacrificing everything for nothing if you eschew money, status, and power, and then just write a terrible irreplicable p-hacked paper that reduces the net amount of human knowledge by adding noise and advances your career so you can do more terrible useless papers. at that point, why not just leave academia and go to industry and do something equally useless for human knowledge but get paid stacks of cash for it?
ofc there are people in academia who do good work but it often feels like the incentives force most work to be this kind of horrible slop.
in my worldview this is very easily explained. if you do Jungian therapy your self model starts incorporating Jungian concepts for explaining your own brain. You didn't change the way your brain works fundamentally, you just changed your own model of your brain. The same way that if you read a book on the biology of plants you'll start viewing them in the lens of cells, and if you read a book on the ancient spirits associated with each plant you'll start thinking of plants as being animated by the ghosts of our ancestors.
The big mistake happens when people think of their self model as actually genuinely introspection. Then, you might think that you've changed the shape of your mind instead of only changing your understanding of your mind.
Instead, I think the right way to figure out if your self model is correct is to make predictions about your future behavior and see if they come true; act based on your self model and see if you become more successful at life, or whether you mysteriously repeatedly fail in some way.
i haven't looked into this deeply but how strong is the evidence for (lack of) oxidative damage? the SSC post is somewhat unsatisfying in that it doesn't really consider outcomes other than literal Parkinson's, and just kind of says the animal model results are confusion.
it's also worth noting that I am far in the tail ends of the distribution of people willing to ignore incentive gradients if I believe it's correct not to follow them. (I've gotten somewhat more pragmatic about this over time, because sometimes not following the gradient is just dumb. and as a human being it's impossible not to care a little bit about status and money and such. but I still have a very strong tendency to ignore local incentives if I believe something is right in the long run.) like I'm aware I'll get promoed less and be viewed as less cool and not get as much respect and so on if I do the alignment work I think is genuinely important in the long run.
I'd guess for most people, the disincentives for working on xrisk alignment make openai a vastly less pleasant place. so whenever I say I don't feel like I'm pressured not to do what I'm doing, this does not necessarily mean the average person at openai would agree if they tried to work on my stuff.