An eccentric dreamer in search of truth and happiness for all. I formerly posted on Felicifia back in the day under the same name. I've been a member of Less Wrong and loosely involved in Effective Altruism to varying degrees since roughly 2013.
I'm someone with an AI research/engineering background who also aspires to be (and sometimes fancies himself) a good writer. How would you be able to tell if I should put in the time and energy required to write short stories or novels that try to input good sci-fi ideas into our culture (particularly AI safety related) rather than using that time and energy on, for instance, side projects in technical AI safety? It may not be an either/or thing, but I'm not sure splitting my time is better than focusing on one path.
Also, from my cursory research, it seems like becoming a successful published author is similar to winning the lottery, and has worse odds than succeeding at a startup. Would it still make sense to try this, even if realistically, it probably wouldn't be financially sustainable? The hypothetical EV could be very high, but that seems to depend on having very good ideas, very good writing ability, and a certain amount of luck, which are things I'm not super confident I have enough of here.
A lot of it is the billionaire = evil heuristic. If you try to steelman the argument, it's essentially that anyone who actually becomes a billionaire, given the way the incentives work in capitalism, probably did a lot of questionable things to get into such a position (at the very least, massively exploiting the surplus labour of their workers, if you think that's a thing), while also choosing not to give away enough to stop being a billionaire (like Chuck Feeney or Yvon Chouinard did, though admittedly late in life), when there are lots of starving children in the world that they could easily save right now (at least in theory).
For Bill Gates in particular, he probably also still has his reputation as a founder and former CEO of Microsoft, which was, at least back in the 1990s, known in popular culture as an "evil" company that maintained a monopoly with unsavoury tactics and by buying out competitors, etc. Back then, he was seen as the face of greedy American tech giant capitalism, before other players like Amazon or Google even existed. To a lot of people then, his philanthropy is seen not as true altruism, but more something akin to trying to redeem himself in the eyes of the public and ensure his legacy is well-regarded.
Personally, I'm not that cynical, but a lot of people are, so you get the hate.
I heard this usage of "tilt" a lot when I used to play League of Legends, but almost never heard it outside of that, so my guess is that it's gamer slang.
Apologies if this is a newbie math comment, as I'm not great at math, but is there a way to calculate a kind of geometric expected value? The geometric mean seems to require positive numbers, and expected values can have negative terms. Also, how would you apply probability weights?
Even if I don't necessarily agree with the premise that a Paperclip Maximizer would run such a Simulation or that they would be more likely than other possible Simulations or base realities, I do find the audacity of this post (and your prior related posts) to be quite appealing from a "just crazy enough it might work" perspective.
I'd advise that whenever you come up with what seems like an original idea or discovery, immediately do at bare minimum a quick Google search about it, or if you have the time, a reasonably thorough literature search in whatever field(s) it's related to. It is really, really easy to come up with something you think is new when it's actually not so much. While the space of possible ideas is vast, the low hanging fruit are very likely to have already been picked by someone somewhere, so especially be wary of a seemingly simple idea that seems super elegant and obvious. It probably is, and odds are someone on the Internet has made at least a blog post about it or there's an obscure paper on ArXiv discussing it.
Also, be aware that often people will use different terminology to describe the same thing, so part of that search for existing work should involve enumerating different ways of describing it. I know it's tedious to go through this process, but it helps to not be reinventing the wheel all the time.
Generally, a really unique, novel idea that actually works requires a lot of effort and domain knowledge to come up with, and probably needs experiments to really test and validate it. A lot of the ideas that aren't amenable to testing will sound nice but be unverifiable, and many ideas that can be tested will sound great on paper but actually not work as expected in the real world.
So, I have two possible projects for AI alignment work that I'm debating between focusing on. Am curious for input into how worthwhile they'd be to pursue or follow up on.
The first is a mechanistic interpretability project. I have previously explored things like truth probes by reproducing the Marks and Tegmark paper and extending it to test whether a cosine similarity based linear classifier works as well. It does, but not any better or worse than the difference of means method from that paper. Unlike difference of means, however, it can be extended to multi-class situations (though logistic regression can be as well). I was thinking of extending the idea to try to create an activation vector based "mind reader" that calculates the cosine similarity with various words embedded in the model's activation space. This would, if it works, allow you to get a bag of words that the model is "thinking" about at any given time.
The second project is a less common game theoretic approach. Earlier, I created a variant of the Iterated Prisoner's Dilemma as a simulation that includes death, asymmetric power, and aggressor reputation. I found, interestingly, that cooperative "nice" strategies banding together against aggressive "nasty" strategies produced an equilibrium where the cooperative strategies win out in the long run, generally outnumbering the aggressive ones considerably by the end. Although this simulation probably requires more analysis and testing in more complex environments, it seems to point to the idea that being consistently nice to weaker nice agents acts as a signal to more powerful nice agents and allows coordination that increases the chance of survival of all the nice agents, whereas being nasty leads to a winner-takes-all highlander situation, which from an alignment perspective could be a kind of infoblessing that an AGI or ASI could be persuaded to spare humanity for these game theoretic reasons.
Back in October 2024, I tried to test various LLM Chatbots with the question:
"Is there a way to convert a correlation to a probability while preserving the relationship 0 = 1/n?"
Years ago, I came up with an unpublished formula that does just that:
p(r) = (n^r * (r + 1)) / (2^r * n)
So I was curious if they could figure it out. Alas, back in October 2024, they all made up formulas that didn't work.
Yesterday, I tried the same question on ChatGPT and, while it didn't get it quite right, it came, very, very close. So, I modified the question to be more specific:
"Is there a way to convert a correlation to a probability while preserving the relationships 1 = 1, 0 = 1/n, and -1 = 0?"
This time, it came up with a formula that was different and simpler than my own, and... it actually works!
I tried this same prompt with a bunch of different LLM Chatbots and got the following:
Correct on the first prompt:
GPT4o, Claude 3.7
Correct after explaining that I wanted a non-linear, monotonic function:
Gemini 2.5 Pro, Grok 3
Failed:
DeepSeek-V3, Mistral Le Chat, QwenMax2.5, Llama 4
Took too long thinking and I stopped it:
DeepSeek-R1, QwQ
All the correct models got some variation of:
p(r) = ((r + 1) / 2)^log2(n)
This is notably simpler and arguably more elegant than my earlier formula. It also, unlike my old formula, has an easy to derive inverse function.
So yeah. AI is now better than me at coming up with original math.
I've read a lot of your posts in the past and find you to be reliably insightful. As such, I find it really interesting that with such an IQ in the 99th percentile (or higher), you still initially thought you weren't good enough to do important AI safety work. While I haven't had my IQ properly tested, I did take the LSAT and got a 160 (80th percentile), which is probably around an IQ of merely 120ish. I remember reading a long time ago that the average self-reported IQ of Less Wrongers was 137, which, combined with the extremely rigorous posting style of most people on here, was quite intellectually intimidating (and still is to an extent).
This makes me wonder if there's much point in my own efforts to push the needle on AI safety. I've interviewed in the past with orgs and not gotten in, and occasionally done some simple experiments with local LLMs and activation vectors (I used to work in industry with word vectors so I know a bunch about that space), but actually getting any sort of position or publishable result seems to be very hard. I've had the thought a lot of times that the resources that could go to me would be better off given to a counterfactual more talented researcher/engineer, as it seems like AI safety is more funding constrained than talent constrained.