Darklight — LessWrong

I heard this usage of "tilt" a lot when I used to play League of Legends, but almost never heard it outside of that, so my guess is that it's gamer slang.

the jackpot age

Darklight3mo60

Apologies if this is a newbie math comment, as I'm not great at math, but is there a way to calculate a kind of geometric expected value? The geometric mean seems to require positive numbers, and expected values can have negative terms. Also, how would you apply probability weights?

Dear Paperclip Maximizer, Please Don’t Turn Off the Simulation

Darklight3mo60

Even if I don't necessarily agree with the premise that a Paperclip Maximizer would run such a Simulation or that they would be more likely than other possible Simulations or base realities, I do find the audacity of this post (and your prior related posts) to be quite appealing from a "just crazy enough it might work" perspective.

lesswronguser123's Shortform

Darklight4mo20

I'd advise that whenever you come up with what seems like an original idea or discovery, immediately do at bare minimum a quick Google search about it, or if you have the time, a reasonably thorough literature search in whatever field(s) it's related to. It is really, really easy to come up with something you think is new when it's actually not so much. While the space of possible ideas is vast, the low hanging fruit are very likely to have already been picked by someone somewhere, so especially be wary of a seemingly simple idea that seems super elegant and obvious. It probably is, and odds are someone on the Internet has made at least a blog post about it or there's an obscure paper on ArXiv discussing it.

Also, be aware that often people will use different terminology to describe the same thing, so part of that search for existing work should involve enumerating different ways of describing it. I know it's tedious to go through this process, but it helps to not be reinventing the wheel all the time.

Generally, a really unique, novel idea that actually works requires a lot of effort and domain knowledge to come up with, and probably needs experiments to really test and validate it. A lot of the ideas that aren't amenable to testing will sound nice but be unverifiable, and many ideas that can be tested will sound great on paper but actually not work as expected in the real world.

Darklight's Shortform

Darklight5mo10

So, I have two possible projects for AI alignment work that I'm debating between focusing on. Am curious for input into how worthwhile they'd be to pursue or follow up on.

The first is a mechanistic interpretability project. I have previously explored things like truth probes by reproducing the Marks and Tegmark paper and extending it to test whether a cosine similarity based linear classifier works as well. It does, but not any better or worse than the difference of means method from that paper. Unlike difference of means, however, it can be extended to multi-class situations (though logistic regression can be as well). I was thinking of extending the idea to try to create an activation vector based "mind reader" that calculates the cosine similarity with various words embedded in the model's activation space. This would, if it works, allow you to get a bag of words that the model is "thinking" about at any given time.

The second project is a less common game theoretic approach. Earlier, I created a variant of the Iterated Prisoner's Dilemma as a simulation that includes death, asymmetric power, and aggressor reputation. I found, interestingly, that cooperative "nice" strategies banding together against aggressive "nasty" strategies produced an equilibrium where the cooperative strategies win out in the long run, generally outnumbering the aggressive ones considerably by the end. Although this simulation probably requires more analysis and testing in more complex environments, it seems to point to the idea that being consistently nice to weaker nice agents acts as a signal to more powerful nice agents and allows coordination that increases the chance of survival of all the nice agents, whereas being nasty leads to a winner-takes-all highlander situation, which from an alignment perspective could be a kind of infoblessing that an AGI or ASI could be persuaded to spare humanity for these game theoretic reasons.

Why Should I Assume CCP AGI is Worse Than USG AGI?

Darklight6mo164

It seems like it would depend pretty strongly on which side you view as having a closer alignment with human values generally. That probably depends a lot on your worldview and it would be very hard to be unbiased about this.

There was actually a post about almost this exact question on the EA Forums a while back. You may want to peruse some of the comments there.

Darklight's Shortform

Darklight6mo*1-1

Back in October 2024, I tried to test various LLM Chatbots with the question:

"Is there a way to convert a correlation to a probability while preserving the relationship 0 = 1/n?"

Years ago, I came up with an unpublished formula that does just that:

p(r) = (n^r * (r + 1)) / (2^r * n)

So I was curious if they could figure it out. Alas, back in October 2024, they all made up formulas that didn't work.

Yesterday, I tried the same question on ChatGPT and, while it didn't get it quite right, it came, very, very close. So, I modified the question to be more specific:

"Is there a way to convert a correlation to a probability while preserving the relationships 1 = 1, 0 = 1/n, and -1 = 0?"

This time, it came up with a formula that was different and simpler than my own, and... it actually works!

I tried this same prompt with a bunch of different LLM Chatbots and got the following:

Correct on the first prompt:

GPT4o, Claude 3.7

Correct after explaining that I wanted a non-linear, monotonic function:

Gemini 2.5 Pro, Grok 3

Failed:

DeepSeek-V3, Mistral Le Chat, QwenMax2.5, Llama 4

Took too long thinking and I stopped it:

DeepSeek-R1, QwQ

All the correct models got some variation of:

p(r) = ((r + 1) / 2)^log2(n)

This is notably simpler and arguably more elegant than my earlier formula. It also, unlike my old formula, has an easy to derive inverse function.

So yeah. AI is now better than me at coming up with original math.

On Pseudo-Principality: Reclaiming "Whataboutism" as a Test for Counterfeit Principles

Darklight6mo1-1

The most I've seen people say "whataboutism" has been in response to someone trying to deflect criticism by pointing out apparent hypocrisy, as in the aforementioned Soviet example (I used to argue with terminally online tankies a lot).

I.e.

(A): "The treatment of Uyghurs in China is appalling. You should condemn this."

(B): "What about the U.S. treatment of Native Americans? Who are you to criticize?"

(A): "That's whataboutism!"

The thing I find problematic with this "defence" is that both instances are ostensibly examples of clear wrongdoing, and pointing out that the second thing happened doesn't make the first thing any less wrong. It also makes the assumption that (A) is okay with the second thing, when they haven't voiced any actual opinion on it yet, and could very well be willing to condemn it just as much.

Your examples are somewhat different in the sense that rather than referring to actions that some loosely related third parties were responsible for, the actions in question are directly committed by (A) and (B) themselves. In that sense, (A) is being hypocritical and probably self-serving. At the same time I don't think that absolves (B) of their actions.

My general sense whenever whataboutism rears its head is to straight up say "a pox on both your houses", rather than trying to defend a side.

A Fraction of Global Market Capitalization as the Best Currency

Darklight7mo10

Ok fair. I was assuming real world conditions rather than the ideal of Dath Ilan. Sorry for the confusion.

A Fraction of Global Market Capitalization as the Best Currency

Darklight7mo10

Why not? Like, the S&P 500 can vary by tens of percent, but as Google suggests, global GDP only fell 3% in 2021, and it usually grows, and the more stocks are distributed, the more stable they are.

Increases in the value of the S&P 500 are basically deflation relative to other units of account. When an asset appreciates in value, when its price goes up it is deflating relative to the currency the price is in. Like, when the price of bread increases, that means dollars are inflating, and bread is deflating. Remember, your currency is based on a percentage of global market cap. Assuming economic growth increases global market cap, the value of this currency will increase and deflate.

Remember, inflation is, by definition, the reduction in the purchasing power of a currency. It is the opposite of that thing increasing in value.

If you imagine that the world's capitalization was once measured in dollars, but then converted to "0 to 1" proportionally to dollars, and everyone used that system, and there is no money printing anymore, what would be wrong with that?

Then you would effectively be using dollars as your currency, as your proposed currency is pegged to the dollar. And you stopped printing dollars, so now your currency is going to deflate as too few dollars chase too many goods and services as they increase with economic growth.

As you are no longer printing dollars or increasing the supply of your new currency, the only way for it to stop deflating is for economic growth to stop. You'll run into problems like deflationary spirals and liquidity traps.

It might seem like deflation would make you hold off on buying, but not if you thought you could get more out of buying than from your money passively growing by a few percent a year, and in that case, you would reasonably buy it.

Deflation means you'd be able to buy things later at a lower price than if you bought it now. People would be incentivised to hold off on anything they didn't need right away. This is why deflation causes hoarding, and why economists try to avoid deflation whenever possible.

Deflation is what deflationary cryptocurrencies like Bitcoin currently do. This leads to Bitcoin being used as a speculative investment instead of as a medium of exchange. Your currency would have the same problem.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments