## LESSWRONGLW

NunoSempere

I'm an independent researcher, hobbyist forecaster, programmer, and aspiring effective altruist.

In the past, I've studied Maths and Philosophy, dropped out in exhasperation at the inefficiency; picked up some development economics; helped implement the European Summer Program on Rationality during 2017, 2018 and 2019, and SPARC during 2020; worked as a contractor for various forecasting and programming projects; volunteered for various Effective Altruism organizations, and carried out many independent research projects. In a past life, I also wrote a popular Spanish literature blog, and remain keenly interested in Spanish poetry.

I like to spend my time acquiring deeper models of the world, and a a good fraction of my research is available on nunosempere.github.io.

With regards to forecasting, I am LokiOdinevich on GoodJudgementOpen, and Loki on CSET-Foretell, and I have been running a Forecasting Newsletter since April 2020. I also quite enjoy winning bets against people too confident in their beliefs.

I was a Future of Humanity Institute 2020 Summer Research Fellow, and I'm working on a grant from the Long Term Future Fund to do "independent research on forecasting and optimal paths to improve the long-term." You can share feedback anonymously with me here.

# Sequences

Inner and Outer Alignment Failures in current forecasting systems

No, this hasn't been solved. But I imagine that mixing logical quantifiers and probability statements would be less messy if one e.g., knows the causal graph of the events to which the statements refer. This is something that the original post didn't mention, but which I thought was interesting.

Low-stakes alignment

To the extent that SGD can’t find the optimum, it hurts the performance of both the aligned model and the unaligned model. In some sense what we really want is a regret bound compared to the “best learnable model,” where the argument for a regret bound is heuristic but seems valid.

I'm not sure this goes through. In particular, it could be that the architecture which you would otherwise deploy (presumably human brains, or some other kind of automated system) would do better than the "best learnable model" for some other (architecture + training data + etc) combination. Perhaps in some sense what you want is a regret bound over the aligned model which you would use if you don't deploy your AI model, not a regret bound between your AI model and the best learnable model in its (architecture + training data + etc) space.

That said, I'm really not familiar with regret bounds, and it could be that this is a non-concern.

I've added these predictions to foretold, in case people want to forecast on them in one place: https://www.foretold.io/c/6eebf79b-4b6f-487b-a6a5-748d82524637

What will GPT-4 be incapable of?

On this same note, matrix multiplication or inversion.

Learning Russian Roulette

I also have the sense that this problem is interesting.

Learning Russian Roulette

I disagree; this might have real world implications. For example, the recent OpenPhil report on Semi-informative Priors for AI timelines updates on the passage of time, but if we model creating AGI as playing Russian roulette*, perhaps one shouldn't update on the passage of time.

* I.e., AGI in the 2000s might have lead to an existential catastrophe due to underdeveloped safety theory

Learning Russian Roulette

You would never play the first few times

This isn't really a problem if the rewards start out high and gradually diminish.

I.e., suppose that you value your life at $L (i.e., you're willing to die if the heirs of your choice get L dollars), and you assign a probability of 10^-15 to H1 = "I am immune to losing at Russian roulette", something like 10^ 4 to H2 = "I intuitively twist the gun each time to avoid the bullet",, and a probability of something like 10^-3 to H3 = "they gave me an empty gun this time". Then you are offered to play enough rounds of Russian roulette for a price of$L/round until you update to arbitrary levels.

Now, if you play enough times, H3 becomes the dominant hypothesis with say 90% probability, so you'd accept a payout for, say, $L/2. Similarly, if you know that H3 isn't the case, you'd still assign very high probability to something like H2 after enough rounds, so you'd still accept a bounty of$L/2.

Now, suppose that all the alternative hypothesis H2, H3,... are false, and your only other alternative hypothesis is H1 (magical intervention). Now the original dilemma has been saved. What should one do?