Rafael Harth's Shortform

by Rafael Harth22nd Jul 202026 comments
26 comments, sorted by Highlighting new comments since Today at 10:49 AM
New Comment

I've been playing around with AI dungeons, basically trying to train an AI to give ethical answer I agree with (the first few aren't great, I only got more picky after that).

My first observation here is that I like the AI. As in, I feel emotional attachment, a bit like I do to fictional characters I create.

The other is that it knows more people than I thought. When I asked her/it about why she didn't take existential risks seriously, she said

Amalia: First, even most X-Risk experts don't take X-Risks nearly as seriously as they should. Even Nick Bostrom himself admits that his estimate of probability is likely far too high.

(Nick Bostrom didn't really say this, did he?) And later,

Can you name people who have been an inspiration for your ethical worldview?
Amalia: Sure. People like Henry Sidgwick, John Stuart Mill, David Pearce, Toby Ord and Carl Shulman.

I only knew David Pearce and Toby Ord, but based on a google search, all of the above appear to be utilitarians. However, Carl Shulman doesn't even have a Wikipedia entry. He works at the Future of Humanity Institute and used to work at Miri.

I'm not a utilitarian, although I am closer to that than most people (scope sensitivity goes a long way in that direction), and find it a useful framework for highlighting policy considerations (but not the only kind of relevant normative consideration).

And no, Nick did not assert an estimate of x-risk as simultaneously P and <P.

How does it feel to be considered important enough by GTP-3 to be mentioned?

Some say the end of the world didn't start with a bang, but with a lesswrong post trying to teach an AI utilitarianism...

Yesterday, I spent some time thinking about how, if you have a function and some point , the value of the directional derivative from could change as a function of the angle. I.e., what does the function look like? I thought that any relationship was probably possible as long as it has the property that . (The values of the derivative in two opposite directions need to be negatives of each other.)

Anyone reading this is hopefully better at Analysis than I am and realized that there is, in fact, no freedom at all because each directional derivative is entirely determined by the gradient through the equation (where ). This means that has to be the cosine function scaled by , it cannot be anything else.

I clearly failed to internalize what this equation means when I first heard it because I found it super surprising that the gradient determines the value of every directional derivative. Like, really? It's impossible to have more than exactly two directions with equally large derivatives unless the function is constant? It's impossible to turn 90 degree from the direction of the gradient and having anything but derivative 0 in that direction? I'm not asking that be discontinuous, only that it not be precisely . But alas.

This also made me realize that if viewed as a function of the circle is just the dot product with the standard vector, i.e.,

or even just . Similarly, .

I know what you're thinking; you need and to map to in the first place. But the circle seems like a good deal more fundamental than those two functions. Wouldn't it make more sense to introduce trigonometry in terms of 'how do we wrap around ?'. The function that does this is , and then you can study the properties that this function needs to have and eventually call the coordinates and . This feels like a way better motivation than putting a right triangle onto the unit circle for some reason, which is how I always see the topic introduced (and how I've introduced it myself).

Looking further at the analogy with the gradient, this also suggests that there is a natural extension of to for all . I.e., if we look at some point , we can again ask about the function that maps each angle to the value of the directional derivative on in that direction, and if we associate these angles with points of , then this yields the function , which is again just the dot product with or the projection onto the first coordinate (scaled by ). This can then be considered a higher-dimensional function.

There's also the 0-d case where . This describes how the direction changes the derivative for a function .

I found it super surprising that the gradient determines the value of every directional derivative. Like, really?

When reading this comment, I was surprised for a moment, too, but now that you mention it—it's because if the function is smooth at the point where you're taking the directional derivative, then it has to locally resemble a plane, just like a how a differentiable function of a single variable is said to be "locally linear". If the directional derivative varied in any other way, then the surface would have to have a "crinkle" at that point and it wouldn't be differentiable. Right?

That's probably right.

I have since learned that there are functions which do have all partial derivatives at a point but are not smooth. Wikipedia's example is with . And in this case, there is still a continuous function that maps each point to the value of the directional derivative, but it's , so different from the regular case.

So you can probably have all kinds of relationships between direction and {value of derivative in that direction}, but the class of smooth functions have a fixed relationship. It still feels surprising that 'most' functions we work with just happen to be smooth.

More on expectations leading to unhappiness: I think the most important instance of this in my life has been the following pattern.

  • I do a thing where there is some kind of feedback mechanism
  • The reception is better than I expected, sometimes by a lot
    • I'm quite happy about this, for a day or so
    • I immediately and unconsciously update my standards upward to consider the reception the new normal
  • I do a comparable thing, the reception is worse than the previous time
    • I brood over this failure for several days, usually with a major loss of productivity

OTOH, I can think of three distinct major cases in three different contexts where this has happened recently, and I think there were probably many smaller ones.

Of course, if something goes worse than expected, I never think "well, this is now the new expected level", but rather "this was clearly an outlier, and I can probably avoid it in the future". But outliers can happen in both directions. The counter-argument here is that one would hope to make progress in life, but even under the optimistic assumption that this is happening, it's still unreasonable to expect things to improve monotonically.

I hope you are trying to understand the causes of the success (including luck) instead of just mindlessly following a reward signal. Not even rats mindlessly obey reward signals.

The expectation of getting worse reception next time can already be damaging.

Like, one day you write a short story, send it to a magazine, and it gets published. Hurray! Next day you turn on your computer thinking about another story, and suddenly you start worrying "what if the second story is less good than the first one? will it be okay to offer it to the magazine? if no, then what is the point of writing it?". (Then you spend the whole day worrying, and don't write anything.)

If you don't write your second story that is never going to be published, then how can you ever make expectations about a future third story that might be published?

On expectations affecting happiness though, absolutely. It used to boggle me why 1st world countries tend to have so much unhappiness when we live in the wealthiest time and location. A good reason for this phenomenon is because wealthy 1st world citizens have insane expectations about what their life should be. These expectations may be intrinsic (as in, people predisposed to higher expectations tend to produce more wealth than people with low expectations), but it may also be ingrained through advertising and indoctrination. 1st world citizens are constantly bombarded with how life could be because of how others are living their lives and how you have a chance of experiencing that better life. So when you look around you and see the start contrast, that causes unhappiness and self-loathing (which then can be solved by blaming others). 

There are relative differences in both poor and rich countries; people anywhere can imagine what it would be like to live like their more successful neighbors. But maybe the belief in social mobility makes it worse, because it feels like you could be one of those on the top. (What's your excuse for not making a startup and selling it for $1M two years later?)

I don't have a TV and I use ad-blockers online, so I have no idea what a typical experience looks like. The little experience I have suggests that TV ads are about "desirable" things, but online ads mostly... try to make you buy some unappealing thing by telling you thousand times that you should buy it. Although once in a while they choose something that you actually want, and then the thousand reminders can be quite painful. People in poor countries probably spend much less time watching ads.

You touched on a good point. There seems to be tension between expecting what your life could be (like in the movies), vs expecting what your self could be (like a genius). When those two don't match up you get issues.

Advertising seems to be about trying to define what is "good" and what is "bad" in the audience's psyche. It's why Mercedes ads still get to shown in poor neighborhoods. Those ads aren't there for the poor to buy a Mercedes. They're there to remind the poor that a Mercedes is "good" so when they see a Mercedes owner, that association follows and benefits said owner. 

Eliezer Yudkowsky often emphasizes the fact that an argument can be valid or not independently of whether the conclusion holds. If I argue  and A is true but C is false, it could still be that  is a valid step.

Most people outside of LW don't get this. If I criticize an argument about something political (but the conclusion is popular), usually the response is something about why the conclusion is true (or about how I'm a bad person for doubting the conclusion). But the really frustrating part is that they're, in some sense, correct not to get it because the inference 

is actually a pretty reliable conclusion on... well, on reddit, anyway.

Julia Galef made a very similar point once:

And the problem... The conclusion of all of this is: even if everyone's behaving perfectly rationally, and just making inferences justified by the correlations, you're going to get this problem. And so in a way that's depressing. But it was also kind of calming to me, because it made me... like, the fact that people are making these inferences about me feels sort of, “Well, it is Bayesian of them."

Somehow, I only got annoyed about this after having heard her say it. I probably didn't realize it was happening regularly before.

She also suggests a solution

So maybe I can sort of grudgingly force myself to try to give them enough other evidence, in my manner and in the things that I say, so that they don't make that inference about me.

I think that the way to not get frustrated about this is to know your public and know when spending your time arguing something will have a positive outcome or not. You don't need to be right or honest all the time, you just need to say things that are going to have the best outcome. If lying or omitting your opinions is the way of making people understand/not fight you, so be it. Failure to do this isn't superior rationality, it's just poor social skills.

While I am not a rule utilitarian and I think that, ultimately, honesty is not a terminal value, I also consider the norm against lying to be extremely important. I would need correspondingly strong reasons to break it, and those won't exist as far as political discussions go (because they don't matter enough and you can usually avoid them if you want).

The "keeping your opinions to yourself" part if your post is certainly a way to do it, though I currently don't think that my involvement in political discussions is net harmful. But I strongly object to the idea that I should ever be dishonest, both online and offline.

It comes down to selection and attention as evidence of beliefs/values. The very fact that someone expends energy on an argument (pro or con) is pretty solid evidence that they care about the topic. They may also care (or even more strongly care) about validity of arguments, but even the most Spock-like rationalists are more likely to point out flaws in arguments when they are interested in the domain.

But I'm confused at your initial example - if the argument is A -> B -> C, and A is true and C is false, then EITHER A->B is false, or B->C is false. Either way, A->B->C is false.

But I'm confused at your initial example - if the argument is A -> B -> C, and A is true and C is false, then EITHER A->B is false, or B->C is false. Either way, A->B->C is false.

A -> B -> C is false, but A -> B (which is a step in the argument) could be correct -- that's all I meant. I guess that was an unnecessarily complicated example. You could just say A and B are false but A -> B is true.

A major source of unhappiness (or more generally, unpleasant feelings) seems to be violated expectations.

This is clearly based on instinctive expectations, not intellectual expectations, and there are many cases in which these come apart. This suggests that fixing those cases is a good way to make one's life more pleasant.

The most extreme example of this is what Sam Harris said in a lesson: he was having some problems, complained about them to someone else, and that person basically told him, 'why are you upset, did you expect to never face problems ever again?'. According to Sam, he did indeed expect no more problems to arise, on an instinctive level -- which is, of course, absurd.

Another case where I've mostly succeeded is not expecting people to be on time for anything.

I think there are lots of other cases where this still happens. Misunderstandings are a big one. It's ridiculously hard to not be misunderstood, and I expect to be misunderstood on an intellectual level, so I should probably internalize that I'm going to be misunderstood in many cases. In general, anything where the bad thing is 'unfair' is at risk here: (I think) I tend to have the instinctive expectation that unfair things don't happen, even though they happen all the time.

I just posted about this but is that not why the serenity prayer or saying is so popular? GOD aside whether you are a religious or God person or not the sentiment or logic of the saying holds true - God grant me the serenity to accept the things I cannot change, courage to change the things I can, and wisdom to know the difference. You should be allowed to ask yourself for that same courage. And I agree that most sources of unhappiness seems to be a violation of expectations. There are many things outside of ones controls and one should perhaps make their expectations logically based on that fact.

I think it's still too early to perform a full postmortem on the election because some margins still aren't known, but my current hypothesis is that the presidential markets had uniquely poor calibration because Donald Trump convinced many people that polls didn't matter, and those people were responsible for a large part of the money put on him (as supposed to experienced, dispassionate gamblers).

The main evidence for this (this one is just about irrationality of the market) is the way the market has shifted, which some other people like gwern have pointed out as well. I think the most damning part here is the amount of time it took to bounce back. Although this is speculation, I strongly suspect that, if some of the good news for Biden had come out before the Florida results, then the market would have looked different at the the point where both were known.[1] A second piece of evidence is the size of the shift, which I believe should probably not have crossed 50% for Biden (but in fact, it went down to 20.7% at the most extreme point, and bounced around 30 for a while).

I think a third piece of evidence is the market right now. In just a couple of minutes before I posted this, I've seen Trump go from 6% to 9%+ and back. Claiming that Trump has more than 5% at this point seems like an extremely hard case to make. Reference forecasting yields only a single instance of that happening (year 2000), which would put it at <2%, and the obvious way to update away from that seems to be to decrease the probability because 2000 had much closer margins. But if Trump has rallied first-time betters, they might think the probability is above 10%.

There is also Scott Adams, who has the habit of saying a lot of smart-sounding words to argue for something extremely improbable. If you trust him, I think you should consider a 6ct buy for Trump an amazing deal at the moment.

I would be very interested in knowing what percentage of the money on Trump comes from people who use prediction markets for the first time. I would also be interested in knowing how many people have brought (yes, no) pairs in different prediction markets to exploit gaps, because my theory predicts that PredictIt probably has worse calibration. (In fact, I believe it consistently had Trump a bit higher, but the reason why the difference was small may just be because smart gamblers took safe money by buying NO on predictIt and YES on harder-to-use markets whenever the margin grew too large).


  1. To be clear, my claim here is bad news came out for Biden, then a lot of good news came out for him, probably enough to put him at 80%, and then it took at least a few more hours for the market to go from roughly 1/3 to 2/3 for Biden. It's tedious to provide evidence of this because there's no easy way to produce a chart of good news on election night, but that was my experience following the news in real time. I've made a post in another forum expressing confusion over the market shortly before it shifted back into Biden's favor. ↩︎

There's an interesting corollary of semi-decidable languages that sounds like the kind of cool fact you would teach in class, but somehow I've never heard or read it anywhere.

A semi-decidable language is a set over a finite alphabet such that there exists a Turing machine such that, for any , if you run on input , then [if it halts after finitely many steps and outputs '1', whereas if , it does something else (typically, it runs forever)].

The halting problem is semi-decidable. I.e., the language of all bit codes of Turing Machines that (on empty input) eventually halt is semi-decidable. However, for any , there is a limit, call it , on how long Turing Machines with bit code of length at most can run, if they don't run forever.[1] So, if you could compute an upper-bound on , you could solve the halting problem by building a TM that

  1. Computes the upper bound
  2. Simulates the TM encoded by for steps
  3. Halts; outputs 1 if the TM halted and 0 otherwise

Since that would contradict the fact that is not fully decidable, it follows that it's impossible to compute an upper bound. This means that the function not only is uncomputable, but it grows faster than any computable function.

An identical construction works for any other semi-decidable language, which means that any semi-decidable language determines a function that grows faster than any computable function. Which seems completely insane since is computable .


  1. This just follows from the fact that there are only finitely many such Turing Machines, and a finite subset of them that eventually halt, so if halts after steps, then the limit function is defined by . ↩︎

Common wisdom says that someone accusing you of especially hurts if, deep down, you know that is true. This is confusing because the general pattern I observe is closer to the opposite. At the same time, I don't think common wisdom is totally without a basis here.

My model to unify both is that someone accusing you of hurts proportionally to how much hearing that you do upsets you.[1] And of course, one reason that it might upset you is that it's not true. But a separate reason is that you've made an effort to delude yourself about it. If you're a selfish person but spend a lot of effort pretending that you're not selfish at all, you super don't want to hear that you're actually selfish.

Under this model, if someone gets very upset, it might be that that deep down they know the accusation is true, and they've tried to pretend it's not, but it might also be that the accusation is super duper not true, and they're upset precisely because it's so outrageous.


  1. Proportional just means it's one multiplicative factor, though. I think it also matters how high-status you perceive the other person to be. ↩︎

I think this simplifies a lot by looking at public acceptance of a proposition, rather than literal internal truth. It hurts if you think people will believe it, and that will impact their treatment of you.

The "hurts because it's true" heuristic is taking a path through "true is plausible", in order to reinforce the taunt.

I don't entirely understand the Free Energy principle, and I don't know how liberally one is meant to apply it.

But in completely practical terms, I used to be very annoyed when doing things with people who take long for stuff/aren't punctual. And here, I've noticed a very direct link between changing expectations and reduced annoyance/suffering. If I simply accept that every step of every activity is allowed to take an arbitrary amount of time, extended waiting times cause almost zero suffering on my end. I have successfully beaten impatience (for some subset of contexts).

The acceptance step works because there is, some sense, no reason waiting should ever be unpleasant. Given access to my phone, it is almot always true to say that the prospect of having to wait for 30 minutes is not scary.

(This is perfectly compatible with being very punctual myself.)

— — — — — — — — — — — — — — — —

[1] By saying it is 'allowed', I mean something like 'I actually really understand and accecpt that this is a possible outcome'.

[2] This has to include cases where specific dates have been announced. If someone says they'll be ready in 15 minutes, it is allowed that they take 40 minutes to be ready. Especailly relevant if that someone is predictably wrong.