Sorted by New

Wiki Contributions


so the maximum "downside" would be the sum of the differences between that reference populations lives and those without the variant for all variants you edit (plus any effects from off-targets)

I don't think that's true? It has to assume the variants don't interact with each other. Your reference population would only have 0.01% people with (the rarest) 2 variants at once, 0.0001% with 3 variants, and so on.

Yes, but this exact case is when you say "This would be useful for trying out different variations on a phrase to see what those small variations change about the implied meaning" and when it can be particularly misleading because the LLM is contrasting with the previous version which the humans reading/hearing the final version don't know about.

So it would be more useful for that purpose to use a new chat.

But the screenshot says "if i instead say the words...". This seems like it has to be in the same chat with the "matters" version.

but speak only the truth to other Parselmouths and (by implication) speak only truth to Quakers.

I would merely like to note that the implication seems contrary to the source of the name: I expect Quirrell and most historical Parselmouths in HPMOR would very much lie to Quakers (Quirrell would maybe derive some entertainment from not saying factually false things while misleading them).

Or to put it another way: in the full post you say

There is some evidence he has higher-than-normal narcissistic traits, and there’s a positive correlation between narcissistic traits and DAE. I think there is more evidence of him having DAE than there is of him having narcissistic traits

but to me it looks like you could have equally replaced DAE with "narcissistic traits" in Theories B and C, and provided the same list of evidence.

(1) Convicted criminals are more likely to have narcissistic traits.

(2) "extreme disregard for protecting his customers" is also evidence for narcissistic traits.

Etc. And then you could repeat the exercise with "sociopathy" and so on.

So there are two possibilities, as far as I can see:

  1. One or more things on the list are in fact not evidence for narcissistic traits.
  2. They are stronger evidence for DAE than for narcissistic traits. 

But it isn't clear which you believe and about what parts of the list in particular. (Of course, with the exception of (4) and (11), but they go in the opposite directions.)

Yes, it's evidence. My question is how strong or weak this evidence is (and my expectation is that it's weak). Your comparison relies on "wet grass is typically substantial evidence for rain".

Based on the full text:

Some readers may think that this sounds circular: if I’m trying to explain why someone would do what SBF did, how is it valid to use the fact that he did it as a piece of evidence for the explanation? But treating the convictions as evidence for SBF’s DAE is valid in the same way that, if you were trying to explain why the grass is wet, it would be valid to use the fact that the grass is wet as evidence for the hypothesis that it rained recently (since wet grass is typically substantial evidence for rain).

But a lot of your pro-DAE evidence seems to me to fail this test. E.g. ok, he lied to the customers and to the Congress; why is this substantial evidence of DAE in particular?

oh, FTX doesn’t have a bank account, I guess people can wire to Alameda’s to get money on FTX….3 years later…oh fuck it looks like people wired $8b to Alameda and oh god we basically forgot about the stub account that corresponded to that, and so it was never delivered to FTX.

This seems like evidence in favor of Theory A and against DAE if you look at those as competing explanations? That is, he (is claiming that in this particular case he) commingled funds for reasons unrelated to DAE. 

In November 2022, he also tweeted these statements

It seems likely he believed at that point that if a run could be avoided, he would have enough assets; so making these statements could help most customers, and not making them could hurt most of them, even if it helped a few lucky and quick ones. Not evidence of decreased empathy at all (in my view).

(3) There are multiple sources suggesting that he has a tendency and willingness to lie and deceive others.

Everything under this seems to fail the rain test, at least; very many people have this willingness, most of them don't have DAE (simply based on the prevalence you mention). Is this particular "style" of dishonesty characteristic of DAE?

(4) is actual evidence for DAE, great.

(5) and (10) For the rain test you need to provide a reason to believe most manipulative people have DAE. 


For decreased affective guilt the situation seems to be worse: as far as I can see, no evidence for it is presented, just evidence there is some reported guilt and then

In the context of the large amounts of evidence for his lack of affective empathy, it seems more likely that the quote above is an example of cognitive guilt rather than affective guilt.

This seems to require a very large correlation between DAEmpathy and DAGuilt. Why couldn't he have one but not the other?

When I wrote the above, I was just going by your stated definition of DAE; after going to the page you linked, which I should have done earlier, a lot of your evidence seems to cover the facets of psychopathy other than DAE; you could argue they are correlated, but it seems replacing DAE with psychopathy (as defined there) in theories B and C would make the evidence fit strictly better.

I feel like people like Scott Aaronson who are demanding a specific scenario for how AI will actually kill us all... I hypothesize that most scenarios with vastly superhuman AI systems coexisting with humans end in the disempowerment of humans and either human extinction or some form of imprisonment or captivity akin to factory farming

Aaronson in that quote is "demanding a specific scenario" for how GPT-4.5 or GPT-5 in particular will kill us all. Do you believe they will be vastly superhuman?

The quoted section more seems like instrumental convergence than orthogonality to me?

The second part of the sentence, yes. The bolded one seems to acknowledge AIs can have different goals, and I assume that version of EY wouldn't count "God" as a good goal.

Another more relevant part:

Obviously, if the AI is going to be capable of making choices, you need to create an exception to the rules - create a Goal object whose desirability is not calculated by summing up the goals in the justification slot.

Presumably this goal object can be anything.

But in order to accept that, one needs to accept the orthogonality thesis. 

I agree that EY rejected the argument because he accepted OT. I very much disagree that this is the only way to reject the argument. In fact, all four positions seem quite possible:

  1. Accept OT, accept the argument: sure, AIs can have different goals, but this (starting an AI without explicit goals) is how you get an AI which would figure out the meaning of life.
  2. Reject OT, reject the argument: you can think "figure out the meaning of life" is not a possible AI goal.
  3. and 4. EY's positions at different times.


In addition, OT can itself be a reason to charge ahead with creating an AGI: since it says an AGI can have any goal, you "just" need to create an AGI which will improve the world. It says nothing about setting an AGI's goal being difficult.

In fact it seems that the linked argument relies on a version of the orthogonality thesis instead of being refuted by it:

For almost any ultimate goal - joy, truth, God, intelligence, freedom, law - it would be possible to do it better (or faster or more thoroughly or to a larger population) given superintelligence (or nanotechnology or galactic colonization or Apotheosis or surviving the next twenty years).

Nothing about the argument contradicts "the true meaning of life" -- which seems in that argument to be effectively defined as "whatever the AI ends up with as a goal if it starts out without a goal" -- being e.g. paperclips.

Load More