All of Viktor Rehnberg's Comments + Replies

You can see twin birth rates fell sharply in the late 90s

Shouldn't this be triplet birthrates? Twin birthrates look pretty stable in comparison.

Hmm, yeah it's a bit hard to try stuff when there's no good preview. Usually I'd recommend rot13 chiffer if all else fails but for number sequences that makes less sense.

I knew about 2-4-6 problem from HPMOR, I really like the opportunity to try it out myself. These are my results on the four other problems:

indexA

Number of guesses:

8 guesses of which 3 were valid and 5 non-valid

Guess:

"A sequence of integers whose sum is non-negative"

Result: Failure

indexB

Number of guesses:

39 of which 23 were valid 16 non-valid

Guess:

"Three ordered real numbers where the absolute difference between neighbouring numbers is decreasing."

Result: Success

indexC

Number of guesses:

21 of which 15 were valid and 6 non-valid

Guess... (read more)

1Caridorc Tergilti3mo
I tried both and neither works

These problems seemed to me similar to the problems at the International Physicist's Tournament. If you want more problems check out https://iptnet.info

In case anyone else is looking for a source a good search term is probably the Beal Effect. From the original paper by Beal and Smith:

Once the effect is pointed out, it does not take long to arrive at the conclusion that it arises from a natural correlation between a high branching factor in the game tree and having a winning move available. In other words, mobility (in the sense of having many moves available) is associated with better positions

Or a counterexample from the other direction would be that you can't describe a uniform distribution of the empty set either (I think). And that would feel even weirder to call "bigger".

Why would this property mean that it is "bigger"? You can construct a uniform distribution of a uncountable set through a probability density as well. However, using the same measure on a countably infinite subset of the uncountable set would show that the countable set has measure 0.

1Viktor Rehnberg6mo
Or a counterexample from the other direction would be that you can't describe a uniform distribution of the empty set either (I think). And that would feel even weirder to call "bigger".

So we have that

[...] Richard Jeffrey is often said to have defended a specific one, namely the ‘news value’ conception of benefit. It is true that news value is a type of value that unambiguously satisfies the desirability axioms.

but at the same time

News value tracks desirability but does not constitute it. Moreover, it does not always track it accurately. Sometimes getting the news that X tells us more than just that X is the case because of the conditions under which we get the news.

And I can see how starting from this you would get that . ... (read more)

Skimming the methodology it seems to be a definite improvement and does tackle the short-comings mentioned in the original post to some degree at least.

Isn't that just a question whether you assume expected utility or not. In the general case it is only utility not expected utility that matters.

1cubefox9mo
I'm not sure this is what you mean, but yes, in case of acts, it is indeed so that only the utility of an action matters for our choice, not the expected utility, since we don't care about probabilities of, or assign probabilities to, possible actions when we choose among them, we just pick the action with the highest utility. But only some propositions describe acts. I can't chose (make true/certain) that the sun shines tomorrow, so the probability of the sun shining tomorrow matters, not just its utility. Now if the utility of the sun shining tomorrow is the maximum amount of money I would pay for the sun shining tomorrow, is that plausible? Assuming the utility of sunshine tomorrow is a fixed value x, wouldn't I pay less money if sunshine is very likely anyway, and more if sunshine is unlikely? On the other hand, I believe (but am uncertain) the utility of a proposition being true moves towards 0 as its probability rises. (Which would correctly predict that I pay less for sunshine when it is likely anyway.) But I notice I don't have a real understanding of why or in which sense this happens! Specifically, we know that tautologies have utility 0, but I don't even see how to prove how it follows that all propositions with probability 1 (even non-tautologies) have utility 0. Jeffrey says it as if it's obvious, but he doesn't actually give a proof. And then, more generally, it also isn't clear to me why the utility of a proposition would move towards 0 as its probability moves towards 1, if that's the case. I notice I'm still far from having a good level of understanding of (Jeffrey's) utility theory...

Anyway, someone should do a writeup of our findings, right? :)

Sure, I've found it to be an interesting framework to think in so I suppose someone else might too. You're the one who's done the heavy lifting so far so I'll let you have an executive role.

If you want me to write up a first draft I can probably do it end of next week. I'm a bit busy for at least the next few days.

1cubefox9mo
I think I will write a somewhat longer post as a full introduction to Jeffrey-style utility theory. But I'm still not quite sure on some things. For example, Bradley suggests that we can also interpret the utility of some proposition as the maximum amount of money we would pay (to God, say) to make it true. But I'm not sure whether that money would rather track expected utility (probability times utility) -- or not. Generally the interpretation of expected utility versus the interpretation of utility is not yet quite clear to me, yet. Have to think a bit more about it...

Lol. Somehow made it more clear that it was meant as a hyperbole than did.

You might want to consider cross-posting this to EA forum to reach a larger audience.

2Lao Mein9mo
Thanks for reminding me. I did, and it's under moderator review.

I've been thinking about the Eliezer's take on the Second Law of Thermodynamics and while I can't think of a succint comment to drop with it. I think it could bring value to this discussion.

Well I'd say that the difference between your expectations of the future having lived a variant of it or not is only in degree not in kind. Therefore I think there are situations where the needs of the many can outweigh the needs of the one, even under uncertainty. But, I understand that not everyone would agree.

I agree with as a sufficient criteria to only sum over , the other steps I'll have to think about before I get them.


I found this newer paper https://personal.lse.ac.uk/bradleyr/pdf/Unification.pdf and having skimmed it seemed like it had similar premises but they defined (instead of deriving it).

2cubefox9mo
Thanks for the Bradley reference. He does indeed work in Jeffrey's framework. On conditional utility ("conditional desirability", in Jeffrey terminology) Bradley references another paper from 1999 [https://doi.org/10.1023/A:1004977019944] where he goes into a bit more detail on the motivation: (With DesXY he means U(X∧Y).) I also found a more recent (2017) book [https://www.cambridge.org/core/books/decision-theory-with-a-human-face/D3670FE43E561F415EB416675E1D5272] from him, where he defines U(A|B):=U(A∧B)−U(B) and where he uses the probability axioms, Jeffrey's desirability axiom, and U(⊤)=0 as axioms. So pretty much the same way we did here. So yeah, I think that settles conditional utility. In the book Bradley has also some other interesting discussions, such as this one: Anyway, someone should do a writeup of our findings, right? :)

GovAI is probably one of the densest places to find that. You could also check out FHI's AI Governance group.

There is no consensus about what constitutes a moral patient and I have seen nothing convincing to rule out that an AGI could be a moral patient.

However, when it comes to AGI some extreme measures are needed.

I'll try with an analogy. Suppose that you traveled back in time to Berlin 1933. Hitler has yet to do anything significantly bad but you still expect his action to have some really bad consequences.

Now I guess that most wouldn't feel terribly conflicted about removing Hitler's right of privacy or even life to prevent Holocaust.

For a longtermist the ris... (read more)

2Paul Tiplady9mo
Thanks, this is what I was looking for: Mind Crime. [https://www.lesswrong.com/tag/mind-crime]As you suggested, S-Risks [https://www.lesswrong.com/tag/risks-of-astronomical-suffering-s-risks] links to some similar discussions too. I'd bite that bullet, with the information we have ex post. But I struggle to see many people getting on board with that ex ante, which is the position we'd actually be in.

Didn't you use that . I can see how to extend the derivation for more steps but only if . The sums

and

for arbitrary are equal if and only if .

The other alternative I see is if (and I'm unsure about this) we assume that and for .


What I would think that would mean is after we've updated probabilities and utilities from the fact that is certain. I think that would be the first one but I'm not sure. I can't tell which one that would be.

1cubefox9mo
Yeah, you are right. I used the fact that A↔((A∧B)∨(A∧¬B)). This makes use of the fact that B and ¬B are both mutually exclusive and exhaustive, i.e. (B∧¬B)↔⊥ and (B∨¬B)↔⊤. For S={s1,s2}, where s1 and s2 are mutually exclusive but not exhaustive, A is not equivalent to (A∧s1)∨(A∧s2). Since A can be true without either of s1 or s2 being true. It should however work if P(A↔(s1∨s2))=1, since then P((A∧s1)∨(A∧s2))=1. So for U(A)=∑s∈S(P(s|A)U(s∧A)) to hold, S would have to be a "partition" of A, exhaustively enumerating all the incompatible ways it can be true. -------------------------------------------------------------------------------- Regarding conditional utility, I agree. This would mean that U(A∧B)=U(A|B) if P(B)=1. I found an old paper [https://www.jstor.org/stable/41797540] by a someone who analyzes conditional utility in detail, though with zero citations according to Google scholar. Unfortunately the paper is hard to read because of eccentric notation, and since the author, an economist, was apparently only aware of Savage's more complicated utility theory (which has acts, states of the world, and prospects), so he doesn't work in Jeffrey's simpler and more general theory. But his conclusions seem intriguing, since he e.g. also says that U(A|A)=0, despite, as far as I know, Savage not having an axiom which demands utility 0 for certainty. Unfortunately I really don't understand his notation and I'm not quite an expert on Savage either...

General (even if mutually exclusive) is tricky I'm not sure the expression is as nice then.

1cubefox10mo
But we have my result above, i.e. which does not rely on the assumption of ∑s∈S(P(s)U(s)) being equal to 0. After all, I only used the desirability axiom for the derivation, not the assumption U(⊤)=0. So we get a "nice" expression anyway as long as our disjunction is mutually exclusive. Right? (Maybe I misunderstood your point.) Regarding U(A|B), I am now no longer sure that U(A|B):=U(A∧B)−U(B) is the right definition. Maybe we instead have E[U(A|B)]:=E[U(A∧B)]−E[U(B)]. In which case it would follow that U(A|B):=P(A∧B)U(A∧B)−P(B)U(B)P(A|B). They are both compatible with U(A|A)=0, and I'm not sure which further plausible conditions would have to be met and which could decide which is the right definition.

that was one of the premises, no? You expect utility from your prior.

1cubefox10mo
Oh yes, of course! (I probably thought this was supposed to be valid for our S as well, which is assumed to be mutually exclusive, but, unlike Ω, not exhaustive.)

Some first reflections on the results before I go into examining all the steps.

Hmm, yes my expression seems wrong when I look at it a second time. I think I still confused the timesteps and should have written

The extra negation comes from a reflex from when not using Jeffrey's decision theory. With Jeffrey's decision theory it reduces to your expression as the negated terms sum to . But, still I probably should learn not to guess at theorems and properly do all steps in the future. I suppose that is a point in favor f... (read more)

1cubefox10mo
I don't understand what you mean in the beginning here, how is ∑ω∈Ω(P(ω|a)U(ω∧a)−P(ω)U(ω)) the same as ∑ω∈Ω(P(ω|a)U(ω∧a))?

Ah, those timestep subscripts are just what I was missing. I hadn't realised how much I needed that grounding until I noticed how good it felt when I saw them.

So to summarise (below all sets have mutually exclusive members). In Jeffrey-ish notation we say have the axiom

and normally you would want to indicate what distribution you have over in the left-hand side. However, we always renormalize such that the distribution is our current prior. We can indicate this by labeling the utilities from what timestep (and agent should probabl... (read more)

1cubefox10mo
Regarding the time stamp: Yeah, this is the right way to think about it, at least in the case of subjective utility theory, where utilities represent desires, and probabilities represent beliefs, and it also the right way to think about for Bayesianism (subjective probability theory). U and P only represent the subjective state of an agent at a particular point in time. They don't say anything how they should be changed over time. They only say that at any point in time, these functions (the agents) should satisfy the axioms. Rules for change over time would need separate assumptions. In Bayesian probability theory this is usually the rule of classical conditionalization or the more general rule of Jeffrey conditionalization. (Bayes' theorem alone doesn't say anything about updating. Bayes' rule = classical conditionalization + Bayes' theorem) Regarding the utility of a, you write the probability part in the sum is P(ω|a)−P(ω). But it is actually just P(ω|a)! To see this, start with the desirability axiom: U(A∨B)=P(A)U(A)+P(B)U(B)P(A)+P(B) This doesn't tell us how to calculate U(A), only U(A∨B). But we can write A as the logically equivalent (A∧B)∨(A∧¬B)). This is a disjunction, so we can apply the desirability axiom: U(A)=U((A∧B)∨(A∧¬B))=P(A∧B)U(A∧B)+P(A∧¬B)U(A∧¬B)P(A∧B)+P(A∧¬B) This is equal to U(A)=P(A∧B)U(A∧B)+P(A∧¬B)U(A∧¬B)P(A). Since P(A∧B)P(A)=P(B|A), we have U(A)=P(B|A)U(A∧B)+P(¬B|A)U(A∧¬B). Since A was chosen arbitrarily, it can be any proposition whatsoever. And since in Jeffrey's framework we only consider propositions, all actions are also described by propositions. Presumably of the form "I now do x". Hence, U(a)=P(B|a)U(a∧B)+P(¬B|a)U(a∧¬B) for any B. This proof could also be extended to longer disjunctions between mutually exclusive propositions apart from B and ¬B. Hence, for a set S of mutually exclusive propositions s, U(a)=∑s∈SP(s|a)U(a∧s). The set Ω, the "set of all outcomes", is a special case of S where the mutually exclusive elements ω of Ω
1cubefox10mo
Interesting! I have a few remarks, but my reply will have to wait a few days as I have to finish something.

Well, deciding to do action would also make it utility 0 (edit: or close enough considering remaining uncertainties) even before it is done. At least if you're committed to the action and then you could just as well consider the decision to be the same as the action.

It would mean that a "perfect" utility maximizer always does the action with utility (edit: but the decision can have positive utility(?)). Which isn't a problem in any way except that it is alien to how I usually think about utility.

Put in another way. While I'm thinking about which possib... (read more)

2cubefox10mo
The way I think about it: The utility maximizer looks for the available action with the highest utility and only then decides to do that action. A decision is the event of setting the probability of the action to 1, and, because of that, its utility to 0. It's not that an agent decides for an action (sets it to probability 1) because it has utility 0. That would be backwards. There seems to be some temporal dimension involved, some "updating" of utilities. Similar to how assuming the principle of conditionalization Pt2(H)=Pt1(H|E) formalizes classical Bayesian updating when something is observed. It sets Pt2(H) to a new value, and (or because?) it sets Pt2(E) to 1. A rule for utility updating over time, on the other hand, would need to update both probabilities and utilities, and I'm not sure how it would have to be formalized.

Oh, I think I see what confuses me. In the subjective utility framework the expected utilities are shifted to after each Bayesian update?

So then utility of doing action to prevent a Doom is . But when action has been done then the utility scale is shifted again.

1cubefox10mo
I'm not perfectly sure what the connection with Bayesian updates is here. In general it is provable from the desirability axiom that U(a)=P(Doom|a)U(Doom∧a)+P(¬Doom|a)U(¬Doom∧a). This is because any A (e.g. a) is logically equivalent to (A∧B)∨(A∧¬B) for any B (e.g. Doom), which also leads to the "law of total probability". Then we have a disjunction which we can use with the desirability axiom. The denominator cancels out and gives us P(Doom|a) in the nominator instead of P(Doom∧a), which is very convenient because we presumably don't know the prior probability of an action P(a). After all, we want to figure out whether we should do a (= make P(a)=1) by calculating U(a) first. It is also interesting to note that a utility maximizer (an instrumentally rational agent) indeed chooses the actions with the highest utility, not the actions with the highest expected utility, as is sometimes claimed. Yes, after you do an action you become certain you have done it; its probability becomes 1 and its utility 0. But I don't see that as counterintuitive, since "Doing it again", or "continuing to do it" would be a different action which has not utility 0. Is that what you meant?

Ok, so this is a lot to take in, but I'll give you my first takes as a start.

My only disagreement prior to your previous comment seems to be in the legibility of the desirability axiom for which I think should contain some reference to the actual probabilities of and .

Now, I gather that this disagreement probably originates from the fact that I defined while in your framework .

Something that appears problematic to me is if we consider the tautology (in Jeffrey notation) . This would mea... (read more)

1Viktor Rehnberg10mo
Oh, I think I see what confuses me. In the subjective utility framework the expected utilities are shifted to 0 after each Bayesian update? So then utility of doing action a to prevent a Doom is (P(Doom|a)−P(Doom))U(Doom)+(P(¬Doom|a)−P(¬Doom))U(¬Doom). But when action a has been done then the utility scale is shifted again.

What I found confusing with was that to me this reads as which should always(?) depend on but with this notation it is hidden to me. (Here I picked as the mutually exclusive event , but I don't think it should remove much from the point).

That is also why I want some way of expressing that in the notation. I could imagine writing as that is the cleanest way I can come up with to satisfy both of us. Then with expected utility .

When we accept the expected utility hypothesis then we can always write it as a e... (read more)

2cubefox10mo
Well, the "expected value" of something is just the value multiplied by its probability. It follows that, if the thing in question has probability 1, its value is equal to the expected value. Since A∨¬A is a tautology, it is clear that E[U(A∨¬A)]=P(A∨¬A)U(A∨¬A)=U(A∨¬A). Yes, this fact is independent of P(A), but this shouldn't be surprising I think. After all, we are talking about the utility of a tautology here, not about the utility of A itself! In general, P(A∨B) is usually not 1 (A and B are only presumed to be mutually exclusive, not necessarily exhaustive), so its utility and expected utility can diverge. In fact, in his book "The Logic of Decision" Richard Jeffrey proposed for his utility theory that the utility of any tautology is zero: U(⊤)=0. This should make sense, since learning a tautology has no value for us, neither positive not negative. This assumption also has other interesting consequences. Consider his "desirability axiom", which he adds to the usual axioms of probability to obtain his utility theory: If A and B are mutually exclusive, then U(A∨B)=P(A)U(A)+P(B)U(B)P(A)+P(B). (Alternatively, this axiom is provable from the expected utility hypothesis I posted a few days ago, by dividing both sides of the equation by P(A∨B)=P(A)+P(B).) If we combine this axiom with the assumption U(⊤)=0 (tautologies have utility zero), it is provable that if P(A)=1 then U(A)=0. Jeffrey explains this as follows: Interpreting utility subjectively as degree of desire, we can only desire things we don't have, or more precisely, things we are not certain are true. If something is certain, the desire for it is already satisfied, for better or for worse. Another way to look at it is that the "news value" of a certain proposition is zero. If the utility of a proposition is how good or bad it would be if we learned that it is true, then learning a certain proposition doesn't have any value, positive or negative, since we knew it all along. So it should be assigned the v

Hmm, I usually don't think too deeply about the theory so I had to refresh somethings to answer this.

First off, the expected utility hypothesis is apparently implied by the VNM axioms. So that is not something needed to add on. To be honest I usually only think of a coherent preference ordering and expected utilities as two seperate things and hadn't realized that VNM combines them.

About notation, with I mean the utility of getting with certainty and with I mean the utility of getting with probability . If you don't have the expected utility h... (read more)

1cubefox10mo
Ah, thanks. I still find this strange, since in your case A and ω are events, which can be assigned specific probabilities and utilities, while X is apparently a random variable. A random variable is, as far as I understand, basically a set of mutually exclusive and exhaustive events. E.g. X = The weather tomorrow = {good, neutral, bad}. Each of those events can be assigned a probability (and they must sum to 1, since they are mutually exclusive and exhaustive) and a utility. So it seems it doesn't make sense to assign X itself a utility (or a probability). But I might be just confused here... Edit: It would make more sense, and in fact agree with the formula I posted in my last comment, if a random variable X would correspond to an event that is the disjunction of its possible values. E.g. X = weather will be good or neutral or bad. In which case the probability of a random variable will be always 1, such that the expected utility of the disjunction is just its utility, and my formula above is identical to yours.

Having read some of your other comments. I expect you to ask if the top preference of a thermostat is it's goal temperature? And to this I have no good answer.

For things like a thermostat and a toy robot you can obviously see that there is a behavioral objective which we could use to infer preferences. But, is the reason that thermostats are not included in utility calculations that behavioral objective does not actually map to a preference ordering or that their weight when aggregated is 0.

Perhaps for most they don't have this in the back of their mind when they think of utility. But, for me this is what I'm thinking about. The aggregation is still confusing to me, but as a simple case example. If I want to maximise total utility and am in a situation that only impacts a single entity then increasing utility is the same to me as getting this entity in for them more preferable states.

2Viktor Rehnberg10mo
Having read some of your other comments. I expect you to ask if the top preference of a thermostat is it's goal temperature? And to this I have no good answer. For things like a thermostat and a toy robot you can obviously see that there is a behavioral objective which we could use to infer preferences. But, is the reason that thermostats are not included in utility calculations that behavioral objective does not actually map to a preference ordering or that their weight when aggregated is 0.

Expected utility hypothesis is that . To make it more concrete suppose that for outcome is worth for you. Then getting with probaillity is worth . This is not necessarily true, there could be an entity that prefers outcomes comparatively more if they are probable/improbable. The name comes from the fact that if you assume it to be true you can simply take expectations of utils and be fine. I find it very agreeable for me.

2cubefox10mo
I'm probably missing something here, but how is U(pA) a defined expression? I thought U takes as inputs events or outcomes or something like that, not a real number like something which could be multiplied with p? It seems you treat A not as an event but as some kind of number? (I get pU(A) of course, since U returns a real number.) The thing I would have associated with "expected utility hypothesis": If A and B are mutually exclusive, then E[U(A∨B)]=P(A∨B)U(A∨B)=P(A)U(A)+P(B)U(B).

You could perhaps argue that "preference" is a human concept. You could extend it with something like coherent extrapolated volition to be what the entity would prefer if it knew all that was relevant, had all the time needed to think about it and was more coherent. But, in the end if something has no preference, then it would be best to leave it out of the aggregation.

Utility when it comes to a single entity is simply about preferences.

The entity should have

  1. For any two outcomes/states of the world the entity should prefer one over the other or consider them equally preferable
  2. The entity should be coherent in its preferences such that if it preferes to and to , then the entity prefers to
  3. When it comes to probabilities, if the entity prefers to then the entity prefers with probability to with probability all else equal. Furthermore, there exist a probability such that and is equally p
... (read more)
1cubefox10mo
Could you explain the "expected utility hypothesis"? Where does this formula come from? Very intriguing!
1SurvivalBias10mo
So utility theory is a useful tool, but as far as I understand it's not directly used as a source of moral guidance (although I assume once you have some other source you can use utility theory to maximize it). Whereas utilitarianism as a metaethics school is concerned exactly with that, and you can hear people in EA talking about "maximizing utility" as the end in and of itself all the time. It was in this latter sense that I was asking.
2Viktor Rehnberg10mo
You could perhaps argue that "preference" is a human concept. You could extend it with something like coherent extrapolated volition [https://arbital.com/p/cev/] to be what the entity would prefer if it knew all that was relevant, had all the time needed to think about it and was more coherent. But, in the end if something has no preference, then it would be best to leave it out of the aggregation.

Could someone who disagrees with the above statement help me by clarifying what the disagreement is?

Seeing as it has -7 on the agreement vote and that makes me think it should be obvious but it isn't to me.

Due to this, he concludes the cause area is one of the most important LT problems and primarily advises focusing on other risks due to neglectedness.

This sentence is confusing me, should I read it as:

  1. Due to this, he concludes the cause area is one of the most important LT problems but primarily advises focusing on other risks anyway due to neglectedness.
  2. Due to this, he concludes the cause area is not one of the most important LT problems and primarily advises focusing on other risks due to neglectedness.

From this summary of the summary I get the th... (read more)

3Zoe Williams1y
Good point, thank you - I've had a re-read of the conclusion and replaced the sentence with "Due to this, he concludes that climate change is still an important LT area - though not as important as some other global catastrophic risks (eg. biorisk), which outsize on both neglectedness and scale." Originally I think I'd mistaken his position a bit based on this sentence: "Overall, because other global catastrophic risks are so much more neglected than climate change, I think they are more pressing to work on, on the margin." (and in addition I hadn't used the clearest phrasing)  But the wider conclusion fits the new sentence better.

I agree with 1 (but then it is called alignment forum, not the more general AI Safety forum). But I don't see that 2 would do much good.

All narratives I can think of where 2 plays a significant part sounds like strawmen to me, perhaps you could help me?

1Koen.Holtman1y
Not sure what makes you think 'strawmen' at 2, but I can try to unpack this more for you. Many warnings about unaligned AI start with the observation that it is a very bad idea to put some naively constructed reward function, like 'maximize paper clip production', into a sufficiently powerful AI. Nowadays on this forum, this is often called the 'outer alignment' [https://www.lesswrong.com/tag/outer-alignment] problem. If you are truly worried about this problem and its impact on human survival, then it follows that you should be interested in doing the Hard Thing of helping people all over the world write less naively constructed reward functions to put into their future AIs. John writes: This pattern of outsourcing the Hard Part to the AI is definitely on display when it comes to 2 above. Academic AI/ML research also tends to ignore this Hard Part entirely, and implicitely outsources it to applied AI researchers, or even to the end users.

I suppose I would just like to see more people start at an earlier level and from that vantage point you might actually want to switch to a path with easier parts.

There’s something very interesting in this graph. The three groups have completely converged by the end of the 180 day period, but the average bank balance is now considerably higher.

Wasn't the groups selected for having currently low income? Shouldn't we expect some regression towards the mean i.e. an increase in average bank balance? Was there any indication for if the observed effect was larger or smaller than expected?

2ADifferentAnonymous1y
Note that after day 120 or so, all three groups' balances decline together. Not sure what that's about.

Tackle the [Hamming Problems](https://www.lesswrong.com/posts/Thwfy4gNFx9kHgvov/research-hamming-questions), Don't Avoid Them

I agree with that statement and this statement

Far and away the most common failure mode among self-identifying alignment researchers is to look for Clever Ways To Avoid Doing Hard Things [...]

seems true as well. However, there was something in this section that didn't seem quite right to me.

Say that you have identified the Hamming Problem at lowest resolution be getting the outcome "AI doesn't cause extinction or worse". However, if ... (read more)

2johnswentworth1y
This is exactly right, and those are the things which I would call Hamming Problems or the Hard Parts.

I don't think an "actual distribution" over the activations is a thing? The distribution depends on what inputs you feed it.

This seems to be what Thomas is saying as well, no?

[...] look at the network activations at each layer for a bunch of different inputs. This gives you a bunch of activations sampled from the distribution of activations. From there, you can do density estimation to estimate the actual distribution over the activations.

The same way you can talk about the actual training distribution underlying the samples in the training set it should b... (read more)

1Lucius Bushnaq1y
Thanks for clarifying for me, see the edit in the parent comment.

It seems like it could be worthwhile for you to contact someone in connection to AGI Safety Communications Initiative. Or at the very least check out the post I linked.

3Darren McKee1y
Yes, thank you. I shall.  I should probably also cross-post to the EA Forum.

Other that I find worth mentioning are channels for opportunities at getting started in AI Safety. I know both AGI Safety Fundamentals and AI Safety Camp have slack channels for participants. Invitation needed and you probably need to be a participant to get invited.

There is also a 80000 hours Google group for technical AI safety. Invitation is needed, I can't find that they've broadcasted how to get in so I won't share it. But, they mention it on their website so I guess it is okay to include it here.

I've also heard about research groups in AI safety havi... (read more)

To me it seems that the disagreement around this question comes from thinking of different questions.

Has DL produced a significant amount of economic value?

Yes, and I think this has been quite established already. It is still possible to argue about what is meant by significant but I think that disagreement is probably coming better resolved by asking a different question.

(I can imagine that many of these examples technically exist, but not at the level that I mean).

From this and some comments, I think there is a confusion that would be better resolved by asking:

Why don't DL in consumer products seem as amazing as what I see from presentations of research results?

I have not seen anyone do something like this but it sounds like something Anders Sandberg (FHI) would do. If you want a lead or want to find someone that might be interested in researching it, he might be it.

I haven't followed your arguments all the way here but I saw the comment

If I am understanding correctly, you are saying if the sleeping beauty problem does not use a coin toss, but measures the spin of an election instead, then the answer would be different.

and would just jump in and say that others have made a similar arguments. The one written example I've seen is this Master's Thesis

I'm not sure if I'm convinced but at least I buy that depending on how the particular selection goes about there can be instances were difference between probabilitie... (read more)

1dadadarren1y
The link point back to this post. But I also remember reading similar arguments from halfer before, that the answer changes depending on if it is true quantum randomness, could not remember the source though. But the problem remains the same: can Halfers keep the probability of a coin yet to be tossed at 1/2, and remain Bayesian. Michael Titelbaum showed it cannot be true as long as the probability of "Today is Tuesday" is valid and non-zero. If Lewisian Halfer argues that, unlike true quantum randomness, a coin yet to be tossed can have a probability differing from half, such that they can endorse self-locating probability and remains Bayesian. Then the question can simply be changed to using quantum measurements (or quantum coin for ease of expression). Then Lewisian Halfers faces the counter-argument again: either the probability is 1/2 at waking up and remains at 1/2 after learning it is Monday, therefore non-Bayesian. Or the probability is indeed 1/3 and updates to 1/2 after learning it is Monday, therefore non-halving. The latter effectively says SSA is only correct in non-quantum events and SIA is correct only for quantum events. But differentiating the cases between quantum and non-quantum events is no easy job. A detailed analysis of a simple coin toss result can lead to many independent physical causes, which can very well depend on quantum randomness. What shall we do in these cases? It is a very assumption-heavy argument for an initially simple Halfer answer.  Edit: Just gave the linked thesis a quick read. The writer seems to be partial to MWI and thinks it gives a more logical explanation to anthropic questions. He is not keen on the notion of treating probability/chance as that a randomly possible world becomes actualized, but considers all possible worlds ARE real (many-worlds), that the source of probability (or "the illusion of probability" as the writer says) is from which branch-world "I" am in. My problem with that is the "I" in such statement

An area where I think there is an important difference between doing explicit search and optimisation through piles of heuristics is in clistering NN à la Filan et al. (link TBD).

A usecase I've been thinking about is to use that kind of technique to help identify mesaoptimisation or more particularly mesaobjectives (with the help of interpretability tools guided by the clustering of the NN).

In the case of explicit search I would expect that it would be more common than not to be able to find a specific part of the network evaluating world states in terms o... (read more)

Do you have a poster that can be put up on campuses to spread the information?

3Aris1y
 Yes! Apologies, none of my links carried over to this post, so I'll edit them in now. The poster will be linked in the final notes section. 

Oh, I hadn't noticed that. I've got some connections to them and can reach out.

Olle Häggstöm had three two hour lectures on AI Safety earlier this spring. Original description was

This six-hour lecture series will treat basics and recent developments in AI risk and long-term AI safety. The lectures are meant to be of interest to Ph.D. students and researchers in AI-related fields, but no particular prerequisites will be assumed.

 Lecture 1, Lecture 2 and Lecture 3. Perhaps you can find something there,  I expect he would be happy to help if you reach out to him.

3[anonymous]1y
any chance you have contact with the people who uploaded that? I suspect the reason I hadn't seen it is that it is marked for being for kids. because of that I can't add it to a playlist. I'm also going to attempt to contact them directly about this.
1Aryeh Englander1y
Thanks, looks useful!
Load More