Hmm, yeah it's a bit hard to try stuff when there's no good preview. Usually I'd recommend rot13 chiffer if all else fails but for number sequences that makes less sense.
I knew about 2-4-6 problem from HPMOR, I really like the opportunity to try it out myself. These are my results on the four other problems:
Number of guesses:
8 guesses of which 3 were valid and 5 non-valid
Guess:
"A sequence of integers whose sum is non-negative"
Result: Failure
Number of guesses:
39 of which 23 were valid 16 non-valid
Guess:
"Three ordered real numbers where the absolute difference between neighbouring numbers is decreasing."
Result: Success
Number of guesses:
21 of which 15 were valid and 6 non-valid
Guess...
See FAQ for spoiler tags, it seems mods haven't seen your request. https://www.lesswrong.com/faq#How_do_I_insert_spoiler_protections_
These problems seemed to me similar to the problems at the International Physicist's Tournament. If you want more problems check out https://iptnet.info
In case anyone else is looking for a source a good search term is probably the Beal Effect. From the original paper by Beal and Smith:
Once the effect is pointed out, it does not take long to arrive at the conclusion that it arises from a natural correlation between a high branching factor in the game tree and having a winning move available. In other words, mobility (in the sense of having many moves available) is associated with better positions
Or a counterexample from the other direction would be that you can't describe a uniform distribution of the empty set either (I think). And that would feel even weirder to call "bigger".
Why would this property mean that it is "bigger"? You can construct a uniform distribution of a uncountable set through a probability density as well. However, using the same measure on a countably infinite subset of the uncountable set would show that the countable set has measure 0.
So we have that
[...] Richard Jeffrey is often said to have defended a specific one, namely the ‘news value’ conception of benefit. It is true that news value is a type of value that unambiguously satisfies the desirability axioms.
but at the same time
News value tracks desirability but does not constitute it. Moreover, it does not always track it accurately. Sometimes getting the news that X tells us more than just that X is the case because of the conditions under which we get the news.
And I can see how starting from this you would get that . ...
Skimming the methodology it seems to be a definite improvement and does tackle the short-comings mentioned in the original post to some degree at least.
Isn't that just a question whether you assume expected utility or not. In the general case it is only utility not expected utility that matters.
Anyway, someone should do a writeup of our findings, right? :)
Sure, I've found it to be an interesting framework to think in so I suppose someone else might too. You're the one who's done the heavy lifting so far so I'll let you have an executive role.
If you want me to write up a first draft I can probably do it end of next week. I'm a bit busy for at least the next few days.
I've been thinking about the Eliezer's take on the Second Law of Thermodynamics and while I can't think of a succint comment to drop with it. I think it could bring value to this discussion.
Well I'd say that the difference between your expectations of the future having lived a variant of it or not is only in degree not in kind. Therefore I think there are situations where the needs of the many can outweigh the needs of the one, even under uncertainty. But, I understand that not everyone would agree.
I agree with as a sufficient criteria to only sum over , the other steps I'll have to think about before I get them.
I found this newer paper https://personal.lse.ac.uk/bradleyr/pdf/Unification.pdf and having skimmed it seemed like it had similar premises but they defined (instead of deriving it).
GovAI is probably one of the densest places to find that. You could also check out FHI's AI Governance group.
There is no consensus about what constitutes a moral patient and I have seen nothing convincing to rule out that an AGI could be a moral patient.
However, when it comes to AGI some extreme measures are needed.
I'll try with an analogy. Suppose that you traveled back in time to Berlin 1933. Hitler has yet to do anything significantly bad but you still expect his action to have some really bad consequences.
Now I guess that most wouldn't feel terribly conflicted about removing Hitler's right of privacy or even life to prevent Holocaust.
For a longtermist the ris...
Didn't you use that . I can see how to extend the derivation for more steps but only if . The sums
and
for arbitrary are equal if and only if .
The other alternative I see is if (and I'm unsure about this) we assume that and for .
What I would think that would mean is after we've updated probabilities and utilities from the fact that is certain. I think that would be the first one but I'm not sure. I can't tell which one that would be.
Some first reflections on the results before I go into examining all the steps.
Hmm, yes my expression seems wrong when I look at it a second time. I think I still confused the timesteps and should have written
The extra negation comes from a reflex from when not using Jeffrey's decision theory. With Jeffrey's decision theory it reduces to your expression as the negated terms sum to . But, still I probably should learn not to guess at theorems and properly do all steps in the future. I suppose that is a point in favor f...
Ah, those timestep subscripts are just what I was missing. I hadn't realised how much I needed that grounding until I noticed how good it felt when I saw them.
So to summarise (below all sets have mutually exclusive members). In Jeffrey-ish notation we say have the axiom
and normally you would want to indicate what distribution you have over in the left-hand side. However, we always renormalize such that the distribution is our current prior. We can indicate this by labeling the utilities from what timestep (and agent should probabl...
Well, deciding to do action would also make it utility 0 (edit: or close enough considering remaining uncertainties) even before it is done. At least if you're committed to the action and then you could just as well consider the decision to be the same as the action.
It would mean that a "perfect" utility maximizer always does the action with utility (edit: but the decision can have positive utility(?)). Which isn't a problem in any way except that it is alien to how I usually think about utility.
Put in another way. While I'm thinking about which possib...
Oh, I think I see what confuses me. In the subjective utility framework the expected utilities are shifted to after each Bayesian update?
So then utility of doing action to prevent a Doom is . But when action has been done then the utility scale is shifted again.
Ok, so this is a lot to take in, but I'll give you my first takes as a start.
My only disagreement prior to your previous comment seems to be in the legibility of the desirability axiom for which I think should contain some reference to the actual probabilities of and .
Now, I gather that this disagreement probably originates from the fact that I defined while in your framework .
Something that appears problematic to me is if we consider the tautology (in Jeffrey notation) . This would mea...
What I found confusing with was that to me this reads as which should always(?) depend on but with this notation it is hidden to me. (Here I picked as the mutually exclusive event , but I don't think it should remove much from the point).
That is also why I want some way of expressing that in the notation. I could imagine writing as that is the cleanest way I can come up with to satisfy both of us. Then with expected utility .
When we accept the expected utility hypothesis then we can always write it as a e...
Hmm, I usually don't think too deeply about the theory so I had to refresh somethings to answer this.
First off, the expected utility hypothesis is apparently implied by the VNM axioms. So that is not something needed to add on. To be honest I usually only think of a coherent preference ordering and expected utilities as two seperate things and hadn't realized that VNM combines them.
About notation, with I mean the utility of getting with certainty and with I mean the utility of getting with probability . If you don't have the expected utility h...
Having read some of your other comments. I expect you to ask if the top preference of a thermostat is it's goal temperature? And to this I have no good answer.
For things like a thermostat and a toy robot you can obviously see that there is a behavioral objective which we could use to infer preferences. But, is the reason that thermostats are not included in utility calculations that behavioral objective does not actually map to a preference ordering or that their weight when aggregated is 0.
Perhaps for most they don't have this in the back of their mind when they think of utility. But, for me this is what I'm thinking about. The aggregation is still confusing to me, but as a simple case example. If I want to maximise total utility and am in a situation that only impacts a single entity then increasing utility is the same to me as getting this entity in for them more preferable states.
Expected utility hypothesis is that . To make it more concrete suppose that for outcome is worth for you. Then getting with probaillity is worth . This is not necessarily true, there could be an entity that prefers outcomes comparatively more if they are probable/improbable. The name comes from the fact that if you assume it to be true you can simply take expectations of utils and be fine. I find it very agreeable for me.
You could perhaps argue that "preference" is a human concept. You could extend it with something like coherent extrapolated volition to be what the entity would prefer if it knew all that was relevant, had all the time needed to think about it and was more coherent. But, in the end if something has no preference, then it would be best to leave it out of the aggregation.
Utility when it comes to a single entity is simply about preferences.
The entity should have
Could someone who disagrees with the above statement help me by clarifying what the disagreement is?
Seeing as it has -7 on the agreement vote and that makes me think it should be obvious but it isn't to me.
Due to this, he concludes the cause area is one of the most important LT problems and primarily advises focusing on other risks due to neglectedness.
This sentence is confusing me, should I read it as:
From this summary of the summary I get the th...
I agree with 1 (but then it is called alignment forum, not the more general AI Safety forum). But I don't see that 2 would do much good.
All narratives I can think of where 2 plays a significant part sounds like strawmen to me, perhaps you could help me?
I suppose I would just like to see more people start at an earlier level and from that vantage point you might actually want to switch to a path with easier parts.
There’s something very interesting in this graph. The three groups have completely converged by the end of the 180 day period, but the average bank balance is now considerably higher.
Wasn't the groups selected for having currently low income? Shouldn't we expect some regression towards the mean i.e. an increase in average bank balance? Was there any indication for if the observed effect was larger or smaller than expected?
Tackle the [Hamming Problems](https://www.lesswrong.com/posts/Thwfy4gNFx9kHgvov/research-hamming-questions), Don't Avoid Them
I agree with that statement and this statement
Far and away the most common failure mode among self-identifying alignment researchers is to look for Clever Ways To Avoid Doing Hard Things [...]
seems true as well. However, there was something in this section that didn't seem quite right to me.
Say that you have identified the Hamming Problem at lowest resolution be getting the outcome "AI doesn't cause extinction or worse". However, if ...
I don't think an "actual distribution" over the activations is a thing? The distribution depends on what inputs you feed it.
This seems to be what Thomas is saying as well, no?
[...] look at the network activations at each layer for a bunch of different inputs. This gives you a bunch of activations sampled from the distribution of activations. From there, you can do density estimation to estimate the actual distribution over the activations.
The same way you can talk about the actual training distribution underlying the samples in the training set it should b...
It seems like it could be worthwhile for you to contact someone in connection to AGI Safety Communications Initiative. Or at the very least check out the post I linked.
Other that I find worth mentioning are channels for opportunities at getting started in AI Safety. I know both AGI Safety Fundamentals and AI Safety Camp have slack channels for participants. Invitation needed and you probably need to be a participant to get invited.
There is also a 80000 hours Google group for technical AI safety. Invitation is needed, I can't find that they've broadcasted how to get in so I won't share it. But, they mention it on their website so I guess it is okay to include it here.
I've also heard about research groups in AI safety havi...
To me it seems that the disagreement around this question comes from thinking of different questions.
Has DL produced a significant amount of economic value?
Yes, and I think this has been quite established already. It is still possible to argue about what is meant by significant but I think that disagreement is probably coming better resolved by asking a different question.
(I can imagine that many of these examples technically exist, but not at the level that I mean).
From this and some comments, I think there is a confusion that would be better resolved by asking:
Why don't DL in consumer products seem as amazing as what I see from presentations of research results?
I have not seen anyone do something like this but it sounds like something Anders Sandberg (FHI) would do. If you want a lead or want to find someone that might be interested in researching it, he might be it.
I haven't followed your arguments all the way here but I saw the comment
If I am understanding correctly, you are saying if the sleeping beauty problem does not use a coin toss, but measures the spin of an election instead, then the answer would be different.
and would just jump in and say that others have made a similar arguments. The one written example I've seen is this Master's Thesis.
I'm not sure if I'm convinced but at least I buy that depending on how the particular selection goes about there can be instances were difference between probabilitie...
An area where I think there is an important difference between doing explicit search and optimisation through piles of heuristics is in clistering NN à la Filan et al. (link TBD).
A usecase I've been thinking about is to use that kind of technique to help identify mesaoptimisation or more particularly mesaobjectives (with the help of interpretability tools guided by the clustering of the NN).
In the case of explicit search I would expect that it would be more common than not to be able to find a specific part of the network evaluating world states in terms o...
Olle Häggstöm had three two hour lectures on AI Safety earlier this spring. Original description was
This six-hour lecture series will treat basics and recent developments in AI risk and long-term AI safety. The lectures are meant to be of interest to Ph.D. students and researchers in AI-related fields, but no particular prerequisites will be assumed.
Lecture 1, Lecture 2 and Lecture 3. Perhaps you can find something there, I expect he would be happy to help if you reach out to him.
Shouldn't this be triplet birthrates? Twin birthrates look pretty stable in comparison.