Cool! It wrote and executed code to solve the problem, and it got it right.
Are you using chat-GPT-4? I thought it can't run code?
Interesting, I find what you are saying here broadly plausible, and it is updating me (at least toward greater uncertainity/confusion). I notice that I don't expect the 10x effect, or the Von Neumann effect, to be anywhere close to purely genetic. Maybe some path-dependency in learning? But my intuition (of unknown quality) is that there should be some software tweaks which make the high end of this more reliably achievable.
Anyway, to check that I understand your position, would this be a fair dialogue?:
...Person: "The jump from chimps to hu
In your view, who would contribute more to science -- 1000 Einsteins, or 10,000 average scientists?[1]
"IQ variation is due to continuous introduction of bad mutations" is an interesting hypothesis, and definitely helps save your theory. But there are many other candidates, like "slow fixation of positive mutations" and "fitness tradeoffs[2]".
Do you have specific evidence for either:
Or do you believe these things just because ...
It would still be interesting to know whether you were surprised by GPT-4's capabilities (if you have played with it enough to have a good take)
POV: I'm in an ancestral environment, and I (somehow) only care about the rewarding feeling of eating bread. I only care about the nice feeling which comes from having sex, or watching the birth of my son, or being gaining power in the tribe. I don't care about the real-world status of my actual son, although I might have strictly instrumental heuristics about e.g. how to keep him safe and well-fed in certain situations, as cognitive shortcuts for getting reward (but not as terminal values).
Would such a person sacrifice themselves for their children (in situations where doing so would be a fitness advantage)?
Isn't going from an average human to Einstein a huge increase in science-productivity, without any flop increase? Then why can't there be software-driven foom, by going farther in whatever direction Einstein's brain is from the average human?
Of course, my argument doesn't pin down the nature or rate of software-driven takeoff, or whether there is some ceiling. Just that the "efficiency" arguments don't seem to rule it out, and that there's no reason to believe that science-per-flop has a ceiling near the level of top humans.
You could use all of world energy output to have a few billion human speed AGI, or a millions that think 1000x faster, etc.
Isn't it insanely transformative to have millions of human-level AIs which think 1000x faster?? The difference between top scientists and average humans seems to be something like "software" (Einstein isn't using 2x the watts or neurons). So then it should be totally possible for each of the "millions of human-level AIs" to be equivalent to Einstein. Couldn't a million Einstein-level scientists running at 1000x speed ...
In your view, is it possible to make something which is superhuman (i.e. scaled beyond human level), if you are willing to spend a lot on energy, compute, engineering cost, etc?
Any idea why "cheese Euclidean distance to top-right corner" is so important? It's surprising to me because the convolutional layers should apply the same filter everywhere.
See Godel's incompleteness theorems. For example, consider the statement "For all A, (ZFC proves A) implies A", encoded into a form judgeable by ZFC itself. If you believe ZFC to be sound, then you believe that this statement is true, but due to Godel stuff you must also believe that ZFC cannot prove it. The reasons for believing ZFC to be sound are reasons from "outside the system" like "it looks logically sound based on common sense", "it's never failed in practice", and "no-one's found a valid issue". Godel's theorems let us conv...
Agreed. To give a concrete toy example: Suppose that Luigi always outputs "A", and Waluigi is {50% A, 50% B}. If the prior is {50% luigi, 50% waluigi}, each "A" outputted is a 2:1 update towards Luigi. The probability of "B" keeps dropping, and the probability of ever seeing a "B" asymptotes to 50% (as it must).
This is the case for perfect predictors, but there could be some argument about particular kinds of imperfect predictors which supports the claim in the post.
Context windows could make the claim from the post correct. Since the simulator can only consider a bounded amount of evidence at once, its P[Waluigi] has a lower bound. Meanwhile, it takes much less evidence than fits in the context window to bring its P[Luigi] down to effectively 0.
Imagine that, in your example, once Waluigi outputs B it will always continue outputting B (if he's already revealed to be Waluigi, there's no point in acting like Luigi). If there's a context window of 10, then the simulator's probability of Waluigi never goes below 1/1025, w...
In section 3.7 of the paper, it seems like the descriptions ("6 in 5", etc) are inconsistent across the image, the caption, and the paragraph before them. What are the correct labels? (And maybe fix the paper if these are typos?)
It would help if you specified which subset of "the community" you're arguing against. I had a similar reaction to your comment as Daniel did, since in my circles (AI safety researchers in Berkeley), governance tends to be well-respected, and I'd be shocked to encounter the sentiment that working for OpenAI is a "betrayal of allegiance to 'the community'".
To be clear, I do think most people who have historically worked on "alignment" at OpenAI have probably caused great harm! And I do think I am broadly in favor of stronger community norms against working at AI capability companies, even in so called "safety positions". So I do think there is something to the sentiment that Critch is describing.
In ML terms, nearly-all the informational work of learning what “apple” means must be performed by unsupervised learning, not supervised learning. Otherwise the number of examples required would be far too large to match toddlers’ actual performance.
I'd guess the vast majority of the work (relative to the max-entropy baseline) is done by the inductive bias.
Beware, though; string theory may be what underlies QFT and GR, and it describes a world of stringy objects that actually do move through space
I think this contrast is wrong.[1] IIRC, strings have the same status in string theory that particles do in QFT. In QM, a wavefunction assigns a complex number to each point in configuration space, where state space has an axis for each property of each particle.[2] So, for instance, a system with 4 particles with only position and momentum will have a 12-dimensional configuration space.[3] I...
As I understand Vivek's framework, human value shards explain away the need to posit alignment to an idealized utility function. A person is not a bunch of crude-sounding subshards (e.g. "If
food nearbyandhunger>15, then be more likely togo to food") and then also a sophisticated utility function (e.g. something like CEV). It's shards all the way down, and all the way up.[10]
This read to me like you were saying "In Vivek's framework, value shards explain away .." and I was confused. I now think you mean "My take on Vivek's is that value s...
"Well, what if I take the variables that I'm given in a Pearlian problem and I just forget that structure? I can just take the product of all of these variables that I'm given, and consider the space of all partitions on that product of variables that I'm given; and each one of those partitions will be its own variable.
How can a partition be a variable? Should it be "part" instead?
ETA: Koen recommends reading Counterfactual Planning in AGI Systems before (or instead of) Corrigibility with Utility Preservation
Update: I started reading your paper "Corrigibility with Utility Preservation".[1] My guess is that readers strapped for time should read {abstract, section 2, section 4} then skip to section 6. AFAICT, section 5 is just setting up the standard utility-maximization framework and defining "superintelligent" as "optimal utility maximizer".
Quick thoughts after reading less than half:
AFAICT,[2] this is a mathematica...
To be more specific about the technical problem being mostly solved: there are a bunch of papers outlining corrigibility methods that are backed up by actual mathematical correctness proofs
Can you link these papers here? No need to write anything, just links.
- Try to improve my evaluation process so that I can afford to do wider searches without taking excessive risk.
Improve it with respect to what?
My attempt at a framework where "improving one's own evaluator" and "believing in adversarial examples to one's own evaluator" make sense:
Yeah, the right column should obviously be all 20s. There must be a bug in my code[1] :/
I like to think of the argmax function as something that takes in a distribution on probability distributions on with different sigma algebras, and outputs a partial probability distribution that is defined on the set of all events that are in the sigma algebra of (and given positive probability by) one of the components.
Take the following hypothesis :

If I add this into with weight , then the middle column is still near...
Now, let's consider the following modification: Each hypothesis is no longer a distribution on , but instead a distribution on some coarser partition of . Now is still well defined
Playing around with this a bit, I notice a curious effect (ETA: the numbers here were previously wrong, fixed now):

The reason the middle column goes to zero is that hypothesis A puts 60% on the rightmost column, and hypothesis B puts 40% on the leftmost, and neither cares about the middle column specifically.
But philosophically, what d...
most egregores/epistemic networks, which I'm completely reliant upon, are much smarter than me, so that can't be right
*Egregore smiles*
Another way of looking at this question: Arithmetic rationality is shift invariant, so you don't have to know your total balance to calculate expected values of bets. Whereas for geometric rationality, you need to know where the zero point is, since it's not shift invariant.
Some results related to logarithmic utility and stock market leverage (I derived these after reading your previous post, but I think it fits better here):
Tl;dr: We can derive the optimal stock market leverage for an agent with utility logarithmic in money. We can also back-derive a utility function from any constant leverage[1], giving us a nice class of utility functions with different levels of risk-aversion. Logarithmic utility is recovered a special case, and has additional nice properties which the others may or may not have.
For an agent i...
Results on logarithmic utility and stock market leverage: https://www.lesswrong.com/posts/DMxe4XKXnjyMEAAGw/the-geometric-expectation?commentId=yuRie8APN8ibFmRJD
A framing I wrote up for a debate about "alignment tax":
If the system is modular, such that the part of the system representing the goal is separate from the part of the system optimizing the goal, then it seems plausible that we can apply some sort of regularization to the goal to discourage it from being long term.
What kind of regularization could this be? And are you imagining an AlphaZero-style system with a hardcoded value head, or an organically learned modularity?
Pr π∈ξ [U(⌈G⌉,π) ≥ U(⌈G⌉,G*)]is the probability that, for a random policyπ∈ξ, that policy has worse utility than the policyG*its program dictates; in essence, how goodG's policies are compared to random policy selection
What prior over policies?
given
g(G|U), we can infer the probability that an agentGhas a given utility functionU, asPr[U] ∝ 2^-K(U) / Pr π∈ξ [U(⌈G⌉,π) ≥ U(⌈G⌉,G*)])where∝means "is proportional to" andK(U)is the kolmogorov complexity of utility functionU.
Suppose the prior over policies is max-entropy (uniform over all action seq...
Consider this quote by Alan Watts:
There are basically two kinds of philosophy. One’s called prickles, the other’s called goo. And prickly people are precise, rigorous, logical. They like everything chopped up and clear. Goo people like it vague. For example, in physics, prickly people believe that the ultimate constituents of matter are particles. Goo people believe it’s waves.
*facepalm*
If one has a technical understanding of QFT[1] (or even half of technical understanding, like me), this sounds totally silly. There's no real question as to whet...
The directional derivative is zero, so the change is zero to first order. The second order term can exist. (Consider what happens if you move along the tangent line to a circle. The distance from the circle goes ~quadratically, since the circle looks locally parabolic.) Hence .
I have seen one person be surprised (I think twice in the same convo) about what progress had been made.
ETA: Our observations are compatible. It could be that people used to a poor and slow-moving state of interpretability are surprised by the recent uptick, but that the absolute progress over 6 years is still disappointing.
All in all, I don't think my original post held up well. I guess I was excited to pump out the concept quickly, before the dust settled. Maybe this was a mistake? Usually I make the ~opposite error of never getting around to posting things.
The perspective and the computations that are presented here (which in my opinion are representative of the mathematical parts of the linked posts and of various other unnamed posts) do not use any significant facts about neural networks or their architecture.
You're correct that the written portion of the Information Loss --> Basin flatness post doesn't use any non-trivial facts about NNs. The purpose of the written portion was to explain some mathematical groundwork, which is then used for the non-trivial claim. (I did not know at the time ...
Note that, for rational *altruists* (with nothing vastly better to do like alignment), voting can be huge on CDT grounds -- if you actually do the math for a swing state, the leverage per voter is really high. In fact, I think the logically counterfactual impact-per-voter tends to be lower than the impact calculated by CDT, if the election is very close.
[Replying to this whole thread, not just your particular comment]
"Epistemic humility" over distributions of times is pretty weird to think about, and imo generally confusing or unhelpful. There's an infinite amount of time, so there is no uniform measure. Nor, afaik, is there any convergent scale-free prior. You must use your knowledge of the world to get any distribution at all.
You can still claim that higher-entropy distributions are more "humble" w.r.t. to some improper prior. Which begs the question "Higher entropy w.r.t. what m...
When you describe the "emailing protein sequences -> nanotech" route, are you imagining an AGI with computers on which it can run code (like simulations)? Or do you claim that the AGI could design the protein sequences without writing simulations, by simply thinking about it "in its head"?