All of johnswentworth's Comments + Replies

Shulman and Yudkowsky on AI progress

I read this line:

I also correctly bet that vaccine approval and deployment would be historically unprecedently fast and successful due to the high demand).

... and I was like "there is no way in hell that this was unprecedentedly fast". The first likely-counterexample which sprang to mind was the 1957 influenza pandemic, so I looked it up. The timeline goes roughly like this:

The first cases were reported in Guizhou of southern China, in 1956[6][7] or in early 1957.[1][3][8][9] They were soon reported in the neighbouring province of Yunnan in late February o

... (read more)

My understanding is that the correct line is something like, "The COVID-19 vaccines were developed and approved unprecedentedly fast, excluding influenza vaccines." If you want to find examples of short vaccine development, you don't need to go all the way back to the 1957 influenza pandemic. For the 2009 Swine flu pandemic,

Analysis of the genetic divergence of the virus in samples from different cases indicated that the virus jumped to humans in 2008, probably after June, and not later than the end of November,[38] likely around September 2008... By 19 No

... (read more)
Christiano, Cotra, and Yudkowsky on AI progress

I'm not particularly enthusiastic about betting at 75%, that seems like it's already in the right ballpark for where the probability should be. So I guess we've successfully Aumann agreed on that particular prediction.

Finding the Central Limit Theorem in Bayes' rule

[Don't feel like you have to answer these - they're more just me following up on thoughts I got from your comment]

I accept your affordance, and thank you, this will make me more likely to comment on your posts in the future.

Hypotheses about Finding Knowledge and One-Shot Causal Entanglements

Meta commentary: this post is a great example of how to do the very earliest stages of conceptual research. Well done.

Finding the Central Limit Theorem in Bayes' rule

You've got a solid talent for math research.

Your reasoning here is basically correct; this is why Laplace' approximation typically works very well on large datasets. One big catch is that it requires the number of data points be large relative to the dimension of the variables. The real world is decidedly high dimensional, so in practice the conditions for Gausianity usually happen when we pick some small set of "features" to focus on and then get a bunch of data on those (e.g. as is typically done in academic statistics).

There's also another more subtle c... (read more)

3Maxwell Peterson6dThank you! I hadn't actually heard of Laplace's approximation - definitely relevant! The catch about the dimension is a good one. In the large causal model, is the issue just that * there is one multiplication per variable * some dependence chains don't have very many variables in them * in those few-variable chains, we might not get enough multiplications to converge? If that is the issue, weird nasty operations occur to me, like breaking variables up into sub and sub-sub variables, to get more multiplications, which might get more Gaussian. (For example, splitting the node "Maxwell finishes writing this comment" into "his computer doesn't run out of battery" and "the police don't suddenly bust into his apartment"). Whether or not it's worth doing, I wonder - would this actually work to make things more Gaussian? Or is there some... conservation of convergence... that makes it so you can't get closer to Gaussian by splitting variables up? [Don't feel like you have to answer these - they're more just me following up on thoughts I got from your comment].
Christiano, Cotra, and Yudkowsky on AI progress

My understanding is that Sputnik was a big discontinuous jump in "distance which a payload (i.e. nuclear bomb) can be delivered" (or at least it was a conclusive proof-of-concept of a discontinuous jump in that metric). That metric was presumably under heavy optimization pressure at the time, and was the main reason for strategic interest in Sputnik, so it lines up very well with the preconditions for the continuous view.

3Vaniver8dSo it looks like the R-7 (which launched Sputnik) was the first ICBM, and the range is way longer than the V-2s of ~15 years earlier, but I'm not easily finding a graph of range over those intervening years. (And the R-7 range is only about double the range of a WW2-era bomber, which further smooths the overall graph.) [And, implicitly, the reason we care about ICBMs is because the US and the USSR were on different continents; if the distance between their major centers was comparable to England and France's distance instead, then the same strategic considerations would have been hit much sooner.]
Christiano, Cotra, and Yudkowsky on AI progress

My version of it (which may or may not be Paul's version) predicts that in domains where people are putting in lots of effort to optimize a metric, that metric will grow relatively continuously. In other words, the more effort put in to optimize the metric, the more you can rely on straight lines for that metric staying straight (assuming that the trends in effort are also staying straight).

This is super helpful, thanks. Good explanation.

With this formulation of the "continuous view", I can immediately think of places where I'd bet against it. The first wh... (read more)

6rohinmshah7dI agree that when you know about a critical threshold, as with nukes or orbits, you can and should predict a discontinuity there. (Sufficient specific knowledge is always going to allow you to outperform a general heuristic.) I think that (a) such thresholds are rare in general and (b) in AI in particular there is no such threshold. (According to me (b) seems like the biggest difference between Eliezer and Paul.) Some thoughts on aging: * It does in fact seem surprising, given the complexity of biology relative to physics, if there is a single core cause and core solution that leads to a discontinuity. * I would a priori guess that there won't be a core solution. (A core cause seems more plausible, and I'll roll with it for now.) Instead, we see a sequence of solutions that intervene on the core problem in different ways, each of which leads to some improvement on lifespan, and discovering these at different times leads to a smoother graph. * That being said, are people putting in a lot of effort into solving aging in mice? Everyone seems to constantly be saying that we're putting in almost no effort whatsoever. If that's true then a jumpy graph would be much less surprising. * As a more specific scenario, it seems possible that the graph of mouse lifespan over time looks basically flat, because we were making no progress due to putting in ~no effort. I could totally believe in this world that someone puts in some effort and we get a discontinuity, or even that the near-zero effort we're putting in finds some intervention this year (but not in previous years) which then looks like a discontinuity. If we had a good operationalization, and people are in fact putting in a lot of effort now, I could imagine putting my $100 to your $300 on this (not going beyond 1:3 odds simply because you know way more about aging than I do).
3Vaniver8dWhile I think orbit is the right sort of discontinuity for this, I think you need to specify 'flight range' in a way that clearly favors orbits for this to be correct, mostly because about a month before was the manhole cover launched/vaporized with a nuke. [] [But in terms of something like "altitude achieved", I think Sputnik is probably part of a continuous graph, and probably not the most extreme member of the graph?]
The bonds of family and community: Poverty and cruelty among Russian peasants in the late 19th century

It sounds like you're thinking about "adaptivity" in terms of what's good for the group, not the individual. In a malthusian equilibrium, the world is largely zero-sum, so uprooting the trees of slightly more well-off neighbors could plausibly increase the odds of survival for one's own offspring. It's the next best thing to eating the neighbor's babies, as far as evolutionary fitness goes. And over time, it's the families with the most individual fitness which will dominate the constituency of the group.

(On the other hand, the fact that there was space to... (read more)

8Kaj_Sotala9dThe phrase "good for the group, not the individual" feels ambiguous to me; I usually interpret it to mean something that hurts some individuals while improving the group's chances to survive (e.g. norms that make some individuals sacrifice themselves to make the rest of the group better off). That at least wasn't what I meant; by "more adaptive" I meant something like an approximate Pareto improvement (in the long term) for the people adopting it. E.g. if everyone - including spouses! - is stealing from each other all the time, then it seems hard to believe that it's advantageous for people to marry while it not being advantageous to commit to a no-theft policy at least when dealing with your spouse. Even if the village was largely zero-sum, it still seems like being able to reliably cooperate with one person would give you an advantage in trying to steal things from everyone else. Or if things are so zero-sum that it's not even beneficial to cooperate with your spouse, why is there still an institution of marriage? I would think that the fact that people are socially interacting in a village in the first place implies that the world is not perfectly zero-sum and that there are gains to be had from cooperation. If that wasn't the case, I think the optimal strategy would be for one family to try to murder or enslave everyone else? I read this as indicating disagreement with my comment, but isn't it expressing the same thought as the dictatorless dystopia example and my remark that no rule requires cultures to hit particularly good local optimums?
Why Study Physics?

That is indeed a meme. Though if the physicists' attempts consistently failed, then biologists would not joke about physicists being like gunslingers.

How To Get Into Independent Research On Alignment/Agency

My main modification to that plan would be "writing up your process is more important than writing up your results"; I think that makes a false negative much less likely.

8 weeks seems like it's on the short end to do anything at all, especially considering that there will be some ramp-up time. A lot of that will just be making your background frames/approach more legible. I guess viability depends on exactly what you want to test:

  • If your goal is write up your background models and strategy well enough to see if grantmakers want to fund your work based on t
... (read more)
Frame Control

I like the rule, and if it's possible to come up with engagement guidelines that have asymmetrical results for frame control I would really like that.

Some thoughts, based on one particular framing of the problem...

Claim/frame: in general, the most robust defense against abuse is to foster independence in the corresponding domain. The most robust defense against emotional abuse is to foster emotional independence, the most robust defense against financial abuse is to foster financial independence, etc. The reasoning is that, if I am in not independent in so... (read more)

5jmh9dWhen I started reading my first thought was, not independence but competitive alternatives. Then of course you pointed to the same. However, I'm wondering if that is really where it stops. First I want to say I did not give the OP a full read and second that there are important parts of what I did read that I have fully digested. Given that, I have to wonder if the issue of frame control as raised by the author here is fully solved in the same way we think of economic problem solutions coming out of competitive supply and demand settings. Am I really in a good place personally just because I can pick and choose among those controlling my frame? Or, put differently, is multiple support options (i.e., able to expose one's self to multiple other frames) certain to eliminate the problem of frame control for that person? Something is nudging me in the direction of "not quite sure about that". Then again, maybe what we have is that one never escapes frame control so we're always talking about the best of a bunch of "bad" options.

'Monopoly provider of meaning' also helps me understand why this is more widespread in spiritual scenes.

Frame Control

I think it would be helpful for the culture to be more open to persistent long-running disagreements that no one is trying to resolve.

 +1 to this. I have an intuition that the unwillingness-to-let-disagreements-stand leads to a bunch of problems in subtle ways, including some of the things you point out here, but haven't sat down to think through what's going on there.

How To Get Into Independent Research On Alignment/Agency

Man, this is a tough question. Evaluating the quality of research in the field is already a tough problem that everybody disagrees on, and as a result people disagree on what sort of people are well-suited to the work. Evaluating it for yourself without already being an expert in the field is even harder. With that in mind, I'll give an answer which I think a reasonably-broad chunk of people would agree with, but with the caveat that it is very very incomplete.

I had a chat with Evan Hubinger a few weeks ago where we were speculating on how our evaluations ... (read more)

3toonalfrink10dBased on your comment, I'm more motivated to just sit down and (actually) try to solve AI Safety for X weeks, write up my results and do an application. What is your 95% confidence interval for what X needs to be to reduce the odds of a false negative (i.e. my grant gets rejected but shouldn't have been) to a single digit? I'm thinking of doing maybe 8 weeks. Maybe more if I can fall back on research engineering so that I haven't wasted my time completely.
Why Study Physics?

That's a natural hypothesis. A couple reasons to expect otherwise:

  • High dimensional world: to find something as useful as e.g. Fourier methods by brute-force guess-and-check would require an exponentially massive amount of search, and is unlikely to have ever happened at all. Therefore we should expect that it was produced by a method which systematically produces true/useful things more often than random chance, not just by guess-and-check with random guessing. (Einstein's Arrogance is saying something similar.)
  • Physicists have a track record of successfull
... (read more)
6Capybasilisk10dI thought the meme was that physicists think they can ride into town and make sweeping contributions with a mere glance at the problem, but reality doesn't pan out that way. Relevant XKCD [].
4adamShimi10dI don't think this contradict the hypothesis that "Physicists course-correct by regularly checking their answers". After all, the reason Fourier methods and others tricks kept being used is because they somehow worked a lot of the time. Similarly, I expect (maybe wrongly) that there was a bunch of initial fiddling before they got the heuristics to work decently. If you can't check your answer, the process of refinement that these ideas went through might be harder to replicate. The second point sounds stronger than the first, because the first can be explained in the fact that biological systems (for example) are made of physical elements, but not the other way around. So you should expect that biology has not that much to say about physics. Still, one could say that it's not obvious physics would have relevant things to say about biology because of the complexity and the abstraction involved. This makes me wonder if the most important skills of physicists is to have strong enough generators to provide useful criticism in a wide range of fields?
0TAG10dIf you suppose that physics tries to deal with the whole of reality in one gulp, then its ability to come up with simple general rules would be remarkable. But actually , physics deals with extremely simplified and idealised situations ... frictionless planes, free fall in a vacuum, and so on. Even experiments strive to simplify the natural messiness of reality into something where only one parameter changes at a time
Study Guide

Meta-note: I'd usually recommend complementing a course with a book by someone else, in order to get a different perspective. However, some professors are uniquely good at teaching their particular thing, and I'd include both Uri Alon and Stephen Boyd (the convex optimization guy) in that list. In those cases it more often makes sense to use materials from the one professor.

Christiano, Cotra, and Yudkowsky on AI progress

Some thinking-out-loud on how I'd go about looking for testable/bettable prediction differences here...

I think my models overlap mostly with Eliezer's in the relevant places, so I'll use my own models as a proxy for his, and think about how to find testable/bettable predictions with Paul (or Ajeya, or someone else in their cluster).

One historical example immediately springs to mind where something-I'd-consider-a-Paul-esque-model utterly failed predictively: the breakdown of the Philips curve. The original Philips curve was based on just fitting a curve to ... (read more)

The "continuous view" as I understand it doesn't predict that all straight lines always stay straight. My version of it (which may or may not be Paul's version) predicts that in domains where people are putting in lots of effort to optimize a metric, that metric will grow relatively continuously. In other words, the more effort put in to optimize the metric, the more you can rely on straight lines for that metric staying straight (assuming that the trends in effort are also staying straight).

In its application to AI, this is combined with a prediction that... (read more)

3amc11dI was under the impression that GPT-4 would be gigantic, according to this quote from this Wired article [] :
8Eliezer Yudkowsky13dI don't necessarily expect GPT-4 to do better on perplexity than would be predicted by a linear model fit to neuron count plus algorithmic progress over time; my guess for why they're not scaling it bigger would be that Stack More Layers just basically stopped scaling in real output quality at the GPT-3 level. They can afford to scale up an OOM to 1.75 trillion weights, easily, given their funding, so if they're not doing that, an obvious guess is that it's because they're not getting a big win from that. As for their ability to then make algorithmic progress, depends on how good their researchers are, I expect; most algorithmic tricks you try in ML won't work, but maybe they've got enough people trying things to find some? But it's hard to outpace a field that way without supergeniuses, and the modern world has forgotten how to rear those.
Yudkowsky and Christiano discuss "Takeoff Speeds"

FWIW, I did not find this weirdly uncharitable, only mildly uncharitable. I have extremely wide error bars on what you have and have not read, and "Eliezer has not read any of the things on that list" was within those error bars. It is really quite difficult to guess your epistemic state w.r.t. specific work when you haven't been writing about it for a while.

(Though I guess you might have been writing about it on Twitter? I have no idea, I generally do not use Twitter myself, so I might have just completely missed anything there.)

The "weirdly uncharitable" part is saying that it "seemed like" I hadn't read it vs. asking.  Uncertainty is one thing, leaping to the wrong guess another.

Yeah, even I wasn't sure you'd read those three things, Eliezer, though I knew you'd at least glanced over 'Takeoff Speeds' and 'Biological Anchors' enough to form opinions when they came out. :)

johnswentworth's Shortform

Everybody's been talking about Paxlovid, and how ridiculous it is to both stop the trial since it's so effective but also not approve it immediately. I want to at least float an alternative hypothesis, which I don't think is very probable at this point, but does strike me as at least plausible (like, 20% probability would be my gut estimate) based on not-very-much investigation.

Early stopping is a pretty standard p-hacking technique. I start out planning to collect 100 data points, but if I manage to get a significant p-value with only 30 data points, then... (read more)

Early stopping is a pretty standard p-hacking technique.

It was stopped after a pre-planned interim analysis; that means they're calculating the stopping criteria/p-values with multiple testing correction built in, using sequential analysis.

Yudkowsky and Christiano discuss "Takeoff Speeds"

I feel like the debate between EY and Paul (and the broader debate about fast vs. slow takeoff) has been frustratingly much reference class tennis and frustratingly little gears-level modelling.

So, there's this inherent problem with deep gearsy models, where you have to convey a bunch of upstream gears (and the evidence supporting them) before talking about the downstream questions of interest, because if you work backwards then peoples' brains run out of stack space and they lose track of the whole multi-step path. But if you just go explaining upstream g... (read more)

So, there's this inherent problem with deep gearsy models, where you have to convey a bunch of upstream gears (and the evidence supporting them) before talking about the downstream questions of interest, because if you work backwards then peoples' brains run out of stack space and they lose track of the whole multi-step path. But if you just go explaining upstream gears first, then people won't immediately see how they're relevant to alignment or timelines or whatever, and then lots of people just wander off. Then you go try to explain something about alig

... (read more)
Ngo and Yudkowsky on AI capability gains

I'm guessing that a lot of the hidden work here and in the next steps would come from asking stuff like:

  • so I need to alter the bucket for each new idea, or does it instead fit in its current form each time?
  • does the mental act of finding that an idea fit into the bucket removes some confusion and clarifies, or is it just a mysterious answer?
  • Does the bucket become more simple and more elegant with each new idea that fit in it?

Sounds like you should try writing it.

How To Get Into Independent Research On Alignment/Agency

Oh cool, that is what you were asking. I guess Steve's got you covered, then; I don't really know any more about it.

1kylefox118dYes. Thank you for your time and replying to my question.
Corrigibility Can Be VNM-Incoherent

Does broad corrigibility imply VNM-incoherence?

Yes, unless the state reward function is constant and we only demand weak corrigibility to all policies.

Given that this is the main result, I feel like the title "Corrigibility Can Be VNM-Incoherent" is rather dramatically understating the case. Maybe something like "Corrigibility Is Never Nontrivially VNM-Coherent In MDPs" would be closer. Or maybe just drop the hedging and say "Corrigibility Is Never VNM-Coherent In MDPs", since the constant-utility case is never interesting anyway.

I worded the title conservatively because I only showed that corrigibility is never nontrivially VNM-coherent in this particular MDP Maybe there's a more general case to be proven for all MDPs, and using more realistic (non-single-timestep) reward aggregation schemes.

How To Get Into Independent Research On Alignment/Agency

Oh perfect, I hadn't seen that. Strong upvote, very helpful.

How To Get Into Independent Research On Alignment/Agency

It's a very short discussion: there is no independent researcher institute. There are independent researchers, and we have no institute; that's what independent research (in the most literal sense) means.

... ok, actually, there is kind of an independent researcher institute. It's called the Ronin Institute. I'm not affiliated with them at all, and don't really know much about them. My understanding is that they provide an Official-Sounding Institute for independent researchers (in basically any academic field) to affiliate with, and can provide a useful so... (read more)

3kylefox118dAlmost there. My question was actually concerning the expected benefits from affiliating with an independent researcher institute. For example, an independent researcher would expect to receive grant administration (if funded) and virtual infrastructure services as benefits from Theiss Research in exchange for their affiliation. Please let me know if there is a need for further clarification.
Ngo and Yudkowsky on AI capability gains

Potentially important thing to flag here: at least in my mind, expected utility theory (i.e. the property Eliezer was calling "laser-like" or "coherence") and consequentialism are two distinct things. Consequentialism will tend to produce systems with (approximate) coherent expected utilities, and that is one major way I expect coherent utilities to show up in practice. But coherent utilities can in-principle occur even without consequentialism (e.g. conservative vector fields in physics), and consequentialism can in-principle not be very coherent (e.g. if... (read more)

My model of Eliezer says that there is some deep underlying concept of consequentialism, of which the "not very coherent consequentialism" is a distorted reflection; and that this deep underlying concept is very closely related to expected utility theory. (I believe he said at one point that he started using the word "consequentialism" instead of "expected utility maximisation" mainly because people kept misunderstanding what he meant by the latter.)

I don't know enough about conservative vector fields to comment, but on priors I'm pretty skeptical of this being a good example of coherent utilities; I also don't have a good guess about what Eliezer would say here.

Ngo and Yudkowsky on AI capability gains

To be clear, this part:

It's one of those predictions where, if it's false, then we've probably discovered something interesting - most likely some place where an organism is spending resources to do something useful which we haven't understood yet.

... is also intended as a falsifiable prediction. Like, if we go look at the anomaly and there's no new thing going on there, then that's a very big strike against expected utility theory.

This particular type of fallback-prediction is a common one in general: we have some theory which makes predictions, but "ther... (read more)

6adamShimi19dThat's a great way of framing it! And a great way of thinking about why these are not failures that are "worrysome" at first/in most cases.
Ngo and Yudkowsky on alignment difficulty

I do think alignment has a relatively-simple core. Not as simple as intelligence/competence, since there's a decent number of human-value-specific bits which need to be hardcoded (as they are in humans), but not enough to drive the bulk of the asymmetry.

(BTW, I do think you've correctly identified an important point which I think a lot of people miss: humans internally "learn" values from a relatively-small chunk of hardcoded information. It should be possible in-principle to specify values with a relatively small set of hardcoded info, similar to the way ... (read more)

Thinking about it more, it seems that messy reward signals will lead to some approximation of alignment that works while the agent has low power compared to its "teachers", but at high power it will do something strange and maybe harm the "teachers" values. That holds true for humans gaining a lot of power and going against evolutionary values ("superstimuli"), and for individual humans gaining a lot of power and going against societal values ("power corrupts"), so it's probably true for AI as well. The worrying thing is that high power by itself seems suf... (read more)

Ngo and Yudkowsky on AI capability gains

Oh, I can just give you a class of nontrivial predictions of expected utility theory. I have not seen any empirical results on whether these actually hold, so consider them advance predictions.

So, a bacteria needs a handful of different metabolic resources - most obviously energy (i.e. ATP), but also amino acids, membrane lipids, etc. And often bacteria can produce some metabolic resources via multiple different paths, including cyclical paths - e.g. it's useful to be able to turn A into B but also B into A, because sometimes the environment will have lots... (read more)

Thanks! I think that this is a very useful example of an advance prediction of utility theory; and that gathering more examples like this is one of the most promising way to make progress on bridging the gap between Eliezer's and most other people's understandings of consequentialism.

Thanks John for this whole thread!

(Note that I only read the whole Epistemology section of this post and skimmed the rest, so I might be saying stuff that are repeated/resolved elsewhere. Please point me to the relevant parts/quotes if that's the case. ;) )

Einstein's arrogance sounds to me like an early pointer in the Sequences for that kind of thing, with a specific claim about General Relativity being that kind of theory.

That being said, I still understand Richard's position and difficulty with this whole part (or at least what I read of Richard's diffic... (read more)

Ngo and Yudkowsky on AI capability gains

It feels to me like the typical Other within EA has no experience with discovering unexpected order, with operating a generalization that you can expect will cover new cases even when that isn't immediately obvious, with operating that generalization to cover those new cases correctly, with seeing simple structures that generalize a lot and having that be a real and useful and technical experience; instead of somebody blathering in a non-expectation-constraining way about how "capitalism is responsible for everything wrong with the world", and being able t

... (read more)

Like, there's a certain kind of theory/model which generalizes well to many classes of new cases and makes nontrivial predictions in those new cases, and those kinds-of-theories/models have a pattern to them which is recognizable.

Could I ask you to say more about what you mean by "nontrivial predictions" in this context? It seems to me like this was a rather large sticking point in the discussion between Richard and Eliezer (that is, the question of whether expected utility theory--as a specific candidate for a "strongly generalizing theory"--produces "non... (read more)

I second the kudos to Richard, by the way.  In a lot of ways he's an innocent bystander while I say things that aren't aimed mainly at him.

A positive case for how we might succeed at prosaic AI alignment

Abstracting out one step: there is a rough general argument that human-imitating AI is, if not perfectly safe, then at least as safe as the humans it's imitating. In particular, if it's imitating humans working on alignment, then it's at least as likely as we are to come up with an aligned AI. Its prospects are no worse than our prospects are already. (And plausibly better, since the simulated humans may have more time to solve the problem.)

For full strength, this argument requires that:

  • It emulate the kind of alignment research which the actual humans woul
... (read more)
Study Guide

Real analysis, abstract algebra, and topology are often the hardest and most advanced courses in the undergraduate math catalog. Those are considered the capstone courses of an undergraduate degree in pure mathematics. You reference them as introductory classes or prereqs which seems not correct.

Yeah, fair. Harvey Mudd is probably unusual in this regard - it's a very-top-tier exclusively-STEM school, so analysis and abstract algebra were typically late-sophomore-year/early-junior-year courses for the math majors (IIRC). I guess my corresponding advice for ... (read more)

Study Guide

I personally covered the relevant parts of measure theory and a lot of stochastic processes in math finance, which I think is a good way to do it. I did take an OR class which spent about half the time on Markov chains, but I consider that stuff pretty straightforward if you have a good grounding in linear algebra.

Analysis/abstract/topology are exactly the sort of prereqs I recommend skipping. The intro classes usually spend a bunch of time on fairly boring stuff; intermediate-level classes will usually review the actually-useful parts as-needed.

The crypto... (read more)

3jamal20dYou recommend the basic math courses: linear algebra, probability, a standard calculus sequence. You just don't recommend the more pure math type courses. In your view, pure math courses spend too much time digging into boring tedious details, and you advise more applied courses instead. That's an entirely valid perspective. And it may be the most productive tactic. Real analysis, abstract algebra, and topology are often the hardest and most advanced courses in the undergraduate math catalog. Those are considered the capstone courses of an undergraduate degree in pure mathematics. You reference them as introductory classes or prereqs which seems not correct. At almost any university, Real Analysis is the more advanced, theoretical, and difficult version of calculus. Did you study martingales or stopped brownian motion? Are those useful or recommended? Those seem relevant to finance and applied probability? I really enjoyed this post, and thank you for the awesome reply.
Some Remarks on Regulator Theorems No One Asked For

"why does anyone care isn't this trivial"

FWIW, I totally endorse this take on the original Good Regulator Theorem, and at least somewhat endorse it even on my own version.

Ngo and Yudkowsky on alignment difficulty

Personally, I'd consider a Fusion Power Generator-like scenario a more central failure mode than either of these. It's not about the difficulty of getting the AI to do what we asked, it's about the difficulty of posing the problem in a way which actually captures what we want.

4Steven Byrnes21dI agree that that is another failure mode. (And there are yet other failure modes too—e.g. instead of printing the nanobot plan, it prints "Help me I'm trapped in a box…" :-P . I apologize for sloppy wording that suggested the two things I mentioned were the only two problems.) I disagree about "more central". I think that's basically a disagreement on the question of "what's a bigger deal, inner misalignment or outer misalignment?" with you voting for "outer" and me voting for "inner, or maybe tie, I dunno". But I'm not sure it's a good use of time to try to hash out that disagreement. We need an alignment plan that solves all the problems simultaneously. Probably different alignment approaches will get stuck on different things.
A positive case for how we might succeed at prosaic AI alignment

I don't think the assemblage is the point. I think the idea here is that "myopia" is a property of problems: a non-myopic problem is (roughly) one which inherently requires doing things with long time horizons. I think Eliezer's claim is that (1) a (good) pivotal act is probably a non-myopic problem, and (2) you can't solve a nontrivial nonmyopic problem with a myopic solver. Part (2) is what I think TekhneMakr is gesturing at and Eliezer is endorsing.

My guess is that you have some idea of how a myopic solver can solve a nonmyopic problem (by having it out... (read more)

8evhub22dYeah, that's right, I definitely agree with (1) and disagree with (2). I tend to think that HCH is not dangerous, but I agree that it's likely insufficiently capable. To solve that problem, we have to do go to a myopic objective that is more powerful. But that's not that hard, and there's lots of them that can incentivize good non-myopic behavior that are safe to optimize for as long as the optimizer is myopic. AI safety via market making [] is one example, but it's a very tricky one, so maybe not the best candidate for showcasing what I mean. In particular, I suspect that a myopic optimizer given the goal of acting as a trader or market-maker in such a setup wouldn't act deceptively, though I suspect they would Goodhart on the human approval signal in unsafe ways (which is less bad of a problem than deception, and could potentially be solved via something like my step (6), but still a pretty serious problem). Maybe a better example would be something like imitative generalization [] . If imitating HCH is insufficient, we can push further by replacing “imitate HCH” with “output the hypothesis which maximizes HCH's prior times the hypothesis's likelihood,” which gets you substantially farther and I think is still safe to optimize for given a myopic optimizer (though neither are safe for a non-myopic optimizer).
Worth checking your stock trading skills

Indeed there is nothing contradictory about that.

I was being a bit lazy earlier - when I said "EV", I was using that as a shorthand for "expected discounted value", which in hindsight I probably should have made explicit. The discount factor is crucial, because it's the discount factor which makes risk aversion a thing: marginal dollars are worth more to me in worlds where I have fewer dollars, therefore my discount factor is smaller in those worlds.

The person making the margin loan does accept a lower expected return in exchange for lower risk, but their ... (read more)

Ngo and Yudkowsky on alignment difficulty

The main issue with this sort of thing (on my understanding of Eliezer's models) is Hidden Complexity of Wishes. You can make an AI safe by making it only able to fulfill certain narrow, well-defined kinds of wishes where we understand all the details of what we want, but then it probably won't suffice for a pivotal act. Alternatively, you can make it powerful enough for a pivotal act, but unfortunately a (good) pivotal act probably has to be very big, very irreversible, and very entangled with all the complicated details of human values. So alignment is l... (read more)

6cousin_it19dThis is tricky. Let's say we have a powerful black box that initially has no knowledge or morals, but a lot of malleable computational power. We train it to give answers to scary real-world questions, like how to succeed at business or how to manipulate people. If we reward it for competent answers while we can still understand the answers, at some point we'll stop understanding answers, but they'll continue being super-competent. That's certainly a danger and I agree with it. But by the same token, if we reward the box for aligned answers while we still understand them, the alignment will generalize too. There seems no reason why alignment would be much less learnable than competence about reality. Maybe your and Eliezer's point is that competence about reality has a simple core, while alignment doesn't. But I don't see the argument for that. Reality is complex, and so are values. A process for learning and acting in reality can have a simple core, but so can a process for learning and acting on values. Humans pick up knowledge from their surroundings, which is part of "general intelligence", but we pick up values just as easily and using the same circuitry. Where does the symmetry break?
Ngo and Yudkowsky on alignment difficulty

Good point, I wasn't thinking about that mechanism.

However, I don't think this creates an information bottleneck in the sense needed for the original claim in the post, because the marginal cost of storing more information in the genome does not increase via this mechanism as the amount-of-information-passed increases. Each gene just needs to offer a large enough fitness advantage to counter the noise on that gene; the requisite fitness advantage does not change depending on whether the organism currently has a hundred information-passing genes or a hundre... (read more)

3darius21dHere's the argument I'd give for this kind of bottleneck. I haven't studied evolutionary genetics; maybe I'm thinking about it all wrong. In the steady state, an average individual has n children in their life, and just one of those n makes it to the next generation. (Crediting a child 1/2 to each parent.) This gives log2(n) bits of error-correcting signal to prune deleterious mutations. If the genome length times the functional bits per base pair times the mutation rate is greater than that log2(n), then you're losing functionality with every generation. One way for a beneficial new mutation to get out of this bind is by reducing the mutation rate. Another is refactoring the same functionality into fewer bits, freeing up bits for something new. But generically a fitness advantage doesn't seem to affect the argument that the signal from purifying selection gets shared by the whole genome.
Ngo and Yudkowsky on alignment difficulty

So here's one important difference between humans and neural networks: humans face the genomic bottleneck which means that each individual has to rederive all the knowledge about the world that their parents already had. If this genetic bottleneck hadn't been so tight, then individual humans would have been significantly less capable of performing novel tasks.

I disagree with this in an interesting way. (Not particularly central to the discussion, but since both Richard & Eliezer thought the quoted claim is basically-true, I figured I should comment on ... (read more)

Large genomes have (at least) 2 kinds of costs. The first is the energy and other resources required to copy the genome whenever your cells divide. The existence of junk DNA suggests that this cost is not a limiting factor. The other cost is that a larger genome will have more mutations per generation. So maintaining that genome across time uses up more selection pressure. Junk DNA requires no maintenance, so it provides no evidence either way. Selection pressure cost could still be the reason why we don't see more knowledge about the world being translate... (read more)

5TekhneMakre22dMy guess is that this is a total misunderstanding of what's meant by "genomic bottleneck". The bottleneck isn't the amount of information storage, it's the fact that the genome can only program the mind in a very indirect, developmental way, so that it can install stuff like "be more interested in people" but not "here's how to add numbers".
Study Guide

Oh yeah, I did read Li and Vitanyi pretty early on. I completely forgot about that.

Discussion with Eliezer Yudkowsky on AGI interventions

An example: when I first heard the Ought experiments described, I was pretty highly confident how they'd turn out - people would mostly fail to coordinate on any problem without an already-very-obvious factorization. (See here for the kinds of evidence informing that high confidence, though applied to a slightly different question. See here and here for the more general reasoning/world models which underlie that prediction.) From what I've heard of the experiments, it seems that that is indeed basically what happened; therefore the experiments provided app... (read more)

5rohinmshah24dThat one makes sense (to the extent that Eliezer did confidently predict the results), since the main point of the work was to generate information through experiments. I thought the "predictable" part was also meant to apply to a lot of ML work where the main point is to produce new algorithms, but perhaps it was just meant to apply to things like Ought.
Being the (Pareto) Best in the World

I can choose to read the Wikipedia overviews of 1,000,000 different fields, which will allow me to reach the Pareto frontier in this 1,000,000-dimensional graph. However, this isn’t practically useful.

That... actually sounds extremely useful, this is a great idea. The closest analogue I've done is read through a college course catalogue from cover to cover, which was extremely useful. Very good way to find lots of unknown unknowns.

7AllAmericanBreakfast25dTo both of you, I say “useful relative to what?” Opportunity cost is the baseline for judging that. Are you excited to read N field overviews over your next best option?
Discussion with Eliezer Yudkowsky on AGI interventions

I don't mean to say that there's critique of prosaic alignment specifically in the sequences. Rather, a lot of the generators of the Yudkowsky-esque worldview are in there. (That is how the sequences work: it's not about arguing specific ideas around alignment, it's about explaining enough of the background frames and generators that the argument becomes unnecessary. "Raise the sanity waterline" and all that.)

For instance, just the other day I ran across this:

Of this I learn the lesson:  You cannot manipulate confusion.  You cannot make clever pl

... (read more)
Discussion with Eliezer Yudkowsky on AGI interventions

... I find that most people working on alignment are trying far harder harder to justify why they expect their work to matter than EY and the old-school MIRI team ever did.

You've had a few comments along these lines in this thread, and I think this is where you're most severely failing to see the situation from Yudkowsky's point of view.

From Yudkowsky's view, explaining and justifying MIRI's work (and the processes he uses to reach such judgements more generally) was the main point of the sequences. He has written more on the topic than anyone else in the ... (read more)

7adamShimi1moThanks for the pushback! My memory of the sequences is that it's far more about defending and explaining the alignment problem than criticizing prosaic AGI (maybe because the term couldn't have been used years before Paul coined it?). Could you give me the best pointers of prosaic Alignment criticism in the sequence? I(I've read the sequences, but I don't remember every single post, and my impression for memory is what I've written above). I feel also that there might be a discrepancy between who I think of when I think of prosaic alignment researchers and what the category means in general/to most people here? My category mostly includes AF posters, people from a bunch of places like EleutherAI/OpenAI/DeepMind/Anthropic/Redwood and people from CHAI and FHI. I expect most of these people to actually have read the sequences, and tried to understand MIRI's perspective. Maybe someone could point out a list of other places where prosaic alignment research is being done that I'm missing, especially places where people probably haven't read the sequences? Or maybe I'm over estimating how many of the people in the places I mentioned have read the sequences?
Discussion with Eliezer Yudkowsky on AGI interventions

In the abstraction formalism I use, it can be ambiguous whether any particular thing "is a rose", while still having a roughly-unambiguous concept of roses. It's exactly like clustering: a cluster can have unambiguous parameters (mean, variance, etc), but it's still ambiguous whether any particular data point is "in" that cluster.

4Charlie Steiner1moGood point. I was more thinking that not only could it be ambiguous for a single observer, but different observers could systematically decide differently, and that would be okay. Are there any concepts that don't merely have continuous parameters, but are actually part of continuous families? Maybe the notion of "1 foot long"?
Study Guide

I don't have recommendations for courses or principles to select them beyond what's in the post. (Otherwise I would have put them in the post.)

I don't have any sense of what the math of agency and alignment is like, and I hope to get a feel for it sometime in the next year, but I can't right now — by the way, any recommendations on how to do that?

I don't think you're going to find anybody with existing good answers. The embedded agency sequence is the best articulation of the problems which I currently know of. (Even there I disagree with the degree of emp... (read more)

Study Guide

To Aysajan's answer, I would add that "number of calculations a program needs to run" usually comes from a big-O estimate for the data structures involved, and the size of the data we're using. So, for instance, if I'm looping over a list with 1k items and doing a thing to each, then that should take ~1k operations. (Really the thing I'm doing to each will probably take more than one operation, but this is a Fermi estimate, so we just need to be within an order of magnitude.) If I'm looping over all pairs of items from two lists, then the number of operations will be the product of their sizes.

Study Guide

That's a good point. College did match my natural learning style pretty well (albeit with a larger-than-usual technical courseload, and a lot of textbooks/lectures on the side).

I find your 2x estimate plausible, though obviously very highly dependent on the person and the details; it's definitely not something I'd expect to work for everyone or even most people.

Study Guide

Starting from the high-school level, most of the material in this post took me about 5-6 years (a year or two of high school plus four years of college).

I don't think more than a year or two could be shaved off without somebody creating much better study material. (I do think a lot better study material could be made - the framing practica are an attempt at a prototype of that - but I found it very time intensive to make such things.) On the other side, I covered far more ground in college than the vast majority of people I know, and I don't know how much ... (read more)

6toonalfrink1moI'm more interested in the time this would take if one wasn't constrained by being in college. My intuition is that you can go 2x faster on your own if the topic and the pace isn't being imposed on you, but maybe college just matched your natural learning style. Thanks for the data point in any case
Study Guide

I roughly agree with that value prop for physics. I'd add that physics is the archetype of the sciences, and gets things right that haven't necessarily been made a legible part of "the scientific method" yet, so it's important to study physics to get an intuitive idea of science-done-right beyond what we already know how to explain well. (Gears-level models are a good example here - physics is a good way to gain an intuition for "gears" and their importance, even if that's not explicitly brought to attention or made legible. Your point about how we use sym... (read more)

1TAG1moI would argue that physics can make very accurate quantitative predictions under the right circumstances...and that it nonetheless poses philosophical challenges much more than other quantitave sciences.
Load More