Sorted by New


Variables Don't Represent The Physical World (And That's OK)

You are misunderstanding the post. There are no "extra bits of information" hiding anywhere in reality; where the "extra bits of information" are lurking is within the implicit assumptions you made when you constructed your model the way you did.

As long as your model is making use of abstractions--that is, using "summary data" to create and work with a lower-dimensional representation of reality than would be obtained by meticulously tracking every variable of relevance--you are implicitly making a choice about what information you are summarizing and how you are summarizing it.

This choice is forced to some extent, in the sense that there are certain ways of summarizing the data that barely simplify computation at all compared to using the "full" model. But even conditioning on a usefully simplifying (natural) abstraction having been selected, there will still be degrees of freedom remaining, and those degrees of freedom are determined by you (the person doing the summarizing). This is where the "extra information" comes from; it's not because of inherent uncertainty in the physical measurements, but because of an unforced choice that was made between multiple abstract models summarizing the same physical measurements.

Of course, in reality you are also dealing with measurement uncertainty. But that's not what the post is about; the thing described in the post happens even if you somehow manage to get your hands on a set of uncertainty-free measurements, because the moment you pick a particular way to carve up those measurements, you induce a (partially) arbitrary abstraction layer on top of the measurements. As the post itself says:

If there’s only a limited number of data points, then this has the same inherent uncertainty as before: sample mean is not distribution mean. But even if there’s an infinite number of data points, there’s still some unresolvable uncertainty: there are points which are boundary-cases between the “tree” cluster and the “apple” cluster, and the distribution-mean depends on how we classify those. There is no physical measurement we can make which will perfectly tell us which things are “trees” or “apples”; this distinction exists only in our model, not in the territory. In turn, the tree-distribution-parameters do not perfectly correspond to any physical things in the territory.

This implies nothing about determinism, physics, or the nature of reality ("illusory" or otherwise).

Often, enemies really are innately evil.

This does not strike me as a psychologically realistic model of sadism, and (absent further explanation/justification) counts in my opinion as a rather large strike against mistake theory (or at least, it would if I took as given that a plurality of self-proclaimed "mistake theorists" would in fact endorse the statement you made).

What will 2040 probably look like assuming no singularity?

Neither of those seem to me like the right questions to be asking (though for what it's worth the answer to the first question has been pretty clearly "yes" if by "Chinese government" we're referring specifically to post-2001 China).

Having said that, I don't think outside-viewing these scenarios using coarse-grained reference classes like "the set of mid-term goals China has set for itself in the past" leads to anything useful. Well-functioning countries in general (and China in particular) tend to set goals for themselves they view as achievable, so if they're well-calibrated it's necessarily the case that they'll end up achieving (a large proportion of) the goals they set for themselves. This being the case, you don't learn much from finding out China manages to consistently meet its own goals, other than that they've historically done a pretty decent job at assessing their own capabilities. Nor does this allow you to draw conclusions about a specific goal they have, which may be easier or more difficult to achieve than their average goal.

In the case of Taiwan: by default, China is capable of taking Taiwan by force. What I mean by this is that China's maritime capabilities well exceed Taiwan's defensive capacity, such that Taiwan's continued sovereignty in the face of a Chinese invasion is entirely reliant on the threat of external intervention (principally from the United States, but also by allies in the region). Absent that threat, China could invade Taiwan tomorrow and have a roughly ~100% chance of taking the island. Even if allies get involved, there's a non-negligible probability China wins anyway, and the trend going forward only favors China even more.

Of course, that doesn't mean China will invade Taiwan in the near future. As long as its victory isn't assured, it stands to lose substantially more than from a failed invasion than it stands to gain from a successful one. At least for the near future, so long as the United States doesn't send a clear signal about whether it will defend Taiwan, I expect China to mostly play it safe. But there's definitely a growing confidence within China that they'll retake Taiwan eventually, so the prospect of an invasion is almost certainly on the horizon unless current trends w.r.t. the respective strengths of the U.S. and Chinese militaries reverse for some reason. That's not out of the question (the future is unpredictable), but there's also no particular reason to expect said trends to reverse, so assuming they don't, China will almost certainly try to occupy Taiwan at some point, regardless of what stance the U.S. takes on the issue.

(Separately, there's the question of whether the U.S. will take a positive stance; I'm not optimistic that it will, given its historical reluctance to do so, as well as the fact that all of the risks and incentives responsible for said reluctance will likely only increase as time goes on.)

Making Vaccine

A simple Google search shows thousands of articles addressing this very solution.

The solution in the paper you link is literally the solution Eliezer described trying, and not working:

As of 2014, she’d tried sitting in front of a little lightbox for an hour per day, and it hadn’t worked.

(Note that the "little lightbox" in question was very likely one of these, which you may notice have mostly ratings of 10,000 lux rather than the 2,500 cited in the paper. So, significantly brighter, and despite that, didn't work.)

It does sound like you misunderstood, in other words. Knowing that light exposure is an effective treatment for SAD is indeed a known solution; this is why Eliezer tried light boxes to begin with. The point of that excerpt is that this "known solution" did not work for his wife, and the obvious next step of scaling up the amount of light used was not investigated in any of the clinical literature.

But taking a step back, the "Chesterton’s Absence of a Fence" argument doesn't apply here because the circumstances are very different. The entire world is desperately looking for a way to stop COVID. If SAD suddenly occurred out of nowhere and affected the entire economy, you would be sure that bright lights would be one of the first things to be tested.

This is simply a (slightly) disguised variation of your original argument. Absent strong reasons to expect to see efficiency, you should not expect to see efficiency. The "entire world desperately looking for a way to stop COVID" led to bungled vaccine distribution, delayed production, supply shortages, the list goes on and on. Empirically, we do not observe anything close to efficiency in this market, and this should be obvious even without the aid of Dentin's list of bullet points (though naturally those bullet points are very helpful).

(Question: did seeing those bullet points cause you to update at all in the direction of this working, or are you sticking with your 1-2% prior? The latter seems fairly indefensible from an epistemic standpoint, I think.)

Not only is the argument above flawed, it's also special pleading with respect to COVID. Here is the analogue of your argument with respect to SAD:

Around 7% of the population has severe Seasonal Affective Disorder, and another 20% or so has weak Seasonal Affective Disorder. Around 50% of tested cases respond to standard lightboxes. So if the intervention of stringing up a hundred LED bulbs actually worked, it could provide a major improvement to the lives of 3% of the US population, costing on the order of $1000 each (without economies of scale). Many of those 9 million US citizens would be rich enough to afford that as a treatment for major winter depression. If you could prove that your system worked, you could create a company to sell SAD-grade lighting systems and have a large market.

SAD is not an uncommon disorder. In terms of QALYs lost, it's... probably not directly comparable with COVID, but it's at the very least in the same ballpark--certainly to the point where "people want to stop COVID, but they don't care about SAD" is clearly false.

And yet, in point of fact, there are no papers describing the unspeakably obvious intervention of "if your lights don't seem to be working, use more lights", nor are there any companies predicated on this idea. If Eliezer had followed your reasoning to its end conclusion, he might not have bothered testing more light... except that his background assumptions did not imply the (again, fairly indefensible, in my view) heuristic that "if no one else is doing it, the only possible explanation is that it must not work, else people are forgoing free money". And as a result, he did try the intervention, and it worked, and (we can assume) his wife's quality of life was improved significantly as a result.

If there's an argument that (a) applies in full generality to anything other people haven't done before, and (b) if applied, would regularly lead people to forgo testing out their ideas (and not due to any object-level concerns, either, e.g. maybe it's a risky idea to test), then I assert that that argument is bad and harmful, and that you should stop reasoning in this manner.

Making Vaccine

This is a very in-depth explanation of some of the constraints affecting pharmaceutical companies that (mostly) don't apply to individuals, and is useful as an object-level explanation for those interested. I'm glad this comment was written, and I upvoted accordingly.

Having said that, I would also like to point out that a detailed explanation of the constraints shouldn't be needed to address the argument in the grandparent comment, which simply reads:

Why are established pharmaceutical companies spending billions on research and using complex mRNA vaccines when simply creating some peptides and adding it to a solution works just as well?

This question inherently assumes that the situation with commercial vaccine-makers is efficient with respect to easy, do-it-yourself interventions, and the key point I want to make is that this assumption is unjustified even if you don't happen to have access to a handy list of bullet points detailing the ways in which companies and individuals differ on this front. (Eliezer wrote a whole book on this at one point, from which I'll quote a relevant section:)

My wife has a severe case of Seasonal Affective Disorder. As of 2014, she’d tried sitting in front of a little lightbox for an hour per day, and it hadn’t worked. SAD’s effects were crippling enough for it to be worth our time to consider extreme options, like her spending time in South America during the winter months. And indeed, vacationing in Chile and receiving more exposure to actual sunlight did work, where lightboxes failed.

From my perspective, the obvious next thought was: “Empirically, dinky little lightboxes don’t work. Empirically, the Sun does work. Next step: more light. Fill our house with more lumens than lightboxes provide.” In short order, I had strung up sixty-five 60W-equivalent LED bulbs in the living room, and another sixty-five in her bedroom.

Ah, but should I assume that my civilization is being opportunistic about seeking out ways to cure SAD, and that if putting up 130 LED light bulbs often worked when lightboxes failed, doctors would already know about that? Should the fact that putting up 130 light bulbs isn’t a well-known next step after lightboxes convince me that my bright idea is probably not a good idea, because if it were, everyone would already be doing it? Should I conclude from my inability to find any published studies on the Internet testing this question that there is some fatal flaw in my plan that I’m just not seeing?

We might call this argument “Chesterton’s Absence of a Fence.” The thought being: I shouldn’t build a fence here, because if it were a good idea to have a fence here, someone would already have built it. The underlying question here is: How strongly should I expect that this extremely common medical problem has been thoroughly considered by my civilization, and that there’s nothing new, effective, and unconventional that I can personally improvise?

Eyeballing this question, my off-the-cuff answer—based mostly on the impressions related to me by every friend of mine who has ever dealt with medicine on a research level—is that I wouldn’t necessarily expect any medical researcher ever to have done a formal experiment on the first thought that popped into my mind for treating this extremely common depressive syndrome. Nor would I strongly expect the intervention, if initial tests found it to be effective, to have received enough attention that I could Google it.

The grandparent comment is more or less an exact example of this species of argument, and is the first of its kind that I can recall seeing "in the wild". I think examples of this kind of thinking are all over the place, but it's rare to find a case where somebody explicitly deploys an argument of this type in such a direct, obvious way. So I wanted to draw attention to this, with further emphasis on the idea that such arguments are not valid in general.

The prevalence of this kind of thinking is why (I claim) at-home, do-it-yourself interventions are so uncommon, and why this particular intervention went largely unnoticed even among the rationalist community. It's a failure mode that's easy to slip into, so I think it's important to point these things out explicitly and push back against them when they're spotted (which is the reason I wrote this comment).

IMPORTANT NOTE: This should be obvious enough to anyone who read Inadequate Equilibria, but one thing I'm not saying here is that you should just trust random advice you find online. You should obviously perform an object-level evaluation of the advice, and put substantial effort into investigating potential risks; such an assessment might very well require multiple days' or weeks' worth of work, and end up including such things as the bulleted list in the parent comment. The point is that once you've performed that assessment, it serves no further purpose to question yourself based only on the fact that others aren't doing the thing you're doing; this is what Eliezer would call wasted motion, and it's unproductive at best and harmful at worst. If you find yourself thinking along these lines, you should stop, in particular if you find yourself saying things like this (emphasis mine):

That being said, I'm extremely skeptical that this will work, my belief is that there's a 1-2% chance here that you've effectively immunized yourself from COVID.

You cannot get enough Bayesian evidence from the fact that [insert company here] isn't doing [insert intervention here] to reduce your probability of an intervention being effective all the way down to 1-2%. That 1-2% figure almost certainly didn't come from any attempt at a numerical assessment; rather, it came purely from an abstract intuition that "stuff that isn't officially endorsed doesn't work". This is the kind of thinking that (I assert) should be noticed and stamped out.

Syntax, semantics, and symbol grounding, simplified

With regard to GPT-n, I don't think the hurdle is groundedness. Given a sufficiently vast corpus of language, GPT-n will achieve a level of groundedness where it understands language at a human level but lacks the ability to make intelligent extrapolations from that understanding (e.g. invent general relativity), which is rather a different problem.

The claim in the article is that grounding is required for extrapolation, so these two problems are not in fact unrelated. You might compare e.g. the case of a student who has memorized by rote a number of crucial formulas in calculus, but cannot derive those formulas from scratch if asked (and by extension obviously cannot conceive of or prove novel theorems either); this suggests an insufficient level of understanding of the fundamental mathematical underpinnings of calculus, which (if I understood Stuart's post correctly) is a form of "ungroundedness".

[Linkpost] AlphaFold: a solution to a 50-year-old grand challenge in biology

I don't think it's particularly impactful from an X-risk standpoint (at least in terms of first-order consequences), but in terms of timelines I think it represents another update in favor of shorter timelines, in a similar vein to AlphaGo/AlphaZero.

Message Length

Since the parameters in your implementation are 32-bit floats, you assign a complexity cost of 32 ⋅ 2^n bits to n-th order Markov chains, and look at the sum of fit (log loss) and complexity.

Something about this feels wrong. The precision of your floats shouldn't be what determines the complexity of your Markov chain; the expressivity of an nth-order Markov chain will almost always be worse than that of a (n+1)th-order Markov chain, even if the latter has access to higher precision floats than the former. Also, in the extreme case where you're working with real numbers, you'd end up with the absurd conclusion that every Markov chain has infinite complexity, which is obviously nonsensical.

This does raise the question of how to assign complexity to Markov chains; it's clearly going to be linear in the number of parameters (and hence exponential in the order of the chain), which means the general form k ⋅ 2^n seems correct... but the value you choose for the coefficient k seems underdetermined.

Alignment By Default

I like this post a lot, and I think it points out a key crux between what I would term the "Yudkowsky" side (which seems to mostly include MIRI, though I'm not too sure about individual researchers' views) and "everybody else".

In particular, the disagreement seems to crystallize over the question of whether "human values" really are a natural abstraction. I suspect that if Eliezer thought that they were, he would be substantially less worried about AI alignment than he currently is (though naturally all of this is my read on his views).

You do provide some reasons to think that human values might be a natural abstraction, both in the post itself and in the comments, but I don't see these reasons as particularly compelling ones. The one I view as the most compelling is the argument that humans seems to be fairly good at identifying and using natural abstractions, and therefore any abstract concept that we seem to be capable of grasping fairly quickly has a strong chance of being a natural one.

However, I think there's a key difference between abstractions that are developed for the purposes of prediction, and abstractions developed for other purposes (by which I mostly mean "RL"). To the extent that a predictor doesn't have sufficient computational power to form a low-level model of whatever it's trying to predict, I definitely think that the abstractions it develops in the process of trying to improve its prediction will to a large extent be natural ones. (You lay out the reasons for this clearly enough in the post itself, so I won't repeat them here.)

It seems to me, though, that if we're talking about a learning agent that's actually trying to take actions to accomplish things in some environment, there's a substantial amount of learning going on that has nothing to do with learning to predict things with greater accuracy! The abstractions learned in order to select actions from a given action-space in an attempt to maximize a given reward function--these, I see little reason to expect will be natural. In fact, if the computational power afforded to the agent is good but not excellent, I expect mostly the opposite: a kludge of heuristics and behaviors meant to address different subcases of different situations, with not a whole lot of rhyme or reason to be found.

As agents go, humans are definitely of the latter type. And, therefore, I think the fact that we intuitively grasp the concept of "human values" isn't necessarily an argument that "human values" are likely to be natural, in the way that it would be for e.g. trees. The latter would have been developed as a predictive abstraction, whereas the former seems to mainly consist of what I'll term a reward abstraction. And it's quite plausible to me that reward abstractions are only legible by default to agents which implement that particular reward abstraction, and not otherwise. If that's true, then the fact that humans know what "human values" are is merely a consequence of the fact that we happen to be humans, and therefore have a huge amount of mind-structure in common.

To the extent that this is comparable to the branching pattern of a tree (which is a comparison you make in the post), I would argue that it increases rather than lessens the reason to worry: much like a tree's branch structure is chaotic, messy, and overall high-entropy, I expect human values to look similar, and therefore not really encompass any kind of natural category.

The "AI Dungeons" Dragon Model is heavily path dependent (testing GPT-3 on ethics)

Here's the actual explanation for this: https://twitter.com/nickwalton00/status/1289946861478936577

This seems to have been an excellent exercise in noticing confusion; in particular, to figure this one out properly would have required one to not recognize that this behavior does not accord with one's pre-existing model, rather than simply coming up with an ad hoc explanation to fit the observation.

I therefore award partial marks to Rafael Harth for not proposing any explanations in particular, as well as Viliam in the comments:

I assumed that the GPT's were just generating the next word based on the previous words, one word at a time. Now I am confused.

Zero marks to Andy Jones, unfortunately:

I am fairly confident that Latitude wrap your Dungeon input before submitting it to GPT-3; if you put in the prompt all at once, that'll make for different model input than putting it in one line at a time.

Don't make up explanations! Take a Bayes penalty for your transgressions!

(No one gets full marks, unfortunately, since I didn't see anyone actually come up with the correct explanation.)

Load More