Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

I go to Amazon, search for “air conditioner”, and sort by average customer rating. There’s a couple pages of evaporative coolers (not what I’m looking for), one used window unit (?), and then this:

Average rating: 4.7 out of 5 stars.

However, this air conditioner has a major problem. Take a look at this picture:

Key thing to notice: there is one hose going to the window. Only one.

Why is that significant?

Here’s how this air conditioner works. It sucks in some air from the room. It splits that air into two streams, and pumps heat from one stream to the other - making some air hotter, and some air cooler. The cool air, it blows back into the room. The hot air, it blows out the window.

See the problem yet?

Air is blowing out the window. In order for the room to not end up a vacuum, air has to come back into the room from outside. In practice, houses are very not airtight (we don’t want to suffocate), so air from outside will be pulled in through lots of openings throughout the house. And presumably that air being pulled in from outside is hot; one typically does not use an air conditioner on cool days.

The actual effect of this air conditioner is to make the space right in front of the air conditioner nice and cool, but fill the rest of the house with hot outdoor air. Probably not what one wants from an air conditioner!

Ok, that’s amusing, but the point of this post is not physics-101 level case studies in how not to build an air conditioner. The real fact of interest is that this is apparently the top rated new air conditioner on Amazon. How does such a bad design end up so popular?

One aspect of the story, presumably, is fake reviews. That phenomenon is itself a rich source of insight, but not the point of this post, and definitely not enough to account for the popularity of this air conditioner. The reviews shown on the product page are all “verified purchase”, and mostly 5-stars. There are only 4 one-star reviews (out of 104). If most customers noticed how bad this air conditioner is, I do not think a 4.7 rating would be sustainable. Customers actually do like this air conditioner.

And hey, this air conditioner has a lot going for it! There’s wheels on the bottom, so it’s very portable. Setup is super easy - only one hose to the window, much less fiddly than those two-hose designs where you attach one hose and the other pops off.

Sure, the air conditioner has a major problem, but it’s not a major problem which most people will notice. They may notice that most of the house is still hot, but the space right in front of the air conditioner will be cool, so obviously the air conditioner is doing its job. Very few people will realize that the air conditioner is drawing hot air into the rest of the house. (Indeed, I saw zero reviews which mentioned that the air conditioner pulls hot air into the house - even the 1-star reviewers apparently did not realize why the air conditioner was so bad.)

[EDIT: several commenters seem to think that I'm claiming this air conditioner does not work at all, so I want to clarify that it will still cool down a room on net. If the air inside is all perfectly mixed together, it will still end up cooler with the air conditioner than without. The point is not that it doesn't work at all. The point is that it's stupidly inefficient in a way which I do not think consumers would plausibly choose over the relatively-low cost of a second hose if they recognized the problems.]


Major problems are only fixed when those problems are obvious. Problems which most people won’t notice (or won’t attribute correctly) tend to stick around. There’s no economic incentive to fix them.

And in practice, there are plenty of problems which most people won’t notice. A few more examples:

  • Most charities have pretty mediocre impact. But the actual impact is very-not-visible to the person making donations, so people keep donating. (Also people care about things besides impact, but nonetheless I doubt low-impact charities would survive if their ineffectiveness were generally obvious.)
  • Medical research has a replication rate below 50%. But when the effect sizes are expected to be small anyways, it’s hard to tell whether it’s working, so doctors (and patients) keep using crap treatments.
  • Based on my firsthand experience with the B2B software industry, success is mostly determined by how good the product looks to managers making the decision to purchase. Successful B2B software (think “enterprise software”) is usually crap, but has great salespeople and great dashboards for the managers.

… and presumably this extends to lots of other industries which I’m less familiar with.

Two points to highlight here:

  • Regulation does not fix the problem, just moves it from the consumer to the regulator. A regulator will only regulate a problem which is obvious to the regulator. A regulator may sometimes have more expertise than a layperson, but even that requires that the politicians ultimately appointing people can distinguish real from fake expertise, which is hard in general.
  • Waiting longer does not fix the problem. All those people who did not notice their air conditioner pulling hot air into the house will not start noticing if we just wait a few years. Problems do not automatically become obvious over time.

How Does This Relate To Takeoff Speeds?

There’s a common view that, as long as AI does not take off too quickly, we’ll have time to see what goes wrong and iterate on it. It's a view with a lot of intuitive outside-view appeal: AI will work just like other industries. We try stuff, see what goes wrong, fix it. It worked like that in all the other industries, presumably it will work like that in AI too.

The point of the air conditioner is that other industries do not, in fact, work like that. Other industries are absolutely packed with major problems which are not fixed because they’re not obvious. Even assuming that AI does not take off quickly (itself a dubious assumption at best), we should expect the same to be true of AI.

… But Won’t Big Problems Be Obvious?

Most industries have major problems which aren’t fixed because they’re not obvious. But these problems can only be so bad. If they were really disastrous, the disasters would be obvious. Why not expect the same from AI?

Because AI will eventually be far more capable than human industries. It will, by default, optimize way harder than human industries are capable of optimizing.

What does it look like, when the optimization power is turned up to 11 on something like the air conditioner problem? Well, it looks really good. But all the resources are spent on looking good, not on actually being good. It’s “Potemkin village world”: a world designed to look amazing, but with nothing behind the facade. Maybe not even any living humans behind the facade - after all, even generally-happy real humans will inevitably sometimes appear less-than-maximally “good”.

… But Isn’t Solving The Obvious Problems Still Valuable?

The nonobvious problems are the whole reason why AI alignment is hard in the first place.

Think about the “game tree” of alignment - the basic starting points, how they fail, what strategies address the failures, how those fail, etc. The most basic starting points are generally of the form “collect data from humans on which things are good/bad, then train something to do good stuff and avoid bad stuff”. Assuming such a strategy could be implemented efficiently, why would it fail? Well:

  • In cases where humans label bad things as “good”, the trained system will also be selected to label bad things as “good”. In other words, the trained AI will optimize for things which look “good'' to humans, even when those things are not very good.
  • The trained system will likely end up implementing strategies which do “good”-labeled things in the training environment, but those strategies will not necessarily continue to do the things humans would consider “good” in other environments.

(Somewhat more detail on these failure modes here.) Optimizing for things which look “good” to humans obviously raises exactly the sort of failure which the air conditioner points to. Failure of systems to generalize in “good” ways is less centrally about obviousness, but note that if it were obvious that the system were going to generalize badly, this would also be a pretty easy issue to solve: just don’t deploy the system if it will generalize badly. Problem is, we can’t tell whether a system will do what we want in deployment just by looking at what it does in training; we can’t tell by looking at the system's behavior whether there’s problems in there.

Point is: problems which are highly visible to humans are already easy, from an alignment perspective. They will probably be solved by default. There’s not much marginal value in dealing with them. The value is in dealing with the problems which are hard to recognize.

Corollary: alignment is not importantly easier in slow-takeoff worlds, at least not due to the ability to iterate. The hard parts of the alignment problem are the parts where it’s nonobvious that something is wrong. That’s true regardless of how fast takeoff speeds are. And the ability to iterate does not make that hard part easier. Iteration mainly helps on the parts of the problem which were already easy anyway.

So I don't really care about takeoff speeds. The technical problems are basically similar either way.

... though admittedly I did not actually learn everything I need to know about takeoff speeds just from air conditioner ratings on Amazon. It took a lot of examples in different industries. Fortunately, there was no shortage of examples to hammer the idea into my head.


Ω 54

New Comment
129 comments, sorted by Click to highlight new comments since: Today at 3:38 AM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

I agree that people can easily fail to fix alignment problems, and can instead paper over them, even given a long time to iterate. But I'm not really convinced about your analogy with single-hose air conditioners.


The air coming out of the exhaust is often quite a bit hotter than the outside air. I've never checked myself, but just googling has many people reporting 130+ degree temperatures coming out of exhaust from single-hose units. I'm not sure how hot this unit's exhaust is in particular, but I'd guess it's significantly hotter than outside air.

If exhaust is 130 and you are trying to cool from 100 to 70 you'd then only be losing 50% efficiency. Most people won't be cooling by 30 degrees so the efficiency losses would be smaller. In practice I think the actual efficiency loss relative to a 2-hose unit is more like 25-30% (see stats on top wirecutter picks below).


I actually think that this factor(sucking in hot air from the outside) is probably already included in the SACC (seasonally adjusted cooling capacity) and hence CEER reported for this air conditioner. I don't really know anything about air conditioners but it's discussed extensively in the definition of... (read more)

My overall take on this post and comment (after spending like 1.5 hours reading about AC design and statistics): 

Overall I feel like both the OP and this reply say some wrong things. The top Wirecutter recommendation is a dual-hose design. The testing procedure of Wirecutter does not seem to address infiltration in any way, and indeed the whole article does not discuss infiltration as it relates to cooling-efficiency. 

Overall efficiency loss from going to dual to single is something like 20-30%, which I do think is much lower than I think the OP implied, though it also is quite substantial, and indeed most of the top-ranked Amazon listings do not use any of the updated measurements that Paul is talking about, and so consumers do likely end up deceived about that. 

The top-rated AC Wentworth links to is really very weak if you take into account those losses, and I would be surprised if it adequately cooled people's homes. 

My current model: Wirecutter is doing OK but really not great here (with an actively confused testing procedure), Amazon ratings are indeed performing quite badly, and basically display most of the problems that Wentworth talks about. It's unclea... (read more)

Update: I too have now spent like 1.5 hours reading about AC design and statistics, and I can now give a reasonable guess at exactly where the I-claim-obviously-ridiculous 20-30% number came from. Summary: the SACC/CEER standards use a weighted mix of two test conditions, with 80% of the weight on conditions in which outdoor air is only 3°F/1.6°C hotter than indoor air.

The whole backstory of the DOE's SACC/CEER rating rules is here. Single-hose air conditioners take center stage. The comments on the DOE's rule proposals can basically be summarized as:

  • Single-hose AC manufacturers very much did not want infiltration air to be accounted for, and looked for any excuse to ignore it
  • Electric companies very much did want infiltration air to be accounted for, and in particular wanted SACC to be measured at peak temperatures
  • The DOE did its best to maintain a straight face in front of all this obvious bullshitting, and respond to it with legibly-reasonable arguments and statistics.

This quote in particular stands out:

De’ Longhi [an AC manufacturer] expressed concern that modifying the AHAM PAC-1-2014 method to account for infiltration air would disproportionately impact single-duct portable AC

... (read more)

I still the 25-30% estimate in my original post was basically correct. I think the typical SACC adjustment for single-hose air conditioners ends up being 15%, not 25-30%. I agree this adjustment is based on generous assumptions (5.4 degrees of cooling whereas 10 seems like a more reasonable estimate). If you correct for that, you seem to get to more like 25-30%.  The Goodhart effect is much smaller than this 25-30%, I still think 10% is plausible.

I admit that in total I’ve spent significantly more than 1.5 hours researching air conditioners :) So I’m planning to check out now. If you want to post something else, you are welcome to have the last word.

SACC for 1-hose AC seems to be 15% lower than similar 2-hose models, not 25-30%:

  • This site argues for 2-hose ACs being better than 1-hose ACs and cites SACC being 15% lower.
  • The top 2-hose AC on amazon has 14,000 BTU that gets adjusted down to 9500 BTU = 68%.  This similarly-sized 1-hose AC is 13,000 BTU and gets adjusted down to 8000 BTU = 61.5%, about 10% lower.
  • This site does a comparison of some unspecified pair of ACs and gets 10/11.6 = 14% reduction.

I agree the DOE estimate is too generous to 1-hose AC, though I think it’s ... (read more)

If you wouldn't mind one last question before checking out: where did that formula you're using come from?
From [], "What is a good CEER rating?": From the Pro Breeze single-hose AC product description on Amazon: I haven't looked into the % efficiency loss measurements, but I think it's interesting that you can still figure out that this is a crap AC if you're willing to trust this website.
Portable units have to meet a much weaker standard. I actually pushed for a more stringent standard on these products when I was consulting for the Appliance Standards Awareness Project.
  • The top wirecutter recommendation is roughly 3x as expensive as the Amazon AC being reviewed. The top budget pick is a single-hose model.
  • People usually want to cool the room they are spending their time in. Those ACs are marketed to cool a 300 sq ft room, not a whole home. That's what reviewers are clearly doing with the unit. 
  • I'd guess that in extreme cases (where you care about the room with AC no more than other rooms in the house + rest of house is cool) consumers are overestimating efficiency by ~30%. On average in reality I'd guess they are overestimating value-added by the air conditioner by more like ~10% (since the AC'd room will be cooler and they care less about other rooms).
  • I think the OP is misleading if 10% is what's at stake and there are real considerations on the other side.
  • I think there is very little chance that the wirecutter reviewers don't understand that infiltration affects heating efficiency. However I agree that your preferences about AC, and the interpretation of their tests, depend on how hot the rest of the building is (and how much you care about keeping it cool). I'm 50-50 on whether someone from the wirecutter would be able to explain that issue
... (read more)
The infiltration factor of a well-functioning woodstove is far less than a one hose air conditioner, because the air is heated to much higher temperatures. However, it can be significant for fireplaces.
2[comment deleted]1y

Regulation does not fix the problem, just moves it from the consumer to the regulator. A regulator will only regulate a problem which is obvious to the regulator. A regulator may sometimes have more expertise than a layperson, but even that requires that the politicians ultimately appointing people can distinguish real from fake expertise, which is hard in general.

It seems like the DOE decided to adopt energy-efficiency standards that take into account infiltration. They could easily have made a different decision (e.g. because of pressure from portable AC manufacturers, or because it's legitimately unclear how to define the standard, or because it makes measurement harder), but it wouldn't be because the issue wasn't obvious (I think it's not even anywhere close to the "failure because the issue wasn't obvious" regime).

Overall I agree with the bottom line that regulation is unlikely to help that much with alignment. But I don't think this seems like the right model of why that is or how you could fix it.

Waiting longer does not fix the problem. All those people who did not notice their air conditioner pulling hot air into the house will not start noticing if we just wait a few

... (read more)

Obviously the point about air conditioners doesn't matter

I'd like to remark that, at least for me, the facts-of-the-matter about whether this particular air conditioner works by Goodharting consumer preferences actually affect my views on AI. The OP quite surprised my world model, which did not expect one of the most popular AC units on Amazon to work by deceiving consumers. If lots of the modern world works this way, then John's intuition that advanced ML systems are almost certain to work by Goodharting our preferences seems much more likely. Before seeing the above comment and jbash's comment, I was in the process of updating my views, not because I thought the OP was an enlightening allegory, but because it actually changed what I thought the world was like.

Conversely, the world model "sometimes the easiest way to achieve some objective is to actually do the intended thing instead of Goodharting" would predict that air conditioner example was wrong somehow, a prediction which seems to have been right (if Paul's and jbash's comments are correct, that is). I was quite impressed by this, and am now more confident in the "Goodharting isn't omnipresent" world model.

In any case, my main point is that I actually do care about what's going on in this air conditioning example (and I encourage further discussion on whether the OP's characterization of it is accurate or not).

I can’t believe I’m about to write a comment about air conditioners on a thread about world-ending AI, but having bought one of these one-hose systems for my apartment during a particularly hot summer I can say I was pretty disappointed with its performance.

The main drawback to the one hose system is the cool air never makes it outside the room with the unit. I tried putting a bunch of fans to blow the air to the rest of the house, but as you can imagine that didn’t work very well.

I had no idea why until I zoned out one day while thinking about the air conditioner and realized it was sucking the cold air into the intake and blowing it out of the house. And I did indeed read a bunch of reviews from Costco customers before I bought the unit, none of which mentioned the problem.

Wow, the air conditioner systematically sucking the cold air it's generated back into the intake sort of seems like another problem with this design. (Possibly the same problem in another guise, thermodynamically, but in any case, different in terms of actual produced experience.)

I apologize if this is piling on, but I would like to note that this error strikes me as very similar to another one made by the same author in this comment, and which I believe is emblematic of a certain common failure mode within the rationalist community (of which I count myself a part). This common failure mode is to over-value our own intelligence and under-value institutional knowledge (whether from the scientific community or the Amazon marketplace), and thus not feel the need to tread carefully when the two come into conflict.

In the comment in question, johnswentworth asserts, confidently, that there is nothing but correlational evidence of the role of amyloid-β in Alzheimer's disease. However, there is extensive, strong causal evidence for its role: most notably, that certain mutations in the APP, PSEN1, and PSEN2 genes deterministically (as in, there are no known exceptions for anyone living to their 80's) cause Alzheimer's disease, and the corresponding proteins are well understood structurally and functionally to be key players in the production of amyloid-β. Furthermore, the specific mutations in question are shown through multiple lines of evidence (structural analysi... (read more)

I think one reason that this error occurs is that there's a mistaken assumption that the available literature captures all institutional knowledge on a topic, so if one simply spends enough time reading the literature, they'll have all requisite knowledge needed for policy recommendations. I realize that this statement could apply equally to your own claims here, but in my experience I see it happen most often when someone reads a handful of the most recently released research papers and from just that small sample of work tries to draw conclusions applicable that are broadly applicable to the entire field. Engineering claims are particularly suspect because institutional knowledge (often in the form of proprietary or confidential information held by companies and their employees) is where the difference between what is theoretically efficient and what is practically more efficient is found. It doesn't even need to be protected information though -- it can also just be that due to manufacturing reasons, or marketing reasons, or some type of incredibly aggravating constraint like "two hoses require a larger box and the larger box pushes you into a shipping size with much higher per-volume / mass costs so the overall cost of the product needs to be non-linearly higher than what you'd expect would be needed for a single hose unit, and that final per-unit cost is outside of what people would like to pay for an AC unit, unless you then also make drastic improvements to the motor efficiency, thermal efficiency, and reduce the sound level, at which point the price is now even higher than before, but you have more competitive reasons to justify it which will be accepted by a large enough % of the market to make up for the increased costs elsewhere, except the remaining % of the market can't afford that higher per-unit cost at all, so we're back to still making and selling a one-hose unit for them".
Concrete example while we're on the AC unit debate -- there's a very simple way to increase efficiency of portable AC units, and it's to wrap the hot exhaust hose with insulating duct wrap so that less of the heat on that very hot hose radiates directly back into the room you're trying to cool. Why do companies not sell their units with that wrap? Probably for one of any of the following reasons -- A.) takes up a lot of space, B.) requires a time investment to apply to the unit which would dissuade buyers who think they can't handle that complexity, C.) would cost more money to sell and no longer be profitable at the market's price point, D.) has to be applied once the AC unit is in place, and generally is thick enough that the unit is no longer "portable" which during market testing was viewed as a negative by a large % of surveyed people, or E.) some other equally trivial sounding reason that nonetheless means it's more cost effective for companies to NOT sell insulating duct wrap in the same box as the portable AC unit.  Example of an AC company that does sell an insulating wrap as an optional add-on: []
A priori, before having clicked on your links, my guess would be that the studies in question generally diagnose Alzheimer's by the presence of amyloid-β deposits. (That's generally been the case in similar studies I've looked into in the past, although I haven't checked the exact studies you link.) If they're diagnosing based on the presence of amyloid-β, then obviously amyloid-β producing mutations will cause an Alzheimer's diagnosis. The problem is that this diagnosis doesn't reflect real Alzheimer's, i.e. it doesn't necessarily involve dementia. We would expect such things to find strong, extensive evidence of causality. The problem is that it's extensive evidence of the mutations causing amyloid-β plaques, not dementia. (Also, a warning: this is exactly the sort of detail which overview articles tend to overlook and misstate - e.g. an overview article will say something like "so-and-so found that blah causes dementia" when in fact so-and-so were diagnosing amyloid plaques, not dementia. One does need to check the original papers.)
A distinction is made in the literature between preclinical Alzheimer's (the presence of neuropathology such as amyloid-β, without clinically detectable cognitive symptoms) and clinical Alzheimer's (a particular cluster of cognitive symptoms along with the neuropathologies of Alzheimer's). It's currently believed that Alzheimer's has a 15-20 year preclinical phase, the duration of which, however, can vary based on genetic and other factors. In the case of the mutations I mentioned (which are early-onset causing), clinically-detectable cognitive decline typically starts around the age of 45, and nearly always by the age of 60. One of the only known examples in which symptoms didn't start until a person was in her 70's was so surprising that an entire, highly-cited paper was written about it: Arboleda-Velasquez et al (2019). Resistance to autosomal dominant Alzheimer’s disease in an APOE3 Christchurch homozygote: a case report []. Note, however, that the typical cluster of symptoms did eventually occur. Honestly, these particular mutations are so pervasively discussed in the literature, precisely due to their significance to the causal question, that I can tell you have not really engaged with the literature by your unawareness of their existence and the effects that they have on people. I will readily acknowledge, by the way, that by themselves they don't close the book on the causal question: someone could argue that early-onset, autosomal dominant Alzheimer's due to these mutations is essentially a different disease than the much more prevalent late-onset, sporadic Alzheimer's. While I don't think this argument ultimately goes through, and I'd be happy to discuss why, my main point is not that there's no residual question about the the etiology of the disease, but that the research community has intensely, intelligently, and carefully studied the distinction between correlative and causal evidence, as well as the distinct
I'd be interested to read that. (Apologies for lack of citations in the below, I don't have them readily on hand and don't want to go digging right at the moment.) You're right that I never went that deep into the Alzheimer's literature; it's certainly plausible that I overlooked a cluster of actually-competently-executed studies tying Aβ-related genetic mutations to robust dementia outcomes. I did look deeply into at least one study which made that claim (specifically the study which I most often found at the root of citation chains) and it turned out to diagnose using the presence of plaques, not dementia. But that was a paper from the early 90's, so maybe better results have come along since then. However, the absence of evidence for Aβ causing Alzheimer's was not the only thing pinning down my beliefs here. I've also seen papers with positive evidence that Aβ doesn't cause Alzheimer's - i.e. removing plaques doesn't eliminate dementia. And of course there's been literally hundreds of clinical trials with drugs targeting Aβ, and they pretty consistently do not work. So if there is a cluster of genetic studies establishing that Aβ-related mutations are causal for dementia, then the immediate question is how that squares with all the evidence against causality of Aβ for dementia. If the early-onset autosomal dominant version of the disease is in fact a different disease, that would answer the question, but you apparently think otherwise, so I'm curious to hear your case.
In brief, the main reason I don't think the argument works that autosomal-dominant Alzheimer's has a different etiology than sporadic Alzheimer's is that they look, in so many respects, like essentially the same disease, with the same sequence of biomarkers and clinical symptoms: 1. Amyloid pathology starts in the default mode network, and gradually spreads throughout the brain over 15-20 years. 2. It eventually reaches the medial temporal region, where Primary Age-Related Tauopathy is lying in wait. 3. At this point, tau pathology, a prion-like pathology which in Alzheimer's has a very specific conformation, starts spreading from there. The tau protein misfolds in the exact same way in both forms of the disease (Falcon et al (2018). Tau filaments from multiple cases of sporadic and inherited Alzheimer’s disease adopt a common fold []), however it misfolds in a different way in the large majority of other known tau pathologies, of which there are a dozen or so (Shi et al (2021). Structure-based classification of tauopathies []). 4. Then, neurodegeneration follows in lockstep throughout the brain with the presence of tau pathology, with cognitive deficits matching those expected from the affected brain regions. In particular, since the hippocampal formation is located in the medial temporal region, anterograde amnesia is typically the first symptom in both types of Alzheimer's (unlike many other forms of neurodegeneration, in which other clinical symptoms dominate in the early stages). It's as if two bank robberies occurred two hours apart in the same town, conducted in almost exactly the same manner, and in one we can positively ID the culprit on camera. It's a reasonable conclusion that the culprit in the other case is the same. Some further evidence: * There has been extensive causal mediation modeling, e.g. Ha
I happened to be reading this post today, as Science has just published a story on a fabrication scandal regarding an influential paper on amyloid-β: [] I was wondering if this scandal changes the picture you described at all?
Not a ton. I'd also recommend this article [], including the discussion in the comments by researchers in the field. A crucial distinction I'd emphasize which is almost always lost in popular discussions is that between the toxic amyloid oligomer hypothesis, that aggregates of amyloid beta are the main direct cause of neurodegeneration; and the ATN hypothesis I described in this thread, that amyloid pathology causes tau pathology and tau pathology causes neurodegeneration. The former is mainly what this research concerns and has been largely discredited in my opinion since approximately 2012; the latter has a mountain of evidence in favor as I've described, and that hasn't really changed now that it's turned out that one line of evidence for an importantly different hypothesis was fabricated.
Thanks, that was helpful!
Update today: Biogen/Eisai have reported results [] from Lecanemab’s phase 3 trial: a slowing of cognitive decline by 27% with a p-value of 0.00005 on the primary endpoint. All other secondary endpoints, including cognitive ones, passed with p-values under 0.01.
Note I've edited the third-to-last paragraph in the above to remove an overly-strong claim about the four antibodies I didn't discuss in detail.
3Ben Pace1y
In general corrections are good contributions, thanks for your object-level points.
After this comment there was a long thread about AC efficiency. Summarizing: * I said: "In practice I think the actual efficiency loss relative to a 2-hose unit is more like 25-30%" (For cooling from 85 to 70.) * John said [] that this was ridiculous. * After the dust settled, our best estimate on paper is 40% rather than 25-30%. The reason for the adjustments were roughly: * [x2] I estimated exhaust temperature at 130 degrees, but it's more like 100 degrees if the indoor air is 70. * [x1/2] I thought that all depressurization was compensated for by increased infiltration. But probably half of depressurization is offset by reduced exfiltration instead (see here []) * [x3/2] I only considered sensible heat. But actually humidity is a huge deal, because the exhaust is heated but not humidified (see here []) John also attempted to measure the loss empirically [], but I'd summarize as "too hard to measure": * With 1-hose the indoor temp was 68 vs 88 outside, while with 2-hose the indoor temp was 66 vs 88 outside (using the same amount of energy). * We both agree that 10% is an underestimate for the efficiency loss (e.g. due to room insulation, other cooling in the building, and the improvised 2-hose setup). * I don't think we have a plausible way to extract a corrected estimate.
I endorse this summary.
On the physics: to be clear, I'm not saying the air conditioner does not work at all. It does make the room cooler than it started, at equilibrium. I also am not surprised (in this particular example) to hear that various expert sources already account for the inefficiency in their evaluations; it is a problem which should be very obvious to experts. Of course that doesn't apply so well to e.g. the example of medical research replication failures. The air conditioner example is not meant to be an example of something which is really hard to notice for humanity as a whole; it's meant to be an example of something which is too hard for a typical consumer to notice, and we should extrapolate from there to the existence of things which people with more expertise will also not notice (e.g. the medical research example). Also, it's a case-in-point that experts noticing a problem with some product is not enough to remove the economic incentive to produce the product. When the argument specifically includes reasons to expect people to not notice the problem, it seems obviously correct to discount reported experiences. Of course there are still ways to gain evidence from reported experience - e.g. if someone specifically said "this unit cooled even the far corners of the house", then that would partially falsify our theory for why people will overlook the one-hose problem. But we should not blindly trust reports when we have reasons to expect those reports to overlook problems. In this particular case, I indeed do not think the conflict is worth the cost of exploring - it seems glaringly obvious that people are buying a bad product because they are unable to recognize the ways in which it is bad. Positive reports do not contradict this; there is not a conflict here. The model already predicts that there will be positive reports - after all, the air conditioner is very convenient and pumps lots of cool air out the front in very obvious ways.

In this particular case, I indeed do not think the conflict is worth the cost of exploring - it seems glaringly obvious that people are buying a bad product because they are unable to recognize the ways in which it is bad.

The wirecutter recommendation for budget portable ACs is a single-hose model. Until very recently their overall recommendation was also a single-hose model.

The wirecutter recommendations (and other pages discussing this tradeoffs) are based on a combination of "how cold does it make the room empirically?" and quantitative estimates of cooling that take into account infiltration. This issue is discussed extensively, with quantitative detail, by people who quite often end up recommending 1-hose designs for small rooms (like the one this AC is advertised for).

One AC unit tested by the wirecutter is convertible between 2-hose and 1-hose. They write:

The best thing we took away from our tests was the chance at a direct comparison between a single-hose design and a dual-hose design that were otherwise identical, and our experience confirmed our suspicions that dual-hose portable ACs are slightly more effective than single-hose models but not effective enough to make a re

... (read more)

The best thing we took away from our tests was the chance at a direct comparison between a single-hose design and a dual-hose design that were otherwise identical, and our experience confirmed our suspicions that dual-hose portable ACs are slightly more effective than single-hose models but not effective enough to make a real difference

After having looked into this quite a bit, it does really seem like the Wirecutter testing process had no ability to notice infiltration issues, so it seems like the Wirecutter crew themselves is kind of confused here? 

The... Wirecutter article does also not seem to discuss the issue of infiltration of hot air in any reasonable way. Instead it just says that: 

This produces a slight vacuum effect, which pulls in “infiltration air” from anywhere it can in order to equalize the pressure. In the presence of a gas-powered device such as a furnace, that negative pressure creates a backdraft or downdraft, which can cause the machine to malfunction—or worse, fill the room with gas fumes and carbon monoxide. We don’t think that most people plan to use their portable AC in such a room, but if your home is set up in such a way that you’re concerned ab

... (read more)
They measure the temperature in the room, which captures the effect of negative pressure pulling in hot air from the rest of the building. It underestimates the costs if the rest of the building is significantly cooler than the outside (I'd guess by the ballpark of 20-30% in the extreme case where you care equally about all spaces in the building, the rest of your building is kept at the same temp as the room you are cooling, and a negligible fraction of air exchange with the outside is via the room you are cooling). I think that paragraph is discussing a second reason that infiltration is bad.
Yeah, sorry, I didn't mean to imply the section is saying something totally wrong. The section just makes it sound like that is the only concern with infiltration, which seems wrong, and my current model of the author of the post is that they weren't actually thinking through heat-related infiltration issues (though it's hard to say from just this one paragraph, of course). 
I roll to disbelieve. I think it is much more likely that something is wrong with their test setup than that the difference between one-hose and two-hose is negligible. Just on priors, the most obvious problem is that they're testing somewhere which isn't hot outside the room - either because they're inside a larger air-conditioned building, or because it's not hot outdoors. Can we check that? Well, they apparently tested it in April 2022, i.e. nowish, which is indeed not hot most places in the US, but can we narrow down the location more? The photo is by Michael Hession, who apparently operates near Boston []. Daily high temps currently in the 50's to 60's (Fahrenheit). So yeah, definitely not hot there. Now, if they're measuring temperature delta compared to the outdoors, it could still be a valid test. On the other hand, if it's only in the 50's to 60's outside, I very much doubt that they're trying to really get a big temperature delta from that air conditioner - they'd have to get the room down below freezing in order to get the same temperature delta as a 70 degree room on a 100 degree day. If they're only trying to get a tiny temperature delta, then it really doesn't matter how efficient the unit is. For someone trying to keep a room at 70 on a 100 degree day, it's going to matter a lot more. So basically, I am not buying this test setup. It does not look like it is actually representative of real usage, and it looks nonrepresentative in the basically the ways we'd expect from a test that found little difference between one and two hoses. Generalizable lesson/heuristic: the supposed "experts" are also not even remotely trustworthy. (Also, I expect it to seem like I am refusing to update in the face of any evidence, so I'd like to highlight that this model correctly predicted that the tests were run someplace where it was not hot outside. Had that evidence come out different, I'd be much more convin

(Also, I expect it to seem like I am refusing to update in the face of any evidence, so I'd like to highlight that this model correctly predicted that the tests were run someplace where it was not hot outside. Had that evidence come out different, I'd be much more convinced right now that one hose vs two doesn't really matter.)

From how we tested:

Over the course of a sweltering summer week in Boston, we set up our five finalists in a roughly 250-square-foot space, taking notes and rating each model on the basic setup process, performance, portability, accessories, and overall user experience.

ETA: it's not clear that's the same testing setup used in the other tests they described. But they do talk about how the 1-vs-2 convertible unit "struggled to make the room any cooler than 70 degrees" which sounds like it was probably reasonably hot.

Alright, I am more convinced than I was about the temperature issue, but the test setup still sounds pretty bad. First, Boston does not usually get all that sweltering. I grew up in Connecticut (close to Boston and similar weather), summer days usually peaked in the low 80's. Even if they waited for a really hot week, it was probably in the 90's. A quick google search confirms this: typical July daily high temp is 82, and google says "Overall during July, you should expect about 4-6 days to reach or exceed 90 F (32C) while the all-time record high for Boston was 103 F (39.4C)". It's still a way better test than April (so I'm updating from that), but probably well short of keeping a room at 70 on a 100 degree day. I'm guessing they only had about half that temperature delta. Second, their actual test procedure (thankyou for finding that, BTW): Three feet and six feet away? That sure does sound like they're measuring the temperature right near the unit, rather than the other side of the room where we'd expect infiltration to matter. I had previously assumed they were at least measuring the other side of the room (because they mention for the two-hose recommendation "In our tests, it was also remarkably effective at distributing the cool air, never leaving more than a 1-degree temperature difference across the room"), but apparently "across the room" actually meant "6 feet away" based on this later quote: ... which sure does sound more like what we'd expect. So I'm updating away from "it was just not hot outside" - probably a minor issue, but not a major one. That said, it sure does sound like they were not measuring temperature across the room, and even just between 3 and 6 feet away the two-hose model apparently had noticeably less drop-off in effectiveness.
Boston summers are hotter than the average summers in the US, and I'd guess are well above the average use case for an AC in the US. I agree having two hoses are more important the larger the temperature difference, and by the time you are cooling from 100 to 70 the difference is fairly large (though there is basically nowhere in the US where that difference is close to typical). I'd be fine with a summary of "For users who care about temp in the whole house rather than just the room with the AC, one-hose units are maybe 20% less efficient than they feel. Because this factor is harder to measure than price or the convenience of setting up a one-hose unit, consumers don't give it the attention it deserves. As a result, manufacturers don't make as many cheap two-hose units as they should."

Does anyone in-thread (or reading along) have any experiments they'd be interested in me running with this air conditioner? It doesn't seem at all hard for me to do some science and get empirical data, with a different setup to Wirecutter, so let me know.

Added: From a skim of the thread, it seems to me the experiment that would resolve matters is testing in a large room with temperature sensors more like 15 feet away in a city or country that's very hot outside, and to compare this with (say) Wirecutter's top pick with two-hoses. Confirm?

... I actually already started a post titled "Preregistration: Air Conditioner Test (for AI Alignment!)". My plan was to use the one-hose AC I bought a few years ago during that heat wave, rig up a cardboard "second hose" for it, and try it out in my apartment both with and without the second hose next time we have a decently-hot day. Maybe we can have an air conditioner test party.

Predictions: the claim which I most do not believe right now is that going from one hose to two hose with the same air conditioner makes only a 20%-30% difference. The main metric I'm interested in is equilibrium difference between average room temp and outdoor temp (because that was the main metric relevant when I was using that AC during the heat wave). I'm at about 80% chance that the difference will be over 50%.

(Back-of-the-envelope math a few years ago said it should be roughly a factor-of-two difference, and my median expectation is close to that.)

I also expect (though less strongly) that, assuming the room's doors and windows are closed, corners of the room opposite the AC in single-hose mode will be closer to outdoor temp than to the temp 3 ft away from the AC, and that this will not be the case ... (read more)

I would have thought that the efficiency lost is roughly (outside temp - inside temp) / (exhaust temp - inside temp). And my guess was that exhaust temp is ~130. I think the main way the effect could be as big as you are saying is if that model is wrong or if the exhaust is a lot cooler than I think. Those both seem plausible; I don't understand how AC works, so don't trust that calculation too much. I'm curious what your BOTEC was / if you think 130 is too high an estimate for the exhaust temp?  If that calculation is right, and exhaust is at 130, outside is 100, and house is 70, you'd have 50% loss. But you can't get 50% in your setup this way, since your 2-hose AC definitely isn't going to get the temp below 65 or so. Maybe most plausible 50% scenario would be something like 115 exhaust, 100 outside, 85 inside with single-hose, 70 inside with double-hose. I doubt you'll see effects that big. I also expect the improvised double hose will have big efficiency losses. I think that 20% is probably the right ballpark (e.g. 130/95/85/82). If it's >50% I think my story above is called into question. (Though note that the efficiency lost from one hose is significantly larger than the bottom line "how much does people's intuitive sense of single-hose AC quality overstate the real efficacy?") Your AC could also be unusual. My guess is that it just wasn't close to being able to cool your old apartment and that single vs double-hoses was a relatively small part of that, in which case we'd still see small efficiency wins in this experiment. But it's conceivable that it is unreasonably bad in part because it has an unreasonably low exhaust temp, in which case we might see an unreasonably large benefit from a second hose (though I'd discard that concern if it either had similarly good Amazon reviews or a reasonable quoted SACC).
I don't remember what calculation I did then, but here's one with the same result. Model the single-hose air conditioner as removing air from the room, and replacing with a mix of air at two temperatures: TC (the temperature of cold air coming from the air conditioner), and TH (the temperature outdoors). If we assume that TC is constant and that the cold and hot air are introduced in roughly 1:1 proportions (i.e. the flow rate from the exhaust is roughly equal to the flow rate from the cooling outlet), then we should end up with an equilibrium average temperature of TC+TH2. If we model the switch to two-hose as just turning off the stream of hot air, then the equilibrium average temperature should drop to TC. Some notes on this: * It's talking about equilibrium temperature rather than power efficiency, because equilibrium temperature on a hot day was mostly what I cared about when using the air conditioner. * The assumption of roughly-equal flow rates seems to be at least the right order of magnitude based on seeing this air conditioner in operation, though I haven't measured carefully. If anything, it seemed like the exhaust had higher throughput. * The assumption of constant TC is probably the most suspect part.
Ok, I think that ~50% estimate is probably wrong. Happy to bet about outcome (though I think someone with working knowledge of air conditioners will also be able to confirm). I'd bet that efficiency and Delta t will be linearly related and will both be reduced by a factor of about (exhaust - outdoor) / (exhaust - indoor) which will be much more than 50%.
I assume you mean much less than 50%, i.e. (T_outside - T_inside) averaged over the room will be less than 50% greater with two hoses than with one? I'm open to such a bet in principle, pending operational details. $1k at even odds? Operationally, I'm picturing the general plan I sketched four comments upthread. (In particular note the three bulleted conditions starting with "The day being hot enough and the room large enough that the AC runs continuously..."; I'd consider it a null result if one of those conditions fails.) LMK if other conditions should be included. Also, you're welcome to come to the Air Conditioner Testing Party (on some hot day TBD). There's a pool at the apartment complex, could swim a bit while the room equilibrates.

I studied the impact of infiltration because of clothes dryers when I was doing energy efficiency consulting. The nonobvious thing that is missing from this discussion is that the infiltration flow rate does not equal the flow rate of the hot air out the window. Basically absent the exhaust flow, there is an equilibrium of infiltration through the cracks in the building equaling the exfiltration through the cracks in the building. When you have a depressurization, this increases the infiltration but also decreases the exfiltration. If the exhaust flow is a small fraction of the initial infiltration, the net impact on infiltration is approximately half as much as the exhaust flow. The rule of thumb for infiltration is it produces about 0.3 air changes per hour, but it depends on the temperature difference to the outside and the wind (and the leakiness of the building). I would guess that if you did this in a house, the exhaust flow would be relatively small compared to the natural infiltration. So roughly the impact due to the infiltration is about half as much as the calculations indicate. But if you were in a tiny tight house, then the exhaust flow would overwhelm the natural infi... (read more)

Thanks! It's amusing that we had this whole discussion and the one commenter who knew what they were talking about got just one upvote :) It sounds very plausible that exhaust is small relative to natural infiltration and I believe you that (extra infiltration) = 50% (exhaust). In the other direction, it looks like I was wrong about 130 degrees and we're looking at more like 100 [] (alas, googling random forum comments [,unit%20hose%20to%20room%20temperature.] is an imperfect methodology, though I do feel it's plausible that John's AC has unusually cold exhaust). If the building is ending up around 70, that means I'm underestimating the exhaust quantity by about 2x. But then apparently the extra infiltration is only about half of the exhaust. So sounds like the errors cancel out and my initial estimate happens to be roughly right?
Tc does seem like a bad assumption. I tried instead assuming a constant difference between the intake and the cold output, and the result surprised me. (The rest of this comment assumes this model holds exactly, which it definitely doesn't). Let Tr be the temperature of the room (also intake temperature for a one-hose model). Then at equilibrium, Tr=(Tc+Th)/2 Tr=((Tr−Δ)+Th)/2 2Tr=Tr+Th−Δ Tr=Th−Δ i.e. no loss in cooling power at all! (Energy efficiency and time to reach equilibrium would probably be much worse, though) In the case of an underpowered (Δ=15) one-hose unit handling a heat wave (Th=100), you'd get Tr=85 and Tc=70—nice and cool in front of the unit but uncomfortably hot in the rest of the room, just as you observed. Adding a second hose would resolve this disparity in the wrong direction, making Tr=Tc=85. So if you disproportionately care about the area directly in front of the AC, adding the second hose could be actively harmful.
Also, like, Berkeley heat waves may just significantly different than, like, Reno heat waves. My current read is that part of the issue here is that a lot of places don't actually get that hot so having less robustly good air conditioners is fine.
I bought my single-hose AC for the 2019 heat wave in Mountain View (which was presumably basically similar to Berkeley). When I was in Vegas, summer was just three months of permanent extreme heat during the day; one does not stay somewhere without built-in AC in Vegas.
I think labeling requirements are based on the expectation of cooling from 95 to 80 (and I expect typical use cases for portable AC are more like that). Actually hot places will usually have central air or window units.
2Ben Pace1y
Sweet! I could also perform a replication I guess.
Or you could get to it before I do and I could perform a replication.
It is important to note that the current top wirecutter pick is a 2-hose unit, though one that combined the two hoses into one big hose. I guess maybe that is recent, but it does seem important to acknowledge here (and it wouldn't surprise me that much if Wirecutter went through reasoning pretty similar to the one in this post, and then updated towards the two-hose unit because of concerns about infiltration and looking at more comprehensive metrics like SACC). 

Here is the wirecutter discussion of the distinction for reference:

Starting in 2019, we began comparing dual- and single-hose models according to the same criteria, and we didn’t dismiss any models based on their hose count. Our research, however, ultimately steered us toward single-hose portable models—in part because so many newer models use this design. In fact, we found no compelling new double-hose models from major manufacturers in 2019 or 2020 (although a few new ones cropped up in 2021, including our new top pick). Owner reviews indicate that most people prefer single-hose models, too, since they’re easier to set up and don’t look quite as much like a giant octopus trash sculpture. Although our testing has shown that dual-hose models tend to outperform some single-hose units in extremely hot or muggy weather, the difference is usually minimal, and we don’t think it outweighs the convenience of a single hose.

The one major exception, however, is if you plan on setting up your portable AC in a room with a furnace or hot water heater or anything else that uses combustion. When a single-hose AC model forces air out through its exhaust hose, it can create negative pressure in the

... (read more)
A/Cs primarily work by using electricity to drive a pressure differential between the cool, low-pressure indoor refrigerant and the hot, high-pressure outdoor refrigerant. It's not just moving air around. PV = nRT! Here's a video explainer. [] Read carefully, the post doesn't ignore the effect of the evaporator and condenser... ... But it is written in such a way that the reader might come away with the impression that the single-hose A/C has zero net effect on the household temperature. Even the edited-in caveat makes it sound like it might be cooling off the room in which it's located, at the expense of heating up the rest of the house. This reading is reinforced by using the A/C as an analogy for a truly zero-value or destructive AI: We'd need to imagine an A/C that does nothing to net temperature, or that actively heats up the house on net for this analogy to work. Given that I expect more readers here will know about this hypothesis than about the practical details of how an A/C work, I worry they're more likely to see AI as a metaphor for this A/C than this A/C as a metaphor for AI! Note also that regulation could totally fix this particular problem. We could ban single-hose A/Cs; there's a whole nation of HVAC experts who could convey this information, and they're licensed in the USA, so there's already a legal framework for identifying the relevant experts. Waiting also might fix the problem, especially if these people have metered electricity. It's easily possible that they'll notice their high summer electric bill, consider efficiency improvements, look into the A/C, do 10 seconds of research, and invest in the two-hose unit the next time around. When discussing AI, it seems valuable to distinguish more clearly between three scenarios: * Individual AI products truly analogous to an A/C. They are specific services, which can indeed be more or less efficient, and can be chosen badly by ill-informed co

To me this is a metaphor for Alignment research, and LW-style rationality in general, but with an opposite message.

To start, I have this exact AC in my window, and it made a huge difference during last year's heat dome. (I will use metric units in the following, because eff imperial units.) It was around 39-40C last summer, some 15C above average, for a few days, and the A/C cooled the place down by about 10C, which made a difference between livable and unlivable. It was cooler all through the place, not just in the immediate vicinity of the unit. 

How could this happen, in an apparent contradiction to the laws of physics?

Well, three things: 

  • I live in an apartment, so the air coming in is not quite as hot in the hallway as outside, though still pretty warm.
  • The air coming out of the AC exhaust is pretty hot, hotter than the outside most of the time, so there is a definite cooling that happens despite the air influx from outside.
  • The differential air pressure in the hallway is positive regardless of the AC (partly because of the exhaust vents that are always on), so adding AC does not significantly change the air flow.

So, physics is safe! What isn't safe is the theoretical re... (read more)

Ok, I want to say thank you for this comment because it contains a lot of points I strongly agree with. I think the alignment community needs experimental data now more than it needs more theory. However, I don't think this lowers my opinion of MIRI. MIRI, and Eliezer before MIRI even existed yet, was predicting this problem accurately and convincingly enough that people like myself updated. 15 years ago I began studying neuroscience, neuromorphic computing, and machine learning because I believed this was going to become a much bigger deal than it was then. Now the general gist of the message has absolutely been proven out. Machine learning is now a big impressive thing in the world, and scary outcomes are right around the corner. Forecasting that now doesn't win you nearly as many points as forecasting that 15 or 20 years ago. Now we are finally close enough that it makes sense to move from theorizing to experimentation. That doesn't mean the theorizing was useless. It laid an incredible amount of valuable groundwork. It gave the experimental researchers a server of what they are up against. Laid out the scope of the problem, and made helpful pointers towards important characteri... (read more)

Hmm, I agree that Eliezer, MIRI and its precursors did a lot of good work raising the profile of this particular x-risk. However, I am less certain of their theoretical contributions, which you describe as  I guess they did highlight a lot of dead ends, gotta agree with that. I am not sure how much the larger AI/ML community values their theoretical work. Maybe the practitioners haven't caught up yet. Well, whatever the fraction, it certainly seems like it's time to rebalance it, I agree. I don't know if MIRI has the know-how to do experimental work at the level of the rapidly advancing field.
I mostly agree with that relying on real world data is necessary for better understanding our messy world and that in most cases this approach is favorable.  There's a part of me that thinks AI is a different case though, since getting it even slightly wrong will be catastrophic. Experimental alignment research might get us most of the way to aligned AI, but there will probably still be issues that aren't noticeable because the AIs we are experimenting on won't be powerful enough to reveal them. Our solution to the alignment problem can't be something imperfect that does the job well enough. Instead is has to be something that can withstand immense optimization pressure. My intuition tells me that the single-hose solution is not enough for AGI and we instead need something that is flawless in practice and in theory. 
I agree that, given MIRI's model of AGI emergence, getting it slightly wrong would be catastrophic. But that's my whole point: experimenting early is strictly better than not, because it reduces the odds of getting some big wrong, as opposed to something small along the way. I had mentioned in another post that [] so that there are no "immense optimization pressures". I think that's what Eliezer says, as well, hence his pessimism and focus on "dying with dignity". But we won't know if this intuition is correct without actually testing it experimentally and repeatedly. It might not help because "there is no fire alarm for superintelligence", but the alternative is strictly worse, because the problem is so complex.
This is fine for other fields, but the problem with superintelligent alignment is that the things in "move fast and break things" is, like, us. We only have one chance to get superhuman alignment right, which is why we have to design it carefully once. Misaligned systems will try to turn you off to make sure you can't turn them off. I think Eliezer has even said that if we could simply revert the universe after each time we mess up alignment, he would be way less pessimistic. Further, experiments with systems that are not capable enough to kill us yet can provide us with valuable information, but the problem is that much of the difficulty comes up around superintelligence levels. Things are going to break in weird ways that we couldn't have anticipated just by extrapolating out the trends from current systems. So if we just wait for evidence that less robust solutions are not enough, then we will see that less robust solutions seem to work really well on weak current models, pat ourselves on the back for figuring out that actually alignment wouldn't be that hard in X area and then as we approach superintelligence we start noticing X break down (if we're lucky! if X is something like deception, it will try to hide from us and actively avoid being caught by our interpretability tools or whatever) and at that point it will be too late to try and fix the problem, because none of the technical or political solutions are viable in a very short time horizon. Again, to be very clear, I'm not arguing that there is no use at all for empirical experiments today, it's just that there are specific failure cases that are easy to fall into whenever you try to conclude something of the form "and therefore this is some amount of evidence that superintelligence will be more/less likely to be Y and Z"
I agree that we cannot be cavalier about it, but not experimenting is strictly worse than experimenting (not at the expense of theoretical work), because humans are bad at pure theory.
1Alexander Gietelink Oldenziel1y
The statement 'humans are bad at pure theory' seems to be clearly falsified by the extraordinary theoretical advances of the past, e.g. Einstein.  Whether theoretical or experimental approaches will prove most succesful for AI alignment is an open question. 
It is actually confirmed by this particular case. Special Relativity took some 50 years to form after Maxwell equations were written. General relativity took 500 years to be written down after Galileo's experiment with equal acceleration of falling bodies. AND it took a once in a millennium genius to do that. (Twice, actually, Newton was the other one in physics.)
1Alexander Gietelink Oldenziel1y
This doesn't look like a serious reply. I fail to see how the achievements of Newton, Maxwell, Einstein do not illustrate the power of theory.
I have nothing to add to my previous message, other than 500 years to come up with a theory is a long time.
Just added a clarification to the post: I do believe the analysis in the post is in fact correct, and the success of this air conditioner is primarily due to consumers not recognizing the problem. Would you have spent an extra, say, $30 on a two-hose air conditioner if you had noticed the issue in advance? (BTW, I also bought a one-hose air conditioner off amazon a few years back, which is where this example came from. When I realized there was only one hose, I was absolutely flabbergasted that anyone would even bother to build such an obviously stupid thing. And it indeed did a pretty shitty job cooling my apartment!)
Looks like you did the usual iterative approach: bought an AC, saw that it doesn't work as expected, did the analysis, figured out what is wrong, and corrected your model of what works in your situation, then bought a better AC.
I read John as saying steps two and three here were reversed. He bought an AC, realized before trying it that it wouldn't work, then tested and saw that (as expected) it didn't work.
That's true! When I opened the box, I first dug around looking for the second hose. Then I thought they must have made a mistake and not sent the second hose. Then eventually I noticed that the AC only had one hose-slot, and the pictures only had one hose, and I was just very confused as to why on earth someone would build a portable air conditioner with only one hose.

Um, the single-hose air conditioners do in fact work passably, probably because they're designed to minimize the volume of air exhausted compared to the amount circulated. The air you're blowing out is way hotter than the air you're drawing in. This makes the heat pump work harder, but it reduces the air exchange problem.

And a lot of structures already have huge amounts of air exchange going on anyhow. And, by the way, a lot of uncooled structures actually do run hotter on the inside than the temperature of the environment, so the air you're drawing in may not be all that hot depending on where it's coming from and when you run the machine.

And the market has noticed that the single hose design is inefficient, which is why there are two-hose ones available. In fact, if I were writing a review, I probably wouldn't bother to mention the matter because I'd assume everybody already knew about the issue. That's even though I do in fact buy two-hose models for exactly the reasons you describe.

Perhaps people are dumb, but they are not as dumb as you are making them out to be. And I think I have to add that an awful lot of "rationalists" are very fond of talking about how everything is stupid, without in fact having studied the matters in question closely enough to really be allowed opinions...

The fact that you chose to use your superior knowledge to buy the much better air conditioner, while also choosing to not leave a review explaining this, is an illustration of OP's point, and not a refutation.

If I needed a more compact unit, I might buy a one-hoser. If I had a more limited budget (and didn't expect to run the thing all the time) I might buy a one-hoser. If I had very limited space in which to run the hose, I might buy a one-hoser. Absolute maximum efficiency may just not be at the top of most people's lists. ... and you don't really have much of a basis to say that I do have "superior knowledge" compared to most other potential buyers. Looking at the reviews doesn't really get you there, since you don't know that clue is independent of tendency to write reviews, and many of the reviews are probably fake.
You didn't say any of that before. And you didn't show that any of that justified the Amazon reviews either, as is necessary to refute OP, and you will have a hard time doing so given that none of the reviews explain the disadvantage those advantages may or may not offset. OK, so let's say you don't have any superior knowledge when you state things about air conditioners. (I am willing to agree that no one should believe your claims if you want to claim that.) Then why do you believe all the things you said about single-hose air conditioners working or about the exchanging or about what reviewers (and non-reviewers) do or do not know, especially when, as you agree with OP, none of the reviews mention this? This again goes to illustrate OP and not refute it.
Yes, that's true, I didn't go into detail on irrelevant side issues, or take up space saying things that a reasonable reader would have assumed anyhow. That's for much the same reason that I wouldn't waste people's time with Amazon reviews saying nothing but "Well, ACKCHUALLY, two-hose air conditioners are MUCH more efficient (you ignorant plebs).". The most reasonable hypothesis is that people know their own priorities better than I do, so I shouldn't make an ass of myself. Also, of course, even if they're wrong, I'm unlikely to persuade them, but that's a separate matter. You might want to think about why you did not just naturally assume those things. I have adequate knowledge. I am willing to assume that most other buyers of air conditioners also have, or will independently seek out, adequate knowledge. Adequate knowledge includes everything I said. I was claiming that OP was pontificating based on inadequate knowledge. Less than my own and very possibly less than that of the average Amazon air conditioner buyer. Basically getting a nasty case of engineer's disease and treating a basic first-order understanding as if it were special expertise, in a context where most other people might very well have understanding superior to OP's own. And if buyers did not have "adequate" understanding in the area of efficiency, one very strong candidate explanation for that would be that they didn't care so much about efficiency compared to other things. Especially because, even if they didn't understand the whole airflow pattern, they could act on the published efficiency ratings, which take airflow into account. If they're buying something with a lower numerical headline efficiency rating, then you have to assume that they're real idiots to arrive at the idea that efficiency is their main concern. ... all of which I would have let pass if smug, arrogant, supercilious dismissiveness were not a common problem that alienates a lot of people from "rationalists" and their
Just added this clarification to the post:

Regarding the back-and-forth on air conditioners, I tried Google searching to find a precedent for this sort of analysis; the first Google result was "air conditioner single vs. dual hose" was this blog post, which acknowledges the inefficiency johnswentworth points out, overall recommends dual-hose air conditioners, but still recommends single-hose air conditioners under some conditions, and claims the efficiency difference is only about 12%.


In general, a single-hose portable air conditioner is best suited for smaller rooms. The reason being is because if the area you want to cool is on the larger side, the unit will have to work much harder to cool the space.

So how does it work? The single-hose air conditioner yanks warm air and moisture from the room and expels it outside through the exhaust. A negative pressure is created when the air is pushed out of the room, the air needs to be replaced. In turn, any opening in the house like doors, windows, and cracks will draw outside hot air into the room to replace the missing air. The air is cooled by the unit and ejected into the room.


Additionally, the single-hose versions are usually less expensive than their dual-hose

... (read more)
EER does not account for heat infiltration issues, so this seems confused. CEER does, and that does suggest something in the 20% range, but I am pretty sure you can't use EER to compare a single-hose and a dual-hose system.
I assumed EER did account for that based on:
This article explains the difference: [] EER measures performance in BTUs, which are simply measuring how much work the AC performs, without taking into account any backflow of cold air back into the AC, or infiltration issues.

I think I'm missing the most important part of this debate. How does the second hose help? The air outside is hot; with one hose, hot air enters the house because of the vacuum effect; with two hoses, the second hose explicitly sucks in air from the outside... which is still hot. Where is the difference?

With two hoses, the air sucked in never mixes with the cool air in the room; it's kept completely separate. Only heat is exchanged by the AC, not air.

From the wirecutter test conditions above it sounds like these are also meant to dehumidify: With one hose, you presumbly get that for free if the air inside is more humid than the air that replaces it. With two hoses, since you're not mixing air, do you still dehumidify? (If anything I'd expect the opposite, since the same amount of water vapor is apparently higher humidity in a cool room than a hot room.)
Wouldn't the AC unit have to intake cool air from the room (since it's expelling cold air into the room), and mix the cool air with the warm outside air? (Maybe the numbers work out differently in this condition but I'm not convinced yet, would have to see a calculation)

A two hose AC does take in both indoor and outdoor air, but they never mix. (The two hoses both carry outdoor air; indoor air is pumped through two vents in the AC.) The AC just pumps heat from the indoor air to the outdoor air. Similar to a fridge.

The nonobvious problems are the whole reason why AI alignment is hard in the first place.

I disagree with the implication that there’s nothing to worry about on the “obvious problems” side.

An out-of-control AGI self-reproducing around the internet, causing chaos and blackouts etc., is an “obvious problem”. I still worry about it.

After all, consider this: an out-of-control virus self-reproducing around the human population, causing death and disability etc., is also an “obvious problem”.  We already have this problem; we’ve had this problem for millennia! And yet, we haven’t solved it!

(It’s even worse than that—it’s an obvious problem with obvious mitigations, e.g. end gain-of-function research, and we’re not even doing that.)

There is an important difference here between "obvious in advance" and "obvious in hindsight", but your basic point is fair, and the virus example is a good one. Humanity's current state is indeed so spectacularly incompetent that even the obvious problems might not be solved, depending on how things go.
8Steven Byrnes1y
I would say “Humanity's current state is so spectacularly incompetent that even the obvious problems with obvious solutions might not be solved”. If humanity were not spectacularly incompetent, then maybe we wouldn't have to worry about the obvious problems with obvious solutions. But we would still need to worry about the obvious problems with extremely difficult and non-obvious solutions.

I find it funny that there's more discussion in the comments section of the details of how single-hose air conditioners work compared to the object-level claims made in the post about the difficulty distribution of problems that are likely to come up in AI alignment.

I interpreted the air conditioning story as a fable meant to illustrate a point, not as Bayesian evidence for us to use in order to update towards a particular view. Are people here reading the post through a different lens?

No, they're trying to avoid generalizing from fictional evidence. John is offering the Fable of the Air Conditioners as an example of a particular phenomenon that he says also applies to the AI alignment problem. If his chosen example of this phenomenon is not in fact a good example of the phenomenon, then one might reasonably be less inclined to believe that the phenomenon is as common and as important as he suggests it is, and/or less inclined to believe what he says about the phenomenon.

7Ege Erdil1y
Why would you generalize from a cherry-picked example to begin with? The fact that you're able to find some pretty example to illustrate your point is pretty much no evidence at all in favor of it; and if your cherry-picked example ends up being invalid, that's more of a reflection of your lack of attention or clarity on how the example was supposed to work than a reflection of the actual difficulty of coming up with examples. It seems to me like you're agreeing that people are reading this fable as Bayesian evidence in favor of some view, even though it's obviously cherry-picked and therefore shouldn't be evidence in favor of anything even if it were true. In that case, why did you say "no"? Then why not ask him how prevalent he thinks the phenomenon actually is? I agree it's not good that his example isn't as good as it might seem at first, but that's honestly pretty weak evidence against his general point in the context of alignment.
I think people are reading this as intended to be Bayesian evidence in favour of some view. I probably shouldn't have said "no, ...", therefore; my apologies.
3Alex Vermillion1y
Jumping off this (and aware of what you said below), this post makes me uncomfortable in the number of people who are earnestly debating a surface analogy. I get it if people are just having fun and blowing off steam, but it's pretty weird for me to see people acting as if (and explicitly stating they are!) a metaphor to bad products on Amazon somehow changes whether or not alignment is an issue. I'm confused by something happening here. I refuse to fall over onto the " [] is a reliable marker for the seriousness of alignment", but it seems most people here are. What gives?
  • I think it's more relevant in the genre of "rationalists evaluating civilization's adequacy" than "alignment metaphor." It's a big running question how correct these critiques are. (As alignment metaphor I feel it's more like fable than evidence, though I think others may read more into that.)
  • There's something compelling about picking on someone's cherry-picked example of inadequacy. Weaknesses in it feel at least as compelling as weaknesses in "random piece of evidence that established their current view about inadequacy."
  • My initial overly-detailed comments were largely caused by browsing while my wife took a nap on vacation (leaving me unusually likely to follow random impulses about what to do without regard for usefulness).
  • From there I think the conversation was in part sustained by the usual arguing-on-internet-energy.
3Alexander Gietelink Oldenziel1y
As a datapoint: I am usually on the Wentworth side of the Paul-John spectrum but I found Paul's internet arguing about ACs compelling and have updated very slightly to Paul's side. :)

What gives?

Some people may simply have been nerd-sniped, but the OP does seem to present the air conditioner thing as a real piece of evidence, not just a shallow illustrative analogy. When they get literal at the end, they say:

admittedly I did not actually learn everything I need to know about takeoff speeds just from air conditioner ratings on Amazon. It took a lot of examples in different industries.

Also, given that the example was presented with such high confidence, and took up a significant portion of a post that was otherwise only moderately detailed, I don't think it's unreasonable for people's confidence in the poster and the post to drop if the example turns out to be built on a misunderstanding. 

(I'm not suggesting the OP was right or wrong, I have no object-level knowledge here.)

I think to some degree it makes sense to debate it. John Wentworth is offering the example of the airconditioner as indicative of a broader problem woth society. Of course the airconditioner can't prove whether or not there is this broader problem, so it's not evidence in itself, but we could take John Wentworth's post to say that he has seen many things like the airconditioner, and that these many things tell him society may fail to fix major problems if they are hard to notice. Per Aumann's agreement theorem, this then becomes strong evidence that society has lots of cases where people fail to fix major problems if they are hard to notice. But hold on - Aumann's agreement theorem assumes that people are rational, that they correctly interpret evidence, is this assumption correct for John Wentworth? By providing the example of the airconditioner, we can see how he might interpret the evidence about whether or not society may fail to notice major flaws. But this also makes the airconditioner example fairly important if it fails to hold.

Side note: I think that most people are clueless enough of the time that Aumann should mostly be ignored. This also holds for people updating off of what I think: I do not think most readers actually have enough bits of evidence about the reliability of my reasoning that they should Aumann-style update off of it. Instead, I try to make my own reasoning process as legible as possible in my writing, so that people can directly follow the gears and update based on the inside view, rather than just trust my judgement.

1Ege Erdil1y
This is probably the best argument for why you should care about the surface-level analogy, but I still don't find it compelling because you need quite different kinds of domain expertise when thinking about how air conditioners work compared to what you need for AI alignment work.
Many critics seem to be concerned about whether people working in alignment have their heads lost in clouds of abstraction or get proper contact with reality. This intuitively seems like it would be tested by whether the examples provided lack direct experience.
Even just a little bit of domain expertise is useful! I understand your point, and even agree to some extent, but I think it's also great that others are discussing the object-level details of "the surface-level analogy". Both the argument using the analogy, and the analogy itself, seem like potentially fruitful topics to discuss.
Is this a rhetorical question? If not, it would help if you provided quotes and got the attention of the specific commenters you are referencing.

What does it look like, when the optimization power is turned up to 11 on something like the air conditioner problem?

I think it looks exactly like it does now; with a lot of people getting very upset that local optimization often looks un-optimized from the global perspective.

If I needed an air-conditioner for working in my attic space, which is well-insulated from my living space and much, much hotter than either my living space or the outside air in the summer, the single-vent model would be more efficient.  Indeed, it is effectively combining the m... (read more)

The Youtube channel Technology Connections explored the disadvantages of one-hose air conditioners here:

unfortunately I have a meeting and don't remember the conclusion.

I used this type of  air conditioner for years (got it for free and needed it only a few days in a year, as I lived in colder climate). It can lower the temperature in the room for several degrees C, but not more. If outside is 30 C, it can make 25 C and it is enough. 

I love this air conditioner example, not just for alignment but also as a metaphor for many other inference problems.

Data point: I cover my bedroom door with a curtain rather than closing the door, and it was clear with the air conditioner on the bedroom was lower pressure than the main room.  The temperature effects are weird because my apartment can stay above outside temp for hours even with the obvious fixes done, but it was depressurizing.

Of course the commenters talking about cooling gradients vs net cooling can’t agree on an air conditioning utility function

It's a hard problem – a recapitulation of the difficult problem of describing or specifying human values generally.

Corollary: alignment is not importantly easier in slow-takeoff worlds, at least not due to the ability to iterate. The hard parts of the alignment problem are the parts where it’s nonobvious that something is wrong. That’s true regardless of how fast takeoff speeds are.

This is the important part and it seems wrong.

Firstly, there's going to be a community of people trying to find and fix the hard problems, and if they have longer to do that then they will be more likely to succeed.

Secondly, 'nonobvious' isn't a an all-or-nothing term. There can easily be... (read more)

Toy model: we have some system with a bunch of problems. A group of people with some fixed skills/background will be able to find 80% of the problems given enough time; the remaining 20% are problems which they won't find at all, because it won't occur to them to ask the right questions. (The air conditioner pulling in hot air in the far corners of the house is meant to be an example of such a problem, relative to the skills/background of a median customer.) For the 80% of problems which the group can find, the amount of time required to find them has a wide tail: half the problems can be found in a week, another 25% in another two weeks, another 12.5% in another four weeks, etc. (The numbers in this setup aren't meant to be realistic; the basic idea I want to illustrate should occur for a fairly wide range of distributions.) In this toy model: * the group is more likely to find any given problem if given more time * 'nonobvious' is not all-or-nothing; there are problems which won't be found in a week but will be found in a year. So this toy model matches both of your conditions. What happens in this toy model? Well, after a bit over two years, 79.5% of the problems have been found. Almost all of the remaining 20.5% are problems which the group will not find, given any amount of time, because they do not have the skills/background to ask the right questions. They will still keep improving things over time, but it's not going to make a large quantitative difference. Point is: you are arguing that there exist problems which will be found given more time. That is not the relevant claim. In order to argue that alignment is importantly easier in slow takeoff worlds, you need to argue that there do not exist fatal problems which will not be found given more time.
4Tom Davidson1y
I need something weaker; just that we should put some probability on there not being fatal problems which will not be found given more time. (I.e. , some probability that the extra time helps us find the last remaining fatal problems). And that seems reasonable. In your toy model there's 100% chance that we're doomed. Sure, in that case extra time doesn't help. But in models where our actions can prevent doom, extra time typically will help. And I think we should be uncertain enough about difficulty of the problem that we should put some probability on worlds where our actions can prevent doom. So we'll end up concluding that more time does help.
The toy model says there's 100% chance of doom if the only way we find problems is by iteratively trying things and seeing what visibly goes wrong. A core part of my view here is that there's lots of problems which will not be noticed by spending any amount of time iterating on a black box, but will be found if we can build the mathematical tools to open the black box. I do think it's possible to build sufficiently-good mathematical tools that literally all the problems are found (see the True Names thing []). More time does help with building those tools, but more time experimenting with weak AI systems doesn't matter so much. Experimenting with AI systems does provide some feedback for the theory-building, but we can get an about-as-good feedback signal from other agenty systems in the world already. So the slow/fast takeoff question isn't particularly relevant. Man, it would be one hell of a miracle if the number of fatal problems which would not be found by any amount of iterating just so happened to be exactly zero. Probabilities are never literally zero, but that does seem to me unlikely enough as to be strategically irrelevant.
1Tom Davidson1y
It sounds like the crux is whether having time with powerful (compared to today) but sub-AGI systems will make the time we have for alignment better spent. Does that sound right? I'm thinking it will because i) you can better demonstrate AI alignment problems empirically to convince top AI researchers to prioritise safety work, ii) you can try out different alignment proposals and do other empirical work with powerful AIs, iii) you can try to leverage powerful AIs to help you do alignment research itself. Whereas you think these things are so unlikely to help that getting more time with powerful AIs is strategically irrelevant
Yeah, that's right. Of your three channels for impact: ... (i) and (ii) both work ~only to the extent that the important problems are visible. Demonstrating alignment problems empirically ~only matters if they're visible and obvious. Trying out different alignment proposals also ~only matters if their failure modes are actually detectable. (iii) fails for a different reason, namely that by the time AIs are able to significantly accelerate the hard parts of alignment work, they'll already have foomed. Reasoning: there's generally a transition point between "AI is worse than human at task, so task is mostly done by human" and "AI is comparable to human or better, so task is mostly done by AI". Foom occurs roughly when AI crosses that transition point for AI research itself. And alignment is technically similar enough to AI research more broadly that I expect the transition to be roughly-simultaneous for capabilities and alignment research.
1Tom Davidson1y
Quick responses to your argument for (iii). * If AI automates 50% of both alignment work and capabilities research, it could help with alignment before foom (while also bringing foom forward in time) * A leading project might choose to use AIs for alignment rather for fooming * AI might be more useful for alignment work than for capabilities work * fooming may require may compute than certain types of alignment work
1Ege Erdil1y
For what it's worth, I've had a similar discussion with John in another comment thread [] where he said that he doesn't believe the probability of doom is 1, he just believes it's some p≫0 that doesn't depend too much on the time we have to work on problems past a time horizon of 1 week or so. This is consistent with your model and so I don't think John actually believes that the probability of doom is 1 and I don't think he would necessarily disagree with your model either. On the other hand in your model the probability of doom asymptotes to some p≫0 as extra time goes to infinity, so it's also not true that extra time would be very helpful in this situation past a certain point.
TBC, I believe that the value of more time rapidly asymptotes specifically for the purpose of finding problems by trying things and seeing what goes wrong. More time is still valuable for progress via other channels.

This post is difficult to understand for me because of the lack of quantitative forecasts. I agree that "the technical problems are similar either way", but iterating gives you the opportunity to solve some problems more easily, and the assumption that "the only problems that matter are the ones iteration can't solve" seems unjustified. There are a lot of problems you'll catch if you have 50 years to iterate compared to only 6 months, and both of those could count as "slow takeoff" depending on your definition of "slow".

To make this more explicit, suppose ... (read more)

I am saying that the probability is roughly constant and doesn't depend on T much. The vast majority of problems which will be noticed during iteration will be noticed in 6 months; another 49.5 years might catch a few more, but those few will be dwarfed by the problems which will not be noticed during iteration approximately-regardless of how much time is spent. Somebody who did not notice the air conditioner problem in 6 months is unlikely to notice it in the next 49.5 years. The examples of charities with mediocre impact and low replication rates in medicine have both been ongoing for at least a century. Time is just not the main relevant variable here.
5Ege Erdil1y
I understand. I guess that my problem is even in the air conditioner example I'd expect a substantially higher probability that the problem is noticed in 50 years compared to 6 months, if nothing else because you'd get visitors to your house, they'd happen to go to parts of it that are away from the air conditioner, notice it's unnaturally hot, etc. Eventually someone can guess that the air conditioner is the problem. If you start with a Jeffreys prior over the rate at which problems are noticed, then the fact that a problem hasn't been noticed in 6 months only really tells you that it probably takes O(1 year) or more to notice the problem in the typical situation. To get the conclusion "if problem isn't noticed in 6 months, it won't be noticed in 50 years" seems to require some assumption like "almost all problems are either very easy or very hard", or "the prior on the problems being noticed in a given day is like Beta(1/n,1/n) for some big n". I don't think you can justify this by pointing to problems that are very hard to notice, since those will exist no matter what the prior distribution is. The question is about how much of the probability mass they make up, and here you seem to have some inside view into the situation which you don't communicate in the blog post. Could you elaborate on that?
I indeed expect that the vast majority of problems will either be noticed within minutes or not at all. One model: people either have the right mental toolkit to ask the relevant questions, or they don't. Either they know to look for balancing flows in the air conditioner case, or they don't. Either they know to ask about replication rates in the medical example, or they don't. Either they know to ask about impact measures for charities, or they don't. Young people might pick up these skills over time, but most people stop adding to their mental toolkit at a meaningful rate once they're out of school. Another model: once we accept that there are problems which will not be noticed over any relevant timescale, the distribution is guaranteed to be bimodal: there's problems which will be noticed in some reasonable time, and problems which won't. Then the only question is: what's the relevant timescale after which most problems which will be noticed at all, are noticed? Looking at the world, it sure seems like that timescale is "minutes", not "decades", but that's not really the key step here. The key step is realizing that there are plenty of problems which will not be noticed in any relevant amount of time. At that point, we're definitely in a world where "almost all problems are either very easy or very hard", it's just a question of exactly how much time corresponds to "very easy".
3Ege Erdil1y
Two points about this: * People can notice that there's a problem and narrow it down to the air conditioner given enough time even if they have no gears-level understanding of what's happening. For example, the Romans knew nothing about the mechanics of how malaria spreads, but they figured out that it has something to do with "bad air", hence the name "malaria". It's entirely possible that such an understanding will not be here in 6 months but will be here in 50 years, and I suspect it's more or less what happened in the case of malaria. * Thinking in terms of "most people" is reasonable in the case of the air conditioner, but it seems like a bad idea when it comes to AI alignment, since the people working on the problem will be quite far away from the center of the distribution when it comes to many different traits. I don't think I like this framing because I don't think it gets us to the conclusion we want. The Jeffreys prior is also bimodal and it doesn't have this big discrepancy at any timescale. If the Jeffreys prior is applicable to the situation, then if a problem hasn't been solved in T years your mean forecast for how long it will take to solve is ≈2T years. You're assuming not only that the prior is bimodal, but that it's "strongly" bimodal, whatever that means. In the beta distribution case it corresponds to taking n to be very large. Your first argument could do this, but I'm skeptical about it for the two reasons I've mentioned in response to it above.
Stronger than that, even. I'm saying that my distribution over rate-of-problem-solving has a delta spike at zero, mixed with some other distribution at nonzero rates. Which is indeed how realistic priors should usually look! If a flip a coin 50 times and it comes up heads all 50 times, then I think it's much more likely that this coin simply has heads on both sides (or some other reason to come up basically-always-heads) than that it has a 1/100 or smaller (but importantly nonzero) chance of coming up heads. The prior which corresponds to that kind of reasoning is a delta spike on 0% heads, a delta spike on 0% tails, and then some weight on a continuous distribution between those two.
1Ege Erdil1y
Right, but then it seems like you get back to what I said in my original comment: this gets you to limT→∞P(major alignment failure|takeoff duration=T)=p≫0 which I think is quite reasonable, but it doesn't get you to "the probability is roughly constant as T varies", because you're only controlling the tail near zero and not near infinity. If you control both tails then you're back to where we started, and the difference between a delta spike and a smoothed out version of the delta isn't that important in this context.
Let Teq be the first time at which P(major alignment failure|takeoff duration=T) is within ϵ of p. As long as ϵ is small, the probability will be roughly constant with time after Teq. Thus, the probability is roughly constant as T varies, once we get past some initial period. (Side note: in order for this to be interesting, we want ϵ small relative to p.) For instance, we might expect that approximately-anyone who's going to notice a particular problem at all will notice it in the first week, so Teq is on the order of a week, and the probability of noticing a problem is approximately constant with respect to time for times much longer than a week.
3Ege Erdil1y
I agree with that, but I don't see where the justification for Teq≈1week comes from. You can't get there just from "there are problems that won't be noticed at any relevant timescale", and I think the only argument you've given so far for why the "intermediate time scales" should be sparsely populated by problems is your first model, which I didn't find persuasive for the reasons I gave.

I feel like an important lesson to learn from analogy to air conditioners is that some technologies are bounded by physics and cannot improve quickly.(or at all).   I doubt anyone has the data, but I would be surprised if average air conditioning efficiency in BTUs per Watt plotted over the 20th century is not a sigmoid.

Probably everything involving humans is inefficient, especially human values. An AI willing to erase its lifetime memories the moment they aren't needed anymore would be 1% more efficient, and hence take over the universe.

New to LessWrong?