I go to Amazon, search for “air conditioner”, and sort by average customer rating. There’s a couple pages of evaporative coolers (not what I’m looking for), one used window unit (?), and then this:

Average rating: 4.7 out of 5 stars.

However, this air conditioner has a major problem. Take a look at this picture:

Key thing to notice: there is one hose going to the window. Only one.

Why is that significant?

Here’s how this air conditioner works. It sucks in some air from the room. It splits that air into two streams, and pumps heat from one stream to the other - making some air hotter, and some air cooler. The cool air, it blows back into the room. The hot air, it blows out the window.

See the problem yet?

Air is blowing out the window. In order for the room to not end up a vacuum, air has to come back into the room from outside. In practice, houses are very not airtight (we don’t want to suffocate), so air from outside will be pulled in through lots of openings throughout the house. And presumably that air being pulled in from outside is hot; one typically does not use an air conditioner on cool days.

The actual effect of this air conditioner is to make the space right in front of the air conditioner nice and cool, but fill the rest of the house with hot outdoor air. Probably not what one wants from an air conditioner!

Ok, that’s amusing, but the point of this post is not physics-101 level case studies in how not to build an air conditioner. The real fact of interest is that this is apparently the top rated new air conditioner on Amazon. How does such a bad design end up so popular?

One aspect of the story, presumably, is fake reviews. That phenomenon is itself a rich source of insight, but not the point of this post, and definitely not enough to account for the popularity of this air conditioner. The reviews shown on the product page are all “verified purchase”, and mostly 5-stars. There are only 4 one-star reviews (out of 104). If most customers noticed how bad this air conditioner is, I do not think a 4.7 rating would be sustainable. Customers actually do like this air conditioner.

And hey, this air conditioner has a lot going for it! There’s wheels on the bottom, so it’s very portable. Setup is super easy - only one hose to the window, much less fiddly than those two-hose designs where you attach one hose and the other pops off.

Sure, the air conditioner has a major problem, but it’s not a major problem which most people will notice. They may notice that most of the house is still hot, but the space right in front of the air conditioner will be cool, so obviously the air conditioner is doing its job. Very few people will realize that the air conditioner is drawing hot air into the rest of the house. (Indeed, I saw zero reviews which mentioned that the air conditioner pulls hot air into the house - even the 1-star reviewers apparently did not realize why the air conditioner was so bad.)

[EDIT: several commenters seem to think that I'm claiming this air conditioner does not work at all, so I want to clarify that it will still cool down a room on net. If the air inside is all perfectly mixed together, it will still end up cooler with the air conditioner than without. The point is not that it doesn't work at all. The point is that it's stupidly inefficient in a way which I do not think consumers would plausibly choose over the relatively-low cost of a second hose if they recognized the problems.]

Generalization

Major problems are only fixed when those problems are obvious. Problems which most people won’t notice (or won’t attribute correctly) tend to stick around. There’s no economic incentive to fix them.

And in practice, there are plenty of problems which most people won’t notice. A few more examples:

  • Most charities have pretty mediocre impact. But the actual impact is very-not-visible to the person making donations, so people keep donating. (Also people care about things besides impact, but nonetheless I doubt low-impact charities would survive if their ineffectiveness were generally obvious.)
  • Medical research has a replication rate below 50%. But when the effect sizes are expected to be small anyways, it’s hard to tell whether it’s working, so doctors (and patients) keep using crap treatments.
  • Based on my firsthand experience with the B2B software industry, success is mostly determined by how good the product looks to managers making the decision to purchase. Successful B2B software (think “enterprise software”) is usually crap, but has great salespeople and great dashboards for the managers.

… and presumably this extends to lots of other industries which I’m less familiar with.

Two points to highlight here:

  • Regulation does not fix the problem, just moves it from the consumer to the regulator. A regulator will only regulate a problem which is obvious to the regulator. A regulator may sometimes have more expertise than a layperson, but even that requires that the politicians ultimately appointing people can distinguish real from fake expertise, which is hard in general.
  • Waiting longer does not fix the problem. All those people who did not notice their air conditioner pulling hot air into the house will not start noticing if we just wait a few years. Problems do not automatically become obvious over time.

How Does This Relate To Takeoff Speeds?

There’s a common view that, as long as AI does not take off too quickly, we’ll have time to see what goes wrong and iterate on it. It's a view with a lot of intuitive outside-view appeal: AI will work just like other industries. We try stuff, see what goes wrong, fix it. It worked like that in all the other industries, presumably it will work like that in AI too.

The point of the air conditioner is that other industries do not, in fact, work like that. Other industries are absolutely packed with major problems which are not fixed because they’re not obvious. Even assuming that AI does not take off quickly (itself a dubious assumption at best), we should expect the same to be true of AI.

… But Won’t Big Problems Be Obvious?

Most industries have major problems which aren’t fixed because they’re not obvious. But these problems can only be so bad. If they were really disastrous, the disasters would be obvious. Why not expect the same from AI?

Because AI will eventually be far more capable than human industries. It will, by default, optimize way harder than human industries are capable of optimizing.

What does it look like, when the optimization power is turned up to 11 on something like the air conditioner problem? Well, it looks really good. But all the resources are spent on looking good, not on actually being good. It’s “Potemkin village world”: a world designed to look amazing, but with nothing behind the facade. Maybe not even any living humans behind the facade - after all, even generally-happy real humans will inevitably sometimes appear less-than-maximally “good”.

… But Isn’t Solving The Obvious Problems Still Valuable?

The nonobvious problems are the whole reason why AI alignment is hard in the first place.

Think about the “game tree” of alignment - the basic starting points, how they fail, what strategies address the failures, how those fail, etc. The most basic starting points are generally of the form “collect data from humans on which things are good/bad, then train something to do good stuff and avoid bad stuff”. Assuming such a strategy could be implemented efficiently, why would it fail? Well:

  • In cases where humans label bad things as “good”, the trained system will also be selected to label bad things as “good”. In other words, the trained AI will optimize for things which look “good'' to humans, even when those things are not very good.
  • The trained system will likely end up implementing strategies which do “good”-labeled things in the training environment, but those strategies will not necessarily continue to do the things humans would consider “good” in other environments.

(Somewhat more detail on these failure modes here.) Optimizing for things which look “good” to humans obviously raises exactly the sort of failure which the air conditioner points to. Failure of systems to generalize in “good” ways is less centrally about obviousness, but note that if it were obvious that the system were going to generalize badly, this would also be a pretty easy issue to solve: just don’t deploy the system if it will generalize badly. Problem is, we can’t tell whether a system will do what we want in deployment just by looking at what it does in training; we can’t tell by looking at the system's behavior whether there’s problems in there.

Point is: problems which are highly visible to humans are already easy, from an alignment perspective. They will probably be solved by default. There’s not much marginal value in dealing with them. The value is in dealing with the problems which are hard to recognize.

Corollary: alignment is not importantly easier in slow-takeoff worlds, at least not due to the ability to iterate. The hard parts of the alignment problem are the parts where it’s nonobvious that something is wrong. That’s true regardless of how fast takeoff speeds are. And the ability to iterate does not make that hard part easier. Iteration mainly helps on the parts of the problem which were already easy anyway.

So I don't really care about takeoff speeds. The technical problems are basically similar either way.

... though admittedly I did not actually learn everything I need to know about takeoff speeds just from air conditioner ratings on Amazon. It took a lot of examples in different industries. Fortunately, there was no shortage of examples to hammer the idea into my head.

Everything I Need To Know About Takeoff Speeds I Learned From Air Conditioner Ratings On Amazon
New Comment
129 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

I agree that people can easily fail to fix alignment problems, and can instead paper over them, even given a long time to iterate. But I'm not really convinced about your analogy with single-hose air conditioners.

Physics:

The air coming out of the exhaust is often quite a bit hotter than the outside air. I've never checked myself, but just googling has many people reporting 130+ degree temperatures coming out of exhaust from single-hose units. I'm not sure how hot this unit's exhaust is in particular, but I'd guess it's significantly hotter than outside air.

If exhaust is 130 and you are trying to cool from 100 to 70 you'd then only be losing 50% efficiency. Most people won't be cooling by 30 degrees so the efficiency losses would be smaller. In practice I think the actual efficiency loss relative to a 2-hose unit is more like 25-30% (see stats on top wirecutter picks below).

Discourse:

I actually think that this factor(sucking in hot air from the outside) is probably already included in the SACC (seasonally adjusted cooling capacity) and hence CEER reported for this air conditioner. I don't really know anything about air conditioners but it's discussed extensively in the definition of... (read more)

[-]habrykaΩ25470

My overall take on this post and comment (after spending like 1.5 hours reading about AC design and statistics): 

Overall I feel like both the OP and this reply say some wrong things. The top Wirecutter recommendation is a dual-hose design. The testing procedure of Wirecutter does not seem to address infiltration in any way, and indeed the whole article does not discuss infiltration as it relates to cooling-efficiency. 

Overall efficiency loss from going to dual to single is something like 20-30%, which I do think is much lower than I think the OP implied, though it also is quite substantial, and indeed most of the top-ranked Amazon listings do not use any of the updated measurements that Paul is talking about, and so consumers do likely end up deceived about that. 

The top-rated AC Wentworth links to is really very weak if you take into account those losses, and I would be surprised if it adequately cooled people's homes. 

My current model: Wirecutter is doing OK but really not great here (with an actively confused testing procedure), Amazon ratings are indeed performing quite badly, and basically display most of the problems that Wentworth talks about. It's unclea... (read more)

Update: I too have now spent like 1.5 hours reading about AC design and statistics, and I can now give a reasonable guess at exactly where the I-claim-obviously-ridiculous 20-30% number came from. Summary: the SACC/CEER standards use a weighted mix of two test conditions, with 80% of the weight on conditions in which outdoor air is only 3°F/1.6°C hotter than indoor air.

The whole backstory of the DOE's SACC/CEER rating rules is here. Single-hose air conditioners take center stage. The comments on the DOE's rule proposals can basically be summarized as:

  • Single-hose AC manufacturers very much did not want infiltration air to be accounted for, and looked for any excuse to ignore it
  • Electric companies very much did want infiltration air to be accounted for, and in particular wanted SACC to be measured at peak temperatures
  • The DOE did its best to maintain a straight face in front of all this obvious bullshitting, and respond to it with legibly-reasonable arguments and statistics.

This quote in particular stands out:

De’ Longhi [an AC manufacturer] expressed concern that modifying the AHAM PAC-1-2014 method to account for infiltration air would disproportionately impact single-duct portable AC

... (read more)

I still the 25-30% estimate in my original post was basically correct. I think the typical SACC adjustment for single-hose air conditioners ends up being 15%, not 25-30%. I agree this adjustment is based on generous assumptions (5.4 degrees of cooling whereas 10 seems like a more reasonable estimate). If you correct for that, you seem to get to more like 25-30%.  The Goodhart effect is much smaller than this 25-30%, I still think 10% is plausible.

I admit that in total I’ve spent significantly more than 1.5 hours researching air conditioners :) So I’m planning to check out now. If you want to post something else, you are welcome to have the last word.

SACC for 1-hose AC seems to be 15% lower than similar 2-hose models, not 25-30%:

  • This site argues for 2-hose ACs being better than 1-hose ACs and cites SACC being 15% lower.
  • The top 2-hose AC on amazon has 14,000 BTU that gets adjusted down to 9500 BTU = 68%.  This similarly-sized 1-hose AC is 13,000 BTU and gets adjusted down to 8000 BTU = 61.5%, about 10% lower.
  • This site does a comparison of some unspecified pair of ACs and gets 10/11.6 = 14% reduction.

I agree the DOE estimate is too generous to 1-hose AC, though I think it’s ... (read more)

4johnswentworth
If you wouldn't mind one last question before checking out: where did that formula you're using come from?
4DirectedEvolution
From PickHVAC.com, "What is a good CEER rating?": From the Pro Breeze single-hose AC product description on Amazon: I haven't looked into the % efficiency loss measurements, but I think it's interesting that you can still figure out that this is a crap AC if you're willing to trust this website.
7denkenberger
Portable units have to meet a much weaker standard. I actually pushed for a more stringent standard on these products when I was consulting for the Appliance Standards Awareness Project.
  • The top wirecutter recommendation is roughly 3x as expensive as the Amazon AC being reviewed. The top budget pick is a single-hose model.
  • People usually want to cool the room they are spending their time in. Those ACs are marketed to cool a 300 sq ft room, not a whole home. That's what reviewers are clearly doing with the unit. 
  • I'd guess that in extreme cases (where you care about the room with AC no more than other rooms in the house + rest of house is cool) consumers are overestimating efficiency by ~30%. On average in reality I'd guess they are overestimating value-added by the air conditioner by more like ~10% (since the AC'd room will be cooler and they care less about other rooms).
  • I think the OP is misleading if 10% is what's at stake and there are real considerations on the other side.
  • I think there is very little chance that the wirecutter reviewers don't understand that infiltration affects heating efficiency. However I agree that your preferences about AC, and the interpretation of their tests, depend on how hot the rest of the building is (and how much you care about keeping it cool). I'm 50-50 on whether someone from the wirecutter would be able to explain that issue
... (read more)
3denkenberger
The infiltration factor of a well-functioning woodstove is far less than a one hose air conditioner, because the air is heated to much higher temperatures. However, it can be significant for fireplaces.
2[comment deleted]

Regulation does not fix the problem, just moves it from the consumer to the regulator. A regulator will only regulate a problem which is obvious to the regulator. A regulator may sometimes have more expertise than a layperson, but even that requires that the politicians ultimately appointing people can distinguish real from fake expertise, which is hard in general.

It seems like the DOE decided to adopt energy-efficiency standards that take into account infiltration. They could easily have made a different decision (e.g. because of pressure from portable AC manufacturers, or because it's legitimately unclear how to define the standard, or because it makes measurement harder), but it wouldn't be because the issue wasn't obvious (I think it's not even anywhere close to the "failure because the issue wasn't obvious" regime).

Overall I agree with the bottom line that regulation is unlikely to help that much with alignment. But I don't think this seems like the right model of why that is or how you could fix it.

Waiting longer does not fix the problem. All those people who did not notice their air conditioner pulling hot air into the house will not start noticing if we just wait a few

... (read more)

Obviously the point about air conditioners doesn't matter

I'd like to remark that, at least for me, the facts-of-the-matter about whether this particular air conditioner works by Goodharting consumer preferences actually affect my views on AI. The OP quite surprised my world model, which did not expect one of the most popular AC units on Amazon to work by deceiving consumers. If lots of the modern world works this way, then John's intuition that advanced ML systems are almost certain to work by Goodharting our preferences seems much more likely. Before seeing the above comment and jbash's comment, I was in the process of updating my views, not because I thought the OP was an enlightening allegory, but because it actually changed what I thought the world was like.

Conversely, the world model "sometimes the easiest way to achieve some objective is to actually do the intended thing instead of Goodharting" would predict that air conditioner example was wrong somehow, a prediction which seems to have been right (if Paul's and jbash's comments are correct, that is). I was quite impressed by this, and am now more confident in the "Goodharting isn't omnipresent" world model.

In any case, my main point is that I actually do care about what's going on in this air conditioning example (and I encourage further discussion on whether the OP's characterization of it is accurate or not).

I can’t believe I’m about to write a comment about air conditioners on a thread about world-ending AI, but having bought one of these one-hose systems for my apartment during a particularly hot summer I can say I was pretty disappointed with its performance.

The main drawback to the one hose system is the cool air never makes it outside the room with the unit. I tried putting a bunch of fans to blow the air to the rest of the house, but as you can imagine that didn’t work very well.

I had no idea why until I zoned out one day while thinking about the air conditioner and realized it was sucking the cold air into the intake and blowing it out of the house. And I did indeed read a bunch of reviews from Costco customers before I bought the unit, none of which mentioned the problem.

Wow, the air conditioner systematically sucking the cold air it's generated back into the intake sort of seems like another problem with this design. (Possibly the same problem in another guise, thermodynamically, but in any case, different in terms of actual produced experience.)

I apologize if this is piling on, but I would like to note that this error strikes me as very similar to another one made by the same author in this comment, and which I believe is emblematic of a certain common failure mode within the rationalist community (of which I count myself a part). This common failure mode is to over-value our own intelligence and under-value institutional knowledge (whether from the scientific community or the Amazon marketplace), and thus not feel the need to tread carefully when the two come into conflict.

In the comment in question, johnswentworth asserts, confidently, that there is nothing but correlational evidence of the role of amyloid-β in Alzheimer's disease. However, there is extensive, strong causal evidence for its role: most notably, that certain mutations in the APP, PSEN1, and PSEN2 genes deterministically (as in, there are no known exceptions for anyone living to their 80's) cause Alzheimer's disease, and the corresponding proteins are well understood structurally and functionally to be key players in the production of amyloid-β. Furthermore, the specific mutations in question are shown through multiple lines of evidence (structural analysi... (read more)

9anonymousaisafety
I think one reason that this error occurs is that there's a mistaken assumption that the available literature captures all institutional knowledge on a topic, so if one simply spends enough time reading the literature, they'll have all requisite knowledge needed for policy recommendations. I realize that this statement could apply equally to your own claims here, but in my experience I see it happen most often when someone reads a handful of the most recently released research papers and from just that small sample of work tries to draw conclusions applicable that are broadly applicable to the entire field. Engineering claims are particularly suspect because institutional knowledge (often in the form of proprietary or confidential information held by companies and their employees) is where the difference between what is theoretically efficient and what is practically more efficient is found. It doesn't even need to be protected information though -- it can also just be that due to manufacturing reasons, or marketing reasons, or some type of incredibly aggravating constraint like "two hoses require a larger box and the larger box pushes you into a shipping size with much higher per-volume / mass costs so the overall cost of the product needs to be non-linearly higher than what you'd expect would be needed for a single hose unit, and that final per-unit cost is outside of what people would like to pay for an AC unit, unless you then also make drastic improvements to the motor efficiency, thermal efficiency, and reduce the sound level, at which point the price is now even higher than before, but you have more competitive reasons to justify it which will be accepted by a large enough % of the market to make up for the increased costs elsewhere, except the remaining % of the market can't afford that higher per-unit cost at all, so we're back to still making and selling a one-hose unit for them".
5anonymousaisafety
Concrete example while we're on the AC unit debate -- there's a very simple way to increase efficiency of portable AC units, and it's to wrap the hot exhaust hose with insulating duct wrap so that less of the heat on that very hot hose radiates directly back into the room you're trying to cool. Why do companies not sell their units with that wrap? Probably for one of any of the following reasons -- A.) takes up a lot of space, B.) requires a time investment to apply to the unit which would dissuade buyers who think they can't handle that complexity, C.) would cost more money to sell and no longer be profitable at the market's price point, D.) has to be applied once the AC unit is in place, and generally is thick enough that the unit is no longer "portable" which during market testing was viewed as a negative by a large % of surveyed people, or E.) some other equally trivial sounding reason that nonetheless means it's more cost effective for companies to NOT sell insulating duct wrap in the same box as the portable AC unit.  Example of an AC company that does sell an insulating wrap as an optional add-on: https://www.amazon.com/DeLonghi-DLSA003-Conditioner-Insulated-Universal/dp/B07X85CTPX
3johnswentworth
A priori, before having clicked on your links, my guess would be that the studies in question generally diagnose Alzheimer's by the presence of amyloid-β deposits. (That's generally been the case in similar studies I've looked into in the past, although I haven't checked the exact studies you link.) If they're diagnosing based on the presence of amyloid-β, then obviously amyloid-β producing mutations will cause an Alzheimer's diagnosis. The problem is that this diagnosis doesn't reflect real Alzheimer's, i.e. it doesn't necessarily involve dementia. We would expect such things to find strong, extensive evidence of causality. The problem is that it's extensive evidence of the mutations causing amyloid-β plaques, not dementia. (Also, a warning: this is exactly the sort of detail which overview articles tend to overlook and misstate - e.g. an overview article will say something like "so-and-so found that blah causes dementia" when in fact so-and-so were diagnosing amyloid plaques, not dementia. One does need to check the original papers.)
5AβMale
A distinction is made in the literature between preclinical Alzheimer's (the presence of neuropathology such as amyloid-β, without clinically detectable cognitive symptoms) and clinical Alzheimer's (a particular cluster of cognitive symptoms along with the neuropathologies of Alzheimer's). It's currently believed that Alzheimer's has a 15-20 year preclinical phase, the duration of which, however, can vary based on genetic and other factors. In the case of the mutations I mentioned (which are early-onset causing), clinically-detectable cognitive decline typically starts around the age of 45, and nearly always by the age of 60. One of the only known examples in which symptoms didn't start until a person was in her 70's was so surprising that an entire, highly-cited paper was written about it: Arboleda-Velasquez et al (2019). Resistance to autosomal dominant Alzheimer’s disease in an APOE3 Christchurch homozygote: a case report. Note, however, that the typical cluster of symptoms did eventually occur. Honestly, these particular mutations are so pervasively discussed in the literature, precisely due to their significance to the causal question, that I can tell you have not really engaged with the literature by your unawareness of their existence and the effects that they have on people. I will readily acknowledge, by the way, that by themselves they don't close the book on the causal question: someone could argue that early-onset, autosomal dominant Alzheimer's due to these mutations is essentially a different disease than the much more prevalent late-onset, sporadic Alzheimer's. While I don't think this argument ultimately goes through, and I'd be happy to discuss why, my main point is not that there's no residual question about the the etiology of the disease, but that the research community has intensely, intelligently, and carefully studied the distinction between correlative and causal evidence, as well as the distinction between neuropathology and cognitive sym
1johnswentworth
I'd be interested to read that. (Apologies for lack of citations in the below, I don't have them readily on hand and don't want to go digging right at the moment.) You're right that I never went that deep into the Alzheimer's literature; it's certainly plausible that I overlooked a cluster of actually-competently-executed studies tying Aβ-related genetic mutations to robust dementia outcomes. I did look deeply into at least one study which made that claim (specifically the study which I most often found at the root of citation chains) and it turned out to diagnose using the presence of plaques, not dementia. But that was a paper from the early 90's, so maybe better results have come along since then. However, the absence of evidence for Aβ causing Alzheimer's was not the only thing pinning down my beliefs here. I've also seen papers with positive evidence that Aβ doesn't cause Alzheimer's - i.e. removing plaques doesn't eliminate dementia. And of course there's been literally hundreds of clinical trials with drugs targeting Aβ, and they pretty consistently do not work. So if there is a cluster of genetic studies establishing that Aβ-related mutations are causal for dementia, then the immediate question is how that squares with all the evidence against causality of Aβ for dementia. If the early-onset autosomal dominant version of the disease is in fact a different disease, that would answer the question, but you apparently think otherwise, so I'm curious to hear your case.
6AβMale
In brief, the main reason I don't think the argument works that autosomal-dominant Alzheimer's has a different etiology than sporadic Alzheimer's is that they look, in so many respects, like essentially the same disease, with the same sequence of biomarkers and clinical symptoms: 1. Amyloid pathology starts in the default mode network, and gradually spreads throughout the brain over 15-20 years. 2. It eventually reaches the medial temporal region, where Primary Age-Related Tauopathy is lying in wait. 3. At this point, tau pathology, a prion-like pathology which in Alzheimer's has a very specific conformation, starts spreading from there. The tau protein misfolds in the exact same way in both forms of the disease (Falcon et al (2018). Tau filaments from multiple cases of sporadic and inherited Alzheimer’s disease adopt a common fold), however it misfolds in a different way in the large majority of other known tau pathologies, of which there are a dozen or so (Shi et al (2021). Structure-based classification of tauopathies). 4. Then, neurodegeneration follows in lockstep throughout the brain with the presence of tau pathology, with cognitive deficits matching those expected from the affected brain regions. In particular, since the hippocampal formation is located in the medial temporal region, anterograde amnesia is typically the first symptom in both types of Alzheimer's (unlike many other forms of neurodegeneration, in which other clinical symptoms dominate in the early stages). It's as if two bank robberies occurred two hours apart in the same town, conducted in almost exactly the same manner, and in one we can positively ID the culprit on camera. It's a reasonable conclusion that the culprit in the other case is the same. Some further evidence: * There has been extensive causal mediation modeling, e.g. Hanseeuw et al (2019). Association of Amyloid and Tau With Cognition in Preclinical Alzheimer Disease, which so far as I'm aware always fits the amyloid →
2Hyperion
I happened to be reading this post today, as Science has just published a story on a fabrication scandal regarding an influential paper on amyloid-β: https://www.science.org/content/article/potential-fabrication-research-images-threatens-key-theory-alzheimers-disease I was wondering if this scandal changes the picture you described at all?
3AβMale
Not a ton. I'd also recommend this article, including the discussion in the comments by researchers in the field. A crucial distinction I'd emphasize which is almost always lost in popular discussions is that between the toxic amyloid oligomer hypothesis, that aggregates of amyloid beta are the main direct cause of neurodegeneration; and the ATN hypothesis I described in this thread, that amyloid pathology causes tau pathology and tau pathology causes neurodegeneration. The former is mainly what this research concerns and has been largely discredited in my opinion since approximately 2012; the latter has a mountain of evidence in favor as I've described, and that hasn't really changed now that it's turned out that one line of evidence for an importantly different hypothesis was fabricated.
2johnswentworth
Thanks, that was helpful!
4AβMale
Update today: Biogen/Eisai have reported results from Lecanemab’s phase 3 trial: a slowing of cognitive decline by 27% with a p-value of 0.00005 on the primary endpoint. All other secondary endpoints, including cognitive ones, passed with p-values under 0.01.
3AβMale
Note I've edited the third-to-last paragraph in the above to remove an overly-strong claim about the four antibodies I didn't discuss in detail.
3Ben Pace
In general corrections are good contributions, thanks for your object-level points.

After this comment there was a long thread about AC efficiency.

Summarizing:

  • I said: "In practice I think the actual efficiency loss relative to a 2-hose unit is more like 25-30%" (For cooling from 85 to 70.)
  • John said that this was ridiculous.
  • After the dust settled, our best estimate on paper is 40% rather than 25-30%.

The reason for the adjustments were roughly:

  • [x2] I estimated exhaust temperature at 130 degrees, but it's more like 100 degrees if the indoor air is 70.
  • [x1/2] I thought that all depressurization was compensated for by increased infiltration. But probably half of depressurization is offset by reduced exfiltration instead (see here)
  • [x3/2] I only considered sensible heat. But actually humidity is a huge deal, because the exhaust is heated but not humidified (see here)

John also attempted to measure the loss empirically, but I'd summarize as "too hard to measure":

  • With 1-hose the indoor temp was 68 vs 88 outside, while with 2-hose the indoor temp was 66 vs 88 outside (using the same amount of energy).
  • We both agree that 10% is an underestimate for the efficiency loss (e.g. due to room insulation, other cooling in the building, and the improvised 2-hose setup).
  • I don't think we
... (read more)
6johnswentworth
I endorse this summary.
6johnswentworth
On the physics: to be clear, I'm not saying the air conditioner does not work at all. It does make the room cooler than it started, at equilibrium. I also am not surprised (in this particular example) to hear that various expert sources already account for the inefficiency in their evaluations; it is a problem which should be very obvious to experts. Of course that doesn't apply so well to e.g. the example of medical research replication failures. The air conditioner example is not meant to be an example of something which is really hard to notice for humanity as a whole; it's meant to be an example of something which is too hard for a typical consumer to notice, and we should extrapolate from there to the existence of things which people with more expertise will also not notice (e.g. the medical research example). Also, it's a case-in-point that experts noticing a problem with some product is not enough to remove the economic incentive to produce the product. When the argument specifically includes reasons to expect people to not notice the problem, it seems obviously correct to discount reported experiences. Of course there are still ways to gain evidence from reported experience - e.g. if someone specifically said "this unit cooled even the far corners of the house", then that would partially falsify our theory for why people will overlook the one-hose problem. But we should not blindly trust reports when we have reasons to expect those reports to overlook problems. In this particular case, I indeed do not think the conflict is worth the cost of exploring - it seems glaringly obvious that people are buying a bad product because they are unable to recognize the ways in which it is bad. Positive reports do not contradict this; there is not a conflict here. The model already predicts that there will be positive reports - after all, the air conditioner is very convenient and pumps lots of cool air out the front in very obvious ways.

In this particular case, I indeed do not think the conflict is worth the cost of exploring - it seems glaringly obvious that people are buying a bad product because they are unable to recognize the ways in which it is bad.

The wirecutter recommendation for budget portable ACs is a single-hose model. Until very recently their overall recommendation was also a single-hose model.

The wirecutter recommendations (and other pages discussing this tradeoffs) are based on a combination of "how cold does it make the room empirically?" and quantitative estimates of cooling that take into account infiltration. This issue is discussed extensively, with quantitative detail, by people who quite often end up recommending 1-hose designs for small rooms (like the one this AC is advertised for).

One AC unit tested by the wirecutter is convertible between 2-hose and 1-hose. They write:

The best thing we took away from our tests was the chance at a direct comparison between a single-hose design and a dual-hose design that were otherwise identical, and our experience confirmed our suspicions that dual-hose portable ACs are slightly more effective than single-hose models but not effective enough to make a re

... (read more)
[-]habrykaΩ6120

The best thing we took away from our tests was the chance at a direct comparison between a single-hose design and a dual-hose design that were otherwise identical, and our experience confirmed our suspicions that dual-hose portable ACs are slightly more effective than single-hose models but not effective enough to make a real difference

After having looked into this quite a bit, it does really seem like the Wirecutter testing process had no ability to notice infiltration issues, so it seems like the Wirecutter crew themselves is kind of confused here? 

The... Wirecutter article does also not seem to discuss the issue of infiltration of hot air in any reasonable way. Instead it just says that: 

This produces a slight vacuum effect, which pulls in “infiltration air” from anywhere it can in order to equalize the pressure. In the presence of a gas-powered device such as a furnace, that negative pressure creates a backdraft or downdraft, which can cause the machine to malfunction—or worse, fill the room with gas fumes and carbon monoxide. We don’t think that most people plan to use their portable AC in such a room, but if your home is set up in such a way that you’re concerned ab

... (read more)
4paulfchristiano
They measure the temperature in the room, which captures the effect of negative pressure pulling in hot air from the rest of the building. It underestimates the costs if the rest of the building is significantly cooler than the outside (I'd guess by the ballpark of 20-30% in the extreme case where you care equally about all spaces in the building, the rest of your building is kept at the same temp as the room you are cooling, and a negligible fraction of air exchange with the outside is via the room you are cooling). I think that paragraph is discussing a second reason that infiltration is bad.
4habryka
Yeah, sorry, I didn't mean to imply the section is saying something totally wrong. The section just makes it sound like that is the only concern with infiltration, which seems wrong, and my current model of the author of the post is that they weren't actually thinking through heat-related infiltration issues (though it's hard to say from just this one paragraph, of course). 
6johnswentworth
I roll to disbelieve. I think it is much more likely that something is wrong with their test setup than that the difference between one-hose and two-hose is negligible. Just on priors, the most obvious problem is that they're testing somewhere which isn't hot outside the room - either because they're inside a larger air-conditioned building, or because it's not hot outdoors. Can we check that? Well, they apparently tested it in April 2022, i.e. nowish, which is indeed not hot most places in the US, but can we narrow down the location more? The photo is by Michael Hession, who apparently operates near Boston. Daily high temps currently in the 50's to 60's (Fahrenheit). So yeah, definitely not hot there. Now, if they're measuring temperature delta compared to the outdoors, it could still be a valid test. On the other hand, if it's only in the 50's to 60's outside, I very much doubt that they're trying to really get a big temperature delta from that air conditioner - they'd have to get the room down below freezing in order to get the same temperature delta as a 70 degree room on a 100 degree day. If they're only trying to get a tiny temperature delta, then it really doesn't matter how efficient the unit is. For someone trying to keep a room at 70 on a 100 degree day, it's going to matter a lot more. So basically, I am not buying this test setup. It does not look like it is actually representative of real usage, and it looks nonrepresentative in the basically the ways we'd expect from a test that found little difference between one and two hoses. Generalizable lesson/heuristic: the supposed "experts" are also not even remotely trustworthy. (Also, I expect it to seem like I am refusing to update in the face of any evidence, so I'd like to highlight that this model correctly predicted that the tests were run someplace where it was not hot outside. Had that evidence come out different, I'd be much more convinced right now that one hose vs two doesn't really matter.)

(Also, I expect it to seem like I am refusing to update in the face of any evidence, so I'd like to highlight that this model correctly predicted that the tests were run someplace where it was not hot outside. Had that evidence come out different, I'd be much more convinced right now that one hose vs two doesn't really matter.)

From how we tested:

Over the course of a sweltering summer week in Boston, we set up our five finalists in a roughly 250-square-foot space, taking notes and rating each model on the basic setup process, performance, portability, accessories, and overall user experience.

ETA: it's not clear that's the same testing setup used in the other tests they described. But they do talk about how the 1-vs-2 convertible unit "struggled to make the room any cooler than 70 degrees" which sounds like it was probably reasonably hot.

6johnswentworth
Alright, I am more convinced than I was about the temperature issue, but the test setup still sounds pretty bad. First, Boston does not usually get all that sweltering. I grew up in Connecticut (close to Boston and similar weather), summer days usually peaked in the low 80's. Even if they waited for a really hot week, it was probably in the 90's. A quick google search confirms this: typical July daily high temp is 82, and google says "Overall during July, you should expect about 4-6 days to reach or exceed 90 F (32C) while the all-time record high for Boston was 103 F (39.4C)". It's still a way better test than April (so I'm updating from that), but probably well short of keeping a room at 70 on a 100 degree day. I'm guessing they only had about half that temperature delta. Second, their actual test procedure (thankyou for finding that, BTW): Three feet and six feet away? That sure does sound like they're measuring the temperature right near the unit, rather than the other side of the room where we'd expect infiltration to matter. I had previously assumed they were at least measuring the other side of the room (because they mention for the two-hose recommendation "In our tests, it was also remarkably effective at distributing the cool air, never leaving more than a 1-degree temperature difference across the room"), but apparently "across the room" actually meant "6 feet away" based on this later quote: ... which sure does sound more like what we'd expect. So I'm updating away from "it was just not hot outside" - probably a minor issue, but not a major one. That said, it sure does sound like they were not measuring temperature across the room, and even just between 3 and 6 feet away the two-hose model apparently had noticeably less drop-off in effectiveness.
5paulfchristiano
Boston summers are hotter than the average summers in the US, and I'd guess are well above the average use case for an AC in the US. I agree having two hoses are more important the larger the temperature difference, and by the time you are cooling from 100 to 70 the difference is fairly large (though there is basically nowhere in the US where that difference is close to typical). I'd be fine with a summary of "For users who care about temp in the whole house rather than just the room with the AC, one-hose units are maybe 20% less efficient than they feel. Because this factor is harder to measure than price or the convenience of setting up a one-hose unit, consumers don't give it the attention it deserves. As a result, manufacturers don't make as many cheap two-hose units as they should."
[-]Ben PaceΩ9242

Does anyone in-thread (or reading along) have any experiments they'd be interested in me running with this air conditioner? It doesn't seem at all hard for me to do some science and get empirical data, with a different setup to Wirecutter, so let me know.

Added: From a skim of the thread, it seems to me the experiment that would resolve matters is testing in a large room with temperature sensors more like 15 feet away in a city or country that's very hot outside, and to compare this with (say) Wirecutter's top pick with two-hoses. Confirm?

... I actually already started a post titled "Preregistration: Air Conditioner Test (for AI Alignment!)". My plan was to use the one-hose AC I bought a few years ago during that heat wave, rig up a cardboard "second hose" for it, and try it out in my apartment both with and without the second hose next time we have a decently-hot day. Maybe we can have an air conditioner test party.

Predictions: the claim which I most do not believe right now is that going from one hose to two hose with the same air conditioner makes only a 20%-30% difference. The main metric I'm interested in is equilibrium difference between average room temp and outdoor temp (because that was the main metric relevant when I was using that AC during the heat wave). I'm at about 80% chance that the difference will be over 50%.

(Back-of-the-envelope math a few years ago said it should be roughly a factor-of-two difference, and my median expectation is close to that.)

I also expect (though less strongly) that, assuming the room's doors and windows are closed, corners of the room opposite the AC in single-hose mode will be closer to outdoor temp than to the temp 3 ft away from the AC, and that this will not be the case ... (read more)

6paulfchristiano
I would have thought that the efficiency lost is roughly (outside temp - inside temp) / (exhaust temp - inside temp). And my guess was that exhaust temp is ~130. I think the main way the effect could be as big as you are saying is if that model is wrong or if the exhaust is a lot cooler than I think. Those both seem plausible; I don't understand how AC works, so don't trust that calculation too much. I'm curious what your BOTEC was / if you think 130 is too high an estimate for the exhaust temp?  If that calculation is right, and exhaust is at 130, outside is 100, and house is 70, you'd have 50% loss. But you can't get 50% in your setup this way, since your 2-hose AC definitely isn't going to get the temp below 65 or so. Maybe most plausible 50% scenario would be something like 115 exhaust, 100 outside, 85 inside with single-hose, 70 inside with double-hose. I doubt you'll see effects that big. I also expect the improvised double hose will have big efficiency losses. I think that 20% is probably the right ballpark (e.g. 130/95/85/82). If it's >50% I think my story above is called into question. (Though note that the efficiency lost from one hose is significantly larger than the bottom line "how much does people's intuitive sense of single-hose AC quality overstate the real efficacy?") Your AC could also be unusual. My guess is that it just wasn't close to being able to cool your old apartment and that single vs double-hoses was a relatively small part of that, in which case we'd still see small efficiency wins in this experiment. But it's conceivable that it is unreasonably bad in part because it has an unreasonably low exhaust temp, in which case we might see an unreasonably large benefit from a second hose (though I'd discard that concern if it either had similarly good Amazon reviews or a reasonable quoted SACC).
4johnswentworth
I don't remember what calculation I did then, but here's one with the same result. Model the single-hose air conditioner as removing air from the room, and replacing with a mix of air at two temperatures: TC (the temperature of cold air coming from the air conditioner), and TH (the temperature outdoors). If we assume that TC is constant and that the cold and hot air are introduced in roughly 1:1 proportions (i.e. the flow rate from the exhaust is roughly equal to the flow rate from the cooling outlet), then we should end up with an equilibrium average temperature of TC+TH2. If we model the switch to two-hose as just turning off the stream of hot air, then the equilibrium average temperature should drop to TC. Some notes on this: * It's talking about equilibrium temperature rather than power efficiency, because equilibrium temperature on a hot day was mostly what I cared about when using the air conditioner. * The assumption of roughly-equal flow rates seems to be at least the right order of magnitude based on seeing this air conditioner in operation, though I haven't measured carefully. If anything, it seemed like the exhaust had higher throughput. * The assumption of constant TC is probably the most suspect part.
7paulfchristiano
Ok, I think that ~50% estimate is probably wrong. Happy to bet about outcome (though I think someone with working knowledge of air conditioners will also be able to confirm). I'd bet that efficiency and Delta t will be linearly related and will both be reduced by a factor of about (exhaust - outdoor) / (exhaust - indoor) which will be much more than 50%.
8johnswentworth
I assume you mean much less than 50%, i.e. (T_outside - T_inside) averaged over the room will be less than 50% greater with two hoses than with one? I'm open to such a bet in principle, pending operational details. $1k at even odds? Operationally, I'm picturing the general plan I sketched four comments upthread. (In particular note the three bulleted conditions starting with "The day being hot enough and the room large enough that the AC runs continuously..."; I'd consider it a null result if one of those conditions fails.) LMK if other conditions should be included. Also, you're welcome to come to the Air Conditioner Testing Party (on some hot day TBD). There's a pool at the apartment complex, could swim a bit while the room equilibrates.

I studied the impact of infiltration because of clothes dryers when I was doing energy efficiency consulting. The nonobvious thing that is missing from this discussion is that the infiltration flow rate does not equal the flow rate of the hot air out the window. Basically absent the exhaust flow, there is an equilibrium of infiltration through the cracks in the building equaling the exfiltration through the cracks in the building. When you have a depressurization, this increases the infiltration but also decreases the exfiltration. If the exhaust flow is a small fraction of the initial infiltration, the net impact on infiltration is approximately half as much as the exhaust flow. The rule of thumb for infiltration is it produces about 0.3 air changes per hour, but it depends on the temperature difference to the outside and the wind (and the leakiness of the building). I would guess that if you did this in a house, the exhaust flow would be relatively small compared to the natural infiltration. So roughly the impact due to the infiltration is about half as much as the calculations indicate. But if you were in a tiny tight house, then the exhaust flow would overwhelm the natural infi... (read more)

5paulfchristiano
Thanks! It's amusing that we had this whole discussion and the one commenter who knew what they were talking about got just one upvote :) It sounds very plausible that exhaust is small relative to natural infiltration and I believe you that (extra infiltration) = 50% (exhaust). In the other direction, it looks like I was wrong about 130 degrees and we're looking at more like 100 (alas, googling random forum comments is an imperfect methodology, though I do feel it's plausible that John's AC has unusually cold exhaust). If the building is ending up around 70, that means I'm underestimating the exhaust quantity by about 2x. But then apparently the extra infiltration is only about half of the exhaust. So sounds like the errors cancel out and my initial estimate happens to be roughly right?
3ADifferentAnonymous
Tc does seem like a bad assumption. I tried instead assuming a constant difference between the intake and the cold output, and the result surprised me. (The rest of this comment assumes this model holds exactly, which it definitely doesn't). Let Tr be the temperature of the room (also intake temperature for a one-hose model). Then at equilibrium, Tr=(Tc+Th)/2 Tr=((Tr−Δ)+Th)/2 2Tr=Tr+Th−Δ Tr=Th−Δ i.e. no loss in cooling power at all! (Energy efficiency and time to reach equilibrium would probably be much worse, though) In the case of an underpowered (Δ=15) one-hose unit handling a heat wave (Th=100), you'd get Tr=85 and Tc=70—nice and cool in front of the unit but uncomfortably hot in the rest of the room, just as you observed. Adding a second hose would resolve this disparity in the wrong direction, making Tr=Tc=85. So if you disproportionately care about the area directly in front of the AC, adding the second hose could be actively harmful.
4Raemon
Also, like, Berkeley heat waves may just significantly different than, like, Reno heat waves. My current read is that part of the issue here is that a lot of places don't actually get that hot so having less robustly good air conditioners is fine.
2johnswentworth
I bought my single-hose AC for the 2019 heat wave in Mountain View (which was presumably basically similar to Berkeley). When I was in Vegas, summer was just three months of permanent extreme heat during the day; one does not stay somewhere without built-in AC in Vegas.
2paulfchristiano
I think labeling requirements are based on the expectation of cooling from 95 to 80 (and I expect typical use cases for portable AC are more like that). Actually hot places will usually have central air or window units.
2Ben Pace
Sweet! I could also perform a replication I guess.
2johnswentworth
Or you could get to it before I do and I could perform a replication.
5habryka
It is important to note that the current top wirecutter pick is a 2-hose unit, though one that combined the two hoses into one big hose. I guess maybe that is recent, but it does seem important to acknowledge here (and it wouldn't surprise me that much if Wirecutter went through reasoning pretty similar to the one in this post, and then updated towards the two-hose unit because of concerns about infiltration and looking at more comprehensive metrics like SACC). 

Here is the wirecutter discussion of the distinction for reference:

Starting in 2019, we began comparing dual- and single-hose models according to the same criteria, and we didn’t dismiss any models based on their hose count. Our research, however, ultimately steered us toward single-hose portable models—in part because so many newer models use this design. In fact, we found no compelling new double-hose models from major manufacturers in 2019 or 2020 (although a few new ones cropped up in 2021, including our new top pick). Owner reviews indicate that most people prefer single-hose models, too, since they’re easier to set up and don’t look quite as much like a giant octopus trash sculpture. Although our testing has shown that dual-hose models tend to outperform some single-hose units in extremely hot or muggy weather, the difference is usually minimal, and we don’t think it outweighs the convenience of a single hose.

The one major exception, however, is if you plan on setting up your portable AC in a room with a furnace or hot water heater or anything else that uses combustion. When a single-hose AC model forces air out through its exhaust hose, it can create negative pressure in the

... (read more)
3DirectedEvolution
A/Cs primarily work by using electricity to drive a pressure differential between the cool, low-pressure indoor refrigerant and the hot, high-pressure outdoor refrigerant. It's not just moving air around. PV = nRT! Here's a video explainer. Read carefully, the post doesn't ignore the effect of the evaporator and condenser... ... But it is written in such a way that the reader might come away with the impression that the single-hose A/C has zero net effect on the household temperature. Even the edited-in caveat makes it sound like it might be cooling off the room in which it's located, at the expense of heating up the rest of the house. This reading is reinforced by using the A/C as an analogy for a truly zero-value or destructive AI: We'd need to imagine an A/C that does nothing to net temperature, or that actively heats up the house on net for this analogy to work. Given that I expect more readers here will know about this hypothesis than about the practical details of how an A/C work, I worry they're more likely to see AI as a metaphor for this A/C than this A/C as a metaphor for AI! Note also that regulation could totally fix this particular problem. We could ban single-hose A/Cs; there's a whole nation of HVAC experts who could convey this information, and they're licensed in the USA, so there's already a legal framework for identifying the relevant experts. Waiting also might fix the problem, especially if these people have metered electricity. It's easily possible that they'll notice their high summer electric bill, consider efficiency improvements, look into the A/C, do 10 seconds of research, and invest in the two-hose unit the next time around. When discussing AI, it seems valuable to distinguish more clearly between three scenarios: * Individual AI products truly analogous to an A/C. They are specific services, which can indeed be more or less efficient, and can be chosen badly by ill-informed consumers. We might handle these in a similar way to h
[-]Shmi530

To me this is a metaphor for Alignment research, and LW-style rationality in general, but with an opposite message.

To start, I have this exact AC in my window, and it made a huge difference during last year's heat dome. (I will use metric units in the following, because eff imperial units.) It was around 39-40C last summer, some 15C above average, for a few days, and the A/C cooled the place down by about 10C, which made a difference between livable and unlivable. It was cooler all through the place, not just in the immediate vicinity of the unit. 

How could this happen, in an apparent contradiction to the laws of physics?

Well, three things: 

  • I live in an apartment, so the air coming in is not quite as hot in the hallway as outside, though still pretty warm.
  • The air coming out of the AC exhaust is pretty hot, hotter than the outside most of the time, so there is a definite cooling that happens despite the air influx from outside.
  • The differential air pressure in the hallway is positive regardless of the AC (partly because of the exhaust vents that are always on), so adding AC does not significantly change the air flow.

So, physics is safe! What isn't safe is the theoretical re... (read more)

Ok, I want to say thank you for this comment because it contains a lot of points I strongly agree with. I think the alignment community needs experimental data now more than it needs more theory. However, I don't think this lowers my opinion of MIRI. MIRI, and Eliezer before MIRI even existed yet, was predicting this problem accurately and convincingly enough that people like myself updated. 15 years ago I began studying neuroscience, neuromorphic computing, and machine learning because I believed this was going to become a much bigger deal than it was then. Now the general gist of the message has absolutely been proven out. Machine learning is now a big impressive thing in the world, and scary outcomes are right around the corner. Forecasting that now doesn't win you nearly as many points as forecasting that 15 or 20 years ago. Now we are finally close enough that it makes sense to move from theorizing to experimentation. That doesn't mean the theorizing was useless. It laid an incredible amount of valuable groundwork. It gave the experimental researchers a server of what they are up against. Laid out the scope of the problem, and made helpful pointers towards important characteri... (read more)

3Shmi
Hmm, I agree that Eliezer, MIRI and its precursors did a lot of good work raising the profile of this particular x-risk. However, I am less certain of their theoretical contributions, which you describe as  I guess they did highlight a lot of dead ends, gotta agree with that. I am not sure how much the larger AI/ML community values their theoretical work. Maybe the practitioners haven't caught up yet. Well, whatever the fraction, it certainly seems like it's time to rebalance it, I agree. I don't know if MIRI has the know-how to do experimental work at the level of the rapidly advancing field.
3[anonymous]
I mostly agree with that relying on real world data is necessary for better understanding our messy world and that in most cases this approach is favorable.  There's a part of me that thinks AI is a different case though, since getting it even slightly wrong will be catastrophic. Experimental alignment research might get us most of the way to aligned AI, but there will probably still be issues that aren't noticeable because the AIs we are experimenting on won't be powerful enough to reveal them. Our solution to the alignment problem can't be something imperfect that does the job well enough. Instead is has to be something that can withstand immense optimization pressure. My intuition tells me that the single-hose solution is not enough for AGI and we instead need something that is flawless in practice and in theory. 
2Shmi
I agree that, given MIRI's model of AGI emergence, getting it slightly wrong would be catastrophic. But that's my whole point: experimenting early is strictly better than not, because it reduces the odds of getting some big wrong, as opposed to something small along the way. I had mentioned in another post that https://www.lesswrong.com/posts/mc2vroppqHsFLDEjh/aligned-ai-needs-slack so that there are no "immense optimization pressures". I think that's what Eliezer says, as well, hence his pessimism and focus on "dying with dignity". But we won't know if this intuition is correct without actually testing it experimentally and repeatedly. It might not help because "there is no fire alarm for superintelligence", but the alternative is strictly worse, because the problem is so complex.
2leogao
This is fine for other fields, but the problem with superintelligent alignment is that the things in "move fast and break things" is, like, us. We only have one chance to get superhuman alignment right, which is why we have to design it carefully once. Misaligned systems will try to turn you off to make sure you can't turn them off. I think Eliezer has even said that if we could simply revert the universe after each time we mess up alignment, he would be way less pessimistic. Further, experiments with systems that are not capable enough to kill us yet can provide us with valuable information, but the problem is that much of the difficulty comes up around superintelligence levels. Things are going to break in weird ways that we couldn't have anticipated just by extrapolating out the trends from current systems. So if we just wait for evidence that less robust solutions are not enough, then we will see that less robust solutions seem to work really well on weak current models, pat ourselves on the back for figuring out that actually alignment wouldn't be that hard in X area and then as we approach superintelligence we start noticing X break down (if we're lucky! if X is something like deception, it will try to hide from us and actively avoid being caught by our interpretability tools or whatever) and at that point it will be too late to try and fix the problem, because none of the technical or political solutions are viable in a very short time horizon. Again, to be very clear, I'm not arguing that there is no use at all for empirical experiments today, it's just that there are specific failure cases that are easy to fall into whenever you try to conclude something of the form "and therefore this is some amount of evidence that superintelligence will be more/less likely to be Y and Z"
2Shmi
I agree that we cannot be cavalier about it, but not experimenting is strictly worse than experimenting (not at the expense of theoretical work), because humans are bad at pure theory.
1Alexander Gietelink Oldenziel
The statement 'humans are bad at pure theory' seems to be clearly falsified by the extraordinary theoretical advances of the past, e.g. Einstein.  Whether theoretical or experimental approaches will prove most succesful for AI alignment is an open question. 
2Shmi
It is actually confirmed by this particular case. Special Relativity took some 50 years to form after Maxwell equations were written. General relativity took 500 years to be written down after Galileo's experiment with equal acceleration of falling bodies. AND it took a once in a millennium genius to do that. (Twice, actually, Newton was the other one in physics.)
1Alexander Gietelink Oldenziel
This doesn't look like a serious reply. I fail to see how the achievements of Newton, Maxwell, Einstein do not illustrate the power of theory.
3Shmi
I have nothing to add to my previous message, other than 500 years to come up with a theory is a long time.
2johnswentworth
Just added a clarification to the post: I do believe the analysis in the post is in fact correct, and the success of this air conditioner is primarily due to consumers not recognizing the problem. Would you have spent an extra, say, $30 on a two-hose air conditioner if you had noticed the issue in advance? (BTW, I also bought a one-hose air conditioner off amazon a few years back, which is where this example came from. When I realized there was only one hose, I was absolutely flabbergasted that anyone would even bother to build such an obviously stupid thing. And it indeed did a pretty shitty job cooling my apartment!)
5Shmi
Looks like you did the usual iterative approach: bought an AC, saw that it doesn't work as expected, did the analysis, figured out what is wrong, and corrected your model of what works in your situation, then bought a better AC.
3philh
I read John as saying steps two and three here were reversed. He bought an AC, realized before trying it that it wouldn't work, then tested and saw that (as expected) it didn't work.
2johnswentworth
That's true! When I opened the box, I first dug around looking for the second hose. Then I thought they must have made a mistake and not sent the second hose. Then eventually I noticed that the AC only had one hose-slot, and the pictures only had one hose, and I was just very confused as to why on earth someone would build a portable air conditioner with only one hose.

Um, the single-hose air conditioners do in fact work passably, probably because they're designed to minimize the volume of air exhausted compared to the amount circulated. The air you're blowing out is way hotter than the air you're drawing in. This makes the heat pump work harder, but it reduces the air exchange problem.

And a lot of structures already have huge amounts of air exchange going on anyhow. And, by the way, a lot of uncooled structures actually do run hotter on the inside than the temperature of the environment, so the air you're drawing in may not be all that hot depending on where it's coming from and when you run the machine.

And the market has noticed that the single hose design is inefficient, which is why there are two-hose ones available. In fact, if I were writing a review, I probably wouldn't bother to mention the matter because I'd assume everybody already knew about the issue. That's even though I do in fact buy two-hose models for exactly the reasons you describe.

Perhaps people are dumb, but they are not as dumb as you are making them out to be. And I think I have to add that an awful lot of "rationalists" are very fond of talking about how everything is stupid, without in fact having studied the matters in question closely enough to really be allowed opinions...

The fact that you chose to use your superior knowledge to buy the much better air conditioner, while also choosing to not leave a review explaining this, is an illustration of OP's point, and not a refutation.

8jbash
If I needed a more compact unit, I might buy a one-hoser. If I had a more limited budget (and didn't expect to run the thing all the time) I might buy a one-hoser. If I had very limited space in which to run the hose, I might buy a one-hoser. Absolute maximum efficiency may just not be at the top of most people's lists. ... and you don't really have much of a basis to say that I do have "superior knowledge" compared to most other potential buyers. Looking at the reviews doesn't really get you there, since you don't know that clue is independent of tendency to write reviews, and many of the reviews are probably fake.
5gwern
You didn't say any of that before. And you didn't show that any of that justified the Amazon reviews either, as is necessary to refute OP, and you will have a hard time doing so given that none of the reviews explain the disadvantage those advantages may or may not offset. OK, so let's say you don't have any superior knowledge when you state things about air conditioners. (I am willing to agree that no one should believe your claims if you want to claim that.) Then why do you believe all the things you said about single-hose air conditioners working or about the exchanging or about what reviewers (and non-reviewers) do or do not know, especially when, as you agree with OP, none of the reviews mention this? This again goes to illustrate OP and not refute it.
9jbash
Yes, that's true, I didn't go into detail on irrelevant side issues, or take up space saying things that a reasonable reader would have assumed anyhow. That's for much the same reason that I wouldn't waste people's time with Amazon reviews saying nothing but "Well, ACKCHUALLY, two-hose air conditioners are MUCH more efficient (you ignorant plebs).". The most reasonable hypothesis is that people know their own priorities better than I do, so I shouldn't make an ass of myself. Also, of course, even if they're wrong, I'm unlikely to persuade them, but that's a separate matter. You might want to think about why you did not just naturally assume those things. I have adequate knowledge. I am willing to assume that most other buyers of air conditioners also have, or will independently seek out, adequate knowledge. Adequate knowledge includes everything I said. I was claiming that OP was pontificating based on inadequate knowledge. Less than my own and very possibly less than that of the average Amazon air conditioner buyer. Basically getting a nasty case of engineer's disease and treating a basic first-order understanding as if it were special expertise, in a context where most other people might very well have understanding superior to OP's own. And if buyers did not have "adequate" understanding in the area of efficiency, one very strong candidate explanation for that would be that they didn't care so much about efficiency compared to other things. Especially because, even if they didn't understand the whole airflow pattern, they could act on the published efficiency ratings, which take airflow into account. If they're buying something with a lower numerical headline efficiency rating, then you have to assume that they're real idiots to arrive at the idea that efficiency is their main concern. ... all of which I would have let pass if smug, arrogant, supercilious dismissiveness were not a common problem that alienates a lot of people from "rationalists" and their
2johnswentworth
Just added this clarification to the post:
[-]jessicataΩ11160

Regarding the back-and-forth on air conditioners, I tried Google searching to find a precedent for this sort of analysis; the first Google result was "air conditioner single vs. dual hose" was this blog post, which acknowledges the inefficiency johnswentworth points out, overall recommends dual-hose air conditioners, but still recommends single-hose air conditioners under some conditions, and claims the efficiency difference is only about 12%.

Highlights:

In general, a single-hose portable air conditioner is best suited for smaller rooms. The reason being is because if the area you want to cool is on the larger side, the unit will have to work much harder to cool the space.

So how does it work? The single-hose air conditioner yanks warm air and moisture from the room and expels it outside through the exhaust. A negative pressure is created when the air is pushed out of the room, the air needs to be replaced. In turn, any opening in the house like doors, windows, and cracks will draw outside hot air into the room to replace the missing air. The air is cooled by the unit and ejected into the room.

...

Additionally, the single-hose versions are usually less expensive than their dual-hose

... (read more)
3habryka
EER does not account for heat infiltration issues, so this seems confused. CEER does, and that does suggest something in the 20% range, but I am pretty sure you can't use EER to compare a single-hose and a dual-hose system.
3jessicata
I assumed EER did account for that based on:
5habryka
This article explains the difference: https://www.consumeranalysis.com/guides/portable-ac/best-portable-air-conditioner/ EER measures performance in BTUs, which are simply measuring how much work the AC performs, without taking into account any backflow of cold air back into the AC, or infiltration issues.

I think I'm missing the most important part of this debate. How does the second hose help? The air outside is hot; with one hose, hot air enters the house because of the vacuum effect; with two hoses, the second hose explicitly sucks in air from the outside... which is still hot. Where is the difference?

With two hoses, the air sucked in never mixes with the cool air in the room; it's kept completely separate. Only heat is exchanged by the AC, not air.

2philh
From the wirecutter test conditions above it sounds like these are also meant to dehumidify: With one hose, you presumbly get that for free if the air inside is more humid than the air that replaces it. With two hoses, since you're not mixing air, do you still dehumidify? (If anything I'd expect the opposite, since the same amount of water vapor is apparently higher humidity in a cool room than a hot room.)
2jessicata
Wouldn't the AC unit have to intake cool air from the room (since it's expelling cold air into the room), and mix the cool air with the warm outside air? (Maybe the numbers work out differently in this condition but I'm not convinced yet, would have to see a calculation)

A two hose AC does take in both indoor and outdoor air, but they never mix. (The two hoses both carry outdoor air; indoor air is pumped through two vents in the AC.) The AC just pumps heat from the indoor air to the outdoor air. Similar to a fridge.

The nonobvious problems are the whole reason why AI alignment is hard in the first place.

I disagree with the implication that there’s nothing to worry about on the “obvious problems” side.

An out-of-control AGI self-reproducing around the internet, causing chaos and blackouts etc., is an “obvious problem”. I still worry about it.

After all, consider this: an out-of-control virus self-reproducing around the human population, causing death and disability etc., is also an “obvious problem”.  We already have this problem; we’ve had this problem for millennia! And yet, we haven’t solved it!

(It’s even worse than that—it’s an obvious problem with obvious mitigations, e.g. end gain-of-function research, and we’re not even doing that.)

6johnswentworth
There is an important difference here between "obvious in advance" and "obvious in hindsight", but your basic point is fair, and the virus example is a good one. Humanity's current state is indeed so spectacularly incompetent that even the obvious problems might not be solved, depending on how things go.
8Steven Byrnes
I would say “Humanity's current state is so spectacularly incompetent that even the obvious problems with obvious solutions might not be solved”. If humanity were not spectacularly incompetent, then maybe we wouldn't have to worry about the obvious problems with obvious solutions. But we would still need to worry about the obvious problems with extremely difficult and non-obvious solutions.

I find it funny that there's more discussion in the comments section of the details of how single-hose air conditioners work compared to the object-level claims made in the post about the difficulty distribution of problems that are likely to come up in AI alignment.

I interpreted the air conditioning story as a fable meant to illustrate a point, not as Bayesian evidence for us to use in order to update towards a particular view. Are people here reading the post through a different lens?

[-]gjm470

No, they're trying to avoid generalizing from fictional evidence. John is offering the Fable of the Air Conditioners as an example of a particular phenomenon that he says also applies to the AI alignment problem. If his chosen example of this phenomenon is not in fact a good example of the phenomenon, then one might reasonably be less inclined to believe that the phenomenon is as common and as important as he suggests it is, and/or less inclined to believe what he says about the phenomenon.