The things which you describe are related to safety of normal technologies, like nuclear power plants. However, AI is an especially abnormal technology believed to be capable of destroying the entire mankind. Compare it, for example, with nuclear weapons which were once conjectured to be able to burn the Earth's atmosphere[1] and which can now be used in World War III having the potential to destroy mankind or at least the Northern Hemisphere. These two counterexamples imply that one should not iterate on a technology unless unintended use is guaranteed to leave survivors.
Returning to the AIs, the endgoal of the industry is to create the aligned ASI and have it steer the world towards the future desired by the creators. The obvious failure mode would be to have a misaligned ASI pass all the safeguards and kill mankind while having us believe that the ASI is aligned. Did you mean that incremental AI safety is about decreasing the chance that the ASI is misaligned and uncaught?
However, I doubt that anyone actually believed that the atmosphere would be burned.
Reality is a dangerous place. From the dawn of humanity we have faced the hazards of nature: fire, flood, disease, famine. Better technology and infrastructure have made us safer from many of these risks—but have also created new risks, from boiler explosions to carcinogens to ozone depletion, and exacerbated old ones.
Safety, security, and resilience against these hazards is not the default state of humanity. It is an achievement, and in each case it came about deliberately.
A striking theme from the history of such achievements is that there is rarely if ever a silver bullet for risk. Safety is achieved through defense in depth, and through the orchestration of a wide variety of solutions, all working in concert.
Recently, in a private talk, I gave a historical example: the history of fire safety. It resonated so strongly with the audience that I’m writing it up here for wider distribution.
Up until and through the 1800s, city fires were a great hazard. Neighborhoods were full of densely packed wooden structures without flame-retardant chemicals, fire alarms, or sprinkler systems; open flames were used everywhere for lighting, heating, and cooking; there were no best practices in place for storing or handling combustible materials; fire departments lacked training and discipline, and they worked with inadequate equipment and insufficient water supply. All this meant that large swaths of cities regularly burned to the ground: Rome in AD 64; Constantinople in 406; London in 1135, 1212, and 1666; Hangzhou 1137; Amsterdam 1421 and 1452; Stockholm 1625 and 1759; Nagasaki 1663; Boston 1711, 1760, 1787, and 1872; New York 1776, 1835, and 1845; New Orleans 1788 and 1794; Pittsburgh 1845; Chicago 1871; Seattle 1889; Shanghai 1894; Baltimore 1904; Atlanta 1917; and Tokyo 1923 are just a short list of the most well-known.
Fire is not unknown today, but it is far less lethal, and great city fires consuming multiple blocks are largely a thing of the past. Today, if you see a fire truck on the street with its sirens blaring, it is more likely to be responding to an emergency medical call than to a fire. Even if the truck is responding to a fire call, it is more like likely to be a false alarm than an actual fire.
How was this achieved?
Better fire-fighting. Pumps to douse fires with water have existed since antiquity, but for most of history they were man-powered. With the Industrial Revolution, we got steam-powered and later diesel-powered pumps that can deliver much greater throughput of water, and at greater muzzle velocities to reach higher floors of buildings. In the 20th century, horse-drawn fire engines were replaced with fire trucks that could get around the city faster and more reliably.
A high-throughput engine, however, needs a high-volume source of water. In ancient and medieval times, water was provided by the bucket brigade: two lines of people stretching from the fire to the nearest lake or river, passing buckets by hand in both directions. A much better solution was the fire hose, invented in the late 1600s (and improved in strength and reliability over the centuries through better materials, manufacturing, quality control). The fire hose not only allowed a fire engine to be connected to a water source, it also allowed the fire-fighters to get in closer to the base of the fire and dump water directly on it, which is far more effective than just spraying the building from the outside.
A fire hose can be inserted into a natural water source like a pond or cistern, but one of these might not be handy nearby, and they aren’t pressurized, so all the pumping force has to be supplied by the fire engine. They also contain debris that can clog the intake and block the flow. Eventually, cities were outfitted with regularly spaced fire hydrants connected to the municipal water supply. A water system designed to supply city residents with daily needs, however, often proved inadequate in an emergency; these systems had to be upgraded to supply the large bursts that big fires demanded. This is a matter of serious engineering: 19th-century fire-fighting journals are full of technical details and mathematical calculations attempting to precisely nail down questions of optimal hydrant distribution or nozzle size, or the pressure required to force a certain volume of water to a given height at a particular angle.
Finally, fire-fighting teams needed improved organization. Traditionally, fire-fighters were volunteers, often rowdy young men with no training or discipline (there is at least one story of a fist fight breaking out between two rival teams who arrived at a fire at the same time). In the 19th century, fire departments were professionalized and were organized more formally, along almost military lines, as befits responders to a life-threatening emergency.
Faster alarming. Fire, like many of our most dangerous hazards, is a chain reaction. Chain reactions grow exponentially, which means early detection and response time are crucial. Traditionally, fires were spotted by watchmen, either on patrol or from a watch tower, who then had to run, shout, or ring bells or other alarms to alert the fire fighters.
Electronic communications, first via telegraph and later telephone, provided a much faster way to get the alarm to the fire department. The telephone lines could be busy, however, so in the 20th century the 911 emergency response system was created to provide a priority channel.
Far better than having a human sound the alarm, however, is doing it automatically. Smoke detectors and other automatic fire alarms caused the fire to “tell on itself,” saving valuable minutes or even hours. Even more effective was the automatic sprinkler, which combined detection and response into one near-instant system.
Reducing open flames. Better than fighting fires, of course, is preventing them. Before the 20th century, flames from candles and oil or gas lamps provided lighting, and fires in wood- or coal-burning stoves provided heat for building, cooking, and industrial processes. The Great London Fire of 1666 is said to have started in a baker’s shop, Copenhagen 1728 was blamed on an upset candle, Pittsburgh 1845 came from an unattended fire in a shed. Even worse, people often kept these fires going unattended overnight, because even starting a fire was difficult before the invention of matches. Medieval regulations required city- and town-dwellers to cover their fires after a certain hour (the word “curfew” derives from the French couvre-feu, “cover the fire”).
Electric lighting and heating greatly reduced this risk. Electric sparks, however, were also a fire hazard—and initially, electrical installations increased rather than decreased fire risk, owing to shoddy electrical products, fixtures, and wiring. The solution here was improved standards, testing, and certification: the fire insurance companies created an organization, Underwriters Laboratories, specifically for this purpose, and its label became a highly valued marker of quality. (I told the story of UL in The Techno-Humanist Manifesto.) Today, our electronics and appliances are so safe that arson is the cause of more fires than either of them.
Safer construction. Preventing fires by eliminating the sparks or flames that ignite them is like lining up dominoes and then trying hard to make sure the first one never gets tipped over: a fragile proposition. Far more robust is to remove their fuel. Wood construction was widespread through the late 19th century, even in dense city neighborhoods: Daniel Defoe wrote that before the Great London Fire of 1666, “the Buildings looked as if they had been formed to make one general Bonfire.”
Today our cities are built of incombustible brick, stone, and concrete. Building codes enforce safety practices to slow the spread of fire both within a building and between buildings. They specify the quality of materials such as brick, mortar, cement, timber, and iron, including the specific tests it must pass; the materials for walls, and their minimum thickness; and the height of non-fireproof structures; among many other details.
Saving lives. By the early 1900s, in advanced societies, the problem of large city fires that spread over many blocks had mostly been solved; fires were often contained to a single building. That was small comfort, however, for those trapped inside the building. Tragedies such as the Iroquois Theatre Fire of 1903 and the Triangle Shirtwaist Fire of 1911 taught us valuable lessons. Exit paths must be adequate to evacuate entire buildings. Doors must remain unlocked, and they should open outwards in case a stampede presses up against them. Fire-resistant material must be used not only for the construction of the building, but for the interior: sofas, beds, curtains, carpets, wallpaper, paneling. Again, building and safety codes specify and enforce these practices.
So fire safety was achieved through the combination of:
This is a general pattern. Safety requires:
We see the same thing in other domains. Road safety, for instance, was achieved through seat belts, anti-lock brakes, crumple zones, air bags, turn signals, windshield wipers, traffic lights, divided highways, driver’s education, driver’s licensing, and moral campaigns against drunk driving. No silver bullet.
When we think about creating safety and resilience from emerging technologies, such as AI or biotech, we should expect the same pattern. Safety will be created gradually, incrementally, through multiple layers of defense, and by orchestrating a wide combination of products, systems, techniques, and norms.
In particular, there is a line of thinking within the AI safety community that tends to dismiss or reject any proposal that isn’t ultimate—fully robust against the most powerful imaginable AI. There’s a good rationale for this: it’s easy to fall victim to hope and cope, and to lull ourselves into a false sense of security based on half-measures that were “the best we could do”; vulnerabilities are often invisible and are revealed dramatically in disasters; such disasters may be sufficiently catastrophic that we can’t afford to learn from mistakes. But I find the all-or-nothing thinking about AI safety counterproductive. We should embrace every idea that can provide any increment of security. History suggests that the accumulation and combination of such incremental solutions is the path to resilience.
Selected sources and further reading:
Historical and primary sources: