Security Mindset and the Logistic Success Curve

Eliezer Yudkowsky

Follow-up to: Security Mindset and Ordinary Paranoia

(Two days later, Amber returns with another question.)

amber: Uh, say, Coral. How important is security mindset when you're building a whole new kind of system—say, one subject to potentially adverse optimization pressures, where you want it to have some sort of robustness property?

coral: How novel is the system?

amber: Very novel.

coral: Novel enough that you'd have to invent your own new best practices instead of looking them up?

amber: Right.

coral: That's serious business. If you're building a very simple Internet-connected system, maybe a smart ordinary paranoid could look up how we usually guard against adversaries, use as much off-the-shelf software as possible that was checked over by real security professionals, and not do too horribly. But if you're doing something qualitatively new and complicated that has to be robust against adverse optimization, well... mostly I'd think you were operating in almost impossibly dangerous territory, and I'd advise you to figure out what to do after your first try failed. But if you wanted to actually succeed, ordinary paranoia absolutely would not do it.

amber: In other words, projects to build novel mission-critical systems ought to have advisors with the full security mindset, so that the advisor can say what the system builders really need to do to ensure security.

coral: (laughs sadly) No.

amber: No?

coral: Let's say for the sake of concreteness that you want to build a new kind of secure operating system. That is not the sort of thing you can do by attaching one advisor with security mindset, who has limited political capital to use to try to argue people into doing things. “Building a house when you're only allowed to touch the bricks using tweezers” comes to mind as a metaphor. You're going to need experienced security professionals working full-time with high authority. Three of them, one of whom is a cofounder. Although even then, we might still be operating in the territory of Paul Graham's Design Paradox.

amber: Design Paradox? What's that?

coral: Paul Graham's Design Paradox is that people who have good taste in UIs can tell when other people are designing good UIs, but most CEOs of big companies lack the good taste to tell who else has good taste. And that's why big companies can't just hire other people as talented as Steve Jobs to build nice things for them, even though Steve Jobs certainly wasn't the best possible designer on the planet. Apple existed because of a lucky history where Steve Jobs ended up in charge. There's no way for Samsung to hire somebody else with equal talents, because Samsung would just end up with some guy in a suit who was good at pretending to be Steve Jobs in front of a CEO who couldn't tell the difference.

Similarly, people with security mindset can notice when other people lack it, but I'd worry that an ordinary paranoid would have a hard time telling the difference, which would make it hard for them to hire a truly competent advisor. And of course lots of the people in the larger social system behind technology projects lack even the ordinary paranoia that many good programmers possess, and they just end up with empty suits talking a lot about “risk” and “safety”. In other words, if we're talking about something as hard as building a secure operating system, and your project hasn't started up already headed up by someone with the full security mindset, you are in trouble. Where by “in trouble” I mean “totally, irretrievably doomed”.

amber: Look, uh, there's a certain project I'm invested in which has raised a hundred million dollars to create merchant drones.

coral: Merchant drones?

amber: So there are a lot of countries that have poor market infrastructure, and the idea is, we're going to make drones that fly around buying and selling things, and they'll use machine learning to figure out what prices to pay and so on. We're not just in it for the money; we think it could be a huge economic boost to those countries, really help them move forwards.

coral: Dear God. Okay. There are exactly two things your company is about: system security, and regulatory compliance. Well, and also marketing, but that doesn't count because every company is about marketing. It would be a severe error to imagine that your company is about anything else, such as drone hardware or machine learning.

amber: Well, the sentiment inside the company is that the time to begin thinking about legalities and security will be after we've proven we can build a prototype and have at least a small pilot market in progress. I mean, until we know how people are using the system and how the software ends up working, it's hard to see how we could do any productive thinking about security or compliance that wouldn't just be pure speculation.

coral: Ha! Ha, hahaha… oh my god you're not joking.

amber: What?

coral: Please tell me that what you actually mean is that you have a security and regulatory roadmap which calls for you to do some of your work later, but clearly lays out what work needs to be done, when you are to start doing it, and when each milestone needs to be complete. Surely you don't literally mean that you intend to start thinking about it later?

amber: A lot of times at lunch we talk about how annoying it is that we'll have to deal with regulations and how much better it would be if governments were more libertarian. That counts as thinking about it, right?

coral: Oh my god.

amber: I don't see how we could have a security plan when we don't know exactly what we'll be securing. Wouldn't the plan just turn out to be wrong?

coral: All business plans for startups turn out to be wrong, but you still need them—and not just as works of fiction. They represent the written form of your current beliefs about your key assumptions. Writing down your business plan checks whether your current beliefs can possibly be coherent, and suggests which critical beliefs to test first, and which results should set off alarms, and when you are falling behind key survival thresholds. The idea isn't that you stick to the business plan; it's that having a business plan (a) checks that it seems possible to succeed in any way whatsoever, and (b) tells you when one of your beliefs is being falsified so you can explicitly change the plan and adapt. Having a written plan that you intend to rapidly revise in the face of new information is one thing. NOT HAVING A PLAN is another.

amber: The thing is, I am a little worried that the head of the project, Mr. Topaz, isn't concerned enough about the possibility of somebody fooling the drones into giving out money when they shouldn't. I mean, I've tried to raise that concern, but he says that of course we're not going to program the drones to give out money to just anyone. Can you maybe give him a few tips? For when it comes time to start thinking about security, I mean.

coral: Oh. Oh, my dear, sweet summer child, I'm sorry. There's nothing I can do for you.

amber: Huh? But you haven't even looked at our beautiful business model!

coral: I thought maybe your company merely had a hopeless case of underestimated difficulties and misplaced priorities. But now it sounds like your leader is not even using ordinary paranoia, and reacts with skepticism to it. Calling a case like that “hopeless” would be an understatement.

amber: But a security failure would be very bad for the countries we're trying to help! They need secure merchant drones!

coral: Then they will need drones built by some project that is not led by Mr. Topaz.

amber: But that seems very hard to arrange!

coral: ...I don't understand what you are saying that is supposed to contradict anything I am saying.

amber: Look, aren't you judging Mr. Topaz a little too quickly? Seriously.

coral: I haven't met him, so it's possible you misrepresented him to me. But if you've accurately represented his attitude? Then, yes, I did judge quickly, but it's a hell of a good guess. Security mindset is already rare on priors. “I don't plan to make my drones give away money to random people” means he's imagining how his system could work as he intends, instead of imagining how it might not work as he intends. If somebody doesn't even exhibit ordinary paranoia, spontaneously on their own cognizance without external prompting, then they cannot do security, period. Reacting indignantly to the suggestion that something might go wrong is even beyond that level of hopelessness, but the base level was hopeless enough already.

amber: Look... can you just go to Mr. Topaz and try to tell him what he needs to do to add some security onto his drones? Just try? Because it's super important.

coral: I could try, yes. I can't succeed, but I could try.

amber: Oh, but please be careful to not be harsh with him. Don’t put the focus on what he’s doing wrong—and try to make it clear that these problems aren’t too serious. He's been put off by the media alarmism surrounding apocalyptic scenarios with armies of evil drones filling the sky, and it took me some trouble to convince him that I wasn't just another alarmist full of fanciful catastrophe scenarios of drones defying their own programming.

coral: ...

amber: And maybe try to keep your opening conversation away from what might sound like crazy edge cases, like somebody forgetting to check the end of a buffer and an adversary throwing in a huge string of characters that overwrite the end of the stack with a return address that jumps to a section of code somewhere else in the system that does something the adversary wants. I mean, you've convinced me that these far-fetched scenarios are worth worrying about, if only because they might be canaries in the coal mine for more realistic failure modes. But Mr. Topaz thinks that's all a bit silly, and I don't think you should open by trying to explain to him on a meta level why it isn't. He'd probably think you were being condescending, telling him how to think. Especially when you're just an operating-systems guy and you have no experience building drones and seeing what actually makes them crash. I mean, that's what I think he'd say to you.

coral: ...

amber: Also, start with the cheaper interventions when you're giving advice. I don't think Mr. Topaz is going to react well if you tell him that he needs to start all over in another programming language, or establish a review board for all code changes, or whatever. He's worried about competitors reaching the market first, so he doesn't want to do anything that will slow him down.

coral: ...

amber: Uh, Coral?

coral: ... on his novel project, entering new territory, doing things not exactly like what has been done before, carrying out novel mission-critical subtasks for which there are no standardized best security practices, nor any known understanding of what makes the system robust or not-robust.

amber: Right!

coral: And Mr. Topaz himself does not seem much terrified of this terrifying task before him.

amber: Well, he's worried about somebody else making merchant drones first and misusing this key economic infrastructure for bad purposes. That's the same basic thing, right? Like, it demonstrates that he can worry about things?

coral: It is utterly different. Monkeys who can be afraid of other monkeys getting to the bananas first are far, far more common than monkeys who worry about whether the bananas will exhibit weird system behaviors in the face of adverse optimization.

amber: Oh.

coral: I'm afraid it is only slightly more probable that Mr. Topaz will oversee the creation of robust software than that the Moon will spontaneously transform into organically farmed goat cheese.

amber: I think you're being too harsh on him. I've met Mr. Topaz, and he seemed pretty bright to me.

coral: Again, assuming you're representing him accurately, Mr. Topaz seems to lack what I called ordinary paranoia. If he does have that ability as a cognitive capacity, which many bright programmers do, then he obviously doesn't feel passionate about applying that paranoia to his drone project along key dimensions. It also sounds like Mr. Topaz doesn't realize there's a skill that he is missing, and would be insulted by the suggestion. I am put in mind of the story of the farmer who was asked by a passing driver for directions to get to Point B, to which the farmer replied, “If I was trying to get to Point B, I sure wouldn't start from here.”

amber: Mr. Topaz has made some significant advances in drone technology, so he can't be stupid, right?

coral: "Security mindset" seems to be a distinct cognitive talent from g factor or even programming ability. In fact, there doesn’t seem to be a level of human genius that even guarantees you’ll be skilled at ordinary paranoia. Which does make some security professionals feel a bit weird, myself included—the same way a lot of programmers have trouble understanding why not everyone can learn to program. But it seems to be an observational fact that both ordinary paranoia and security mindset are things that can decouple from g factor and programming ability—and if this were not the case, the Internet would be far more secure than it is.

amber: Do you think it would help if we talked to the other VCs funding this project and got them to ask Mr. Topaz to appoint a Special Advisor on Robustness reporting directly to the CTO? That sounds politically difficult to me, but it's possible we could swing it. Once the press started speculating about drones going rogue and maybe aggregating into larger Voltron-like robots that could acquire laser eyes, Mr. Topaz did tell the VCs that he was very concerned about the ethics of drone safety and that he'd had many long conversations about it over lunch hours.

coral: I'm venturing slightly outside my own expertise here, which isn't corporate politics per se. But on a project like this one that's trying to enter novel territory, I'd guess the person with security mindset needs at least cofounder status, and must be personally trusted by any cofounders who don't have the skill. It can't be an outsider who was brought in by VCs, who is operating on limited political capital and needs to win an argument every time she wants to not have all the services conveniently turned on by default. I suspect you just have the wrong person in charge of this startup, and that this problem is not repairable.

amber: Please don't just give up! Even if things are as bad as you say, just increasing our project's probability of being secure from 0% to 10% would be very valuable in expectation to all those people in other countries who need merchant drones.

coral: ...look, at some point in life we have to try to triage our efforts and give up on what can't be salvaged. There's often a logistic curve for success probabilities, you know? The distances are measured in multiplicative odds, not additive percentage points. You can't take a project like this and assume that by putting in some more hard work, you can increase the absolute chance of success by 10%. More like, the odds of this project's failure versus success start out as 1,000,000:1, and if we're very polite and navigate around Mr. Topaz's sense that he is higher-status than us and manage to explain a few tips to him without ever sounding like we think we know something he doesn't, we can quintuple his chances of success and send the odds to 200,000:1. Which is to say that in the world of percentage points, the odds go from 0.0% to 0.0%. That's one way to look at the “law of continued failure”.

If you had the kind of project where the fundamentals implied, say, a 15% chance of success, you’d then be on the right part of the logistic curve, and in that case it could make a lot of sense to hunt for ways to bump that up to a 30% or 80% chance.

amber: Look, I'm worried that it will really be very bad if Mr. Topaz reaches the market first with insecure drones. Like, I think that merchant drones could be very beneficial to countries without much existing market backbone, and if there's a grand failure—especially if some of the would-be customers have their money or items stolen—then it could poison the potential market for years. It will be terrible! Really, genuinely terrible!

coral: Wow. That sure does sound like an unpleasant scenario to have wedged yourself into.

amber: But what do we do now?

coral: Damned if I know. I do suspect you're screwed so long as you can only win if somebody like Mr. Topaz creates a robust system. I guess you could try to have some other drone project come into existence, headed up by somebody that, say, Bruce Schneier assures everyone is unusually good at security-mindset thinking and hence can hire people like me and listen to all the harsh things we have to say. Though I have to admit, the part where you think it's drastically important that you beat an insecure system to market with a secure system—well, that sounds positively nightmarish. You're going to need a lot more resources than Mr. Topaz has, or some other kind of very major advantage. Security takes time.

amber: Is it really that hard to add security to the drone system?

coral: You keep talking about “adding” security. System robustness isn't the kind of property you can bolt onto software as an afterthought.

amber: I guess I'm having trouble seeing why it's so much more expensive. Like, if somebody foolishly builds an OS that gives access to just anyone, you could instead put a password lock on it, using your clever system where the OS keeps the hashes of the passwords instead of the passwords. You just spend a couple of days rewriting all the services exposed to the Internet to ask for passwords before granting access. And then the OS has security on it! Right?

coral: NO. Everything inside your system that is potentially subject to adverse selection in its probability of weird behavior is a liability! Everything exposed to an attacker, and everything those subsystems interact with, and everything those parts interact with! You have to build all of it robustly! If you want to build a secure OS you need a whole special project that is “building a secure operating system instead of an insecure operating system”. And you also need to restrict the scope of your ambitions, and not do everything you want to do, and obey other commandments that will feel like big unpleasant sacrifices to somebody who doesn't have the full security mindset. OpenBSD can’t do a tenth of what Ubuntu does. They can't afford to! It would be too large of an attack surface! They can't review that much code using the special process that they use to develop secure software! They can't hold that many assumptions in their minds!

amber: Does that effort have to take a significant amount of extra time? Are you sure it can't just be done in a couple more weeks if we hurry?

coral: YES. Given that this is a novel project entering new territory, expect it to take at least two years more time, or 50% more development time—whichever is less—compared to a security-incautious project that otherwise has identical tools, insights, people, and resources. And that is a very, very optimistic lower bound.

amber: This story seems to be heading in a worrying direction.

coral: Well, I'm sorry, but creating robust systems takes longer than creating non-robust systems even in cases where it would be really, extraordinarily bad if creating robust systems took longer than creating non-robust systems.

amber: Couldn't it be the case that, like, projects which are implementing good security practices do everything so much cleaner and better that they can come to market faster than any insecure competitors could?

coral: … I honestly have trouble seeing why you’re privileging that hypothesis for consideration. Robustness involves assurance processes that take additional time. OpenBSD does not go through lines of code faster than Ubuntu.

But more importantly, if everyone has access to the same tools and insights and resources, then an unusually fast method of doing something cautiously can always be degenerated into an even faster method of doing the thing incautiously. There is not now, nor will there ever be, a programming language in which it is the least bit difficult to write bad programs. There is not now, nor will there ever be, a methodology that makes writing insecure software inherently slower than writing secure software. Any security professional who heard about your bright hopes would just laugh. Ask them too if you don't believe me.

amber: But shouldn't engineers who aren't cautious just be unable to make software at all, because of ordinary bugs?

coral: I am afraid that it is both possible, and extremely common in practice, for people to fix all the bugs that are crashing their systems in ordinary testing today, using methodologies that are indeed adequate to fixing ordinary bugs that show up often enough to afflict a significant fraction of users, and then ship the product. They get everything working today, and they don't feel like they have the slack to delay any longer than that before shipping because the product is already behind schedule. They don't hire exceptional people to do ten times as much work in order to prevent the product from having holes that only show up under adverse optimization pressure, that somebody else finds first and that they learn about after it's too late.

It's not even the wrong decision, for products that aren't connected to the Internet, don't have enough users for one to go rogue, don't handle money, don't contain any valuable data, and don't do anything that could injure people if something goes wrong. If your software doesn't destroy anything important when it explodes, it's probably a better use of limited resources to plan on fixing bugs as they show up.

… Of course, you need some amount of security mindset to realize which software can in fact destroy the company if it silently corrupts data and nobody notices this until a month later. I don't suppose it's the case that your drones only carry a limited amount of the full corporate budget in cash over the course of a day, and you always have more than enough money to reimburse all the customers if all items in transit over a day were lost, taking into account that the drones might make many more purchases or sales than usual? And that the systems are generating internal paper receipts that are clearly shown to the customer and non-electronically reconciled once per day, thereby enabling you to notice a problem before it's too late?

amber: Nope!

coral: Then as you say, it would be better for the world if your company didn't exist and wasn't about to charge into this new territory and poison it with a spectacular screwup.

amber: If I believed that… well, Mr. Topaz certainly isn't going to stop his project or let somebody else take over. It seems the logical implication of what you say you believe is that I should try to persuade the venture capitalists I know to launch a safer drone project with even more funding.

coral: Uh, I'm sorry to be blunt about this, but I'm not sure you have a high enough level of security mindset to identify an executive who's sufficiently better than you at it. Trying to get enough of a resource advantage to beat the insecure product to market is only half of your problem in launching a competing project. The other half of your problem is surpassing the prior rarity of people with truly deep security mindset, and getting somebody like that in charge and fully committed. Or at least get them in as a highly trusted, fully committed cofounder who isn't on a short budget of political capital. I'll say it again: an advisor appointed by VCs isn't nearly enough for a project like yours. Even if the advisor is a genuinely good security professional—

amber: This all seems like an unreasonably difficult requirement! Can't you back down on it a little?

coral: —the person in charge will probably try to bargain down reality, as represented by the unwelcome voice of the security professional, who won't have enough social capital to badger them into “unreasonable” measures. Which means you fail on full automatic.

amber: … Then what am I to do?

coral: I don't know, actually. But there's no point in launching another drone project with even more funding, if it just ends up with another Mr. Topaz put in charge. Which, by default, is exactly what your venture capitalist friends are going to do. Then you've just set an even higher competitive bar for anyone actually trying to be first to market with a secure solution, may God have mercy on their souls.

Besides, if Mr. Topaz thinks he has a competitor breathing down his neck and rushes his product to market, his chance of creating a secure system could drop by a factor of ten and go all the way from 0.0% to 0.0%.

amber: Surely my VC friends have faced this kind of problem before and know how to identify and hire executives who can do security well?

coral: … If one of your VC friends is Paul Graham, then maybe yes. But in the average case, NO.

If average VCs always made sure that projects which needed security had a founder or cofounder with strong security mindset—if they had the ability to do that even in cases where they decided they wanted to—the Internet would again look like a very different place. By default, your VC friends will be fooled by somebody who looks very sober and talks a lot about how terribly concerned he is with cybersecurity and how the system is going to be ultra-secure and reject over nine thousand common passwords, including the thirty-six passwords listed on this slide here, and the VCs will ooh and ah over it, especially as one of them realizes that their own password is on the slide. That project leader is absolutely not going to want to hear from me—even less so than Mr. Topaz. To him, I'm a political threat who might damage his line of patter to the VCs.

amber: I have trouble believing all these smart people are really that stupid.

coral: You're compressing your innate sense of social status and your estimated level of how good particular groups are at this particular ability into a single dimension. That is not a good idea.

amber: I'm not saying that I think everyone with high status already knows the deep security skill. I'm just having trouble believing that they can't learn it quickly once told, or could be stuck not being able to identify good advisors who have it. That would mean they couldn't know something you know, something that seems important, and that just… feels off to me, somehow. Like, there are all these successful and important people out there, and you’re saying you’re better than them, even with all their influence, their skills, their resources—

coral: Look, you don't have to take my word for it. Think of all the websites you've been on, with snazzy-looking design, maybe with millions of dollars in sales passing through them, that want your password to be a mixture of uppercase and lowercase letters and numbers. In other words, they want you to enter “Password1!” instead of “correct horse battery staple”. Every one of those websites is doing a thing that looks humorously silly to someone with a full security mindset or even just somebody who regularly reads XKCD. It says that the security system was set up by somebody who didn't know what they were doing and was blindly imitating impressive-looking mistakes they saw elsewhere.

Do you think that makes a good impression on their customers? That's right, it does! Because the customers don't know any better. Do you think that login system makes a good impression on the company's investors, including professional VCs and probably some angels with their own startup experience? That's right, it does! Because the VCs don't know any better, and even the angel doesn't know any better, and they don't realize they're missing a vital skill, and they aren't consulting anyone who knows more. An innocent is impressed if a website requires a mix of uppercase and lowercase letters and numbers and punctuation. They think the people running the website must really care to impose a security measure that unusual and inconvenient. The people running the website think that's what they're doing too.

People with deep security mindset are both rare and rarely appreciated. You can see just from the login system that none of the VCs and none of the C-level executives at that startup thought they needed to consult a real professional, or managed to find a real professional rather than an empty suit if they went consulting. There was, visibly, nobody in the neighboring system with the combined knowledge and status to walk over to the CEO and say, “Your login system is embarrassing and you need to hire a real security professional.” Or if anybody did say that to the CEO, the CEO was offended and shot the messenger for not phrasing it ever-so-politely enough, or the CTO saw the outsider as a political threat and bad-mouthed them out of the game.

Your wishful should-universe hypothesis that people who can touch the full security mindset are more common than that within the venture capital and angel investing ecosystem is just flat wrong. Ordinary paranoia directed at widely-known adversarial cases is dense enough within the larger ecosystem to exert widespread social influence, albeit still comically absent in many individuals and regions. People with the full security mindset are too rare to have the same level of presence. That's the easily visible truth. You can see the login systems that want a punctuation mark in your password. You are not hallucinating them.

amber: If that's all true, then I just don't see how I can win. Maybe I should just condition on everything you say being false, since, if it's true, my winning seems unlikely—in which case all victories on my part would come in worlds with other background assumptions.

coral: … is that something you say often?

amber: Well, I say it whenever my victory starts to seem sufficiently unlikely.

coral: Goodness. I could maybe, maybe see somebody saying that once over the course of their entire lifetime, for a single unlikely conditional, but doing it more than once is sheer madness. I'd expect the unlikely conditionals to build up very fast and drop the probability of your mental world to effectively zero. It's tempting, but it's usually a bad idea to slip sideways into your own private hallucinatory universe when you feel you're under emotional pressure. I tend to believe that no matter what the difficulties, we are most likely to come up with good plans when we are mentally living in reality as opposed to somewhere else. If things seem difficult, we must face the difficulty squarely to succeed, to come up with some solution that faces down how bad the situation really is, rather than deciding to condition on things not being difficult because then it's too hard.

amber: Can you at least try talking to Mr. Topaz and advise him how to make things be secure?

coral: Sure. Trying things is easy, and I’m a character in a dialogue, so my opportunity costs are low. I'm sure Mr. Topaz is trying to build secure merchant drones, too. It's succeeding at things that is the hard part.

amber: Great, I'll see if I can get Mr. Topaz to talk to you. But do please be polite! If you think he's doing something wrong, try to point it out more gently than the way you've talked to me. I think I have enough political capital to get you in the door, but that won't last if you're rude.

coral: You know, back in mainstream computer security, when you propose a new way of securing a system, it's considered traditional and wise for everyone to gather around and try to come up with reasons why your idea might not work. It's understood that no matter how smart you are, most seemingly bright ideas turn out to be flawed, and that you shouldn't be touchy about people trying to shoot them down. Does Mr. Topaz have no acquaintance at all with the practices in computer security? A lot of programmers do.

amber: I think he'd say he respects computer security as its own field, but he doesn't believe that building secure operating systems is the same problem as building merchant drones.

coral: And if I suggested that this case might be similar to the problem of building a secure operating system, and that this case creates a similar need for more effortful and cautious development, requiring both (a) additional development time and (b) a special need for caution supplied by people with unusual mindsets above and beyond ordinary paranoia, who have an unusual skill that identifies shaky assumptions in a safety story before an ordinary paranoid would judge a fire as being urgent enough to need putting out, who can remedy the problem using deeper solutions than an ordinary paranoid would generate as parries against imagined attacks?

If I suggested, indeed, that this scenario might hold generally wherever we demand robustness of a complex system that is being subjected to strong external or internal optimization pressures? Pressures that strongly promote the probabilities of particular states of affairs via optimization that searches across a large and complex state space? Pressures which therefore in turn subject other subparts of the system to selection for weird states and previously unenvisioned execution paths? Especially if some of these pressures may be in some sense creative and find states of the system or environment that surprise us or violate our surface generalizations?

amber: I think he'd probably think you were trying to look smart by using overly abstract language at him. Or he'd reply that he didn't see why this took any more caution than he was already using just by testing the drones to make sure they didn't crash or give out too much money.

coral: I see.

amber: So, shall we be off?

coral: Of course! No problem! I'll just go meet with Mr. Topaz and use verbal persuasion to turn him into Bruce Schneier.

amber: That's the spirit!

coral: God, how I wish I lived in the territory that corresponds to your map.

amber: Hey, come on. Is it seriously that hard to bestow exceptionally rare mental skills on people by talking at them? I agree it's a bad sign that Mr. Topaz shows no sign of wanting to acquire those skills, and doesn't think we have enough relative status to continue listening if we say something he doesn't want to hear. But that just means we have to phrase our advice cleverly so that he will want to hear it!

coral: I suppose you could modify your message into something Mr. Topaz doesn't find so unpleasant to hear. Something that sounds related to the topic of drone security, but which doesn't cost him much, and of course does not actually cause his drones to end up secure because that would be all unpleasant and expensive. You could slip a little sideways in reality, and convince yourself that you've gotten Mr. Topaz to ally with you, because he sounds agreeable now. Your instinctive desire for the high-status monkey to be on your political side will feel like its problem has been solved. You can substitute the feeling of having solved that problem for the unpleasant sense of not having secured the actual drones; you can tell yourself that the bigger monkey will take care of everything now that he seems to be on your pleasantly-modified political side. And so you will be happy. Until the merchant drones hit the market, of course, but that unpleasant experience should be brief.

amber: Come on, we can do this! You've just got to think positively!

coral: … Well, if nothing else, this should be an interesting experience. I've never tried to do anything quite this doomed before.

For actual computer security, not AGI, why do you believe that the laxness of most software is a problem rather than a reasonable response?

My understanding is that most people's credit card information is known to at least some identity thieves. This does not drive most individuals into a panic because we've basically never heard of a case in our social circle of someone losing more than $1000$ dollars to identity theft -- and anyway, banks can reimburse the money. It does worry banks and other companies that process identifying information -- and they hire consultants to help them detect and prevent identity theft -- but they don't devote nearly as many resources to this as Bruce Schneier would like. (I worked at one of those consulting companies, which is how I know firsthand that fraud prevention can be pretty dysfunctional even at household-name, big, successful companies.)

But are they actually making a strategic mistake? Is the number & power of fraudsters going to suddenly increase? Or isn't the situation more like "We lose X dollars to fraud every year, we'd like it to be lower but to be honest it's a tiny fraction of our yearly costs. We'd go out of business if our fraud losses were 100x more, but that's pretty unlikely. So we take these fraud precautions -- but not the much more elaborate ones that our computer security experts would like."

If security experts are right, then the world financial system is going to collapse any day now. If not, then the sort of security mindset that is appropriate to designing one piece of software, where any vulnerability should be treated as dangerous, is not actually appropriate to allocating resources to security concerns in a big organization or industry.

I'm going to naively take Schneier's side here on the grounds that if I notice a bunch of small holes in my security I have model uncertainty about those holes unexpectedly growing larger. I would try to establish some bounds on estimates of this new parameter by looking for high profile instances. I wouldn't be that reassured by the fact that most of my peers didn't seem concerned if, after thinking about it for a few minutes, I couldn't see how they were directly incentivized to be concerned. Is this flawed?

I translated both parts into Russian: 1, 2

Speaking of minimizing attack surface, I note that this page attempts to load content from 8 different third-party domains, 2 of which are controlled by Google, a major known tracker.

In examples like this drone scenario (though obviously not AGI) my immediate gut reaction is "make a sandboxed version, open it up, and offer monetary bribes for breaking it". It's not too difficult to sell politically, and you throw an actual optimizer against the system.

Since this is of the form "why not just...", I'm sure there's some good reason why it's not sufficient by itself. But I don't know what that reason is. Could someone please enlighten me?

Coral isn't trying very hard to be helpful. Why doesn't she suggest that the company offer $10,000,000 for each security hole that people can demonstrate? Oh, right, she wants to use this as analogy for AGIs that go foom.

As pretty clearly stated (in this fictional story), Topaz wants cheap interventions, not pedantic nerds coming up with insanely unlikely bugs like a buffer overflow.

Bug bounties already fail like this in the real world all the time when companies refuse to pay out for bugs that don't seem like a big deal to the person in charge.

The bigger problem here is that as noted in the post, (0) it is always faster to do things in a less secure manner. If you assume (1) multiple competitors trying to build AI (and if this is not your assumption, I would like to hear a basis for it), (2) at least some who believe that the first AI created will be in a position of unassailable dominance (this appears to be the belief of at least some and include, but not necessarily be limited to, those who believe in a high likelihood of a hard takeoff), (3) some overlap between the groups described in 1 and 2 (again, if you don't think this is going to be the case, I would like to hear a basis for it) and (4) varying levels of conern about the potential damage caused by an unfriendly AI (even if you believe that as we get closer to developing AI, the average and minimum level of concern will rise, variance is likely), the first AI to be produced is likely to be highly insecure (i.e. with non-robust friendliness).

These posts finally made me get Something Important:

Akrasia is a security problem, and not just against external adversaries like advertising.

Is there anything good written yet about solving this domain of problems from a security mindset perspective?

What is terrfifying is that even if Mr. Topaz had wanted to be secure, the state of the AI safety community , sorry the drone economy safety community, means he will probably be pushed into creating a demo.

Consider Mr Beryl's plight. All the drone economy safety communities are working on one of 2 things.

1) solid fuel rocket drones (they can go very quickly but are not very steerable), played today by Ms Onyx

2) Frictionless massless drone safety researchers played today by Mr Amber.

Mr Beryl has the idea that maybe they could use rotors instead of solid fuel rockets, but rotors tend to disintegrate. He thinks he has a good idea for a rotor that might not disintegrate.

Beryl to Onyx: "Hey anyone working on rotor based-drones, I have an idea for rotors that might not disintegrate"?

Onyx: "Solid fuel rocketry might be able to form an economy."

Beryl: "Sure, okay. It might. Rotors drones have very different flight characteristics compared to solid fuel rockets, so I'm not sure I can re-use your work. But anyone working on rotor drones?"

Onyx: Silence

Beryl to Amber: "Maybe it would be worth considering drones to have mass and friction. It changes things a lot when you do?"

Amber: "We haven't solved the problem of drones going faster than light and forming black holes without considering mass and friction. Once we do that we'll consider mass and friction."

Beryl: "Okay let me know when you do. In the meantime know anyone working on rotor drone safety?"

Amber: Silence

The only way for Beryl to get people to talk about and think about a rotor drone economy safety seems to be to create a demo of a rotor not falling apart when attached to a motor. He might have ideas about how to make a rotor drone and a economy and talk about it, but most of his effort needs to be convincing people that this is a realistic option. Which seems like a demo is needed. And Beryl has to follow the same path as Topaz.

Paul Graham's Design Paradox is that people who have good taste in UIs can tell when other people are designing good UIs, but most CEOs of big companies lack the good taste to tell who else has good taste. And that's why big companies can't just hire other people as talented as Steve Jobs to build nice things for them, even though Steve Jobs certainly wasn't the best possible designer on the planet. Apple existed because of a lucky history where Steve Jobs ended up in charge. There's no way for Samsung to hire somebody else with equal talents, because Samsung would just end up with some guy in a suit who was good at pretending to be Steve Jobs in front of a CEO who couldn't tell the difference.

I think this idea originated in Graham's startup mistakes essay:

...when I think about what killed most of the startups in the e-commerce business back in the 90s, it was bad programmers. A lot of those companies were started by business guys who thought the way startups worked was that you had some clever idea and then hired programmers to implement it. That's actually much harder than it sounds—almost impossibly hard in fact—because business guys can't tell which are the good programmers. They don't even get a shot at the best ones, because no one really good wants a job implementing the vision of a business guy.

In practice what happens is that the business guys choose people they think are good programmers (it says here on his resume that he's a Microsoft Certified Developer) but who aren't. Then they're mystified to find that their startup lumbers along like a World War II bomber while their competitors scream past like jet fighters. This kind of startup is in the same position as a big company, but without the advantages.

So how do you pick good programmers if you're not a programmer? I don't think there's an answer. I was about to say you'd have to find a good programmer to help you hire people. But if you can't recognize good programmers, how would you even do that?

http://paulgraham.com/startupmistakes.html

If it's true that this is the bottleneck on friendliness, one way to address this might be to try to make the people who are actually good at security higher status--by running security competitions, for example. (I assume the competitions would have to be organized by high status people for this to work, and you'd have to identify people who actually had security mindset in order to design & judge them.)

I suspect that different fields have different deltas between the people who look impressive to an outsider on paper vs the people who are actually competent. Programming might have been one of the fields with the highest deltas, though I think this delta has been arbitraged away some as the meme that Github profiles are more important than degrees has spread through the public consciousness. Nowadays, I think Triplebyte has accumulated enough status that a business guy has a decent shot at correctly choosing them to choose their first technical employee.

(Didn't Eliezer claim in a previous essay that he has the ability to differentiate experts from non-experts in fields he's not personally familiar with? I don't remember the details of how he says he does this.)

Real-world anectdata how one big company (medical equipment) got OK at security:

At some time they decided that security was more important now. Their in-house guy (dev->dev management -> "congrats, you are now our chief security guy") got to hire more consultants for their projects, went to trainings and, crucially, went to cons (e.g. defcon). He was a pretty nice guy, and after some years he became fluent at hacker-culture. In short, he became capable of judging consultant's work and hiring real security people. And he made some friends on the way. I think this is the best path to aquire institutional knowledge: Take a smart person loyal to the company, immerse them into the knowledgable subculture (spending money on failures on the way), use the aquired knowledge to hire real professionals (really hire, or hire as consultants for projects, or hire to give trainings).

Different big company (not software related), same thing. After some years their security guys became fed up with their lack of internal political capital, quit and switched career to "real" security.

Note that this approach gets hacked if everyone uses it at once, which means you should never attempt to immerse your experts after hearing another company is doing it, because all the newbies will end up talking to each other (see how things like LinkedIn work for 99% of people as some kind of weird networking simulator).

Why do you think a competition would measure anything meaningful? The best way to hire programmers doesn't seem to be about hiring those people who do best at programming competitions.

The goal is to discover a way to measure security mindset as accurately as possible, and then make it high status to do well according to the measurement. There's no reason why your contest would need to look like our current idea of a programming contest. Software companies already have incentives to measure programming ability as accurately as possible--see companies like Triplebyte which are attempting to do this in a data-driven, scientifically validated way. But no one has an incentive to make your score on Triplebyte's quiz public and give you status based on it.

Another idea is to push for software liabilities, as Bruce Schneier describes in this blog post, in order to create a financial incentive for more developers to have security mindset.

Perhaps this post is about useful arrogance? Do not be so arrogant that you think your field is unique and independent, and requires no improvements from other fields, but be arrogant enough to point out there is problems to someone who thinks he is at the top of his field?

[Mod note: This comment was previously deleted by mods, and on reflection returned.]

From my perspective, Eliezer comes across as the AI safety equivalent of a militant vegan or smug atheist in this post. I'm not aware of any social science on the topic of whether people like this tend to be useful to their cause or not, but my personal impression has always been that they aren't. Even though I agree with his core thesis, I think posts like this plausibly make it harder for someone like me to have conversations with AI people about safety.

The post leans hard on the idea that security mindset is something innate you either have or don't have, which, as I complained previously, is not well-supported. Plenty of sneering towards people who are assumed to lack it is unhelpful.
The post also leans hard on the password requirements example which I critiqued previously here. This feels like an example of the Copenhagen Interpretation of Ethics. Some companies take a basic step to help old people choose better passwords, and they get sneered at... meanwhile the many companies that do nothing to help users choose better passwords get off scot-free. Here, Coral reminds me of a militant vegan who specifically targets companies that engage in humane slaughter practices.
The analogy itself is weak. Paypal is an example of a company that became successful largely due to its ability to combat fraud, despite having no idea that fraud is something it would have to deal with in the beginning. (You can read the book Founders at Work to learn more about this--Paypal's chapter is the very first one. Question for Eliezer: After reading the chapter, can you tell me whether you think Max Levchin is someone who has security mindset? Again, I would argue that the evidence for Eliezer's view of security mindset is very shakey. If I had to bet on either (a) a developer with "ordinary paranoia", using tools they are very familiar with, who is given security as a design goal, vs (b) a developer with "security mindset", using tools they aren't familiar with, who isn't given security as a design goal, I'd bet on (a). More broadly, Eliezer's use of "security mindset" looks kinda like an attempt to sneak in the connotation that anyone who doesn't realize security was a design goal is incapable of writing secure software. Note: It's often not rational for software companies to have security as a design goal.) And Paul Graham writes (granted, this was 2005, so it's possible his opinion has changed in the intervening 12 years):

...we advise groups to ignore issues like scalability, internationalization, and heavy-duty security at first. [1] I can imagine an advocate of "best practices" saying these ought to be considered from the start. And he'd be right, except that they interfere with the primary function of software in a startup: to be a vehicle for experimenting with its own design. Having to retrofit internationalization or scalability is a pain, certainly. The only bigger pain is not needing to, because your initial version was too big and rigid to evolve into something users wanted.

To translate Graham's statement back to the FAI problem: In Eliezer's alignment talk, he discusses the value of solving a relaxed constraint version of the FAI problem by granting oneself unlimited computing power. Well, in the same way, the AGI problem can be seen as a relaxed constraint version of the FAI problem. One could argue that it's a waste of time to try to make a secure version of AGI Approach X if we don't even know if it's possible to build an AGI using AGI Approach X. (I don't agree with this view, but I don't think it's entirely unreasonable.)
As far as I can tell, Coral's OpenBSD analogy is flat out wrong. Coral doesn't appear to be familiar with the concept of a "trusted computing base" (see these lecture notes that I linked from the comments of his previous post). The most exciting project I know of in the OS security space is Qubes, which, in a certain sense, is doing exactly what Coral says can't be done. This is a decent blog post on the philosophy behind Qubes.
Again, translating this info back to the FAI problem: Andrew Wiles once said:

Perhaps I could best describe my experience of doing mathematics in terms of entering a dark mansion. One goes into the first room, and it's dark, completely dark. One stumbles around bumping into the furniture, and gradually, you learn where each piece of furniture is, and finally, after six months or so, you find the light switch. You turn it on, and suddenly, it's all illuminated. You can see exactly where you were.

My interpretation of this statement is that the key to solving difficult problems is to find a way to frame them that makes them seem easy. As someone who works for an AI safety nonprofit, Eliezer has a financial interest in making AI safety seem as difficult as possible. Unfortunately, while that is probably a great strategy for gathering donations, it might not be a good strategy for actually solving the problem. The Qubes project is interesting because someone thought of a way to reframe the OS security problem to make it much more tractable. (Instead of needing a 100% correct OS, now to a first approximation you need a 100% correct hypervisor.) I'm not necessarily saying MIRI is overfunded. But I think MIRI researchers, when they are trying to actually make progress on FAI, need to recognize and push back against institutional incentives to frame FAI in a way that makes it seem as hard as possible to solve. Eliezer's admonition to not "try to solve the whole problem at once" seems like the wrong thing to say, from my perspective.

Frankly, insofar as security mindset is a thing, this post comes across to me as being written by someone who doesn't have it. I don't get the sense that the author has even "ordinary paranoia" about the possibility that a post like this could be harmful, despite the fact Nick Bostrom's non-confrontational advocacy approach seems to have done a lot more to expand the AI safety Overton window, and despite the possibility that a post like this might increase already-existing politicization of AI safety. (I'm not even sure Eliezer has a solid story for how this post could end up being helpful!)

Similarly, when I think of the people I would trust the most to write secure software, Coral's 1,000,000:1 odds estimate does not seem like the kind of thing they would say--I'd sooner trust someone who is much more self-skeptical and spends a lot more time accounting for model uncertainty. (Model uncertainty and self-skepticism are both big parts of security mindset!) Does Coral think she can make a million predictions like this and be incorrect on only one of them? How much time did Coral spend looking in the dark for stories like Max Levchin's which don't fit their models?

(An even more straightforward argument for why Eliezer lacks security mindset: Eliezer says security mindsest is an innate characteristic, which means it's present from birth. SIAI was founded as an organization to develop seed AI without regard for safety. The fact that Eliezer didn't instantly realize the importance of friendliness when presented with the notion of seed AI means he lacks security mindset, and since security mindset is present from birth, he doesn't have it now either.)

The fact that I consider myself a non-expert on both software security and AI, yet I'm able to come up with these obvious-seeming counterarguments, does not bode well. (Note: I keep harping on my non-expert status because I'm afraid someone will take my comments as literal truth, but I'm acutely aware that I'm just sharing facts I remember reading and ideas I've randomly had. I want people to feel comfortable giving me pushback if they have better information, the way people don't seem to do with Eliezer. In fact, this is the main reason why I write so many comments and so few toplevel posts--people seem more willing to accept toplevel posts as settled fact, even if the research behind them is actually pretty poor. How's that for security mindset? :P)

As a side note, I read & agreed with Eliezer's Inadequate Equilibria book. In fact, I'm planning to buy it as a Christmas present for a friend. So if there's a deeper disagreement here, that's not it. A quick shot at identifying the deeper disagreement: From my perspective, Eliezer frequently seems to fall prey to the "What You See Is All There Is" phenomenon that Daniel Kahneman describes in Thinking Fast and Slow. For example, in this dialogue, the protagonist says that for a secure operating system, "Everything exposed to an attacker, and everything those subsystems interact with, and everything those parts interact with! You have to build all of it robustly!" But this doesn't appear to actually be true (see Qubes). Just because Eliezer is unable see a way to do it, doesn't mean it can't be done.

The post leans hard on the idea that security mindset is something innate you either have or don't have, which, as I complained previously, is not well-supported.

Agreed Eliezer hasn't given much evidence for the claim that security mindset is often untrainable; it's good that you're flagging this explicitly. I think his goal was to promote the "not just anyone can be trained to think like Bruce Schneier" hypothesis to readers' attention, and to say what his own current model is, not to defend the model in any detail.

I found the change in focus useful myself because Inadequate Equilibria talks so little about innate competence and individual skill gaps, even though it's clearly one of the central puzzle pieces.

This feels like an example of the Copenhagen Interpretation of Ethics. Some companies take a basic step to help old people choose better passwords, and they get sneered at... meanwhile the many companies that do nothing to help users choose better passwords get off scot-free.

It might not be fair, but I think that's fine here. The important question is whether the world is adequate in certain respects (e.g., at converting resources into some level of security and privacy for the average user), and what that implies in domains like AI that we care about. I don't expect companies with inadequate password systems to suffer any great harm from a blog post criticizing them without spending an equal number of paragraphs criticizing organizations with even worse security practices. The most material question is whether password standards are in fact trivial to improve on in a way that makes users and companies much better off; it's not clear to me whether we disagree about that, since it might be that you're just focusing on a lower adequacy threshold.

The Qubes project is interesting because someone thought of a way to reframe the OS security problem to make it much more tractable. (Instead of needing a 100% correct OS, now to a first approximation you need a 100% correct hypervisor.)

I don't know much about Qubes, but the idea of modularizing the problem, distinguishing trusted and untrusted system components, minimizing reliance on less trusted components, and looking for work-arounds to make things as easy as possible (without assuming they're easy), sounds like ordinary MIRI research practice. Eliezer's idea of corrigibility is an example of this approach, and Eliezer's said that if alignment turns out to be surprisingly easy, one of the likeliest paths is if there turns out to be a good-enough concept of corrigibility that's easy to train into systems.

Eliezer's admonition to not "try to solve the whole problem at once" seems like the wrong thing to say, from my perspective.

Ambitious efforts to take a huge chunk out of the problem, or to find some hacky or elegant way to route around a central difficulty, seem good to me. I haven't seen people make much progress if they "try to solve the whole problem at once" with a few minutes/hours of experience thinking through the problem rather than a few months/years; usually that looks less like corrigibility or ALBA and more like "well we'll shut it off if it starts doing scary stuff" or "well we'll just raise it like a human child".

I don't get the sense that the author has even "ordinary paranoia" about the possibility that a post like this could be harmful

It sounds like you're making a prediction here that Eliezer and others didn't put much thought in advance into the potential risks or costs of this post. Having talked with Eliezer and others about the post beforehand, I can confirm that this prediction is false.

The fact that I consider myself a non-expert on both software security and AI, yet I'm able to come up with these obvious-seeming counterarguments, does not bode well.

I think you're overestimating how much overlap there is between what different people tend to think are the most obvious counterarguments to this or that AGI alignment argument. This is actually a hard problem. If Alice thinks counter-arguments A and B are obvious, Bob thinks counter-arguments B and C are obvious, and Carol thinks counter-arguments C and D are obvious, and you only have time to cover two counter-arguments before the post gets overly long, then no matter which arguments you choose you'll end up with most readers thinking that you've neglected one or more "obvious" counter-arguments.

At the same time, if you try to address as many counter-arguments as you can given length constraints, you'll inevitably end up with most readers feeling baffled at why you're wasting time on trivial or straw counter-arguments that they don't care about.

This is also made more difficult if you have to reply to all the counter-arguments Alice disagrees with but thinks someone else might agree with: Alice might be wrong about who is (or should be) in the target audience, she might be wrong about the beliefs of this or that potential target audience, or she might just have an impractically long list of counter-arguments to cover (due to not restricting herself to what she thinks is true or even all that probable). I think that group discussions often end up going in unproductive directions when hypothetical disagreements take the place of actual disagreements.

Thanks for the response!

The most material question is whether password standards are in fact trivial to improve on in a way that makes users and companies much better off; it's not clear to me whether we disagree about that, since it might be that you're just focusing on a lower adequacy threshold.

If there's a trivial way to measure password strength, the method has not occurred to me. Suppose my password generation algorithm randomly samples my password from the set of all alphanumeric strings that are between 6 and 20 characters long. That's 715971350555965203672729120482208448 possible passwords I'm choosing from. Sounds pretty secure right? Well, two of those alphanumeric strings between 6 and 20 characters are "aaaaaaaaaa" and "password123". A server that just sees "aaaaaaaaaa" as my password has no way a priori to know what algorithm I used to generate it.

I don't expect it is worth the time of your average company to write a specialized module that attempts to reverse-engineer a user's password in order to determine the algorithm that was likely used to generate it. I expect most companies who attempt to measure password strength this way do so using 3rd party libraries, not algorithms that have been developed in-house. The difficulty of doing this depends on whether there's a good 3rd party library available for the platform the company is using, and how quickly an engineer can verify that the library isn't doing anything suspicious with the passwords it analyzes. This article has more info about the difficulty of measuring password strength--it looks like most 3rd party libraries aren't very good at it.

But anyway, as I said, it typically isn't rational for software companies to invest a lot in software security. If we are trying to approximate a function that takes a company's financial interest in security as an input, and outputs the degree to which a company's systems are secure, then the password example gives us a data point where the company's financial interest is low and the security of their system is also low. Coral argues (correctly IMO) that Merchant Drones Inc. has a strong financial incentive to prevent people from swindling their drones. Extrapolating from the password guessing example the way she does makes the assumption that function mapping financial interest to security is a constant function. I don't think that's a reasonable assumption.

I haven't seen people make much progress if they "try to solve the whole problem at once" with a few minutes/hours of experience thinking through the problem rather than a few months/years; usually that looks less like corrigibility or ALBA and more like "well we'll shut it off if it starts doing scary stuff" or "well we'll just raise it like a human child".

The reason I'm complaining about this is because I sometimes try to have conversations with people in the community about ideas I have related to AI alignment, typically ideas that I can get across the gist of in <5 minutes but aren't nearly as naive as "we'll shut it off if it starts doing scary stuff" or "we'll raise it like a human child", but I have a hard time getting people to engage seriously. My diagnosis is that people in the community have some kind of learned helplessness around AI safety, believing it to be a big serious problem that only big serious people are allowed to think about it. Trying to make progress on AI alignment with any idea that can be explained in <5 minutes marks one as uncool and naive. Even worse, in some cases I think people get defensive about the idea that it might be possible to make progress on AI alignment in a 10-minute conversation--the community has a lot invested in the idea that AI alignment is a super difficult problem, and we'd all look like fools if it was possible to make meaningful progress in 10 minutes. I'm reminded of this quote from Richard Hamming:

...if you do some good work you will find yourself on all kinds of committees and unable to do any more work. You may find yourself as I saw Brattain when he got a Nobel Prize. The day the prize was announced we all assembled in Arnold Auditorium; all three winners got up and made speeches. The third one, Brattain, practically with tears in his eyes, said, ``I know about this Nobel-Prize effect and I am not going to let it affect me; I am going to remain good old Walter Brattain.'' Well I said to myself, ``That is nice.'' But in a few weeks I saw it was affecting him. Now he could only work on great problems.

When you are famous it is hard to work on small problems. This is what did Shannon in. After information theory, what do you do for an encore? The great scientists often make this error. They fail to continue to plant the little acorns from which the mighty oak trees grow. They try to get the big thing right off. And that isn't the way things go. So that is another reason why you find that when you get early recognition it seems to sterilize you. In fact I will give you my favorite quotation of many years. The Institute for Advanced Study in Princeton, in my opinion, has ruined more good scientists than any institution has created, judged by what they did before they came and judged by what they did after. Not that they weren't good afterwards, but they were superb before they got there and were only good afterwards.

See also--if Richard Feynman was working on AI alignment, it sounds to me as though he'd see the naive suggestions of noobs as a source of occasional ideas rather than something to be ridiculed.

It sounds like you're making a prediction here that Eliezer and others didn't put much thought in advance into the potential risks or costs of this post. Having talked with Eliezer and others about the post beforehand, I can confirm that this prediction is false.

That's good to hear.

The best way to deal with possible counterarguments is to rethink your arguments so they're no longer vulnerable to them. (Example: Eliezer could have had Coral say something like "Why don't companies just use this free and widely praised password strength measurement library on Github?" or "Why is there no good open source library to measure password strength?" instead of "Why don't companies just measure entropy?" Random note: Insofar as the numbers and symbols thing is not just security theater, I'd guess it mainly makes it harder for friends/relatives to correctly guess that you used your dog's name as your password, in order to decrease the volume of password-related support requests.) I'll confess, when someone writes an article that seems to me like it's written in an insufferably smug tone, yet I don't get the sense that they've considered counterarguments that seem strong and obvious to me, that really rubs me the wrong way.

(Upvoted but disagree with the conclusions and about a quarter of the assumptions. I should really get around to design functionality that allows users to more easily make it clear that they both want to reward someone for writing something, and that they still disagree with it)

I can't work out where you're going with the Qubes thing. Obviously a secure hypervisor wouldn't imply a secure system, any more than a secure kernel implies a secure system in a non-hypervisor based system.

More deeply, you seem to imply that someone who has made a security error obviously lacks the security mindset. If only the mindset protected us from all errors; sadly it's not so. But I've often been in the situation of trying to explain something security-related to a smart person, and sensing the gap that seemed wider than a mere lack of knowledge.

The point I'm trying to make is that this statement

Everything exposed to an attacker, and everything those subsystems interact with, and everything those parts interact with! You have to build all of it robustly!

seems false to me, if you have good isolation--which is what a project like Qubes tries to accomplish. Kernel vs hypervisor is discussed in this blog post. It's possible I'm describing Qubes incorrectly; I'm not a systems expert. But I feel pretty confident in the broader point about trusted computing bases.

More deeply, you seem to imply that someone who has made a security error obviously lacks the security mindset. If only the mindset protected us from all errors; sadly it's not so. But I've often been in the situation of trying to explain something security-related to a smart person, and sensing the gap that seemed wider than a mere lack of knowledge.

This was the implication I was getting from Eliezer. I attempted a reductio ad absurdum.

Everything exposed to an attacker, and everything those subsystems interact with, and everything those parts interact with! You have to build all of it robustly!

seems false to me, if you have good isolation--which is what a project like Qubes tries to accomplish.

I agree with you here that Qubes is cool; but the fact that it is (performantly) possible was not obvious before it was cooked up. I certainly failed to come up with the idea of Qubes before hearing it (even after bluepill), and I am not ashamed of this: Qubes is brilliant (and IOMMU is cheating).

Also, in some sense Qubes is doing exactly what Carol says. Qubes only has a chance of working because the fundamental design for hardware-assisted security-by-isolation trumps all other considerations in their trade-offs. The UI is fundamentally constrained (to prevent window-redressing), as is performance (3d accelleration) and ease-of-use. All these constraints were known and documented before even a single line of code was written (afaik). Qubes can only work because it has security as one of its main goals, and has brilliant security people as project leads with infinite internal political capital.

That said, going on a tangent about qubes:

I really want to see painless live-migration of Qubes (migrate an application between different hosts, without interupting -- say, from a lousy netbook to a fat workstation and back), this would be a killer feature for non-security-nerds. Unfortunately xen cannot do x86 <-> arm (qemu?); live-migration
smartphone<->workstation would be awesome (just bring my smartphone, plug it in as a boot-drive and continue your work on a fat machine -- secure as long as there is no hardware implant).

Re Qubes security: You still have the bad problem of timing-sidechannels which cross VM borders; you should view Qubes as an awesome mitigation, not a solution (not to speak of the not-so-rare xen outbreaks), and you still need to secure your software. That is, Qubes attempts to prevent privelege escalation, not code exec; if the vulnerability is in the application which handles your valuable data, then Qubes cannot help you.

Also, in some sense Qubes is doing exactly what Carol says. Qubes only has a chance of working because the fundamental design for hardware-assisted security-by-isolation trumps all other considerations in their trade-offs. The UI is fundamentally constrained (to prevent window-redressing), as is performance (3d accelleration) and ease-of-use. All these constraints were known and documented before even a single line of code was written (afaik). Qubes can only work because it has security as one of its main goals, and has brilliant security people as project leads with infinite internal political capital.

It sounds like you're saying Qubes is a good illustration of Coral's claim that really secure software needs security as a design goal from the beginning and security DNA in the project leadership. I agree with that claim.

Yep. The counter-example would be Apple iOS.

I never expected it to become as secure as it did. And Apple security are clowns (institutionally, no offense inteded for the good people working there), and UI tends to beat security in tradeoffs.

To translate Graham's statement back to the FAI problem: In Eliezer's alignment talk, he discusses the value of solving a relaxed constraint version of the FAI problem by granting oneself unlimited computing power. Well, in the same way, the AGI problem can be seen as a relaxed constraint version of the FAI problem. One could argue that it's a waste of time to try to make a secure version of AGI Approach X if we don't even know if it's possible to build an AGI using AGI Approach X. (I don't agree with this view, but I don't think it's entirely unreasonable.)

Isn't the point exactly that if you can't solve the whole problem of (AGI + Alignment) then it would be better not even to try solving the relaxed problem (AGI)?

Maybe not, if you can keep your solution to AGI secret and suppress it if it turns out that there's no way to solve the alignment problem in your framework.

Not sure of the message of this post re: AI alignment, but I'm interested in seeing the next installment, hope it brings everything full circle.

As far as I know there is not a next installment planned, though I might be wrong.

I am surprised that you feel any uncertainty around the message of this piece regarding AI alignment. It seems clear that everything said is directly applicable if you replace ”Drone Economy Company” with “AGI research lab”.

Brain fart. On second thought, this is not the first time I'm failing to get EY's message, so it could be a lack of cognitive ability in my part.

Coral should to try to be a white hacker for Mr. Topaz company. Mr. Topaz would agree, because Coral say, that if she didn't success she don't take money, so he lose nothing. After few times, when Coral hacked all drons software in one hour after presentation of its new version, mr. Topaz would understand, that security is important.

Mr. Topaz is right (even if for the wrong reasons). If he is optimizing for money, being the first to market may indeed get him a higher probability of becoming rich. Wanting to be rich is not wrong. Sure, the service will be insecure, but if he sells before any significant issues pop up he may sell for billions. “Après moi, le déluge.”

Unfortunately, the actions of Mr. Topaz are also the very optimization process that leads to misaligned AGI, already happening at this very moment. And indeed, trying to fix Mr. Topaz is ordinary paranoia. Security mindset would be closer to getting rid of The Market that so perversely incentivized him; I don’t like the idea, but if it’s between that and a paperclip maximizer, I know what I would choose.

I'm happy to see a demonstration that Eliezer has a good understanding of the top-level issues involving computer security.

One thing I wonder though, is why making Internet security better across the board isn't a more important goal in the rationality community? Although very difficult (for reasons illustrated here), it seems immediately useful and also a good prerequisite for any sort of AI security. If we can't secure the Internet against nation-state level attacks, what hope is there against an AI that falls into the wrong hands?

In particular, building "friendly AI" and assuming it will remain friendly seems naive at best, since it will copied and then the friendly part will be modified by hostile actors.

It seems like someone with a security mindset will want to avoid making any assumption of friendliness and instead work on making critical systems that are simple enough to be mathematically proven secure. I wonder why this quote (from the previous post) isn't treated as a serious plan: "If your system literally has no misbehavior modes at all, it doesn't matter if you have IQ 140 and the enemy has IQ 160—it's not an arm-wrestling contest."

We are far from being able to build these systems but it still seems like a more plausible research project than ensuring that nobody in the world makes unfriendly AI.

In particular, building "friendly AI" and assuming it will remain friendly seems naive at best, since it will copied and then the friendly part will be modified by hostile actors.

This is correct. Any reasonable AGI development strategy must have strong closure and security measures in place to minimize the risk of leaks, and deployment has to meet the conditions in Dewey (2014).

It seems like someone with a security mindset will want to avoid making any assumption of friendliness and instead work on making critical systems that are simple enough to be mathematically proven secure.

This is also correct, if the idea is to ensure that developers understand their system (and safety-critical subsystems in particular) well enough for the end product to be "friendly"/"aligned." If you're instead saying that alignment isn't a good target to shoot for, then I'm not sure I understand what you're saying. Why not? How do we achieve good long-term outcomes without routing through alignment?

I think odds are good that, assuming general AI happens at all, someone will build a hostile AI and connect it to the Internet. I think a proper understanding the security mindset is that the assumption "nobody will connect a hostile AI to the Internet" is something we should stop relying on. (In particular, maintaining secrecy and internatonal cooperation seems unlikely. We shouldn't assume they will work.)

We should be looking for defenses that aren't dependent of the IQ level of the attacker, similar to how mathematical proofs are independent of IQ. AI alignment is an important research problem, but doesn't seem directly relevant for this.

In particular, I don't see why you think "routing through alignment" is important for making sound mathematical proofs. Narrow AI should be sufficient for making advances in mathematics.

I think odds are good that, assuming general AI happens at all, someone will build a hostile AI and connect it to the Internet. I think a proper understanding the security mindset is that the assumption "nobody will connect a hostile AI to the Internet" is something we should stop relying on. (In particular, maintaining secrecy and internatonal cooperation seems unlikely. We shouldn't assume they will work.)

Yup, all of this seems like the standard MIRI/Eliezer view.

In particular, I don't see why you think "routing through alignment" is important for making sound mathematical proofs. Narrow AI should be sufficient for making advances in mathematics.

I don't know what the relevance of "mathematical proofs" is. Are you talking about applying formal methods of some kind to the problem of ensuring that AGI technology doesn't leak, and saying that AGI is unnecessary for this task? I'm guessing that part of the story you're missing is that proliferation of AGI technology is at least as much about independent discovery as it is about leaks, splintering, or espionage. You have to address those issues, but the overall task of achieving nonproliferation is much larger than that, and it doesn't do a lot of good to solve part of the problem without solving the whole problem. AGI is potentially a route to solving the whole problem, not to solving the (relatively easy, though still very important) leaks/espionage problem.

I mean things like using mathematical proofs to ensure that Internet-exposed services have no bugs that a hostile agent might exploit. We don't need to be able to build an AI to improve defences.

There's no "friendly part" in an AGI in the same way there's no "secure part" in an OS. The kind of friendliness and security we want is deep in the architecture.

Most hostile actors also don't want an AGI that kills them. They might still do nasty things with the AGI by giving it bad goals but that's not the core what the AGI argument is about.

As far as removing misbehavior modes, that's done in security circles. Critical computers get airgapped to prevent them from getting hacked.

In the quest of getting more secure Mozilla build a new programming language that prevents a class of errors that C had.

See also Paul Christiano's "Security and AI Alignment," which Eliezer agrees with if I recall correctly.

Even if there's no "friendly part," it seems unlikely that someone who learns the basic principles behind building a friendly AI will be unable to build an unfriendly AI by accident. I'm happy that we're making progress with safe languages, but there is no practical programming language in which it's the least bit difficult to write a bad program.

It would make more sense to assume that at some point, a hostile AI will get an Internet connection, and figure out what needs to be done about that.