All of owencb's Comments + Replies

I kind of want you to get quantitative here? Like pretty much every action we take has some effect on AI timelines, but I think effect-on-AI-timelines is often swamped by other considerations (like effects on attitudes around those who will be developing AI).

Of course it's prima facie more plausible that the most important effect of AI research is the effect on timelines, but I'm actually still kind of sceptical. On my picture, I think a key variable is the length of time between when-we-understand-the-basic-shape-of-things-that-will-get-to-AGI and when-it... (read more)

1Jakub Kraus3mo
Don't we already understand the basic shape of things that will get to AGI? Seems plausible that AGI will consist of massive transformer models within some RLHF approach. But maybe you're looking for something more specific than that.
Amusingly, I expect that each doubling of that time is negative EV. Because that time is very likely negative.
I don't understand why you think the sort of capabilities research done by alignment-conscious people contributes to lengthening this time. In particular, what reason do you have to think they're not advancing the second time point as much as the first? Could you spell that out more explicitly?

I'd be very interested to read more about the assumptions of your model, if there's a write-up somewhere.

Fair question. I just did the lazy move of looking up world GDP figures. In fact I don't think that my observers would measure GDP the same way we do. But it would be a measurement of some kind of fundamental sense of "capacity for output (of various important types)". And I'm not sure whether that has been growing faster or slower than real GDP, so the GDP figures seem a not-terrible proxy.

1M. Y. Zuo7mo
Important seems to be doing most of the work here. Since even within any given society there is no broad agreement as to what falls into this category. It's better to use a more agreed upon measure such as total energy production / consumption such as what the Kardashev scale uses.

I'd be interested to dig into this claim more. What exactly is the claim, and what is the justification for it? If the claim is something like "For most tasks, the thinking machines seem to need 0 to 3 orders of magnitude more experience on the task before they equal human performance" then I tentatively agree. But if it's instead 6 to 9 OOMs, or even just a solid 3 OOMs, I'd say "citation needed!"

No precise claim, I'm afraid! The whole post was written from a place of "OK but what are my independent impressions on this stuff?", and then setting down the t... (read more)

It's a lightly fictionalized account of my independent impressions of AI trajectories.

Interesting, I think there's some kind of analogy (or maybe generalization) here, but I don't fully see it.

I at least don't think it's a direct reinvention because slack (as I understand it) is a think that agents have, rather than something which determines what's good or bad about a particular decision.

(I do think I'm open to legit accusations of reinvention, but it's more like reinventing alignment issues.)

I'm relatively a fan of their approach (although I haven't spent an enormous amount of time thinking about it). I like starting with problems which are concrete enough to really go at but which are microcosms for things we might eventually want.

I actually kind of think of truthfulness as sitting somewhere on the spectrum between the problem Redwood are working on right now and alignment. Many of the reasons I like truthfulness as medium-term problem to work on are similar to the reasons I like Redwood's current work.

I think it would be an easier challenge to align 100 small ones (since solutions would quite possibly transfer across).

I think it would be a bigger victory to align the one big one.

I'm not sure from the wording of your question whether I'm supposed to assume success.

To add to what Owain said:

  • I think you're pointing to a real and harmful possible dynamic
  • However I'm generally a bit sceptical of arguments of the form "we shouldn't try to fix problem X because then people will get complacent"
    • I think that the burden of proof lies squarely with the "don't fix problem X" side, and that usually it's good to fix the problem and then also give attention to the secondary problem that's come up
  • I note that I don't think of politicians and CEOs to be the primary audience of our paper
    • Rather I think in the next several years such peo
... (read more)
2Daniel Kokotajlo1y
This is very helpful, thanks! I now have a better understanding of what you are doing and basically endorse it. (FWIW, this is what I thought/hoped you were doing.)

I don't think I'm yet at "here's regulation that I'd just like to see", but I think it's really valuable to try to have discussions about what kind of regulation would be good or bad. At some point there will likely be regulation in this space, and it would be great if that was based on as deep an understanding as possible about possible regulatory levers, and their direct and indirect effects, and ultimate desirability.

I do think it's pretty plausible that regulation about AI and truthfulness could end up being quite positive. But I don't know enough to i... (read more)

Adding to this: AI is already being regulated. In the EU, you could argue that previous regulations (like GDPR) already had some impacts on AI, but regardless, the EU is now working on an AI Act that will unambiguously regulate AI broadly. The current proposal [] (also see some discussion on the EA forum [] ) contains some lines that are related to and could set some precedents for truthfulness-releated topics, such as: and There's not yet any concrete regulation that I know I'd be excited about pushing (truthfulness-related or otherwise). But I would expect further work to yield decent guesses about what kind of regulation is likely to be better/worse; and I'd be surprised if the answer was to ignore the space or oppose all regulation. (Although I should note: Even if there will doubtlessly be some regulation of AI in general, that doesn't mean that there'll be regulation of all potentially-important subareas of AI. And insofar as there's currently little attention on regulation of particular sub-areas (including e.g. regulation that mentions alignment, or regulation of narrowly construed AI truthfulness), the situation with regards to pushing for regulation in those areas might be more similar to the general AI/regulation situation from 5 years ago.)
I think there's also a capability component, distinct from "understanding/modeling the world", about self-alignment or self-control - the ability to speak or act in accordance with good judgement, even when that conflicts with short-term drives.

In my ontology I guess this is about the heuristics which are actually invoked to decide what to do given a clash between abstract understanding of what would be good and short-term drives (i.e. it's part of meta-level judgement). But I agree that there's something helpful about having term... (read more)

Do you consider "good decision-making" and "good judgement" to be identical? I think there's a value alignment component to good judgement that's not as strongly implied by good decision-making.

I agree that there's a useful distinction to be made here. I don't think of it as fitting into "judgement" vs "decision-making" (and would regard those as pretty much the same), but rather about how "good" is interpreted/assessed. I was mostly using good to mean something like "globally goo... (read more)

I think the double decrease effect kicks in with uncertainty, but not with confident expectation of a smaller network.

I think it does do the double decrease for the known smaller network. Take three agent A1, A2, and A3, with utilities u1, u2, and u3. Assume the indexes i, j, and k are always distinct. For each Ai, they can boost uj at the cost described above in terms of ui. What I haven't really specified is the three-way synergy - can Ai boost uj+uk more efficiently that simply boosting uj and uk independently? In general yes (the two utilities uj and uk are synergistic with each other, after all), but let's first assume there is zero three-way synergy. Then each agent Ai will sacrifice 1/2+1/2=1 in ui to boost uj and uk each by 1. Overall, each utility function goes up by 1+1−1=1. This scales linearly with the size of the trade network each agent sees (excluding themselves): if there were two agents total, each utility would go up by 1/2, as in the top post example. And if there were n+1 agents, each utility would go up by n/2. However, if there are any three-way, four-way,..., or n-way synergies, then the trade network is more efficient than that. So there is a double decrease (or double increase, from the other perspective), as long as there are higher-order synergies between the utilities.

I'm not sure I've fully followed, but I'm suspicious that you seem to be getting something for nothing in your shift from a type of uncertainty that we don't know how to handle to a type we do.

It seems to me like you must be making an implicit assumption somewhere. My guess is that this is where you used to pair with . If you'd instead chosen as the matching then you'd have uncertainty between whether should be or . My guess is that generically this gives different recommendations from your approach.

Nope! That gives the same recommendation (as does the same thing if you pre-compose with any other permutation of S). I thought about putting that fact in, but it took up space. The recommendation given in both cases is just to normalise each utility function individually, using any of the methods that we know (which will always produce equivalent utility classes in this situation).

Seems to me like there are a bunch of challenges. For example you need extra structure on your space to add things or tell what's small; and you really want to keep track of long-term impact not just at the next time-step. Particularly the long-term one seems thorny (for low-impact in general, not just for this).

Nevertheless I think this idea looks promising enough to explore further, would also like to hear David's reasons.

For #5, OK, there's something to this. But:

  • It's somewhat plausible that stabilising pivotal acts will be available before world-destroying ones;
  • Actually there's been a supposition smuggled in already with "the first AI systems capable of performing pivotal acts". Perhaps there will at no point be a system capable of a pivotal act. I'm not quite sure whether it's appropriate to talk about the collection of systems that exist being together capable of pivotal acts if they will not act in concert. Perhaps we'll have a collection of systems which if aligned
... (read more)
I agree that things get messier when there is a collection of AI systems rather than a single one. "Pivotal acts" mostly make sense in the context of local takeoff. In nonlocal takeoff, one of the main concerns is that goal-directed agents not aligned with human values are going to find a way to cooperate with each other.

Thanks for the write-up, this is helpful for me (Owen).

My initial takes on the five steps of the argument as presented, in approximately decreasing order of how much I am on board:

  • Number 3 is a logical entailment, no quarrel here
  • Number 5 is framed as "therefore", but adds the assumption that this will lead to catastrophe. I think this is quite likely if the systems in question are extremely powerful, but less likely if they are of modest power.
  • Number 4 splits my intuitions. I begin with some intuition that selection pressure would significantly constra
... (read more)
* For #5, it seems like "capable of pivotal acts" is doing the work of implying that the systems are extremely powerful. * For #4, I think that selection pressure does not constrain the goal much, since different terminal goals produce similar convergent instrumental goals. I'm still uncertain about this, though; it seems at least plausible (though not likely) that an agent's goals are going to be aligned with a given task if e.g. their reproductive success is directly tied to performance on the task. * Agree on #2; I can kind of see it both ways too. * I'm also somewhat skeptical of #1. I usually think of it in terms of "how much of a competitive edge does general consequentialist reasoning give an AI project" and "how much of a competitive edge will safe AI projects have over unsafe ones, e.g. due to having more resources".

This conclusion is way too strong. To just give one way: there's a big space of possibilities where discovering the planning fallacy in fact makes you less susceptible to the planning fallacy, but not immune.

Actually, if the CFAR could reliably reduce susceptibility to the planning fallacy, they are wasting their time with AI safety--they could be making a fortune teaching their methods to the software industry, or engineers in general.

I don't know who the intended audience for this is, but I think it's worth flagging that it seemed extremely jargon-heavy to me. I expect this to be off-putting to at least some people you actually want to attract (if it were one of my first interactions with CFAR I would be less inclined to engage again). In several cases you link to explanations of the jargon. This helps, but doesn't really solve the problem that you're asking the reader to do a large amount of work.

Some examples from the first few paragraphs:

  • clear and unhidden
  • original seeing
  • original
... (read more)
I got the same feeling, and I would add "inside view" to the list.

I found this document kind of interesting, but it felt less like what I normally understand as a mission statement, and more like "Anna's thoughts on CFAR's identity". I think there's a place for the latter, but I'd be really interested in seeing (a concise version of) the former, too.

If I had to guess right now I'd expect it to say something like:

We want to develop a community with high epistemic standards and good rationality tools, at least part of which is devoted to reducing existential risk from AI.

... but I kind of expect you to think I have the emphasis there wrong in some way.

I like your (A)-(C), particularly (A). This seems important, and something that isn't always found by default in the world at large.

Because it's somewhat unusual, I think it's helpful to give strong signals that this is important to you. For example I'd feel happy about it being a core part of the CFAR identity, appearing in even short statements of organisational mission. (I also think this can help organisation insiders to take it even more seriously.)

On (i), it seems clearly a bad idea for staff to pretend they have no viewpoints. And if the organisatio... (read more)

This was helpful to me, thanks.

I think I'd still endorse a bit more of a push towards thinking in credences (where you're at a threshold of that being a reasonable thing to do), but I'll consider further.

Thanks. I'll dwell more on these. Quick thoughts from a first read:

  • I generally liked the "further discussion" doc.
  • I do think it's important to strongly signal the aspects of cause neutrality that you do intend to pursue (as well as pursuing them). These are unusual and important.
  • I found the mission statement generally opaque and extremely jargony. I think I could follow what you were saying, but in some cases this required a bit of work and in some cases I felt like it was perhaps only because I'd had conversations with you. (The FAQ at the
... (read more)

Thanks for engaging. Further thoughts:

I agree with you that framing is important; I just deleted the old ETA.

For what it's worth I think even without saying that your aim is explicitly AI safety, a lot of people reading this post will take that away unless you do more to cancel the implicature. Even the title does this! It's a slightly odd grammatical construction which looks an awful lot like CFAR’s new focus: AI Safety; I think without being more up-front about alternative interpretation it will sometimes be read that way.

I'm curious where our two

... (read more)
To get a better idea of your model of what you expect the new focus to do, here's a hypothetical. Say we have a rationality-qua-rationality CFAR (CFAR-1) and an AI-Safety CFAR (CFAR-2). Each starts with the same team, works independently of each other, and they can't share work. Two years later, we ask them to write a curriculum for the other organization, to the best of their abilities. This is along the lines of having them do an Ideological Turing Test on each other. How well do they match? In addition, is the newly written version better in any case? Is CFAR-1's CFAR-2 curriculum better than CFAR-2's CFAR-2 curriculum? I'm treating curriculum quality as a proxy for research progress, and somewhat ignoring things like funding and operations quality. The question is only meant to address worries of research slowdowns.
Oh, sorry, the two new docs are posted and were in the new ETA: [] and []

Even the title does this! It's a slightly odd grammatical construction which looks an awful lot like CFAR’s new focus: AI Safety; I think without being more up-front about alternative interpretation it will sometimes be read that way.

Datapoint: it wasn't until reading your comment that I realized that the title actually doesn't read "CFAR's new focus: AI safety".

I'm not sure exactly what you meant, so not ultimately sure whether I disagree, but I at least felt uncomfortable with this claim.

I think it's because:

  • Your framing pushes towards holding beliefs rather than credences in the sense used here.
  • I think it's generally inappropriate to hold beliefs about the type of things that are important and you're likely to turn out to be wrong on. (Of course for boundedly rational agents it's acceptable to hold beliefs about some things as a time/attention-saving matter.)
  • It's normally right to update credences graduall
... (read more)
I think this clarifies an important area of disagreement: I claim that there are lots of areas where people have implicit strong beliefs, and it's important to make those explicit to double-check. Credences are important for any remaining ambiguity, but for cognitive efficiency, you should partition off as much as you can as binary beliefs first, so you can do inference on them - and change your mind when your assumptions turn out to be obviously wrong. This might not be particularly salient to you because you're already very good at this in many domains. This is what I was trying to do with my series of blog posts on GiveWell [], for instance - partition off some parts of my beliefs as a disjunction I could be confident enough in to think about it as a set of beliefs I could reason logically about. (For instance, Good Ventures either has increasing returns to scale, or diminishing, or constant, at its given endowment.) What remains is substantial uncertainty about which branch of the disjunction we're in, and that should be parsed as a credence - but scenario analysis requires crisp scenarios, or at least crisp axes to simulate variation along. Another way of saying this is that from many epistemic starting points it's not even worth figuring out where you are in credence-space on the uncertain parts, because examining your comparatively certain premises will lead to corrections that fundamentally alter your credence-space.
I'm all about epistemology. (my blog is at But in order to engage in or start a conversation, it's important to take one of the things you place credence in and advocate for it. If you're wishy-washy, in many circumstances, people won't actually engage with your hypothesis, so you won't learn anything about it. Take a stand, even if you're on slippery ground.

I had mixed feelings towards this post, and I've been trying to process them.

On the positive side:

  • I think AI safety is important, and that collective epistemology is important for this, so I'm happy to know that there will be some attention going to this.
  • There may be synergies to doing some of this alongside more traditional rationality work in the same org.

On the negative side:

  • I think there is an important role for pursuing rationality qua rationality, and that this will be harder to do consistently under an umbrella with AI safety as an explicit a
... (read more)
Thanks for the thoughts; I appreciate it. I agree with you that framing is important; I just deleted the old ETA. (For anyone interested, it used to read: I'm curious where our two new docs leave you; I think they make clearer that we will still be doing some rationality qua rationality. Will comment later re: separate organizations; I agree this is an interesting idea; my guess is that there isn't enough money and staff firepower to run a good standalone rationality organization in CFAR's stead, and also that CFAR retains quite an interest in a standalone rationality community and should therefore support it... but I'm definitely interested in thoughts on this. Julia will be launching a small spinoff organization called Convergence, facilitating double crux conversations between EAs and EA-adjacent people in, e.g., tech and academia. It'll be under the auspices of CFAR for now but will not have opinions on AI. I'm not sure if that hits any of what you're after.

Your (a) / (b) division basically makes sense to me.[*] I think we're already at the point where we need this fracturing.

However, I don't think that the LW format makes sense for (a). I'd probably prefer curated aggregation of good content for (a), with fairly clear lines about what's in or out. It's very unclear what the threshold for keeping up on LW should be.

Also, I quite like the idea of the topical centres being hosted in the same place as the core, so that they're easy to find.

[*] A possible caveat is dealing with new community members nicely; I haven't thought about this enough so I'm just dropping a flag here.

3Ben Pace6y
Also it makes it easy for mods to enforce the distinction. Instead of "I think this post and discussion is not suited for this place, could you delete it and take it elsewhere?" it can just be "This should actually be over in sub-forum X, so I've moved it there."

In general if we don't explicitly design institutions that will work well with a much larger community, we shouldn't be surprised if things break down when the community grows.

I think I disagree with your conclusion here, although I'd agree with something in its vicinity.

One of the strengths of a larger community is the potential to explore multiple areas in moderate amounts of depth. We want to be able to have detailed conversations on each of: e.g. good epistemic habits; implications of AI; distributions of cost-effectiveness; personal productivity; technical AI safety; ...

It asks too much for everyone to keep up with each of these conversations, particularly when each of them can spawn many detailed sub-conversations. But if ... (read more)

It seems to me that for larger communities, there should be both: (a) a central core that everyone keeps up on, regardless of subtopical interest; and (b) topical centers that build in themselves, and that those contributing to that topical center are expected to be up on, but that members of other topical centers are not necessarily up on. (So that folks contributing to a given subtopical center should be expected to be keeping up with both that subtopic, and the central cannon.) It seems to me that (a) probably should be located on LW or similar, and that, if/as the community grows, the number of posts within (a) can remain capped by some "keep up withable" number, with quality standards raising as needed.
In general if we don't explicitly design institutions that will work well with a much larger community, we shouldn't be surprised if things break down when the community grows.

Update: I now believe I was over-simplifying things. For two delegates I think is correct, but in the parliamentary model that corresponds to giving the theories equal credence. As credences vary so do the number of delegates. Maximising the Nash product over all delegates is equivalent to maximising a product where they have different exponents (exponents in proportion to the number of delegates).

Maybe, if it had good enough UI and enough features?

I feel like it's quite a narrow-target/high-bar to compete with back-of-the-envelope/whiteboard at one end (for ease of use), and a software package that does monte carlos properly at the other end.

Thanks, this is an important result showing that the dominating property really isn't enough to pick out a prior for a good agent. I like your example as a to-the-point explanation of the issue.

I think the post title is a somewhat misleading, though: it sounds as though differences in instantiations of AIXI don't really matter, and they can all be arbitrarily stupid. Any chance of changing that? Perhaps to something like "Versions of AIXI can be arbitrarily stupid"?

Changed the title.

I disagree that "you really didn't gain all that much" in your example. There are possible numbers such that it's better to avoid producing AI, but (a) that may not be a lever which is available to us, and (b) AI done right would probably represent an existential eucatastrophe, greatly improving our ability to avoid or deal with future threats.

I have an intellectual issue with using "probably" before an event that has never happened before, in the history of the universe (so far as I can tell). And - if I am given the choice between slow, steady improvement in the lot of humanity (which seems to be the status quo), and a dice throw that results in either paradise, or extinction - I'll stick with slow steady, thanks, unless the odds were overwhelmingly positive. And - I suspect they are, but in the opposite direction, because there are far more ways to screw up than to succeed, and once the AI is out - you no longer have a chance to change it much. I'd prefer to wait it out, slowly refining things, until paradise is assured. Hmm. That actually brings a thought to mind. If an unfriendly AI was far more likely than a friendly one (as I have just been suggesting) - why aren't we made of computronium? I can think of a few reasons, with no real way to decide. The scary one is "maybe we are, and this evolution thing is the unfriendly part..."

I'm not sure quite what point you're trying to make:

  • If you're arguing that with the best attempt in the world it might be we still get it wrong, I agree.
  • If you're arguing that greater diligence and better techniques won't increase our chances, I disagree.
  • If you're arguing something else, I've missed the point.
Fair question. My point is that if improving techniques could take you from (arbitrarily chosen percentages here) a 50% chance that an unfriendly AI would cause an existential crisis, to 25% chance that it would - you really didn't gain all that much, and the wiser course of action is still not to make the AI. The actual percentages are wildly debatable, of course, but I would say that if you think there is any chance - no matter how small - of triggering ye olde existential crisis, you don't do it - and I do not believe that technique alone could get us anywhere close to that. The ideas you propose in OP seem wise, and good for society - and wholly ineffective in actually stopping us from creating an unfriendly AI, The reasons are simply that the complexity defies analysis, at least by human beings. The fear is that the unfriendly arises from unintended design consequences, from unanticipated system effects rather than bugs in code or faulty intent It's a consequence of entropy - there are simply far, far more ways for something to get screwed up than for it to be right. So unexpected effects arising from complexity are far, far more likely to cause issues than be beneficial unless you can somehow correct for them - planning ahead only will get you so far. Your OP suggests that we might be more successful if we got more of it right "the first time". But - things this complex are not created, finished, de-novo - they are an iterative, evolutionary task. The training could well be helpful, but I suspect not for the reasons you suggested. The real trick is to design things so that when they go wrong - it still works correctly. You have to plan for and expect failure, or that inevitable failure is the end of the line.

I'm not suggesting that the problems would come from what we normally think of as software bugs (though see the suggestion in this comment). I'm suggesting that they would come from a failure to specify the right things in a complex scenario -- and that this problem bears enough similarities to software bugs that they could be a good test bed for working out how to approach such problems.

The flaws leading to an unexpectedly unfriendly AI certainly might lead back to a flaw in the design - but I think it is overly optimistic to think that the human mind (or a group of minds, or perhaps any mind) is capable of reliably creating specs that are sufficient to avoid this. We can and do spend tremendous time on this sort of thing already, and bad things still happen. You hold the shuttle up as an example of reliability done right (which it is) - but it still blew up, because not all of shuttle design is software. In the same way, the issue could arise from some environmental issue that alters the AI in such a way that it is unpredictable - power fluctuations, bit flip, who knows. The world is a horribly non-deterministic place, from a human POV. By way of analogy - consider weather prediction. We have worked on it for all of history, we have satellites and supercomputers - and we are still only capable of accurate predictions for a few days or week, getting less and less accurate as we go. This isn't a case of making a mistake - it is a case of a very complex end-state arising from simple beginnings, and lacking the ability to make perfectly accurate predictions about some things. To put it another way - it may simply be the problem is not computable, now or with any forseeable technology.

I'm not sure how much we are disagreeing here. I'm not proposing anything like formal verification. I think development in simulation is likely to be an important tool in getting it right the first time you go "live", but I also think there may be other useful general techniques/tools, and that it could be worth investigating them well in advance of need.

Agreed. In particular I think IRL (Inverse Reinforcement Learning) is likely to turn out to be very important. Also, it is likely that the brain has some clever mechanisms for things like value acquisition or IRL, as well as empathy/altruism, and figuring out those mechanisms could be useful.

Thanks, this is a great collection of relevant information.

I agree with your framing of this as differential tech development. Do you have any thoughts on the best routes to push on this?

I will want to think more about framing AGI failures as (subtle) bugs. My initial impression is positive, but I have some worry that it would introduce a new set of misconceptions.

Sorry for the slow reply. I'm flattered that my thoughts as someone who has no computer science degree and just a couple years of professional programming experience are considered valuable. So here's more info-dumping (to be taken with a grain of salt, like the previous info dump, because I don't know what I don't know): * My comment [] on different sorts of programming, and programming cultures, and how tolerant they are of human error. Quick overview [] of widely used bug reduction techs (code review and type systems should have been included). * Ben Kuhn says [] that [] writing machine learning code is unusually unforgiving, which accords well with my view that data science programming is unusually unforgiving (although the reasons aren’t completely the same). * Improving the way I managed my working memory [] seemed important to the way I reduced bugs in my code. I think by default things fall out of your working memory without you noticing, but if you allocate some of your working memory to watching your working memory, you can prevent this and solve problems in a slower but less error-prone way. The subjective sense was something like having a "meditative bulldozer" thinking style where I was absolutely certain of what I had done with each subtask before going on to the next. It's almost exactly equivalent to doing a complicated sequence of algebraic operations correctly on the first try. It seems slower at first, but it's generally faster in the long run, because fixing errors after the fact is quite slow. This sort of perfectionistic attention to detail was actually counterproductive for activities I worked on after quitting my job, like reading marketing boo

Good point that this hasn't always been the case. However, we also know that people made a lot of mistakes in some of these cases. It would be great to work out how we can best approach such challenges in the future.

To me these look like (pretty good) strategies for getting something right the first time, not in opposition to the idea that this would be needed.

They do suggest that an environment which is richer than just "submit perfect code without testing" might be a better training ground.

To clarify, I was not critiquing the idea that we need to get "superintelligence unleashed on the world" correct the first try - that of course I do agree with. I was critiquing the more specific idea that we need to get AGI morality/safety correct the first try. One could compare to ICBM missile defense systems. The US (and other nations) have developed that tech, and i'ts a case where you have to get the deployed product "right the first try". However you can't test it in the real world, but you absolutely can do iterative development in simulation, and this really is the only sensible way to develop such tech. Formal verification is about as useful for AGI safety as it is for testing ICBM defense - not much use at all.

Meta: I'd love to know whether the downvotes are because people don't like the presentation of undeveloped ideas like this, or because they don't think the actual idea is a good one.

(The first would put me off posting similar things in the future, the second would encourage me as a feedback mechanism.)

Software may not be the best domain, but it has a key advantage over the other suggestions you are making: it's easy to produce novel challenges that are quite different from the previous challenges.

In a domain such as peeling an egg, it's true that peeling an individual egg has to be done correctly first time, but one egg is much like another, so the skill transfers easily. On the other hand one complex programming challenge may be quite different from another, so the knowledge from having solved one doesn't transfer so much. This should I think help make sure that the skill that does transfer is something closer to a general skill of knowing how to be careful enough to get it right first time.

There are lots of factors involved in peeling a perfect egg, most seem to matter before you hand it to a person to peel it. The most applicable areas where "get it right the first time" seems to apply are areas with a high cost of failure (this priceless gem will never be the same again, If I care enough about the presentation of this dish I will have to boil another egg). This also relates well to deliberate practice where a technique to make practicing a skill harder is to be less tolerant of errors. A novel challenge is good, but in most real-world novel situations with high-cost-failures the situation is not easy to replicate. Another area that comes to mind with high cost of failure and "get it right" model would be hostage negotiations.

Yes, gjm's summary is right.

I agree that there are some important disanalogies between the two problems. I thought software development was an unusually good domain to start trying to learn the general skill, mostly because it offers easy-to-generate complex challenges where it's simple to assess success.

I'm not hopeful that there's an easy solution (or I think it would be used in the industry), and I don't think you'd get up to total reliability.

Nonetheless it seems likely that there are things people can do that increase their bug rate, and there are probably things they can do that would decrease it. These might be costly things -- perhaps it involves writing detailed architectural plans for the software and getting these critiqued and double-checked by a team who also double-check that the separate parts do the right thing with respect to the architect... (read more)

Maybe. But would it change any of the conclusions?

It would change the regressions. I don't know whether you think that's an important part of the conclusion. It is certainly minor compared to the body of the work.

Again, if you think it does make a difference, I have provided all the code and data.

I think this is commendable; unfortunately I don't know the language and while it seemed like it would take a few minutes to explain the insight, it seems like it would be a few hours for me to mug up enough to explore the change to the data.

[...] Disagr

... (read more)

By using a slightly different offset you get a slightly different nonlinear transformation, and one that may work even better.

There isn't a way to make this transformation without a choice. You've made a choice by adding $1 -- it looks kind of canonical but really it's based on the size of a dollar, which is pretty arbitrary.

For example say instead of denominating everything in dollars you'd denominated in cents (and added 1 cent before logging). Then everyone would move up the graph by pretty much log(100), except the people who gave nothing, who would be... (read more)

That seems pretty unlikely. There's always some subjectivity to the details of coding and transformations, but what constant you add to make logs behave is not one I have ever seen materially change anyone's analysis; I don't think this bikeshedding makes a lick of difference. Again, if you think it does make a difference, I have provided all the code and data. Maybe. But would it change any of the conclusions? ...why? One 'gives back to society' just by buying stuff in free markets and by not going out and axe-murdering people, does that mean we should credit everyone as secretly being generous? Disagree here as well. As you already pointed out, a more interesting property is the apparent split between people who give nothing and people who give something; someone who gives $199 is already in the habit and practice of donations just like someone who is giving $999, while going from $1 to $9 might represent a real change in personal propensity. ($1 might be tossing a beggar a dollar bill and that person really is not a giver, while $9 might be an explicit donation through Paypal for a fundraiser.)

It shifts all datapoints equally in the dollar domain, but not in the log domain (hence letting you get rid of the -infinity). Of course it still preserves orderings, but it's a non-linear transformation of the y-axis.

I'd support this sensitivity check, or if just using one value would prefer a larger offset.

(Same caveat: I might have misunderstood log1p)

It's a nonlinear transformation to turn nonlinear totals back into something which is linear, and it does so very well, as you can see by comparing the log graph with an unlogged graph. Again, I'm not seeing what the problem here is. What do you think this changes? Ordering is preserved, zeros are preserved, and dollar amounts become linear which avoids a lot of potential problems with the usual statistical machinery.

No, it's supposed to be annual spend. However it's worth noting that this is a simplified model which assumes a particular relationship between annual spend and historical spend (namely it assumes that spending has grown and will grow on an exponential).

Thanks. I wasn't entirely sure whether you were aiming at improving decision-making or at game design, but it was interesting either way!

By the way, your link is doubly(!) broken. This should work.

Load More