Interesting, hadn't heard of this! Haven't fully grasped the "No evidence for nudging after adjusting for publication bias" study yet, but at first glance it looks to me as if it is rather evidence for small effect sizes than for no effect at all? Generally, when people say "nudging doesn't work", this can mean a lot of things, from "there's no effect at all" to "there often is an effect, but it's not very large, and it's not worth it to focus on this in policy debates", to "it has a significant effect, but it will never solve a problem fully because it only affects the behavior of a minority of subjects".
There's also this article making some similar points, overall defending the effectiveness of nudging while also pushing for more nuance in the debate. They cite one very large study in particular that showed significant effects while avoiding publication bias (emphasis mine):
The study was unique because these organizations had provided access to the full universe of their trials—not just ones selected for publication. Across 165 trials testing 349 interventions, reaching more than 24 million people, the analysis shows a clear, positive effect from the interventions. On average, the projects produced an average improvement of 8.1 percent on a range of policy outcomes. The authors call this “sizable and highly statistically significant,” and point out that the studies had better statistical power than comparable academic studies. So real-world interventions do have an effect, independent of publication bias.(...)We can start to see the bigger problem here. We have a simplistic and binary “works” versus “does not work” debate. But this is based on lumping together a massive range of different things under the “nudge” label, and then attaching a single effect size to that label.
Personally I have a very strong prior that nudging must have an effect > 0 - it would just be extremely surprising to me if the effect of an intervention that clearly points in one direction would be exactly 0. This may however still be compatible with the effects in many cases being too small to be worth to put the spotlight on, and I suspect it just strongly depends on the individual case and intervention.
Unless I misunderstand your comment, isn't it rather the opposite of odd that user stories are so popular, given that this is what the bias would predict? That being said, maybe I've argued a bit too strongly in one direction with this post - I wouldn't even say that user stories are detrimental or useless. Depending on your product, it may well be that some significant ratio of users to have strong intent. My main claim is that in most situations, the number of people who are closer to the middle of the spectrum is >0. But it's not necessary for that group to dominate the distribution.
So in my view, it can still make sense to focus on a subgroup of your users who know what they're doing, as long as you remain aware that this will not apply to all users. E.g. when A/B testing, you should expect by default that making any feature even mildly less convenient to use will have negative effects. So you should not be surprised to see that result - but it may still be the right choice to make such a change nonetheless, depending on what benefits you hope to get from it.
During winter, opening windows will raise your heating bills like mad.
Opening several windows/doors widely for a few minutes every couple of hours, rather than keeping one of them open for longer times, is supposed to mostly prevent this, as this will exchange the air in your room without significantly cooling down floor/walls/furniture. But of course you're still right that it's a trade-off, and for some people it's much easier to achieve consistently good CO2 levels than for others. For many it may be worth at least getting a CO2 monitor to be able to make better informed decisions.
One could certainly argue that improving an existing system while keeping its goals the same may be an easier (or at least different) problem to solve than creating a system from scratch and instilling some particular set of values into it (where part of the problem is to even find a way to formalize the values, or know what the values are to begin with - both of which would be fully solved for an already existing system that tries to improve itself).
I would be very surprised if an AGI would find no way at all to improve its capabilities without affecting its future goals.
Side point: this whole idea is arguably somewhat opposed to what Cal Newport in Deep Work describes as the "any benefit mindset", i.e. people's tendency to use tools when they can see any benefit in them (Facebook being one example, as it certainly does come with the benefit of keeping you in touch with people you would otherwise have no connection to), while ignoring the hidden costs of these tools (such as the time/attention they require). I think both ideas are worth to keep in mind when evaluating the usefulness of a tool. Ask yourself both if the usefulness of the tool can be deliberately increased, and if the tool's benefits are ultimately worth its costs.
I think it does relate to examples 2 and 3, although I would still differentiate between perfectionism in the sense that you actually keep working on something for a long time to reach perfection on the one hand, and doing nothing because a hypothetical alternative deters you from some immediate action on the other hand. The latter is more what I was going for here.
Good point, agreed. If "pay for a gym membership" turns out to be "do nothing and pay $50 a month for it", then it's certainly worse than "do nothing at home".
I would think that code generation has a much greater appeal to people / is more likely to go viral than code review tools. The latter surely is useful and I'm certain it will be added relatively soon to github/gitlab/bitbucket etc., but if OpenAI wanted to start out building more hype about their product in the world, then generating code makes more sense (similar to how art generating AIs are everywhere now, but very few people would care about art critique AIs).
Can you elaborate? Were there any new findings about the validity of the contents of Predictably Irrational?
This is definitely an interesting topic, and I too would like to see a continued discussion as well as more research in the area. I also think that Jeff Nobbs' articles are not a great source, as he seems to twist the facts quite a bit in order to support his theory. This is particularly the case for part 2 of his series - looking into practically any of the linked studies, I found issues with how he summarized them. Some examples:
(note I wrote this up from memory, so possible I've mixed something up in the examples above - might be worth writing a post about it with properly linked sources)
I still think he's probably right about many things, and it's most certainly correct that oils high in Omega6 in particular aren't healthy (which might indeed include Canola oil, which I was not aware of before reading his articles). Still he seems to be very much on an agenda to an extent that it prevents him from summarizing studies accurately, which is not great. Doesn't mean he's wrong, but also means I won't trust anything he says without checking the sources.