Mo Putera

I've been lurking on LW since 2013, but only started posting recently. My day job was "analytics broadly construed" although I'm currently exploring applied prio-like roles; my degree is in physics; I used to write on Quora and Substack but stopped, although I'm still on the EA Forum. I'm based in Kuala Lumpur, Malaysia.

Wiki Contributions

Comments

In Bostrom's recent interview with Liv Boeree, he said (I'm paraphrasing; you're probably better off listening to what he actually said)

  • p(doom)-related
    • it's actually gone up for him, not down (contra your guess, unless I misinterpreted you), at least when broadening the scope beyond AI (cf. vulnerable world hypothesis, 34:50 in video)
    • re: AI, his prob. dist. has 'narrowed towards the shorter end of the timeline - not a huge surprise, but a bit faster I think' (30:24 in video)
    • also re: AI, 'slow and medium-speed takeoffs have gained credibility compared to fast takeoffs'
    • he wouldn't overstate any of this
  • contrary to people's impression of him, he's always been writing about 'both sides' (doom and utopia) 
  • in the past it just seemed more pressing to him to call attention to 'various things that could go wrong so we could avoid these pitfalls and then we'd have plenty of time to think about what to do with this big future'
    • this reminded me of this illustration from his old paper introducing the idea of x-risk prevention as global priority: 

If I take this claimed strategy as a hypothesis (that radical introspective speedup is possible and trainable), how might I falsify it? I ask because I can already feel myself wanting to believe it's true and personally useful, which is an epistemic red flag. Bonus points if the falsification test isn't high cost (e.g. I don't have to try it for years).

Mo Putera2mo14-2

I was wondering about this too. I thought of Eugene Wei writing about Edward Tufte's classic book The Visual Display of Quantitative Information, which he considers "[one of] the most important books I've read". He illustrates with an example, just like dynomight did above, starting with this chart auto-created in Excel: 

chart-1.pngand systematically applies Tufte's principles to eventually end up with this:

chart-4.png

Wei adds further commentary:

No issues for color blind users, but we're stretching the limits of line styles past where I'm comfortable. To me, it's somewhat easier with the colored lines above to trace different countries across time versus each other, though this monochrome version isn't terrible. Still, this chart reminds me, in many ways, of the monochromatic look of my old Amazon Analytics Package, though it is missing data labels (wouldn't fit here) and has horizontal gridlines (mine never did).

We're running into some of these tradeoffs because of the sheer number of data series in play. Eight is not just enough, it is probably too many. Past some number of data series, it's often easier and cleaner to display these as a series of small multiples. It all depends on the goal and what you're trying to communicate.

At some point, no set of principles is one size fits all, and as the communicator you have to make some subjective judgments. For example, at Amazon, I knew that Joy wanted to see the data values marked on the graph, whenever they could be displayed. She was that detail-oriented. Once I included data values, gridlines were repetitive, and y-axis labels could be reduced in number as well.

Tufte advocates reducing non-data-ink, within reason, and gridlines are often just that. In some cases, if data values aren't possible to fit onto a line graph, I sometimes include gridlines to allow for easy calculation of the relative ratio of one value to another (simply count gridlines between the values), but that's an edge case.

For sharp changes, like an anomalous reversal in the slope of a line graph, I often inserted a note directly on the graph, to anticipate and head off any viewer questions. For example, in the graph above, if fewer data series were included, but Greece remained, one might wish to explain the decline in health expenditures starting in 2008 by adding a note in the plot area near that data point, noting the beginning of the Greek financial crisis (I don't know if that's the actual cause, but whatever the reason or theory, I'd place it there).

If we had company targets for a specific metric, I'd note those on the chart(s) in question as a labeled asymptote. You can never remind people of goals often enough.

And I thought, okay, sounds persuasive and all, but also this feels like Wei/Tufte is pushing their personal aesthetic on me, and I can't really tell the difference (or whether it matters).

I'm curious about you not doing these, since I'd unquestioningly accepted them, and would love for you to elaborate:

- save lots of money in a retirement account and buy index funds
- shower daily
- use shampoo
- wear shoes
- walk

Regarding 'diet stuff', I mostly agree and like how Jay Daigle put it:

I’ve decided lately that people regularly get confused, on a number of subjects, by the difference between science and engineering. ... Tl;dr: Science is sensitive and finds facts; engineering is robust and gives praxes. Many problems happen when we confuse science for engineering and completely modify our praxis based on the result of a couple of studies in an unsettled area. ...

This means two things. First is that we need to understand things much better for engineering than for science. In science it’s fine to say “The true effect is between +3 and -7 with 95% probability”. If that’s what we know, then that’s what we know. And an experiment that shrinks the bell curve by half a unit is useful. For engineering, we generally need to have a much better idea of what the true effect is. (Imagine trying to build a device based on the information that acceleration due to gravity is probably between 9 and 13 m/s^2).

Second is that science in general cares about much smaller effects than engineering does. It was a very long time before engineering needed relativistic corrections due to gravity, say. A fact can be true but not (yet) useful or relevant, and then it’s in the domain of science but not engineering. 

Why does this matter?

The distinction is, I think fairly clear when we talk about physics. ... But people get much more confused when we move over to, say, psychology, or sociology, or nutrition. Researchers are doing a lot of science on these subjects, and doing good work. So there’s a ton of papers out there saying that eggs are good, or eggs are bad, or eggs are good for you but only until next Monday or whatever.

And people have, often, one of two reactions to this situation. The first is to read one study and say “See, here’s the scientific study. It says eggs are bad for you. Why are you still eating eggs? Are you denying the science?” And the second reaction is to say that obviously the scientists can’t agree, and so we don’t know anything and maybe the whole scientific approach is flawed.

But the real situation is that we’re struggling to develop a science of nutrition. And that shit is hard. We’ve worked hard, and we know some things. But we don’t really have enough information to do engineering, to say “Okay, to optimize cardiovascular health you need to cut your simple carbs by 7%, eat an extra 10g of monounsaturated fats every day, and eat 200g of protein every Wednesday” or whatever. We just don’t know enough.

And this is where folk traditions come in. Folk traditions are attempts to answer questions that we need decent answers to, that have been developed over time, and that are presumably non-horrible because they haven’t failed obviously and spectacularly yet. A person who ate “Like my grandma” is probably on average at least as healthy as a person who tried to follow every trendy bit of scientistic nutrition advice from the past thirty years.

Mo Putera2mo135

Nitpick that doesn't bear upon the main thrust of the article: 

2021: Here’s a random weightlifter I found coming in at over 400kg, I don’t have his DEXA but let’s say somewhere between 300 and 350kgs of muscle.

More plausibly Josh Silvas weighs 220-ish kg, not 400 kg, and there's no way he has anywhere near 300+ kg of muscle. To contextualize, the heaviest WSM winners ever weighed around 200-210 kg (Hafthor, Brian); Brian in particular had a lean body mass of 156 kg back when he weighed 200 kg peaking for competition ('peaking' implies unsustainability), which is the highest DEXA figure I've ever found in years of following strength-related statistics. 

The two highest mean validity paired procedures for predicting job performance are general mental ability (GMA) plus an integrity test, and GMA + a structured interview (Schmidt et al 2016 meta-analysis of "100 years of research in personnel selection", reviewing 31 procedures, via 80,000 Hours – check out Table 2 on page 71). GMA alone beats all other single procedures; integrity tests not only beat all other non-GMA procedures but also correlate nearly zero with GMA, hence the combination efficacy. 

A bit more on integrity tests, if you (like me) weren't clear on them:

These tests are used in business and industry to hire employees with reduced probability of counterproductive work behaviors on the job, such as fighting, drinking or taking drugs, stealing from the employer, equipment sabotage, or excessive absenteeism. Integrity tests do predict these behaviors, but surprisingly they also predict overall job performance (Ones, Viswesvaran, & Schmidt, 1993).

Behavioral interviews – which Schmidt et al call situational judgment tests – are either middle of the rankings (for knowledge-based tests) or near the bottom (for behavioral tendencies). Given this, I'd be curious what value Ben gets out of investing nontrivial effort into running them, cf. Luke's comment.

I think curse of dimensionality is apt, since the prerequisite reading directly references it:

One problem with this whole GEM-vs-Pareto concept: if chasing a Pareto frontier makes it easier to circumvent GEM and gain a big windfall, then why doesn’t everyone chase a Pareto frontier? Apply GEM to the entire system: why haven’t people already picked up the opportunities lying on all these Pareto frontiers?

Answer: dimensionality. If there’s 100 different specialties, then there’s only 100 people who are the best within their specialty. But there’s 10k pairs of specialties (e.g. statistics/gerontology), 1M triples (e.g. statistics/gerontology/macroeconomics), and something like 10^30 combinations of specialties. And each of those pareto frontiers has room for more than one person, even allowing for elbow room. Even if only a small fraction of those combinations are useful, there’s still a lot of space to stake out a territory.

That said, the way John talks about it there I think 'boon of dimensionality' might be more apt still, but in Screwtape's context 'curse' is right.

Great comment. I also like Nate Soares' Dive in:

In my experience, the way you end up doing good in the world has very little to do with how good your initial plan was. Most of your outcome will depend on luck, timing, and your ability to actually get out of your own way and start somewhere. The way to end up with a good plan is not to start with a good plan, it's to start with some plan, and then slam that plan against reality until reality hands you a better plan.

It's important to possess a minimal level of ability to update in the face of evidence, and to actually change your mind. But by far the most important thing is to just dive in.

Would the recent Anthropic sleeper agents paper count as an example of bullet #2 or #3? 

Load More