This framing isn't meaningful, nor useful. All 3 of those are ambiguous.
The point of any of this is to better predict human behavior, and better describe variation in behavior between people. That's the value pitch that society might plausibly get, from taxonomizing personality tendencies. These should be updated based on actual data, and whatever makes them more predictive. Not just speculation.
So for example, when HEXACO began distinguishing Honesty-Humility from Agreeableness, that wasn't done because someone speculated that they thought a 6th trait made sense to them. Including more languages in the lexical studies resulted in a 6th factor emerging from the factor analysis. So it's a more representative depiction of the clusterings, than Big Five.
Also, e.g. H-H is more predictive of workplace deviance than the old Big Five Agreeableness trait was. That's an example of why anyone might plausibly care about adding that 6th category. Differentiating Disagreeableness from Dark Triad might plausibly be useful, and anyone who thinks that's useful can now use HEXACO. Progress.
Your suggestion that we can use MBTI to "improve" Big Five is funny to people familiar with the literature. Sticking to MBTI is going WAY back to something much more crude, and much less supported by data. It's like saying you're going to improve 21st century agriculture with an ox and a plow.
Similarly, your proposed change to Big Five is highly unlikely to improve it. E.g.:
So, for example, for our question about whether people naturally think in terms of what other people think about something or think in terms of how they think about things, we would have that be the extroverted thinking vs the introverted thinking cognitive function (or Te/Ti for short).
You have little reason to think this is even a good description of personality clustering. But the behaviors are probably captured by some parts of Extroversion and Agreeableness.
I think you should just go learn about the modern personality psychology field, it's not helpful to spend time pitching improvements if you're using a framework that's 80 years behind. We talked about this on Manifold and I think you're kind of spinning in circles, you don't need to do this -- just go learn the superior stuff and don't look back.
I confess I don't know what it means to talk about a person's value as a soul. I am very much in that third group I mentioned.
On an end to relative ability: is this outcome something you give any significant probability to? And if there existed some convenient way to make long-term bets on such things, what sorts of bets would you be willing to make?
There is intense censorship of some facts of human traits, and biology. Of the variance in intelligence and economic productivity, the percent attributable to genetic factors is >0%. But almost nobody prestigious, semi-prestigious -- nor anything close -- can ever speak of those facts, without social shaming. You'd probably be shamed before you even got to the question of phenotypic causation -- speaking as if the g factor exists would often suffice. (Even though g factor is an unusually solid empirically finding, in fact I can hardly think of any more reliable one from the social sciences.)
But with all the high-functioning and prestigious people filtered out, the topic is then heavily influenced by people who have something wrong with them. Such as having an axe to grind with a racial group. Or people who like acting juvenile. Or a third group that's a bit too autistic, to easily relate with the socially-accepted narratives. I'll give you a hint: the first 2 groups rarely know enough to format the question in a meaningful way, such as "variance attributable to genes", and instead often ask "if it's genetic", which is a meaningless format.
The situation is like an epistemic drug prohibition, where the empirical insights aren't going anywhere, but nobody high-functioning or good can be the vendor. The remaining vendors have a disproportionate number of really awful people.
I should've first learned about the Wilson effect on IQ from a liberal professor. Instead I first heard it mentioned from some guy with an axe to grind with other groups. I should've been conditioned with prosocial memes that don't pretend humans are exempt from the same forces that shape dogs and guppies. Instead it's memes predicting any gaps would trend toward 0 given better controls for environment (which hasn't been the trend for many years, the recent magnitude is similar despite improving sophistication, and many interventions that didn't replicate). The epistemics of this whole situation are egregiously dysfunctional.
I haven't read her book, but I know Kathryn Paige Harden is making an attempt. So hats off to her.
Sorry if I'm just misreading -- in Compute Trends Across Three eras of Machine Learning it was shown that the doubling time (at that time) had slowed to every ~10 months for the large-scale projects. In this projection you go with a 6-month doubling time for x number of years, then slowing to every 20 months. My questions are:
This was a contorted and biased portrayal of the topic. If you're a reader in a hurry, skip to my last paragraph.
First, this needs clarification on who you mean by a "fox", and who you don't. There's a very high risk of confusion, or talking about unrelated things without noticing. It may help if you name 5 people you consider to be foxes, and 5 you consider to be hedgehogs.
For the rest of this comment, I'm going to restrict "fox" to "good-scoring generalist forecaster", because they would tend to be quite fox-like, in the Tetlockian sense, and you did mention placing probabilities. If there are non-forecasters you would include in your taxonomy for fox, you are welcome to mention them. As an occasional reminder of potential confusion about this, I'll often put "fox" in quotation marks.
Paying more attention to easily-evaluated claims that don't matter much, at the expense of hard-to-evaluate claims that matter a lot.
E.g., maybe there's an RCT that isn't very relevant, but is pretty easily interpreted and is conclusive evidence for some claim. At the same time, maybe there's an informal argument that matters a lot more, but it takes some work to know how much to update on it, and it probably won't be iron-clad evidence regardless
This point has some truth to it, but it misses a lot.
When forecasters pitch ideas for questions, they tend to be interested in whether the question "really captures the spirit of the question". Forecasters are well aware of e.g. Goodhart's Law and measurement issues, it's on our minds all the time and often discussed. We do find it much more meaningful to forecast things that we think matter. The format makes it possible to make progress on that. It happens to take effort.
If a single stream of data (or criterion) doesn't adequately capture it, but if the claim actually corresponds to some future observations in any way, then you can add more questions from other angles. By creating a "basket" from different measures, a progressively-clearer picture can be drawn. That is, if the topic is worth the effort.
An example of this is the accumulated variety of AI-related questions on Metaculus. Earlier attempts were famously unsatisfying, but the topic was important. There is now a FAR better basket of measures from many angles. And I'm sure it will continue to improve, such as by finding new ways to measure "alignment" and its precursors.
It's possible for "foxes" to actually practice this, and make the claim more evaluable. It's a lot of work, which is why most topics don't get this. Also this is still a very niche hobby with limited participation. Prediction markets are literally banned. If they weren't, they'd probably grow like an invasive weed, with questions about all sorts of things.
Although you don't explicitly say hedgehogs do a better job of including and evaluating the hard-to-evaluate claims, this seems intimately related. The people who are better at forecasting than me tend to also be very discerning at other things we can't forecast. In all likelihood these two things are correlated.
I'm most sympathetic to the idea that many topics have inadequate "coverage", in the sense that it's laborious to make things amenable to forecasting. I agree lots of forecasting questions are irrelevant, or in your example, may focus on an RCT too much.
But why you think foxes seem to be worse off in this way, I don't think you really make a case. As far as I can tell, hedgehogs get easily fixated on lots of irrelevant details all the time. The way you describe this seems actively biased, and I'm disappointed that such a prolific poster on the site would have such a bias.
1. A desire for cognitive closure, confidence, and a feeling of "knowing things" — of having authoritative Facts on hand rather than mere Opinions.
But real-world humans (even if they think of themselves as aspiring Bayesians) are often uncomfortable with uncertainty. We prefer sharp thresholds, capital-k Knowledge, and a feeling of having solid ground to rest on.
I found this surprising. Hedgehogs are famously more prone to this than foxes. Their discomfort with uncertainty (and desire for authoritative facts) tends to make them bad forecasters.
Granted, forecasters are human too, and we feel more comfortable when certain. And it is true that we use explicit probabilities -- we do that so our beliefs are more transparent, even though it's inconvenient to us. I can see how this relates to fixating on specific information. We even get pretty irate when a question "resolves ambiguous", dashing our efforts like a failed replication.
But hedgehogs tend to be utterly convinced, epistemically slippery, and incredibly opinionated. If you like having authoritative facts and feeling certainty, just be a hedgehog with One Big Idea. And definitely stay the hell away from forecasting.
As above, this point would've been far more informative if you tried making a clear comparison against hedgehogs, and what this tends to look like in them. Surely "foxes" can fixate on a criterion for closure, but how does this actually compare with hedgehogs? Do you actually want to make a genuine comparison?
2. Hyperbolic discounting of intellectual progress.
With unambiguous data, you get a fast sense of progress. With fuzzy arguments, you might end up confident after thinking about it a while, or after reading another nine arguments; but it's a long process, with uncertain rewards.
I don't believe you here. Hedgehogs are free to self-reinforce in whatever direction they want, with certainty, as fast as they want. You know what's a really slow, tedious way to feel intellectual progress? Placing a bunch of forecasts and periodically checking on them. And being forced to tediously check potential arguments to update in various ways, which we're punished for not doing (unlike a hedgehog). It seems far more tedious than sticking to my favorite One Big Idea.
The only way this might be true is that forecasting often focuses on short-term questions, so we can get that feedback, and also because it's much more attainable. Though we do have lots of long-term questions too, we know they're far more difficult and we'll often be dart-throwing chimps. But nothing about your posts seems to really deal with this.
Also a deep point that I might have already told you somewhere else, and seems like a persistent confusion, so I'm going to loudly bold it here:
Forecasters think about and aggregate lots of fuzzy things.
Let me repeat that:
We do this all the time! The substantial difference is we get later scored on whether we evaluated the fuzzy things (and also non-fuzzy-things) properly.
It's compression. If any of those fuzzy things actually make a difference to the observable outcomes, then we actually get scored on whether we did a good job of considering those fuzzy things. "Foxes" do this all the time, probably better than hedgehogs, on average.
I'll elaborate with a concrete example. Suppose I vaguely overhear a nebulous rumor that Ukraine may use a dirty bomb against Russia. I can update my forecast on that, even if I can't directly verify the rumor. Generally you shouldn't update very much on fuzzy things though because they are very prone to being unfounded or incorrect. In that particular example I made a small update, correctly reflecting that it's fuzzy and poorly-substantiated. People actively get better at incorporating fuzzy things as they build a forecasting practice, we're literally scored on how well we do this. Which Rob Bensinger would understand better if he did forecasting.
Hedgehogs are free to use fuzzy things to rationalize whatever they want, with little to slow them down, beyond the (weak and indirect) social checks they'll have on whether they considered those fuzzy things well enough.
3. Social modesty and a desire to look un-arrogant.
It can feel socially low-risk and pleasantly virtuous to be able to say "Oh, I'm not claiming to have good judgment or to be great at reasoning or anything; I'm just deferring to the obvious clear-cut data, and outside of that, I'm totally uncertain."
...To the extent I see "foxes" do this, it was usually a good thing. Also, your wording of "totally uncertain" sounds mildly strawmanny. They don't usually say that. When "outside the data", people are often literally talking about unrelated things without even noticing, but a seasoned forecaster is more likely to notice this. In such cases, they might sometimes say "I'm not sure". Partly out of not knowing what else is being asked exactly, and partly out of genuine uncertainty
This point would be a lot more impactful if you gave examples, so we know you're not exaggerating and this is a real problem.
Collecting isolated facts increases the pool of authoritative claims you can make, while protecting you from having to stick your neck out and have an Opinion on something that will be harder to convince others of, or one that rests on an implicit claim about your judgment.
But in fact it often is better to make small or uncertain updates about extremely important questions, than to collect lots of high-confidence trivia. It keeps your eye on the ball, where you can keep building up confidence over time; and it helps build reasoning skill.
Seriously? Foxes actually make smaller updates more often than hedgehogs do.
Hedgehogs collect facts and increase the pool of authoritative claims they can make, while protecting from having to stick their necks out and risk being wrong. Not looking wrong socially, but being actually-wrong about what happens.
This point seems just wrong-headed, as if you were actively trying to misportray the topic.
High-confidence trivia also often poses a risk: either consciously or unconsciously, you can end up updating about the More Important Questions you really care about, because you're spending all your time thinking about trivia.
Even if you verbally acknowledge that updating from the superficially-related RCT to the question-that-actually-matters would be a non sequitur, there's still a temptation to substitute the one question for the other. Because it's still the Important Question that you actually care about.
Again I appreciate it's very laborious to capture what matters into verifiable question. If there is a particular topic that you think is missing something, please offer suggestions for new ways to capture what you believe is missing. If that thing actually corresponds to reality in some provable way.
Overall I found this post misleading and confused. At several points, I had no idea what you were talking about. I suspect you're doing this because you like (some) hedgehogs, have a vested interest in their continued prestige, and want to rationalize ways that foxes are more misguided. I think this has been a persistent feature of what you've said about this topic, and I don't think it will change.
If anyone wants to learn about this failure mode, from someone who knows what they are talking about, I highly recommend the work of David Manheim. He's an excellent track-recorded forecaster who has done good work on Goodhart's Law, and has thought about how this relates to forecasting.
Edited to slightly change the wording emphasis, de-italicize some things that didn't really need italics, etc.
I think many people on this site over-update on recent progress. But I also doubt the opposite extreme you're at.
I think it's very unlikely (<10% chance) that we'll see AGI within the next 50 years, and entirely possible (>25% chance) that it will take over 500 years.
Even just evolutionary meta-algorithms would probably have runaway progress by 500 years. That is, without humans getting super specific, deep math insights. This is easy to imagine with the enormously higher yearly ASIC hardware fabrication we'd be seeing long before then. I don't think a 500 year timeframe would take an unexpected math obstacle, it would take a global catastrophe.
I'd give this formulation of AGI a 93% chance of happening by 2522, and 40% by 2072. If I could manage to submit a post before December, I'd be arguing for the Future Fund prize to update to a later timeline. But not this much later.
I would love to see proper data on this. In particular, including the facets and not just broad buckets. Or if possible, even including findings for specific items.
The ones I've met at a meetup seemed (compared to the broader population):
-Very high in Interest in ideas, which was by far the most noticeable trend.
Agreeableness was mixed. Some were unfailingly agreeable, and some were starkly low in agreeableness. Maybe data would show a clear trend on facets or items. For the more strongly utilitarian ones, as a group, I'd speculate they are lower in Honesty-Humility from HEXACO. Yet none ever seemed to make me "worry" in that way, as if they couldn't even manage to have Dark Triad traits without being helpful.
I mean that Google themselves wouldn't want something that could get them lawsuits, and if they generate stuff, yes they'll have a selection for accuracy. If someone is interested in AI-Dr-Oz's cures and searched for those, I'm sure Google will be happy to provide. The market for that will be huge, and I'm not predicting that crap will go away.
Yes Google does select, now. The ocean of garbage is that bad. For people making genuine inquiries, often the best search providers can do right now is defer to authority websites. If we're talking specifically about interpreting medical papers, why don't you think they'll have a selection for accuracy?
In the first example it sounds like the engine is fabricating a false testimony. Was that an intentional attribute in the example? I guess fictionalizing will happen lots, but I don't expect Google to use that particular method and jeopardize credibility.
For the second example, I assume there will be heavy selection against fabricating incorrect medical advice, at least for Google.
For genuine best-guess attempts to answer the question? I will be concerned if that doesn't happen in a few years. What's the matter?
"AGI" here is undefined, and so is "significant probability". When I see declarations in this format, I downgrade my view of the epistemics involved. Reading stuff like this makes me fantasize about not-yet-invented trading instruments, without the counterparty risk of social betting, and getting your money.