It may depend on what we mean by “best”.
Epistemic status: I understand very little of anything.
Speculation about potential applications: regulating a logical prediction market, e.g. logical induction; constructing judges or competitors in e.g. alignment by debate; designing communication technology, e.g. to mitigate harms and risks of information warfare.
The slogan “the best ideas float to the top” is often used in social contexts. The saying goes, “in a free market of ideas, the best ideas float to the top”. Of course, it is not intended as a facts statement, as in “we have observed that this is the case”; it is instead a values statement, as in “we would prefer for this to be the case.”.
In this essay, however, we will force an empirical interpretation, just to see what happens. I will provide three ways to consider the density of an idea, or the number assigned to how float-to-the-top an idea is.
In brief, an idea is a sentence, and you can vary the amount of it’s antecedent graph (like in bayesian nets, NARS-like architectures) or function out of which it is printed (like in compression) that you want to consider at a given moment, up to resource allocation. This isn’t an entirely mathematical paper, so don’t worry about WFFs, parsers, etc., which is why i’ll stick with “ideas” instead of “sentences”. I will also be handwaving between "description of some world states" and "belief about how world states relate to eachother".
Suppose you observe wearers of teal hats advocate for policy A, but you don’t know what A is. You’re minding your business in an applebees parking lot when a wearer of magenta hats gets your attention to tell you “A is harmful”. There are two cases:
- Suppose A is “kicking puppies”, (and I don’t mean the wearer of magenta hats is misleadingly compressing A to you, I mean the policy is literally kicking puppies). The inferential gap between you and the magentas can be closed very cheaply, so you’re quickly convinced that A is harmful (unless you believe that kicking puppies is good).
- Suppose A is “fleegan at a rate of flargen”, where fleeganomics is a niche technical subject which nevertheless can be learned by anyone of median education in N units[^1] or less. Suppose also that you know the value of N, but you’re not inclined to invest that much compute in a dumb election, so you either a. take them at their word that A is harmful; b. search the applebees for an authority figure who believes that A is harmful, but believes it more credibly; or c. leave the parking lot without updating in any direction.
“That’s easy, c” you respond, blindingly fast. You peel out of there, and the whole affair makes not a dent in your epistemic hygiene. But you left behind many others. Will they be as strong, as wise as you?
“In an information-rich world, the wealth of information means a dearth of something else: a scarcity of whatever it is that information consumes. What information consumes is rather obvious: it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention and a need to allocate that attention efficiently among the overabundance of information sources that might consume it.”
– Herbert Simon
Let’s call case 1 “constant” and case 2 “linear”. We assume that constant refers to negligible cost, and that linear is in pedagogical length (where pedagogical cost is some measure of the resources needed to acquire some sort of understanding).
A regulator, unlike you, isn’t willing to leave anyone behind for evangelists and pundits to prey on. This is the role I’m assuming for this essay. I will ultimately propose a negative attentional tax, in which the constant cases would be penalized to give the linear cases a boost. (It’s like negative income tax, replacing money with attention).
If you could understand fleeganomics in N/100000 bits, would it be worth it to you then?
Let’s force an empirical interpretation of “the best ideas float to the top”
Three possible measures of density:
- the simplest ideas float to the top.
- the truest ideas float to the top.
- the ideas which advance the best values float to the top, where by “advance the best values” we mean either a. maximize my utility function, not yours; or b. maximize the aggregate/average utility function of all moral patients, without emphasis on zero-sum faceoffs between opponent utility functions.
Each in turn implies a sort of world in which it is the sole interpretation, and thus the sole factor over beliefs of truth-seekers.
The intuition given above leans heavily on density_3, however, we must start much lower, at the fundamentals of simplicity and truth. From now on, for brevity’s sake, please ignore density_3 and focus on the first two.
density_1: The Simplest Ideas Float to the Top.
If you form a heuristic by philosophizing the conjunction rule in probability theory, you get Occam’s razor. In machine learning, we have model selection methods that directly penalize complexity. Occam’s razor doesn’t say anything about reception of ideas in a social system, beyond implying that in gambling the wise bet on shorter sentences (insofar as the wise are gamblers).
If we assume that the wearer of magenta hats is maximizing something like petition signatures, and by proxy maximizing the number of applebees patrons converted to magenta hat wearing applebees patrons, then in the world of density_1 they ought to persuade only via statements with constant or negligible cost. (remember, in the world of density_1, statement’s needn’t have any particular content to be successful. In an idealized setting, this would mean the empty string gets 100% of the vote in every election, or 100% of traders purchase nothing but the empty string, etc.; in a human setting, think of the “smallest recognizable belief”).
density_2: The Truest Ideas Float to the Top.
If the truest ideas floated to the top, then statements with more substantial truth values (i.e. with more evidence, more compelling evidence, stronger inferential steps) win out against those with less substantial truth values. In a world governed only by density_2, all cost is negligible.
In this world, the wearer of magenta hats is incentivized to teach fleaganomics – to bother themselves (and others) with linear cost ideas – if that’s what leads people to more substantially held beliefs or commitments. This is a sort of oracle world, in a word, logical omniscience.
In a market view, truth only prevails in the long run (i.e. in the same way that price only converges to value but you can’t pinpoint when they’re equal, supply with demand, etc.), which is why the density_2 interpretation is suitable for oracles, or at least the infinite resources of AIXI-like agents. If you tried to populate the world of density_2 with logically uncertain/AIKR-abiding agents, the entire appeal of markets evaporates. “Those who know they are not relevant experts shut up, and those who do not know this eventually lose their money, and then shut up.” (Hanson), but without the “eventually”.
Negative attention tax
Now suppose we live in some world where density_1 and density_2 are operating at the same time, with some foggier and handwavier things like density_3 on the margins. In such a world, we say false-complicated ideas are robustly uncompetitive and true-simple ideas are robustly competitive, where “robust” means “resilient to foul play”, and "foul play" means "any misleading compression, fallacious reasoning, etc.". Without such resilience, we have risk that false-simple ideas will succeed and true-complicated ideas will fail.
A regulator isn’t willing to leave anyone behind for evangelists and pundits to prey on.
Perhaps we want free attention distributed to true-but-complicated things, and penalties applied to false-but-simple things. In economics, a negative income tax (NIT) is a welfare system within an income tax where people earning below a certain amount receive supplemental pay from the government instead of paying taxes to the government.
For us, a negative attentional tax is a welfare system, where ideas demanding above a certain amount of compute receive supplemental attention, and ideas below that amount pay up.
|density_2 \ density_1||Simple||Complicated|
|False||I'm saying this is a failure mode, danger zone, etc.||Robustly uncompetitive (won’t bother us)|
|True||Robustly competitive (these will be fine)||I’m saying the solution is to give these sentences a boost.|
An example implementation: suppose I’m working at nosebook in year of our lord. When I notice certain posts get liked/shared blindingly fast, and others take more time, I suppose that the simple ones are some form of epistemic foul play, and the complicated ones are more likely to align with epistemic norms we prefer. I make an algorithm to suppress posts that get liked/shared too quickly, and replace their spots in the feed with posts that seem to be digested before getting liked/shared (disclaimer: this is not a resilient proposal, I spent all of 10 seconds thinking about it, please defer to your nearest misinformation expert)
Individuals apply NAT credits to interesting-looking complicated ideas, complicated ideas aren't directly supplied with these supplements in the way that simple ideas are automatically handicapped.
Though the above may be a valid interpretation, especially in the nosebook example, NAT is more properly understood as credits allocated to individuals for them to spend freely.
You can imagine the stump speech.
extremely campaigning voice: I’m going to make sure every member of every applebees parking lot has a lengthened/handicapped mental speed when they’re faced with simple ideas, and this will come back to them as tax credits they can spend on complicated ideas. Every applebees patron deserves complexity, even if they can’t afford the full compute/price for it.
--footnote-- [^1]: "Pedagogical cost" is loosely inspired by "algorithmic decomposition" in Between Saying and Doing. TLDR., to reason about a student acquiring long division, we reason about their acquisition of subtraction and multiplication. For us, pedagogical cost or length of some capacity is the sum of the length of its prerequisite capacities. We'll consider our pedagogical units as some function on attentional units. Herbert Simon dismisses adopting Shannon's bit as the attentional unit, because he wants something invariant under different encoding choices. He goes on to suggest time in the form of "how long it takes for the median human cognition to digest". This can be our base unit of parsing things you already know how to parse, even though extending it to pedagogical cost wouldn't be as stable because we don't understand teaching or learning very well.