[Epistemic status: Medium-Uncertainty. I've only spent a few days thinking about this, but it seems to fit well for some specific problem so far.]
The following describes one possible framework for understanding the usefulness of different sources of information. It's particularly meant to help value source types such as books, academic articles, blog posts, online comments, and mathematical models. I think it could be a useful starting point but would guess that there are better alternatives upon further deliberation.
The framework factors are robustness, importance, novelty, and accessibility.
Simple Use Cases
There are multiple things this could be useful for, I'm sure most of which I haven't yet considered. For a start, I would hope that it could be used when discussing options on either writing information for others or deciding what materials to encourage.
Some possible discussion quotes relating to this framework
"These blog posts are quite novel, but I think they aren't very robust."
"This video may not be very dense, but it is highly accessible."
"This paper has a lot of equations but they don't seem useful to the point. It's both inaccessible and not robust."
"I think that you can change your paper to make it more accessible without sacrificing any robustness."
People use informational resources (books, videos, etc), in part, to learn information. There are many important attributes of such resources that will impact the quality and magnitude of such learning. Here I wrap these in the total term "information effectiveness."
Information effectiveness, when most narrowly estimated, is context-specific to an agent or group of readers or writers. It is specific to a set of topics; for instance, a particular article by George about politics may be considered ineffective on the topic of politics, but highly effective in it's revealed information about George's beliefs.
Informational effectiveness could be judged for any quantity of information; an entire book, a "per-page average", a "per-bit average", or similar.
In this document, we focus on "reader information effectiveness", which seeks to understand the effectiveness of information to readers. Similar frameworks could be made for writers; for instance, they may have goals such as persuading readers of specific claims or generating status.
To give a simple example, if you were to read a document that was enjoyable, seemed trustworthy, and became significantly life-changing in a positive way, that would be considered to have high informational effectiveness. If you were to read a boring archaic tome by a highly unreliable author about a topic not at all important to you, then that would be considered to have low information effectiveness. To be clear, this says more about the relationship between yourself and the text than about the text itself; in each case, it's possible other readers could have had very different reactions.
While reader informational effectiveness varies per reader, there are expected to be strong correlations between readers on many dimensions. For example, one article may be highly biased. This may not be a big deal for a reader incredibly well read on the particular author's biases, but would likely be a significant deterrent for most readers. Therefore such an article could be rated as having "low expected informational effectiveness" for a collection of possible readers.
The RAIN framework lists four factors that I think may be relatively intuitive, mutually-exclusive, relatively exhaustive, and relatively low in internal correlations. These factors are robustness, importance, novelty, and accessibility.
Hypothetically one could directly calculate the expected value of all information sources on all agents on all tasks, but this would be challenging and may not break down into the most intuitive substructures. This framework may provide a more pragmatic approach.
Robustness describes how valid the information understood by a reader would be expected to be. This could mean a few different things in different contexts. If a reader is reading an article expressing several claims, robustness would refer to the expected validity of those claims. If the reader is reading a table of data, robustness would refer to the expected validity of that data. If the reader is reading an article by an obviously biased source, but is reading it for information unimpacted from that bias, then that information can be robust.
Robustness can itself be broken down into further factors.
If claims or data are described, can those be easily verified? One way of doing this is by being able to explicitly falsify this information.
Based on the author's background, the medium, and the intended audience, can this information be expected to be false, misleading, or selectively chosen to create bias that would be disliked by the reader?
If the information came from a human author, did it go through rounds of scrutinization by unbiased and qualified parties? Will other parties pay attention to it and be able to disprove questionable claims? Even in specific cases where scrutinization itself is not obvious, the threat of it could promote accuracy.
Accessibility covers the relative cost or benefit of obtaining information. Typically learning bears costs, but not always. There are some educational information sources that are highly enjoyable and preferable even if not for the information value; these would be considered to have negative learning cost.
Unlike with the other three primary attributes, accessibility determines both costs and benefits. An unnecessarily difficult-to-read book would probably have readers struggle more per unit learned (a cost), but also have them give up before learning all the available content (a lack of benefit).
As with robustness, accessibility can be broken down further.
Information may not be easily available to many possible learners. This could be because it is behind a paywall, only shared within an exclusive group, or difficult to discover. In cases like video, it may not be available in websites that offer variable speeds. There could also be substantial parts missing.
Even when information is technically available, it may be difficult for some readers to understand. This could be reader-specific; a technical article may have high understandability, and thus information effectiveness, for some readers, but not others.
Most documents take a lot of time to understand, and then have some expected limit of understanding for a given reader. Both of these cost considerations can be significant and would go under the title of understandability.
If information is strongly unenjoyable, that would count as a cost for the learner. Enjoyment could come from many traits such as simplicity, elegance, low required mental effort, and humor. There could also be personally beneficial factors such as reinforcing the learner's identity or making them feel intelligent.
Compactness describes the density of relevant information.
Importance here is very similar to the Importance attribute of the ITN framework. Information content is important to a reader if it describes information that is highly decision relevant to the reader. This is very similar to it having high "value of information", though it is not constrained to any one decision the reader may be facing. Note that it is possible that information content could be high in importance but still useless; for instance, if the reader already knows all of that information.
Information is novel to a learner if that learner does not yet know that information. If the reader does know that information, it would have zero educational value. I believe this is pretty self-evident.
I think there are some common correlations between the four factors, and that these come about for different reasons.
Robustness vs. Accessibility
Some common ways of making information sources more robust include things that would make them less generally accessible.
High-Robustness, Low-Accessibility Example
Technical papers with lots of proofs, citations, and carefully described terminology.
Low-Robustness, High-Accessibility Example
Short emotionally-charged opinion pieces.
Importance vs. Accessibility
People generally seem to like it when information is useful to them, but on the other hand, the most accessible information for them is generally not the most important.
High-Importance, Low-Accessibility Example
Facts involving difficult truths. For a group at war, this could be, "You are very likely to lose, and if you really should surrender immediately."
Low-Importance, High-Accessibility Example
Writings about the lives of cultural celebrities.
Robustness vs. Novelty
When information is not novel, the learner would have a greater ability to validate it against their existing knowledge. Also, if one believes there is generally a much wider variety of false information than true information, then on average the false information would be more novel.
High-Robustness, Low-Novelty Example
Scientific statements that can be reasonably verified, because almost all are already known well by the readers.
Low-Robustness, High-Novelty Example
Sophisticated conspiracy theories complete new to the readers but very unlikely to be true.
Accessibility vs. Novelty
If information is not at all novel it may be boring, which would reduce accessibility. On the other hand, if it is too novel, it may be mentally challenging to process, also reducing accessibility.
High-Accessibility, Low-Novelty Example
A movie that the viewers have seen before but still enjoy. They don't have to struggle to follow it because they already know it well.
Low-Accessibility, High-Novelty Example
A 120-minute, highly-dense academic seminar on a very new topic to the audience.
Density vs. Accessibility
This is similar to the accessibility/novelty tradeoff. Very dense and very sparse information is typically low in accessibility.
High-Density, Low-Accessibility Example
A dense math logic textbook with derivations but very few explanations.
Low-Density, High-Accessibility Example
An extensive video series on a relatively simple subject.
Using RAIN for Content Evaluation
If one wanted to use this framework in order to evaluate all blog posts of LessWrong, for instance, I would recommend using it as a starting point, but modifying it for the use case. A few things to consider:
- Does the audience fall into clusters? Content may be important or novel to some clusters but not others.
- It's a much higher bar to be novel to experts than to be novel to most readers. Work that is novel to experts can be considered "innovative," will work that is novel to most readers can be considered "informative" or similar.
- Total length likely matters, even though it is not technically part of this framework.
The current framework is not tied to any specific mathematical model. I think that one is possible, though it may not map 1-1 with the accessibility term specifically.
It would be interesting to attempt to provide rubrics or quantifications for each factor. I'd also be interested, of course, in applying this framework in different ways to various available information sources.
For any specific in-progress informational work, there would be an effective "Pareto frontier" of RAIN factors. Understanding how to weight these factors for future works could be quite useful.
Many thanks to Ondřej Bajgar, Jan Kulveit, and Carina Prunkl for feedback and discussion on this post.