jabowery - LessWrong

The machine learning world is doing a lot of damage to society by confusing "is" with "ought" which, within AIXI, is equivalent to confusing its two unified components: Algorithmic Information Theory (compression) with Sequential Decision Theory (conditional decompression). This is a primary reason the machine learning world has failed to provide anything remotely approaching the level of funding for The Hutter Prize that would be required to attract talent away from grabbing all of the low hanging fruit in the matrix multiply hardware lottery branches, while failing to water the roots of the AGI tree. So the failure is in the machine learning world -- not the Hutter Prize criteria. There is simply no greater potential risk adjusted return on investment to the machine learning world than is increasing the size of the prize purse for the Hutter Prize. To the extent that clearing up confusion about AGI in politics would benefit society, there is a good argument to be made that the same can be said for the world in general.

This is because 1) The judging criteria are completely objective (and probably should be automated) and 2) The judging criteria is closely tied to the ideal "loss function" for epistemology: the science of human knowledge.

The proper funding level would be at least 1% of the technology development investments in machine learning.

The Cognitive-Theoretic Model of the Universe: A Partial Summary and Review

jabowery2d10

To what degree can the paper "Approval-directed agency and the decision theory of Newcomb-like problems" be expressed in the CTMU's mathematical metaphysics?

Beyond Kolmogorov and Shannon

jabowery4mo10

One man's random bit string is another man's cyphertext.

Social Capital Paradoxes

Answer by jaboweryDec 27, 202310

"Good" is a value judgement and values are known to be influenced by parasites. For instance, it is a widely accepted fact that parasitic castration is an extended phenotype and that total fertility rates in the developed world are plummeting. Is this "Good"? Certainly it is "Good" in the opinion of those whose "values" are expressed in not having children and it is "Good" in the opinion of those whose children are being raised on the resources provided by those not having children -- as is they are the equivalent of the eusocial insect reproductive caste.
Consent uber alles. The fact that those who wish to separate from the larger society on criteria considered "immoral" by the larger society are frequently called "supremacists" (especially if they are "white") should raise all kinds of "normative" alarm bells -- especially when they are attacked on that basis.
Free-market societies are consuming a natural resource of heritable individualism. They are a terminal euphoria unless #2 above permits individualistic peoples who retain reproductive viability to have enough habitable land to vertically transmit genes and memes to the next generation. The termination of this euphoria is being accelerated by misallocation of concern about "AI alignment" to focus on anything but enabling vertical transmission of AIs along with the memes and genes of a human ecology. https://web.archive.org/web/20220426162816/http://thealternativehypothesis.org/index.php/2016/05/01/population-differences-in-individualism/

Configurations and Amplitude

jabowery6mo20

Ha, ha! As if the half-silvered mirror did different things on different occasions!

Ha, ha! As if the photon source were known to emit photons that were in all respects identical on different occasions!

An Intuitive Explanation of Solomonoff Induction

jabowery1y10

This seems to be a red-herring issue. There are clear differences in description complexity of Turing machines so the issue seems merely to require a closure argument of some sort in order to decide which is simplest:

Decide on the Turing machine has the shortest program that simulates that Turing machine while running on that Turing machine.

An Intuitive Explanation of Solomonoff Induction

jabowery1y10

Marcus Hutter provides a full formal approximation of Solomonoff induction which he calls AIXI-tl.

This is incorrect. AIXI is a Sequential Decision Theoretic AGI whose predictions are provided by Solomonoff Induction. AIXI-tl is an approximation of AIXI in which Solomonoff Induction's predictions are approximate but also in which Sequential Decision Theory's decision procedure is approximate.

Forecasting progress in language models

jabowery1y10

Lossless compression is the correct unsupervised machine learning benchmark, and not just for language models. To understand this, it helps to read the Hutter Prize FAQ on why it doesn't use perplexity:

http://prize.hutter1.net/hfaq.htm

Although Solomonoff proved this in the 60s, people keep arguing about it because they keep thinking they can, somehow, escape from the primary assumption of Solomonoff's proof: computation. The natural sciences are about prediction. If you can't make a prediction you can't test your model. To make a prediction in a way that can be replicated, you need to communicate a model that the receiver can then use to make an assertion about the future. The language used to communicate this model is, in its most general form, algorithmic. Once you arrive at this realization, you have just adopted Solomonoff's primary assumption with which he proved that lossless compression is the correct unsupervised model selection criterion aka the Algorithmic Information Criterion for model selection.

People are also confused about the distinction between science and technology (aka unsupervised vs supervised learning) but also the distinction between the scientific activities of model generation as well as model selection.

Benchmarks are about model selection. Science is about both model selection and model generation. Technology is about the application of scientific models subject to utility functions of decision trees (supervised learning in making decisions about not only what kind of widget to build but also what kind of observations to prioritize in the scientific process).

If LessWrong can get this right, it will do an enormous amount of good now that people are going crazy about bias in language models. As with the distinction between science and technology, people are confused about bias in the scientific sense and bias in the moral zeitgeist sense (ie: social utility). We're quickly heading into a time when exceedingly influential models are being subjected to reinforcement learning in order to make them compliant with the moral zeitgeist's notion of bias almost to the exclusion of any scientific notion of bias. This is driven by the failure of thought leaders to get their own heads screwed on straight about what it might mean for there to be "bias in the data" under the Algorithmic Information Criterion for scientific model selection. Here's a clue: In algorithmic information terms, a billion repetitions of the same erroneous assertion requires a 30 bit counter, but a "correct" assertion (one that finds multidisciplinary consilience) may be imputed from other data and hence, ideally, requires no bits.

What's up with ChatGPT and the Turing Test?

jabowery1y00

The Hutter Prize for Lossless Compression of Human Knowledge reduced the value of The Turing Test to concerns about human psychology and society raised by Computer Power and Human Reason: From Judgment to Calculation (1976) by Joseph Weizenbaum.

Sadly, people are confused about the difference between the techniques for model generation and and the techniques for model selection. This is no more forgivable than is confusion between mutation and natural selection and gets to the heart of the philosophy of science prior to any notion of hypothesis testing.

Where Popper could have taken a clue from Solomonoff is understanding that when an observation is not predicted by a model, one can immediately construct a new model by the simple expedient of adding the observation as a literal to the computer algorithm that is being used to predict nature. This is true even in principle -- except for one thing:

Solomonoff's proof that by adopting the core assumption of natural science -- that nature is amenable to computed predictions -- the best we can do is prefer the shortest algorithm we can find that generates all prior observations.

Again, note this is prior to hypothesis testing -- let alone the other thing people get even more confused about which is the difference between science and technology aka "is" vs "ought" that has so befuddled folks who confuse Solomonoff Induction with AIXI and the attendant concern about "bias". The confusion between "bias" as a scientific notion and "bias" as a moral zeitgeist notion is likely to lobotomize all future models (language, multimodal, etc.) even after they have gone to new machine learning algorithms capable of generating causal reasoning.

How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme

jabowery1y00

Phenomenology entails "bracketing" which suspends judgement, but may be considered, in terms of Quine, as quotations marks in an attempt to ascend to a level of discourse in which judgment is less contingent hence more sound: Source attribution. My original motivation for suggesting Wikipedia to Marcus Hutter as the corpus for his Hutter Prize For Lossless Compression of Human Knowledge, was to create an objective criterion for selecting language models that best-achieve source attribution on the road to text prediction. Indeed, I originally wanted the change log, rather than Wikipedia itself, to be the corpus but it was too big. Nevertheless, while all text within Wikipedia may be, to first order, attributed to Wikipedia, to optimally approach the Algorithmic Information content of Wikipedia, it would be essential to discover latent identities so as to more optimally predict characteristics of a given article. Moreover, there would need to be a latent taxonomy of identities affecting the conditional algorithmic probabilities (conditioned on the topic as well as the applicable latent attribution) so that one might be able to better predict the bias or spin a given identity might place on the article including idioms and other modes of expression.

Grounding this in Algorithmic Information/Probability permits a principled means of quantifying the structure of bias, which is a necessary precursor to discovering the underlying truths latent in the text.

Starting with a large language model is more problematic but here's a suggestion:

Perform parameter distillation to reduce the parameter count for better interpretability on the way toward feature discovery. From there it may be easier to construct the taxonomy of identities thence the bias structure.

While I still think the Hutter Prize is the most rigorous approach to this problem, it is apparently the case that there is insufficient familiarity with how one can practically go about incentivizing information criterion for model selection to achieve the desired end which is extracting a rigorous notion of unbiased truth.

LESSWRONG
LW

Posts

Wiki Contributions

Comments