Abstract: First [1)], a suggested general method of determining, for AI operating under the human feedback reinforcement learning (HFRL) model, whether the AI is “thinking”; an elucidation of latent knowledge that is separate from a recapitulation of its training data. With independent concepts or cognitions, then, an early observation that...
Motivated by thinking gay rights were advanced by asking "When did you choose to be straight?" Which emphasised that what isn't a choice and doesn't harm others shouldn't be proscribed. Here, we're seeking a memetic way of framing the fact that the alignment problem is unsolved. Author's "null quip": "Can...
Abstract: Values alignment, in AI safety, is typically construed as the imbuing into artificial intelligence of human values, so as to have the artificial intelligence act in ways that encourage what humans value to persist, and equally to preclude what humans do not value. “Anthropocentric” alignment emphasises that the values...
Abstract: A demonstration that the philosophy Effective Altruism (hereafter “EA”), particularly its emphasis on the use of the free market to collect means then used to promote human welfare, including reducing risks to human existence (our definition of EA), is contradictory, and therefore ineffectual. Epistemic status: Modest confidence. 1. EA’s...
Abstract: An alternative to the now-predominating models of alignment, corrigibility and "CEV", following a critique of these. The critique to show, in substance: CEV and corrigibility have the exact same problems - in effect, they're isomorphs of one another, and each equally unobtainable. This briefly shown, and then, in flat...
An attempt to demonstrate a limiting condition of an optimal Bayesian agent, or a probabilistic agent in general. This analysis relies on the description of such an optimal Bayesian agent (hereafter OBA), in Bostrom 2014, Boxes one and ten, and associated endnotes, with supplementary research on Kolmogorov’s axioms of probability,...
So as to know, structurally, what it is the better to avoid it; Yudkowsky's previously mentioned, so implied, that MIRI, in the process of designing a corrigible agent, in fact succeeded in designing an agent specifically intended to shut itself down. So, not having seen it elsewhere, here to suggest:...