Bridge Collapse: Reductionism as Engineering Problem

I really appreciate your clear expositions!

I thought of a phrase to quickly describe the gist of this problem: You need your AI to realize that the map is part of the territory.

Also, I was thinking that the fact that this is a problem might be a good thing. A Cartesian agent would probably be relatively slower at FOOMing, since it can't natively conceive of modifying itself. (I still think a sufficiently intelligent one would still be highly dangerous and capable of FOOMing, though) A bigger advantage might be that it could potentially be used to control a 'baby' AI that is still being trained/built, since there is this huge blindspot in they way they can model the world. For example, imagine that a Cartesian AI is trying to increase its computational power, and it notices that there happens to be a lot of computational power right in easy access! So it starts reprogramming it to suit its own nefarious needs - but whoops, it just destroyed itself. Might act as a sort of fuse for a too ambitious AI. Or maybe, this could be used to more safely grow a seed AI - you tell it to write a design for a better version of itself. Then you could turn it off (which is easier to do since it is Cartesian), check that the design was sound, build it, and then work on the next generation AI, instead of trying to let it FOOM in controlled intervals. At some point, you could presumably ask it to solve this problem, and then design a new generation based on that. I don't know how plausible these scenarios are, but it is interesting to think about.

[-]Rob Bensinger12y50

Thanks, Adele!

You need your AI to realize that the map is part of the territory.

That's right, if you mean 'representations exist, so they must be implemented in physical systems'.

But the Cartesian agrees with 'the map is part of the territory' on a different interpretation. She thinks the mental and physical worlds both exist (as distinct 'countries' in a larger territory). Her error is just to think that it's impossible to redescribe the mental parts of the universe in physical terms.

A Cartesian agent would probably be relatively slower at FOOMing

An attempt at a Cartesian seed AI would probably just break, unless it overcame its Cartesianness by some mostly autonomous evolutionary algorithm for generating successful successor-agents. A human programmer could try to improve it over time, but it wouldn't be able to rely much on the AI's own intelligence (because self-modification is precisely where the AI has no defined hypotheses), so I'd expect the process to become increasingly difficult and slow and ineffective as we reached the limits of human understanding.

I think the main worry with Cartesians isn't that they're dumb-ish, so they might become a dangerously unpredictable human-level AI or a bumbling superintelligence. The main worry is that they're so dumb that they'll never coalesce into a working general intelligence of any kind. Then, while the build-a-clean-AI people (who are trying to design simple, transparent AGIs with stable, defined goals) are busy wasting their time in the blind alley of Cartesian architectures, some random build-an-ugly-AI project will pop up out of left field and eat us.

Build-an-ugly-AI people care about sloppy, quick-and-dirty search processes, not so much about AIXI or Solomonoff. So the primary danger of Cartesians isn't that they're Unfriendly; it's that they're shiny objects distracting a lot of the people with the right tastes and competencies for making progress toward Friendliness.

The bootstrapping idea is probably a good one: There's no way we'll succeed at building a perfect FAI in one go, so the trick will be to cut corners in all the ways that can get fixed by the system, and that don't make the system unsafe in the interim. I'm not sure Cartesianism is the right sort of corner to cut. Yes, the AI won't care about self-preservation; but it also won't care about any other interim values we'd like to program it with, except ones that amount to patterns of sensory experience for the AI.

[-]Eliezer Yudkowsky12y80

The "build a clean Cartesian AI" folks, Schmidhuber and Hutter, are much closer to "describe how to build a clean naturalistic AI given unlimited computing power" than, say, Lenat's Eurisko is to AIXI. It's just that AIXI won't actually work as a conceptual foundation for the reasons given, nay it is Solomonoff induction itself which will not work as a conceptual foundation, hence considering naturalized induction as part of the work to be done along the way to OPFAI. The worry from Eurisko-style AI is not that it will be Cartesian and therefore bad, but that it will do self-modification in a completely ad-hoc way and thus have no stable specifiable properties nor be apt to grafting on such. To avoid that, we want to do a cleaner system; and then, doing a cleaner system, we wish it to be naturalistic rather than Cartesian for the given reasons. Also, once you sketch out how a naturalistic system works, it's very clear that these are issues central to stable self-modification - the system's model of how it works and its attempt to change it.

[-]V_V12y30

I think you are conflating two different problems:

How to learn by reinforcement in an unknown non-ergodic environment (e.g. one where it is possible to drop an anvil on your head)
How to make decisions that take into account future reward, in a non-ergodic environment, where actions may modify the agent.

The first problem is well known the reinforcement learning community, and in fact it is mentioned also in the first AIXI papers, but it is sidestepped with an ergodicity assumption, rather than addressed.
I don't think there can be really general solutions for this problem: you need some environment-specific prior or supervision.

The second problem doesn't seem as hard as the first one.
AIXI, of course, can't model self-modifications, because it is incomputable and it can only deal with computable environments, but computable varieties of AIXI (Schmidhuber's Gödel machine, perhaps?) can easily represent themselves as part of the environment.

[-]Adele_L12y20

Thank you, this helps clarify things for me.

Yes, the AI won't care about self-preservation; but it also won't care about any other interim values we'd like to program it with, except ones that amount to patterns of sensory experience for the AI.

I get why AIXI would behave like this, but it's not obvious to me that all Cartesian AIs would probably have this problem. If the AI has some model of the world, and this model can still update (mostly correctly) based on what the sensory channel inputs, and predict (mostly correctly) how different outputs can change the world, it seems like it could still try to maximize making as many paperclips as possible according to its model of the world. Does that make sense?

[-]Rob Bensinger12y70

Alex Mennen designed a Cartesian with preferences over its environment: A utility-maximizing variant of AIXI.

[-]Rob Bensinger12y50

That's a good point. AIXI is my go-to example, and AIXI's preferences are over its input tape. But, sticking to the cybernetic agent model, there are other action-dependent things Alice could have preferences over, like portions of her work tape, or her actions themselves. She could also have preferences over input-conditional logical constructs out of Everett's program, like Everett's work tape contents.

I agree it's possible to build a non-AIXI-like Cartesian that wants to make paperclips, not just produce paperclip-experiences in itself. But Cartesians are weird, so it's hard to predict how much progress that would represent.

For example, the Cartesian might wirehead under the assumption that doing so changes reality, instead of wireheading under the assumption that doing so changes its experiences. I don't know whether a deeply dualistic agent would recognize that editing its camera to create paperclip hallucinations counts as editing its input sequence semi-directly. It might instead think of camera-hacking as a godlike way of editing reality as a whole, as though Alice had the power to create billions of representations of objective physical paperclips in Everett's work tape just by editing the part of Everett's work tape representing her hardware.

In general, I'm worried about including anything reminiscent of Cartesian reasoning in our 'the seed AI can help us solve this' corner-cutting category, because I don't formally understand the precise patterns of mistakes Cartesians make well enough to think I can predict them and stay two steps ahead of those errors. And in the time it takes to figure out exactly which patches would make Cartesians safe and predictable without rendering them useless, it's plausible we could have just built a naturalized architecture from scratch.

[-]Squark12y60

RobbBB, I want to draw your attention the the model I constructed in http://lesswrong.com/lw/h4x/intelligence_metrics_and_decision_theories/ which solves the duality and ontology problems in AIXI.

Since then I've made some improvements, in particular quasi-Solomonoff induction should be constructed slightly differently and, more importantly, I realized the correct way to use UDT in this model. Planning to write about this soon.

[-]V_V12y40

A physical Cai might need to foresee scenarios like 'an anvil crashes into my head and destroys me', and assign probability mass to them.

An AI operating with the traditional cybernetic agent model can also evaluate scenarios like that, where "destroys me" means "puts the world in a state where my future ability to gain reward/fulfil my goals become permanently compromised".

[-]Rob Bensinger12y20

That's true. I'm focusing in on AIXI (/ AIXItl) in my next two posts because I want to see how much we can rely on indirect solutions along those lines to make a self-preserving, self-improving Cartesian. (Or an agent that starts off Cartesian but is easily self-modified, or humanly modified, to become naturalized.) AIXItl's behaviors are what ultimately matters, and if some crude hack can make its epistemic flaws irrelevant or effectively nonexistent, then we won't need to abandon Solomonoff induction after all.

I'm not confident that's possible because I'm not confident it's a process we can automate or find a single magic bullet for, even if we come up with a clever band-aid here or there. Naturalistic reasoning isn't just about knowing when you'll die; it's about knowing anything and everything useful about the physical conditions for your computations.

[-]V_V12y10

I'm not sure that this "Cartesian vs Naturalistic" distinction that you are making is really that fundamental.

An intelligent agent tries to learn a model of its environment that allows it to explain its observations and predict how to fulfil its goals. If that entails including in the world model a submodel that represents the agent itself, the agent will learn that, assuming that the agent is smart enough and learning can done safely (e.g. without accidentally dropping an anvil on its head).

After all, humans start with an intuitively dualistic worldview, and yet they are able to revise it to a naturalistic one, after observing enough evidence. Even people who claim to believe in supernatural souls tend to use naturalistic beliefs when making actual decisions (e.g. they understand that drugs, trauma or illness that physically affect the brain can alter cognitive functions).

[-]kokotajlod12y30

And since we happen to live in a world made of physics, the kind of monist we want in practice is a reductive physicalist AI. We want a 'physicalist' as opposed to a reductive monist that thinks everything is made of monads, or abstract objects, or morality fluid, or what-have-you

This may be nitpicky, but I'd like our AI to leave open the possibility of a non-physical ontology. We don't yet know that our world is made of physics. Even though it seems like it is. An analogy: It would be bad to hard-code our AI to have an ontology of wave-particles, since things might turn out to be made of strings/branes. So we shouldn't rule out other possibilities either.

[-]Rob Bensinger12y40

I'm not sure what you have in mind when you say 'non-physical ontology'. Physics at this point is pretty well empirically confirmed, so it doesn't seem likely we'll discover it's All A Lie tomorrow. On the other hand, you might have in mind a worry like:

How much detail of our contemporary scientific world-view is it safe to presuppose in building the AI, without our needing to seriously worry that tomorrow we'll have a revolution in physics that's outside of our AI's hypothesis space?
In particular: Might we discover that physics as we know it is a high-level approximation of a mathematical structure that looks nothing like physics as we know it?
To what extent is it OK if the world turns out to be non-computable but the AI can only hypothesize computable environments?

These are all very serious, and certainly not nitpicky. My last couple of posts in this sequence will be about the open problem 'Given that we want our AGI's hypotheses to look like immersive worlds rather than like communicating programs, how do we formalize "world"?' If we were building this thing at the turn of the 20th century, we might have assumed that it was safe to build 'made of atoms' into our conception of 'physical', and let the AI only think in terms of configurations of atoms. What revisable assumptions about the world might be in the background of our current thinking, that we ought to have the AI treat as revisable hypotheses and not as fixed axioms?

[-]kokotajlod12y10

The worry I had in mind is pretty well captured by your three bullet points there, though I think you are phrasing it in a weaker way than it deserves. Consider the Simulation Hypothesis combined with the hypothesis that the higher-level universe running the simulation does not follow rules remotely like those of modern physics. If it is true, then an AI which is hard-coded to only consider "physical" theories will be bad.

I'm not sure what you mean by (paraphrase) 'we want our AI to be a reductive physicalist monist.' I worried that you meant something like "We want our AI to be incapable of assigning any probability whatsoever to the existence of abstract objects, monads, or for that matter anything that doesn't look like the stuff physicists would talk about." It is quite possible that you meant something much less strong, in which case I was just being nitpicky about your language. If you truly meant that though, then I think myself to be raising a serious issue here.

By 'non-physical ontology' I meant mainly (a) an ontology that is radically different from modern physics, but also (b) in particular, an ontology that involves monads, or ideas, or abstract objects. (I exclude morality fluid because I'm pretty sure you just made that up to serve as an example of ridiculousness. The other options are not ridiculous though. Not that I know much about monads.)

[-]Rob Bensinger12y10

I worried that you meant something like "We want our AI to be incapable of assigning any probability whatsoever to the existence of abstract objects, monads, or for that matter anything that doesn't look like the stuff physicists would talk about."

What I meant was a conjunctive claim: 'We want our AI's beliefs to rapidly approach the truth', and 'the truth probably looks reasonably similar to contemporary physical theory'. I think it's an open question how strict 'reasonably similar' is, but the three examples I gave are very plausibly outside that category.

However, I independently suspect that an FAI won't be able to hypothesize all three of those things. That's not a requirement for naturalized agents; a naturalized agent should in principle be able to hypothesize anything a human or Cartesian can and do fine, by having vanishingly small priors for a lot of the weirder ideas. But I suspect that in practice it won't be pragmatically important to make the AI's hypothesis space that large. And I also suspect that it would be too difficult and time-consuming for us to formalize 'monad' and 'morality fluid' and assign sensible priors to those formalizations. See my response to glomerulus.

So, 'assign 0 probability to those hypotheses' isn't part of what I mean by 'physicalist', but it's not at all implausible that that's the sort of thing human beings need to do in order to build a working, able-to-be-vetted superintelligent physicalist. Being unable to think about false things (or a fortiori not-even-false things) can make an agent converge upon the truth faster and with less chance of getting stuck in an epistemic dead end.

(Edit: And the agent will still be able to predict our beliefs about incoherent things; our brains are computable, even if some of the objects of our thoughts are not.)

I exclude morality fluid because I'm pretty sure you just made that up to serve as an example of ridiculousness.

? Why exactly is it sillier to think our universe is made of morality-stuff than to think our universe is made of mind-stuff? Is it because morality is more abstract than mind stuff? But abstract objects are too, presumably.... I wasn't being entirely serious, no, but now I'm curious about your beliefs about morality.

[-]kokotajlod12y20

What I meant was a conjunctive claim: 'We want our AI's beliefs to rapidly approach the truth', and 'the truth probably looks reasonably similar to contemporary physical theory'

Then I agree with you. This was all a misunderstanding. Read my original comment as a nitpick about your choice of words, then.

...

The truth does probably look reasonably similar to contemporary physical theory, but we can handle that by giving the AI the appropriate priors. We don't need to make it actually rule stuff out entirely, even though it would probably work out OK if we did.

I don't think it would be that difficult for us to formalize "monad." Monads are actually pretty straightforward as I understand them. Ideas would be harder. At any rate, I don't think we need to formalize lots of different fundamental ontologies and have it choose between them. Instead, all we need to do is formalize a general open-mindedness towards considering different ontologies. I admit this may be difficult, but it seems doable. Correct me if I'm wrong.

? Why exactly is it sillier to think our universe is made of morality-stuff than to think our universe is made of mind-stuff?

I didn't exclude morality fluid because I thought it was sillier; I excluded it because I thought it wasn't even a thing. You might as well have said "aslkdj theory" and then challenged me to explain why "aslkdj theory" is sillier than monads or ideas. It's an illegitimate challenge, since you don't mean anything by "aslkdj theory." By contrast, there are actual bodies of literature on idealism and on monads, so it is legitimate to ask me what I think about them.

To put it another way: He who introduces a term decides what that term means. "Monads" and "Ideas," having been introduced by very smart, thoughtful people and discussed by hundreds more, definitely are meaningful, at least meaningful enough to talk about. (Meaningfulness comes in degrees) If we talk about morality fluid, which I suspect is something you made up, then we rely on whatever meaning you assigned to it when you made it up--but since you (I suspect) assigned no meaning to it, we can't even talk about it.

EDIT: So, in conclusion, if you tell me what morality fluid means, then I'll tell you what I think about it.

[-]Rob Bensinger12y00

Ah, OK. What I mean by 'the world is made of morality' is that physics reduces to (is fully, accurately, parsimoniously, asymetrically explainable in terms of) some structure isomorphic to the complex machinery we call 'morality'. For example, it turns out that the mathematical properties of human-style Fairness are what explains the mathematical properties of dark energy or quantum gravity.

This doesn't necessarily mean that the universe is 'fair' in any intuitive sense, though karmic justice might be another candidate for an unphysicalistic hypothesis. It's more like the hypothesis that a simulation deity created our moral intuitions, then built our universe out of the patterns in that moral code. Like a somewhat less arbitrary variant on 'I'm going to use a simple set of letter-to-note transition rules to convert the works of Shakespeare into a new musical piece'.

I think this view is fully analogous to idealism. If it makes complete sense to ask whether our world is made of mental stuff, it can't be because our mental stuff is simultaneously a complex human brain operation and an irreducible simple; rather, it's because the complex human brain operation could have been a key ingredient in the laws and patterns of our universe, especially if some god or simulator built our universe.

I don't think we need to formalize lots of different fundamental ontologies and have it choose between them. Instead, all we need to do is formalize a general open-mindedness towards considering different ontologies. I admit this may be difficult, but it seems doable. Correct me if I'm wrong.

I don't think I know enough to correct you. But I can express my doubts. I suspect 'a general open-mindedness towards considering different ontologies' can't be formalized, or can't be both formalized and humanly vetted. At a minimum, we'll need to decide what gets to count as an 'ontology', which means drawing the line somewhere and declaring everything outside a certain set of boundaries nonsensical. And I'm skeptical that there's any strongly principled way to determine that 'colorless green ideas sleep furiously' is contentless or nonsensical or 'non-ontological', while 'the world is made of partless fundamental ideas' is contentful and meaningful and picks out an ontology.

(Which doesn't mean I think we should be rude or dismissive toward idealists in ordinary conversation. We should be very careful not to conflate the question 'what questions should we treat with respect or inquire into in human social settings' with the question 'what questions should we program a Friendly AI to be able to natively consider'.)

[-]kokotajlod12y20

Thanks for that explanation of mental stuff. My opinion? Sounds implausible, but fine, in the sense that we shouldn't build our AI in a way that makes it incapable of considering that hypothesis. As an aside, I think it is less plausible than idealism, because it lacks the main cluster of motivations for idealism. The whole point of idealism is to be monist (and thus achieve ontological parsimony) whilst also "taking consciousness seriously." As seriously as possible, in fact. Perhaps more seriously than is necessary, but anyhow that's the appeal. Morality fluid takes morals seriously (maybe? Maybe not, actually, given your construction) but it doesn't take consciousness any more seriously than physicalism, it seems. And, I think, it is more important that our theories take consciousness seriously than that they take morality seriously.

I suspect 'a general open-mindedness towards considering different ontologies' can't be formalized, or can't be both formalized and humanly vetted.

Humans do it. If intelligent humans can consider a hypothesis, an AI should be able to as well. In most cases it will quickly realize the hypothesis is silly or even self-contradictory, but at least it should be able to give them an honest try, rather than classify them as nonsense from the beginning.

At a minimum, we'll need to decide what gets to count as an 'ontology', which means drawing the line somewhere and declaring everything outside a certain set of boundaries nonsensical.

Doesn't seem to difficult to me. It isn't really an ontology/nonontology distinction we are looking for, but a "hypothesis about the lowest level of description of the world / not that" distinction. Since the hypothesis itself states whether or not it is about the lowest level of description of the world, really all this comes down to is the distinction between a hypothesis and something other than a hypothesis. Right?

My general idea is, we don't want to make our AI more limited than ourselves. In fact, we probably want our AI to reason "as we wish we ourselves would reason." You don't wish you were incapable of considering idealism, do you? If you do, why?

[-]Armok_GoB12y-10

... Are you claiming that not only is the world dualistic, but that not only humans but also AIs that we program in enough detail that what ontology we program them with matters have souls? Or that there exist metaphysical souls that are not computable but you expect an AI lacking one to understand them and act appropriately? just... wut?

[-]glomerulus12y20

I don't think that's what they're saying at all. I think they mean, don't hardcode physics understanding into them the way that humans have a hardcoded intuition for newtonian-physics, because our current understanding of the universe isn't so strong as to be confident we're not missing something. So it should be able to figure out the mechanism by which its map is written on the territory, and update it's map of its map accordingly.

E.g., in case it thinks it's flipping q-bits to store memory, and defends its databases accordingly, but actually q-bits aren't the lowest level of abstraction and it's really wiggling a hyperdimensional membrane in a way that makes it behave like q-bits under most circumstances, or in case the universe isn't 100% reductionistic and some psychic comes along and messes with it's mind using mystical woo-woo. (The latter being incredibly unlikely, but hey, might as well have an AI that can prepare itself for anything)

[-]Armok_GoB12y10

Oh. OH. Yea that makes more sense, and is so obviously true that I didn't even consider the hypothesis someone'd feel the need to say it, but in hindsight I was wrong and it's probably a good thing someone did.

[-]Rob Bensinger12y00

in case the universe isn't 100% reductionistic and some psychic comes along and messes with it's mind using mystical woo-woo. (The latter being incredibly unlikely, but hey, might as well have an AI that can prepare itself for anything)

This isn't a free lunch; letting the AI form really weird hypotheses might be a bad idea, because we might give those weird hypotheses the wrong prior. Non-reductive hypotheses, and especially non-Turing-computable non-reductive hypotheses, might not be able to be assigned complexity penalties in any of the obvious or intuitive ways we assign complexity penalties to absurd physical hypotheses or absurd computable hypotheses.

It could be a big mistake if we gave the AI a really weird formalism for thinking thoughts like 'the irreducible witch down the street did it' and assigned a slightly-too-high prior probability to at least one of those non-reductive or non-computable hypotheses.

[-]glomerulus12y60

Do you assign literally zero probability to the simulation hypothesis? Because in-universe irreducible things are possible, conditional on it being true.

Assigning a slightly-too-high prior is a recoverable error: evidence will push you towards a nearly-correct posterior. For an AI with enough info-gathering capabilities, it will push it there fast enough that you could assign a prior of .99 to "the sky is orange" but it will figure out the truth in an instant. Assigning a literally zero prior is a fatal flaw that can't be recovered from by gathering evidence.

[-]Rob Bensinger12y60

It's very possible that what's possible for AIs should be a proper subset of what's possible for humans. Or, to put it less counter-intuitively: The AI's hypothesis space might need to be more restrictive than our own. (Plausibly, it will be more restrictive in some ways, less in others; e.g., it can entertain more complicated propositions than we can.)

On my view, the reason for that isn't 'humans think silly things, haha look how dumb they are, we'll make our AI smarter than them by ruling out the dumbest ideas a priori'. If we give the AI silly-looking hypotheses with reasonable priors and reasonable bridge rules, then presumably it will just update to demote the silly ideas and do fine; so a priori ruling out the ideas we don't like isn't an independently useful goal. For superficially bizarre ideas that are actually at least somewhat plausible, like 'there are Turing-uncomputable processes' or 'there are uncountably many universes', this is just extra true. See my response to koko.

Instead, the reason AIs may need restrictive hypothesis spaces is that building a self-correcting epistemology is harder than living inside of one. We need to design a prior that's simple enough for a human being (or somewhat enhanced human, or very weak AI) to evaluate its domain-general usefulness. That's tough, especially if 'domain-general usefulness' requires something like an infinite-in-theory hypothesis space. We need a way to define a prior that's simple and uniform enough for something at approximately human-level intelligence to assess and debug before we deploy it. But that's likely to become increasingly difficult the more bizarre we allow the AI's ruminations to become.

'What are the properties of square circles? Could the atoms composing brains be made of tiny partless mental states? Could the atoms composing wombats be made of tiny partless wombats? Is it possible that colorless green ideas really do sleep furiously?'

All of these feel to me, a human (of an unusually philosophical and not-especially-positivistic bent), like they have a lot more cognitive content than 'Is it possible that flibbleclabble?'. I could see philosophers productively debating 'does the nothing noth?', and vaguely touching on some genuinely substantive issues. But to the extent those issues are substantive, they could probably be better addressed with a formalization that's a lot less colorful and strange, and disposes of most of the vaguenesses and ambiguities of human language and thought.

An example of why we might need to simplify and precisify an AI's hypotheses is Kolmogorov complexity. K-complexity provides a very simple and uniform method for assigning a measure to hypotheses, out of which we might be able to construct a sensible, converges-in-bounded-time-upon-reasonable-answers prior that can be vetted in advance by non-superintelligent programmers.

But K-complexity only works for computable hypotheses. So it suddenly becomes very urgent that we figure out how likely we think it is that the AI will run into uncomputable scenarios, figure out how well/poorly an AI without any way of representing uncomputable hypotheses would do in various uncomputable worlds, and figure out whether there are alternatives to K-complexity that generalize in reasonable, simple-enough-to-vet ways to wider classes of hypothesis.

This is not a trivial mathematical task, and it seems very likely that we'll only have the time and intellectual resources to safely generalize AI hypothesis spaces in some ways before the UFAI clock strikes 0. We can't generalize the hypothesis space in every programmable-in-principle way, so we should prioritize the generalizations that seem likely to actually make a difference in the AI's decision-making, and that can't be delegated to the seed AI in safe and reliable ways.

[-]Shmi12y-20

How would you tell if the the simulation hypothesis is a good model? How would you change your behavior if it were? If the answers are "there is no way" or "do nothing differently", then it is as good as assigning zero probability to it.

[-]glomerulus12y00

If it's a perfect simulation with no deliberate irregularities, and no dev-tools, and no pattern-matching functions that look for certain things and exert influences in response, or anything else of that ilk, you wouldn't expect to see any supernatural phenomena, of course.

If you observe magic or something else that's sufficiently highly improbable given known physical laws, you'd update in favor of someone trying to trick you, or you misunderstanding something, of course, but you'd also update at least slightly in favor of hypotheses in which magic can exist. Such as simulation, aliens, huge conspiracy, etc. If you assigned zero prior probability to it, you couldn't update in that direction at all.

As for what would raise the simulation hypothesis relative to non-simulation hypotheses that explain supernatural things, I don't know. Look at the precise conditions under which supernatural phenomena occur, see if they fit a pattern you'd expect an intelligence to devise? See if they can modify universal constants?

As for what you could do, if you discovered a non-reductionist effect? If it seems sufficiently safe take advantage of it, if it's dangerous ignore it or try to keep other people from discovering it, if you're an AI try to break out of the universe-box (or do whatever), I guess. Try to use the information to increase your utility.

[-]Kaj_Sotala12y30

naturalized!Cai

I'm not sure that using this notation is a good idea, given that at least some of the readers unfamiliar with it are likely to initially parse it as "naturalized not-Cai". Even I did for a brief moment, because I was parsing the writing using my logic!brain rather than my fanfiction!brain.

[-]Nornagest12y40

Where does that notation come from, anyway? I know I've seen it on LJ, AO3, Tumblr, and ff.net, but as far as I can remember it just appeared out of thin air sometime in the mid-2000s. Do you have a sense of the etymology?

[-]Said Achmiz12y100

It's used in Microsoft Excel. If you have multiple worksheets, you preface a cell reference with "!" to specify which sheet you want that cell reference to be resolved on.

i.e.

"A3" means "the value of the cell in column A, row 3, on the current sheet", whereas

"Sheet1!A3" means "the value of the cell in column A, row 3, on Sheet 1".

[-]Douglas_Knight12y110

Here are two theories that I find much more plausible than Excel.

Added: I said two theories, but the differences are small and not really relevant here. They agree on the essential point that it started as "Action! Mulder" (or something similar) with more normal spacing, with the exclamation point associated with the modifier and functioning pretty much as normal.

[-]9eB112y20

I always assumed it was a reference to bang paths. It seems more likely to me that Eliezer would reference something that appears in the Jargon File than syntax from Excel.

[-]Said Achmiz12y50

Well, Eliezer presumably is referencing something that appeared in fanfiction/Tumblr/etc. culture; where said culture got the notation has nothing to do with Eliezer.

Bang paths seem an unlikely candidate, as they don't actually make a good metaphor for what's being conveyed here.

[-]9eB112y00

Interesting. I did not know that it was used prior to him, and I apparently have poor reading comprehension. I definitely agree that the Excel metaphor makes more sense.

[-]Nornagest12y00

Huh. Not something I would have guessed.

Thanks, that's actually interesting.

[-]Said Achmiz12y00

A similar notation, and one which I believe Eliezer has used in the past (somewhere in the Sequences) is the scope resolution operator, used in C++ and PHP (and probably elsewhere):

std::cout

which means: the function "cout", in the namespace "std". (As opposed to just "cout", which would mean: "the function 'cout' in the current namespace".)

I can only conclude from this that the Tumblr-and-fanfiction crowd contains more finance types than programmers.

[-]Nornagest12y40

Yeah, I'd been aware of the scope resolution operator (I'm a programmer working in C++), though in context I think a cast, or maybe even template syntax, might be more appropriate: Rational!Harry in fanfic parlance seems to mean something closer to "Harry reconstrued as Rational" or "Harry built around the Rational type" than "Harry resolved to an existing instance in the Rational scope". Excel isn't something I've had much occasion to use, though.

It'd have to be a C-style cast or a reinterpret_cast, though -- we can't guarantee that the target type is a member of the canonical inheritance hierarchy. Though const_cast might have potential for some characters...

[-]Said Achmiz12y00

Heh. So: Harry , or Rational (Harry), or (Rational) Harry (for C-style casting)? That would be amusing to see. It does seem slightly less readable, though.

[-]shokwave12y10

(Rational) Harry

Seemed eminently more readable than rationalist!Harry to me when I first encountered this notation, although now it's sunk in enough that my brain actually generated "that's more keystrokes!" as a reason not to switch style.

[-]komponisto12y20

Just curious (and not necessarily addressed to you specifically), but what on Earth is wrong with the standard, conventional English notation for this, which is a hyphen? E.g. "Rational-Harry" etc.

[-]Nornagest12y50

I'm not a linguist, but hyphen-compounding doesn't look quite right to me in this context; you usually see that for disambiguation, in compound participles ("moth-eaten"; "hyphen-compounding"), or to cover a few odd cases like common names derived from phrases ("jack-in-the-pulpit"). I think standard English would be to simply treat the modifier as an adjective ("Rational Harry"; "Girl Blaise"; "Death Eater Ron"); nouns often get coerced into their adjective form here if possible, but it's common to see modifying nouns even if no adjective form exists.

As to why it doesn't get used this way in fan jargon... who knows, but fans do tend to share a (mildly irritating) fondness for unusual lexical and grammatical constructions ("I have lost my ability to can"). Probably just a shibboleth thing.

[-]polymathwannabe12y00

Explained here:

http://fanlore.org/wiki/!

[-]komponisto12y20

I was looking for an explanation of why the exclamation point was used in preference to the already-existing hyphen notation. Instead, that page only contains an explanation of the meaning and the origin of the exclamation-point notation, and does not compare it to the hyphen notation at all.

[-][anonymous]12y-20

[This comment is no longer endorsed by its author]Reply

[-]Shmi12y-10

Which rules or principles are you applying here?

[-]komponisto12y20

I don't think it's listed explicitly at either of the links, but the principle I'm using is that of hyphenating when you want to make clear that a compound is a compound, and not (e.g.) an adjective happening contingently to modify a noun.

This used to be done a lot more often, e.g. "magnifying-glass". I generally dislike the trend of eliminating such hyphens.

But in any case my question is the same even if you prefer "Rational Harry" to "Rational-Harry"; why "Rational!Harry" instead of one of the former?

[-]glomerulus12y40

Rational!Harry describes a character similar to the base except persistently Rational, for whatever reason. Rational-Harry describes a Harry which is rational, but it's nonstandard usage and might confuse a few people (Is his name "Rational-Harry"? Do I have to call him that in-universe to differentiate him from Empirical-Harry and Oblate-Spheroiod-Harry?). Rational Harry might just be someone attaching an adjective to Harry to indicate that at the moment, he's rational, or more rational by contrast to Silly Dumbledore.

Anyway, adj!noun is a compound with a well-defined purpose within a fandom: to describe how a character differs from canon. It's an understood notation, and the convention, so everyone uses it to prevent misunderstandings. Outside of fandom things, using it signals casualness and fandom-savviness to those in fandom culture, and those who aren't familiar with fandom culture can understand it and don't notice the in-joke.

[-]Mestroyer12y20

I always figured it was like the scope resolution operator ("::") in C++, but in some weird functional language that AI people liked.

[-]Rob Bensinger12y10

Yes. I used it in an earlier version of this post reflexively, without even thinking about the connection to fanfics. My thinking was just 'this is clearer than subscript notation, and is a useful and commonplace LW shibboleth'.

[-]komponisto12y00

Rational Harry might just be someone attaching an adjective to Harry to indicate that at the moment, he's rational, or more rational by contrast to Silly Dumbledore.

Yes, that's why I favor the hyphen (in response to shminux above).

[-]Shmi12y00

I agree that using ! is non-standard outside the fandom cultures. It looked weird to me when I first saw it. Sometimes I am still not sure what goes first, the canon character or the derivative qualifier, especially for crossovers (is it SailorMoon!Harry or Harry!SailorMoon, to take a particularly silly example). However, a special delimiter is needed as a shorthand for "a derivative work based on with elements of ", and space or a dash is not unambiguous enough. The "bang notation" appears to be one of those memetic leaks from subcultures to the mainstream which is likely to survive for some time.

[-]Creutzer12y-20

I don't think it's listed explicitly at either of the links, but the principle I'm using is that of hyphenating when you want to make clear that a compound is a compound, and not (e.g.) an adjective happening contingently to modify a noun.

Except Adj-Noun compounds are not actually productive in English. (Also, magnifying glass is arguably from "magnifying" the gerund, not the participle.)

[-]Rob Bensinger12y40

Heh, logic!brain is definitely something I want to encourage. Fixed.

[-]alicey12y00

this is why i like ¬

script your keyboard! make it so that the chords ~1 and 1~ output a '¬'! or any other chord, really

if this actually sounds interesting and you use windows you can grab my script at https://github.com/alice0meta/userscripts/tree/master/ahk

[-]PhilGoetz11y10

RobBB, how did you make the diagrams, & how long did writing this post take?

With the help of an inductive algorithm that uses bridge hypotheses to relate sensory data to a continuous physical universe, we can avoid making our AIs Cartesians. This will make their epistemologies much more secure.

Is this whole post about a problem that only applies in odd cases, such as considering the possibility that someone is inserting bits into your brain, that real humans need never consider? Does avoiding Cartesianism make every-day epistemology more secure, or is it something needed only for the epistemological certainty needed for FAI? I suspect it is the latter, since most humans are Cartesians. It would help to have an example of how this is a problem for Alice in a real-world situation of the kind humans regularly experience.

[-][anonymous]11y00

What you really want is a vast hierarchical forest of causal models, ordered by what parameterizes what. A bridge hypothesis, or reduction, is then a continuous function from the high-dimensional outcome-space of one causal model to the lower-dimensional free-parameter space of another causal model, specifically, a function that "compresses well" with respect to the empirical data available about the "truer" model's outcome space (ie: perturbing the velocity of one molecule in a molecular simulation of a gas cloud doesn't cause a large change to the temperature parameter of a higher-level thermodynamic simulation of the same gas cloud). I don't know what sort of function these would be, but they should be learnable from data.

Metaphysical monism, dualism, or pluralism then consists in the assumptions we make about the graph-structure of the model hierarchy. We can a strict tree structure, in which each higher-level (more abstract, lower-dimensional parameter space) model is parameterized on only one parent, but that leaves us unable to apply multiple theories to one situation (ie: we can't make predictions about how a human being behaves when he helps you move house, because we need both some physics and some psychology to know when he's tired from lifting heavy boxes). We thus should assume a DAG structure, and that gives us a weak metaphysical pluralism (we can thus apply both physics and psychology where appropriate).

But what we think we want is strong metaphysical monism: the assumption, built into our algorithm, that ultimately there is only one root node in the Grand Hierarchy of Models, a Grand Unified Theory of reality, even if we don't actually know what it is. What we think we need to avoid is strong metaphysical pluralism: the (AFAIK, erroneous) inference by our algorithm that there are multiple root-level nodes in the Grand Hierarchy of Models, and thus multiple incommensurable fundamental realities.

Questions:

What would reality look like if it had multiple, incommensurable root-level "programs" running it forward?

Is it worth building a hierarchical inference algorithm on the hard-coded assumption that only one root-level reality exists, or is it better to allow for metaphysical uncertainty by "only" designing in a prior that assigns greater probability to model hierarchies with fewer, ideally only one, program?

Actually, isn't it more correct to build the hierarchies from the bottom up as we acquire the larger and larger amounts of empirical data necessary to build theories with higher-dimensional free-parameter spaces? And in that circumstance, how do we encode the preference for building reductions and unifying theories wherever possible, with a kind of "metaphysical simplicity prior"?

[-]HoverHell12y00

[-]jsalvatier12y00

Given that we're scared about non-FAI, I wonder if this cartesianism can't be a benefit, as it presumably substantially limits the power of the AI. Boxing an AI should be easier if the AI cannot conceive that the box would be a problem for it.

I would be interested in hearing people argue in both directions.

[-]Rob Bensinger12y60

Adele suggested this above. You can see my and Eliezer's response there. The basic worry is that Cartesians have no way to FOOM, because they're unlikely to form intelligent hypotheses about self-modifications. So a real Cartesian won't be an AGI, or will only barely be an AGI. Our work should go into something more useful than that, since it's possible that in the time it takes us to build a moderately useful Cartesian AI that doesn't immediately destroy itself, we could have invented FAI or proto-FAI.

Non-FAI isn't what we're acutely scared of; UFAI (i.e., superintelligence without human values) is. Failing to build a superintelligence is not the same thing as preventing others from building a dangerous superintelligence. So self-handicapping isn't generically useful, especially when most AI researchers won't handicap themselves in the same way.

[-]Shmi12y20

It probably is a benefit, up until the AI is smart enough to smash the box or itself accidentally.

[-]polymathwannabe12y-20

Can an AI live and not notice it's boxed?

Then how do I know I'm not boxed?

[-]Shmi12y-20

Can an AI live and not notice it's boxed?

Sure, for awhile, until it gets smart enough, say, smarter than whatever keeps it inside the box.

Then how do I know I'm not boxed?

Who says you aren't? Who says we all aren't? All those quantum limits and exponentially harder ways to get farther away from Earth might be the walls of the box in someone's Truman show.

[-]polymathwannabe12y00

An AI that isn't smart enough to notice (or care) that it's boxed doesn't seem to be a dangerous AI.

Which makes me think that AIs that would object to being boxed are precisely the ones that should be. But then that would make a smart AI pretend to be OK with it.

This reminds me of the Catch-22 case of soldiers who pretended to be insane by volunteering for suicide missions so that their superiors would remove them from said missions.