Discord: DeepBlueSeal#2895. Reddit: u/Smack-works. About my situation: here.
At this point, discussing my ideas or ways to share them would be more useful for me.
I would like to have a chance to explain/defend them. To defend what I thought and cared about.
"Staying safe" haven't worked that well so far. The list of diseases is a very tiny hope (even though I may've underestimated it).
Thank you! But, judging by the first link, there's not much I can do at the moment. Nothing I didn't do/hear already. Does the third link contain any special advice? (Probably not... if it did, it would circulate somewhere.)
You can apply the same idea (about the "common pool") to hypotheses and argumentation:
In a way it means that specific hypotheses/beliefs just don't exist, they're melted into a single landscape. It may sound insane ("everything is true at the same time and never proven wrong" and also relative!). But human language, emotions, learning, pattern-matching and research programs often work like this. It's just a consequence of ideas (1) not being atomic statements about the world and (2) not being focused on causal reasoning, causal modeling. And it's rational to not start with atomic predictions when you don't have enough evidence to locate atomic hypotheses.
You can split rationality into 2 components. The second component isn't explored. My idea describes the second component:
Causal and Descriptive rationality work according to different rules. Causal uses Bayesian updating. Descriptive uses "the common pool of properties + Bayesian updating", maybe.
Example: Vitalism. It was proven wrong in causal terms. But in descriptive terms it's almost entirely true. Living matter does behave very differently from non-living matter. Living matter does have a "force" that non-living matter doesn't have (it's just not a fundamental force). Many truths of vitalism were simply split into different branches of science: living matter is made out of special components (biology/microbiology) including nanomachines/computers!!! (DNA, genetics), can have cognition (psychology/neuroscience), can be a computer (computer science), can evolve (evolutionary biology), can do something like "decreasing entropy" (an idea by Erwin Schrödinger, see entropy and life). On the other hand, maybe it's bad that vitalism got split into so many different pieces. Maybe it's bad that vitalism failed to predict reductionism. However, behaviorism did get overshadowed by cognitive science (living matter did turn out to be more special than it could be). Our judgement of vitalism depends on our choices, but at worst vitalism is just the second best idea. Or the third best idea compared to some other version of itself... Absolute death of vitalism is astronomically unlikely and it would cause most of reductionism and causality to die too along with most of our knowledge about the world. Vitalism partially just restates our knowledge ("living matter is different from non-living"), so it's strange to simply call it wrong. It's easier to make vitalism better than to disprove it.
Perhaps you could call the old version of vitalism "too specific given the information about the world": why should "life-like force" be beyond laws of physics? But even this would be debatable at the time. By the way, the old sentiment "Science is too weak to explain living things" can be considered partially confirmed: 19th century science lacked a bunch of conceptual breakthroughs. And "only organisms can make the components of living things" is partially just a fact of reality: skin and meat don't randomly appear in nature. This fact was partially weakened, but also partially strengthened with time. The discovery of DNA strengthened it in some ways. It's easy to overlook all of those things.
In Descriptive rationality, an idea is like a river. You can split it, but you can't stop it. And it doesn't make sense to fight the river with your fists: just let it flow around you. However, if you did manage to split the river into independent atoms, you get Causal rationality.
I think causal rationality has some problems and those problems show that it has a missing component:
I'm not saying that all of this is impossible to solve with Causal rationality. I'm saying that Causal rationality doesn't give any motivation to solve all of this. When you're trying to solve it without motivation you kind of don't know what you're doing. It's like trying to write a program in bytecode without having high-level concepts even in your mind. Or like trying to ride an alien device in the dark: you don't know what you're doing and you don't know where you're doing.
What and where are we doing when we're trying to fix rationality?
Thank you! Sorry, I should have formulated my question better.
I meant that from time to time people come up with the idea "maybe AI shouldn't learn human values/ethics in the classical sense" or "maybe learning something that's not human values can help to learn human values":
The common theme of all those ideas is describing human values as a part of something bigger. I thought it would be rational to give a name to this entire area "beyond human values" and compare ideas in that context. And answer the question: why do we even bother going there? what can we gain there in the perfect case? (Any approach in theory can be replaced by a very long list of direct instructions, but we look for something more convenient than "direct instructions".) Maybe we should try to answer those questions in general before trying to justify specific approaches. And I think there shouldn't be a conflict between different approaches: different approaches can share results and be combined in various ways.
What do you think about that whole area "beyond human values"?
Crash Bandicoot N. Sane Trilogy
My ordering of some levels: image. Videos of the levels: Level 1, Level 2, Level 3, Level 4, Level 5, Level 6.
I used 2 metrics to evaluate the levels:
The levels go from "vertical and separable" to "horizontal and not separable".
But to see this you need to note:
Any question about any property of any level is answered by another question: is this property already "occupied" by some other level?
Places in random order: image.
My ordering of places: image.
I used 2 metrics to evaluate the places:
The places go from "box-like and outside" to "not box-like and inside".
If you feel this relativity of places' properties, then you understand how I think about places. You don't need to understand a specific order of places perfectly.
My ordering of some levels: image. Videos of the levels: Level 1, Level 2, Level 3, Level 4, Level 5, Level 6, Level 7
I used 1 metrics to evaluate the levels:
Levels go from 3D to 2D to 0D.
Each level is described by all other levels. This recursive logic determines what features of the levels matter.
When objects take their properties from a single pool of properties, there may appear "negative objects". It happens when objects A and B take away opposite properties from a third object C (with equal force). For example, A may take away height from C. But B takes away shortness (anti-height) from C. So, "negative objects" are like contradictions. You can't fit a negative object anywhere in the order of positive objects.
Let's get back to Crash Bandicoot 3 and add two levels: image. Videos of the levels: Level -2, Level -1
Note that negative levels are still connected with all the other levels anyway: their properties are still determined by properties of all other levels, just in a more complicated way.
You can order negative levels by using the metrics for positive levels. In the case above, you can do it like this:
There are also "hyper objects" (hyper positive and hyper negative objects). Such objects take "too much" or "too little" from the common pool of properties compared to normal objects.
How do hyper objects appear? I may not be able to explain it. Maybe a hyper object appears when an object takes a property (equally strong) from objects with very different amounts of that property. This was very confusing and vague, so here's an analogy: imagine a number that's very-very, but equally far away from the numbers 2 and 5. It has distance 10 from both 2 and 5. How can this be? This number should go somewhere "sideways"... it must be a complex number. So, you can compare hyper objects to complex numbers.
An example of hyper levels for Crash Bandicoot 3: image. Video of the levels: "Bye Bye Blimps", "N. Gin"
You may be asking "How can ordering things be related to anything?" Prepare for a little bit abstract argument.
Any thought/experience is about multiple things coexisting in your mental state. So, any thought/experience is about direct or indirect comparison between things. And any comparison can be described by an order or multiple orders.
So, "my orders + arithmetic orders" is something like a Turing machine: a universal model that can describe any thought/experience, any mental state. Of course, a Turing machine can describe anything my method can describe, but my method is more high-level.
I know that what I described above doesn't automatically specify a mathematical model. But I think we should be able to formalize my idea easily enough. If not, then my idea is wrong.
We have those hints for formalization:
To be honest, I'm bad at math. I based my theory on synesthesia-like experiences and conceptual ideas. But if the information above isn't enough, I can try to give more. I have experience of making my idea more specific, so I could guess how to make the idea even more specific (if we encounter a problem). Please, help me with formalizing this idea.
For some time I wanted to apply the idea of probabilistic thinking (used for predicting things) to describing things, making analogies between things. This is important because your hypotheses (predictions) depend on the way you see the world. If you could combine predicting and describing into a single process, you would unify cognition.
Fuzzy logic and fuzzy sets is one way to do it. The idea is that something can be partially true (e.g. "humans are ethical" is somewhat true) or partially belong to a class (e.g. a dog is somewhat like a human, but not 100%). Note that "fuzzy" and "probable" are different concepts. But fuzzy logic isn't enough to unify predicting and describing. Because it doesn't tell us much about how we should/could describe the world. No new ideas.
I have a different principle for unifying probability and description. Here it is:
Properties of objects aren't contained in specific objects. Instead, there's a common pool that contains all possible properties. Objects take their properties from this pool. But the pool isn't infinite. If one object takes 80% of a certain property from the pool, other objects can take only 20% of that property (e.g. "height"). Socialism for properties: it's not your "height", it's our "height".
How can an object "take away" properties of other objects? For example, how can a tall object "steal" height from other objects? Well, imagine there are multiple interpretations of each object. Interpretation of one object affects interpretation of all other objects. It's just a weird axiom. Like a Non-Euclidean geometry.
This sounds strange, but this connects probability and description. And this is new. I think this principle can be used in classification and argumentation. Before showing how to use it I want to explain it a little bit more with some analogies.
Imagine two houses, A and B. Those houses are connected in a specific way.
When one house turns on the light at 80%, the other turns on the light only at 20%.
When one house uses 60% of the heat, the other uses only 40% of the heat.
(When one house turns on the red light, the other turns on the blue light. When one house is burning, the other is freezing.)
Those houses take electricity and heat from a common pool. And this pool doesn't have infinite energy.
Usually people think about qualities as something binary: you either has it or not. For example, a person can be either kind or not.
For me an abstract property such as "kindness" is like the white light. Different people have different colors of "kindness" (blue kindness, green kindness...). Every person has kindness of some color. But nobody has all colors of kindness.
Abstract kindness is the common pool (of all ways to express it). Different people take different parts of that pool.
Theism analogy. You can compare the common pool of properties to the "God object", a perfect object. All other objects are just different parts of the perfect object. You also can check out Monadology by Gottfried Leibniz.
Spectrum analogy. You can compare the common pool of properties to the spectrum of colors. Objects are just colors of a single spectrum.
Ethics analogy. Imagine that all your good qualities also belong (to a degree) to all other people. And all bad qualities of other people also belong (to a degree) to you. As if people take their qualities from a single common pool.
Buddhism analogy. Imagine that all your desires and urges come (to a degree) from all other people. And desires and urges of all other people come (to a degree) from you. There's a single common pool of desire. This is somewhat similar to karma. In rationality there's also a concept of "values handshakes": when different beings decide to share each other's values.
Quantum analogy. See quantum entanglement. When particles become entangled, they take their properties from a single common pool (quantum state).
Fractal analogy. "All objects in the Universe are just different versions of a single object."
Subdivision analogy. Check out Finite subdivision rule. You can compare the initial polygone to the common pool of properties. And different objects are just pieces of that polygone.
Recursion. If objects take their properties from the common pool, it means they don't really have (separate) identities. It also means that a property (X) of an object is described in terms of all other objects. So, the property (X) is recursive, it calls itself to define itself.
For example, imagine we have objects A, B and C. We want to know their heights. In order to do this we may need to evaluate those functions:
A priori assumptions about objects should allow us to simplify this and avoid cycles.
Fractals. See Coastline paradox. You can treat a fractal as an object with multiple interpretations (where an interpretation depends on the scale). Objects taking their properties from the common pool = fractals taking different scales from the common range.
To explain how to classify objects using my principle, I need to explain how to order them with it.
I'll explain it using fantastical places and videogame levels, because those things are formal and objective enough (they are 3D shapes). But I believe the same classification method can be applied to any objects, concepts and even experiences.
Basically, this is an unusual model of contextual thinking. If we can formalize this specific type of contextual thinking, then maybe we can formalize contextual thinking in general. This topic will sound very esoteric, but it's the direct application of the principle explained above.
(I interpret paintings as "real places": something that can be modeled as a 3D shape. If a painting is surreal, I simplify it a bit in my mind.)
Take a look at those places: image.
Let's compare 2 of them: image. Let's say we want to know the "height" of those places. We don't have a universal scale to compare the places. Different interpretations of the height are possible.
If we're calling a place "very tall" - we need to understand the epithet "very tall" in probabilistic terms, such as "70-90% tall" - and we need to imagine that this probability is taken away from all other places. We can't have two different "very tall" places. Probability should add up to 100%.
Now take a look at another place (A): image (I ignore the cosmos to simplify it). Let's say we want to know how enclosed it is. In one interpretation, it is massively enclosed by trees. In another interpretation, trees are just a decorative detail and can be ignored. Let's add some more places for context: image. They are definitely more open than the initial place, so we should update towards more enclosed interpretation of (A). All interpretations should be correlated and "compatible". It's as if we're solving a puzzle.
You can say that properties of places are "expandable". Any place contains a seed of any possible property and that seed can be expanded by a context. "Very tall place" may mean Mt. Everest or a molehill depending on context. You can compare it to a fractal: every small piece of a fractal can be expanded into the entire thing. And I think it's also very similar to how human language, human concepts work.
You also may call it "amplification of evidence": any smallest piece of evidence (or even absence of any evidence) can be expanded into very strong evidence by context. We have a situation like in the Raven paradox, but even worse.
(I interpret paintings as "real" places.)
The places go from "box-like and enclosed" to "not box-like and open" in my ordering.
But to see this you need to look at the places in a certain way, reason about them in a certain way:
Almost any property of any specific place can be "illusory". But when you look at places in the context you can deduce their properties vie the process of elimination.
I made a post about different ways to learn human ethics. I argued that there should be something better than value learning. Learning "contracts" seems similar to learning "X statements" (a concept from my post). I want to know your opinion about this. (A person mentioned your work in a comment to the linked post, that's why I'm here.)
I can't discuss the math of learning contracts directly, but I would like to discuss possible properties of "contracts" or "hypotheses about contracts" that the AI system learns. (Inferring roles and norms)
Hello! I've heard you can ask people about your content in the open thread. Sorry if I'm asking too soon.
Could you help me to explain this ("Should AI learn human values, human norms or something else?") idea better? It's a 3 minute read.
I also would like to discuss with somebody those thought experiments. Not in a 100% formal way.
If you want to describe human values, you can use three fundamental types of statements (and mixes between the types). Maybe there're more types, but I know only those three:
Any of those types can describe unaligned values. So, any type of those statements still needs to be "charged" with values of humanity. I call a statement "true" if it's true for humans.
We need to find the statement type with the best properties. Then we need to (1) find a language for this type of statements (2) encode some true statements and/or describe a method of finding "true" statements. If we've succeeded we solved the Alignment problem.
I believe X statements have the best properties, but their existence is almost entirely ignored in Alignment field.
I want to show the difference between the statement types. Imagine we ask an Aligned AI: "if human asked you to make paperclips, would you kill the human? Why not?" Possible answers with different statement types:
X statements have those better properties compared to other statement types:
I can't define human values, but I believe values exist. The same way I believe X statements exist, even though I can't define them.
I think existence of X statements is even harder to deny than existence of value statements. (Do you want to deny that you can make statements about general properties of systems and tasks?) But you can try to deny their properties.
X statements are almost entirely ignored in the field (I believe), but not completely ignored.
Impact measures ("affecting the world too much is bad", "taking too much control is bad") are X statements. But they're a very specific subtype of X statements.
Normativity (by abramdemski) is a mix between value statements and X statements. But statements about normativity lack most of the good properties of X statements. They're too similar to value statements.
I'm not sure it would change much for me.
I may dislike my decisions or habits. But to understand if it's a "bug" or not I would need to have a near complete understanding of my own thinking. I don't think I have that. So I don't see any goal/gain of conceptualizing something as a "bug". If my behavior depends on genes or weather or anything else it's not relevant for me at the moment.