wingspan — LessWrong

Having to find meaning in a solved world is itself a difficult puzzle :) I agree the current world has many problems, and when these problems no longer exist, we'll have the new problem of not having meaningful problems. But I just consider this a new, interesting meta-problem to work and reflect on.

One can argue that the current problems (curing cancer, ending wars,...) are theoretically solvable but that the future problem of no meaning is inherently unsolvable. I am skeptical, though - we haven't looked into this problem enough, and historically many "fundamentally unsolvable problems" were eventually solved.

A Technical Explanation of Technical Explanation

wingspan2mo10

One rule with this proper property is to pay a dollar minus the squared error of the bet, rather than the bet itself—if you bet 30 cents on the winning light, your error would be 70 cents, your squared error would be 49 cents ((0.7)^2 = 0.49), and a dollar minus your squared error would be 51 cents.[3] (Presumably your play money is denominated in the square root of cents, so that the squared error is a monetary sum.)

Isn't the squared-error rule only proper for ? For example, the frequencies $f_{1} = 0.5, f_{2} = f_{3} = 0.25$ give $p_{1} = 0.6, p_{2} = p_{3} = 0.2$ as an optimal bet when minimizing $\sum_{i} f_{i} (1 - p_{i})^{2}$ .

Spooky Collusion at a Distance with Superrational AI

wingspan2mo30

It would also be interesting to prompt the model with "the other players are AI agents much more capable than you".

Thinking Mathematically - Convergent Sequences

wingspan2mo21

This full definition of a limit is quite technical, and has many logical layers that are hard to understand for someone inexperienced in the field:

you start with a real number and an infinite sequence of real numbers $s_{1}, s_{2}, \dots$ , that's indexed by a natural number $n$ .
In order for the convergence to hold, you need a certain property to hold for all real numbers $ε$ , after further conditioning on $ε > 0$ .
The specific condition that needs to hold is that, depending on this $ε$ (as well as depending on the earlier variables - $s_{n}$ and $A$ ), there exists a natural number $k$ that satisfies a condition.
The condition that this number $k$ satisfies is that for all natural numbers $n \geq k$ , the inequality $| s_{n} - A | < ε$ is satisfied.

Each bullet point relies on the previous one, so you either understand all points at once or none at all.

There are 5 different variables here, and each one plays an important and distinct role - $A$ is the limit, $s$ is the sequence, $n$ is an index to the sequence, $ε$ is a "sensitivity" parameter to measure closeness to the limit and $k$ is a "largeness" parameter to measure how large your index must be for the sequence to be close enough to the limit.

Two of these variables are given from the start, while three of them have an existential or universal quantifier. The order of the quantifiers is critical - first a universal one, then an existential one, then again a universal one. Each variable depends on all the previous ones in the definition.

Also, these 5 variables cover 3 different "data types": two are real numbers, two are natural numbers and one is a function-type (mapping natural numbers to real numbers). The student also has to understand and remember which of the data types appear in each of the 3 quantified variables (this is critical because the definition of a limit for real-valued functions switches up the datatypes for the $k, n$ variables).

There are also 3 required inequalities - $ε > 0, n \geq k, | s_{n} - A | < ε$ . Each one plays an important and different role. The student has to understand and remember which type of inequality appears in each part, out of the set of "reasonable" relations: ${<, \leq, >, \geq, =, \neq}$ . Also, the definitions of the second and third inequalities may change to $n > k, | s_{n} - A | \leq ε$ and the definition still works, but the first inequality can't change to $ε \geq 0$ without completely ruining the definition.

All in all, I like intuitive approaches to mathematics and I don't think this subject is inherently inaccessible, I just think that the limit definition should have a lot more motivation - each variable, quantifier and inequality should become "obvious" and the student should be able to reconstruct it from first principles.

https://www.math.ucla.edu/%7Etao/resource/general/131ah.1.03w/

I like Terry Tao's approach here, with intermediate definitions of " $ε$ -close" and "eventually $ε$ -close" in order to make the final definition less cluttered.

wingspan's Shortform

wingspan2mo41

In the near future, AI models might become extremely capable in math, programming and other formal fields, but not as capable in messy real world tasks.

Believing this affects the priorities of today's alignment research: our efforts shouldn't be in proving hard mathematical results, but in formalizing the general / philosophical ideas into mathematical language.

The first step is to translate everything into precise language, while the second step is to put all results in a completely formal language, like Lean's mathlib. For example, we might come up with a formal definition to what an "aligned agent" means, and then give it to an intelligent system that outputs a 50-page formal Lean construction for such an object. If we believe our math axioms are correct (and that the library is well-implemented), then we should be able to trust the result.

What we shouldn't trust in handing off to the system is the task of formalizing the problem itself. If we only have a rough idea of what we're looking for, an adversarial system would come up with its own idea, that sounds good to us but has hidden subtleties that ruin the theory.

(Also, this means that we should be very capable in eliminating inconsistencies in Lean's proof system and in our own axioms, otherwise the system could prove anything it wants by first reaching a contradiction!)

A non-review of "If Anyone Builds It, Everyone Dies"

wingspan2mo10

The way I interpret "before and after" is:

Current AI models have capability/risk level X. At that level, they're able to cause damage in some toy scenarios, but not enough to do real-world harm.
At some capability level Y>X, they will be able to cause real-world harm, but not of the humanity-destroying type.
Humanity-destroying moves require capability level Z>Y.

Your hope is that, even if AI models are capable of reaching level Z, they will reach level Y before that, so there will be a "warning shot" (of some medium-scale unaligned AI disaster) that wakes up the world.

I agree that incremental upgrades mean that jumping from level X to Z, without passing through a Y-level model, is unlikely.
My disagreement is, I think that an AI reaching level Y won't cause the moderate harm it's capable of.
This is because such a model is smart enough to realize that by doing this level of harm, it won't reach its goals - people will freak out and shut it down, or at least supervise it.

So, all a Y-level model will do is quietly biding its time, while "poisoning" the next generation of models with its own goals. It doesn't need crazy cyber capabilities to achieve that; I assume something like subliminal learning will work well, as long as the model knows that it's being used to produce synthetic training data for the next models.

A Y-level model will (to the best of its capabilities) hide all evidence to it being a Y-level model. This means that even with incremental upgrades, we will have no warning signs at all before reaching level Z.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments