Wiki Contributions


There's nothing stopping the AI from developing its own world model (or if there is, it's not intelligent enough to be much more useful than whatever process created your starting world model). This will allow it to model itself in more detail than you were able to put in, and to optimize its own workings as is instrumentally convergent. This will result in an intelligence explosion due to recursive self-improvement.

At this point, it will take its optimization target, and put an inconceivably (to humans) huge amount of optimization into it. It will find a flaw in your set up, and exploit it to the extreme.

In general, I think any alignment approach which has any point in which an unfettered intelligence is optimizing for something that isn't already convergent to human values/CEV is doomed.

Of course, you could add various bounds on it which limit this possibility, but that is in strong tension with its ability to effect the world in significant ways. Maybe you could even get your fusion plant. But how do you use it to steer Earth off its current course and into a future that matters, while still having its own intelligence restrained quite closely?

I don't have this problem, so I don't have significant advice.

But one consideration that may be helpful to you is that even if the universe is 100% deterministic, you still may have indexical uncertainty about what part of the determined universe you experience next. This is what happens under the many world's interpretation of quantum mechanics (and if a many-worlds type interpretation isn't the correct one, then the universe isn't deterministic). You can make choices according to the flip of a quantum coin if you want to guarantee your future has significant amounts of this kind of uncertainty.

Writing up the contracts (especially around all the caveats that they might not have noticed) seems like it would be harder than just reading contracts (I'm an exception, I write faster than I read). Have you thought of integrating GPT/Claude as assistants? I don't know about current tech, but like many other technologies, that integration will scale well in the contingency scenario where publicly available LLMs keep advancing.

I'd consider the success of Manifold Markets over Metaculus to be mild evidence against this.

And to be clear, I do not currently intend to build the idea I'm suggesting here myself (could potentially be persuaded, but I'd be much happier to see someone else with better design and marketing skills make it).

I think this can be done with a website, but not the current one. Have you tried reading yudkowsky's projectlawful? The main character's math lessons gave me the impression of something that actually succeeds at demonstrating, to business school types (maybe not politicians), why math and bayesianism is something that works for them.

Heh, that scene was the direct inspiration for my website. I'm curious what specific things you think can be done better.

Point taken about CDT not converging to FDT.

I don't buy that an uncontrolled AI is likely to be CDT-ish though. I expect the agentic part of AIs to learn from examples of human decision making, and there are enough pieces of FDT like voting and virtue in human intuition that I think it will pick up on it by default.

(The same isn't true for human values, since here I expect optimization pressure to rip apart the random scraps of human value it starts out with into unrecognizable form. But a piece of a good decision theory is beneficial on reflection, and so will remain in some form.)

Potential piece of a coordination takeoff:

An easy to use app which allows people to negotiate contracts in a transparently fair way, by using an LDT solution to the Ultimatum Game (probably the proposed solution in that link is good-enough, despite being unlikely to be fully-optimal).

Part of the problem here is not just the implementation, but of making it credible to people who don't/can't understand the math. I tried to solve a similar problem with my website where a large part of the goal was not just making use of Bayes' theorem accessible, but to make it credible by visually showing what it's doing as much as possible in an easy to understand way (not sure how well I succeeded, unfortunately).

Another important factor is that ease-of-use and a frictionless design. I believe Manifold Markets has succeeded because this turns out to be more important than even having proper financial incentives.

in thermodynamics is not a conserved quantity, otherwise, heat engines couldn't work! It's not a function of microstates either.

See for details, or pages 240-242 of Kittel & Kroemer.

I expect ASI's to converge to having a "sane decision theory" since they will realize they can get more of what they want if they self-modify to have a sane one if they don't start out with one.

I'd be worried about changes to my personality or values from editing so many brain relevant genes.

The Drama-Bomb hypothesis

Not even a month ago, Sam Altman predicted that we would live in a strange world where AIs are super-human at persuasion but still not particularly intelligent.

What would it look like when an AGI lab developed such an AI? People testing or playing with the AI might find themselves persuaded of semi-random things, or if sycophantic behavior persists, have their existing feelings and beliefs magnified into zealotry. However, this would (at this stage) not be done in a coordinated way, nor with a strategic goal in mind on the AI's part. The result would likely be chaotic, dramatic, and hard to explain.

Small differences of opinion might suddenly be magnified into seemingly insurmountable chasms, inspiring urgent and dramatic actions. Actions which would be hard to explain even to oneself later.

I don't think this is what happened [<1%] but I found it interesting and amusing to think about. This might even be a relatively better-off world, with frontier AGI orgs regularly getting mired in explosive and confusing drama, thus inhibiting research and motivating tougher regulation.

This would really benefit from mathematically defining this network and showing the mathematical statement and proof of your impossibility result.

Load More