Gordon Seidoh Worley

If you are going to read just one thing I wrote, read The Problem of the Criterion.

More AI related stuff collected over at PAISRI


Advice to My Younger Self
Fundamental Uncertainty: A Book
Zen and Rationality
Formal Alignment
Map and Territory Cross-Posts
Phenomenological AI Alignment


To match this up with standard Less Wrong terminology and check if I'm understanding you, sounds like you're arguing that GPT-4 is an adaptation executor and it's executing adaptations it developed based on the incentives of its training and deployment, and we can reify this, just as we do for other adaptation executors like animals, into goals that they are oriented toward achieving.

For what it's worth, this is half of why I'm writing a book about epistemology. My initial goal was to, when it's done, do what I can to get it into the hands of AI researchers to nudge them in the direction of better understanding some important ideas in epistemology on the theory that this will lead to them being more cautions about how they build AI and more open to many rationalist ideas that I think are core to the project of AI safety.

My side goal, which LLMs have made more important, is to write things that will help AI understand epistemology better and hopefully be less likely to make naive mistakes (because they are the naive mistakes that most humans make).

I recently revisited Lob. I bounced off it a lot. It helped after I watched the video linked at the top of this post where Eliezer combines it with Godel theorem.

At some point AI becomes powerful enough that it's no longer economical to employ humans. That's important, but it's also something like the next phase after the phase we're entering with AI.

The phase we're entering now is one where AI will automate and transform work in ways that make humans more productive. It's a bit unclear how long this period will last. My guess is between 15 and 30 years, because it'll take about that long for us to grow the economy enough to be able to afford to build AI powerful enough to fully replace humans. This is an often overlooked concern: we don't build AI just because we can, but because it's economical to. We've seen similar patterns during industrialization: we don't build a factory or automate something until it becomes cheaper than just paying a human to do it in a bespoke way.

So eventually, yes, AI eats everything, but there's likely enough years before that when we'll have to live through a world thoroughly transformed by AIs working with humans as productivity multipliers.

@abramdemski Wanted to say thanks again for engaging with my posts and pointing me towards looking again at Lob. It's weird: now that I've taken so time to understand it, it's just what in my mind was already the thing going on with Godel, just I wasn't doing a great job of separating out what Godel proves and what the implications are. As presented on its own, Lob didn't seem that interesting to me so I kept bouncing off it as something worth looking at, but now I realize it's just the same thing I learned from GEB's presentation of Peano arithmetic and Godel when I read it 20+ years ago.

When I go back to make revisions to the book, I'll have to reconsider including Godel and Lob somehow in the text. I didn't because I felt like it was a bit complicated and I didn't really need to dig into it since I think there's already a bit too many cases where people use Godel to overreach and draw conclusions that aren't true, but it's another way to explain these ideas. I just have to think about if Godel and Lob are necessary: that is, do I need to appeal to them to make my key points, or are these things that are better left as additional topics I can point folks at but not key to understanding the intuitions I want them to develop.

My vague impression is that for a while the US did have something like starting under FDR, but it broke in the post-Nixon era when politicians stopped being able to collude as well.

I'm suspicious of the strength of the claim this company is making. I think it's more likely this is a publicity stunt.

First, there's the legal issues. As far as I know, no jurisdiction allows software to serve as the officer of a company, let alone as the CEO. So to any extent an AI is calling the shots, there's got to be humans in the loop for legal reasons.

Second, sort of unclear what this AI is doing. Sounds more like they just have some fancy analytics software and they're saying it's the CEO because they mostly do whatever their analytics say to do?

This would be a big deal if true, but seems like there's not enough in this article to think an AI is now the CEO of a company in a meaningful way beyond the way companies already rely heavily on data and analytics and ML-based analytics to make decisions.

Oh, oops, thank you! I can't believe I made that mistake. I'll update my comment. I thought the number seemed really low!

There's already a good answer to the question, but I'll add a note.

Different people value different things, and so are willing to expend different amounts of effort to achieve different ends. As a result, even rational agents may not all achieve the same ends because they care about different things.

Thus we can have two rational agents, A and B. A cares a lot about finding a mate and not much else. B cares a lot about making money and not much else. A will be willing to invest more effort into things like staying in shape to the extent that helps A find a mate. B will invest a lot less in staying in shape and more in other things to the extent that's the better tradeoff to make a lot of money.

Rationality doesn't prescribe the outcome, just some of the means. Yes, some outcomes are convergent for many concerns, so many agents end up having the same instrumental concerns even if they have different ultimate concerns (e.g. power seeking is a common instrumental goal), but without understanding what an agent cares about you can't judge how well they are succeeding since success must be measures against their goals.

So just to check, if we run the numbers, not counting non-human life or future lives, and rounding up a bit to an even 8 billion people alive today, if we assume for the sake of argument that each person has 30 QALYs left, that's 8b * 30 QALY at stake with doom, and a 0.01% chance of doom represents the loss of 24 million QALYs. Or if we just think in terms of people, that's the expected loss of 800 thousand people.

If we count future lives the number gets a lot bigger. If we conservatively guess at something like 100 trillion future lives throughout the history of the future universe with let's say 100mm QALYs each, that's 10^16 QALYs at stake.

But either way, since this is the threshold, you seem to think that, in expectation, less than 800,000 people will die from misaligned AI? Is that right? At what odds would you be willing to bet that less than 800,000 people die as a result of the development of advanced AI systems?

Load More