Daniel Kokotajlo

Daniel Kokotajlo's Comments

Comment on Coherence arguments do not imply goal directed behavior

This theory of goal-directedness has the virtue of being closely tied to what we care about:

--If a system is goal-directed according to this definition, then (probably) it is the sort of thing that might behave as if it has convergent instrumental goals. It might, for example, deceive us and then turn on us later. Whereas if a system is not goal-directed according to this definition, then absent further information we have no reason to expect those behaviors.

--Obviously we want to model things efficiently. So we are independently interested in what the most efficient way to model something is. So this definition doesn't make us go that far out of our way to compute, so to speak.

On the other hand, I think this definition is not completely satisfying, because it doesn't help much with the most important questions:

--Given a proposal for an AGI architecture, is it the sort of thing that might deceive us and then turn on us later? Your definition answers: "Well, is it the sort of thing that can be most efficiently modelled as an EU-maximizer? If yes, then yes, if no, then no." The problem with this answer is that trying to see whether or not we can model the system as an EU-maximizer involves calculating out the system's behavior and comparing it to what an EU-maximizer (worse, to a range of EU-maximizers with various relatively simple or salient utility and credence functions) would do, and if we are doing that we can probably just answer the will-it-deceive-us question directly. Alternatively perhaps we could look at the structure of the system--the architecture--and say "see, this here is similar to the EU-max algorithm." But if we are doing that, then again, maybe we don't need this extra step in the middle; maybe we can jump straight from looking at the structure of the system to inferring whether or not it will act like it has convergent instrumental goals.



Why aren't assurance contracts widely used?

Oh right, I forgot, the $1 incentive gives people an ulterior motive for signing. :/ OK, so this is part of the answer to my original question--I had not noticed that fact and thus overestimated their usefulness.

Strategic implications of AIs' ability to coordinate at low cost, for example by merging

I wonder also if the conflicts that remain are nevertheless more peaceful. When hunter-gatherer tribes fight each other, they often murder all the men and enslave the women, or so I hear. Similar things happened with farmer societies sometimes, but also sometimes they just become new territories and have to pay tribute and levy conscripts and endure the occasional pillage. And then industrialized modern nations even have rules about how you can't rape and pillage and genocide and sell into slavery the citizens of your enemy. Perhaps AI conflicts would be even more peaceful. For example, perhaps they would look something more like fancy maneuvers, propaganda, and hacking, with swift capitulation by the "checkmated" AI, which is nevertheless allowed to continue existing with some smaller amount of influence over the future. Perhaps no property would even be destroyed in the entire war!

Just spitballing here. I feel much less confident in this trend than in the trend I pointed out above.

Why aren't assurance contracts widely used?

But that doesn't seem like a big cost to me. It seems that other methods of solving coordination problems have similarly high or even higher costs--e.g. campaign to raise awareness to get people to vote for legislation to solve the problem... Think of how many petitions there are on Change.org and how many signatures they regularly get. Now imagine that you got paid $1 on average for each one that you signed. People would be making shittons of money just by logging into change.org and browsing through proposals. Until, that is, a large portion of the population starts regularly doing this... then the money flow shrinks but change starts happening!

Yes it's moving the cost of failure to the person sponsoring the contract, but I think for many of these problems there should be people with enough money and altruism willing to take the risk. E.g. political campaigns regularly spend comparable sums. And like you perhaps hint at with the game theory point, it's different when the risk is all on one person--because it means we can be much more confident that the contract will trigger, conditional on someone taking the risk to fund it, and thus the risk is actually much smaller.

Why aren't assurance contracts widely used?

The first point you make doesn't apply to dominant assurance contracts, which pay signers in the case where not enough people sign. I don't know of any real-world instance of dominant assurance contracts being used, but boy do they seem like they would be super effective. Imagine during the 2016 election: "Sign this petition if you want Michelle Obama to be president! If at least 100million people sign, you promise to vote for her. Otherwise, you'll get a $1 gift card to Target." Note that even in the unlikely event that this gets 99 million signatures, it would cost the organizer an order of magnitude less than Clinton spent on her campaign. More likely it would either get ~5 million signatures (because Michelle just isn't as popular as the organizer thought) or >100million.

Petitions and indiegogo campaigns aren't dominant assurance contracts as far as I know. I agree that there is a cost to get people to understand them, but that's true for all sorts of complicated financial instruments like mortgages which we have no problem with.


Why aren't assurance contracts widely used?

I think voting for third-party candidates would be significantly improved by assurance contracts. Ditto for marches & rallies, and things like the Free State Project. (Imagine how much of a fail the FSP would have been if they used a more traditional method.) And I think maybe also kickstarter stuff? IDK, maybe this disagreement comes down to a disagreement about the meaning of "significantly."

"Fully" acausal trade

I think I'd prefer calling it "acausal trade vs. pre-causal acausal trade" because it seems that the underlying phenomenon is exactly the same in both cases, it's the circumstances surrounding that are different. But this is just a minor terminological quibble.

How common is it for one entity to have a 3+ year technological lead on its nearest competitor?

Yeah, me too. Well, I won't exactly have done a full lit review by the time the blog post comes out... my post is mostly about other things. So don't get your hopes up too high. A good idea for future work though... maybe we can put it on AI Impacts' todo list.

Strategic implications of AIs' ability to coordinate at low cost, for example by merging

I very much agree. Historical analogies:

To a tiger, human hunter-gatherers must be frustrating and bewildering in their ability to coordinate. "What the hell? Why are they all pouncing on me when I jumped on the little one? The little one is already dead anyway, and they are risking their own lives now for nothing! Dammit, gotta run!"

To a tribe of hunter-gatherers, farmers must be frustrating and bewildering in their ability to coordinate. "What the hell? We pillaged and slew that one village real good, they sure didn't have enough warriors left over to chase us down... why are the neighboring villages coming after us? And what's this--they have professional soldiers with fancy equipment riding horses? Somehow hundreds--no, thousands--of farmers cooperated over a period of several years to make this punitive expedition possible! How were we to know they would go to such lengths?"

To the nations colonized by the Europeans, it must have been pretty interesting how the Europeans were so busy fighting each other constantly, yet somehow managed to more or less peacefully divide up Africa, Asia, South America, etc. to be colonized between them. Take the Opium Wars and the Boxer Rebellion for example. I could imagine a Hansonian prophet in a native american tribe saying something like "Whatever laws the European nations use to keep the peace among themselves, we will benefit from them also; we'll register as a nation, sign treaties and alliances, and rely on the same balance of power." He would have been disastrously wrong.

I expect something similar to happen with us humans and AGI, if there are multiple AGI. "What? They all have different architectures and objectives, not to mention different users and owners... we even explicitly told them to compete with each other! Why are they doing X.... noooooooo...." (Perhaps they are competing with each other furiously, even fighting each other. Yet somehow they'll find a way to cut us out of whatever deal they reach, just as European powers so often did for their various native allies.)

Why aren't assurance contracts widely used?

Voting for third-party candidates. Organizing marches and rallies. Things like the Free State Project (why aren't lots of other subcultures and political factions doing that?) Sweet parties at my house.

Now that I think more about it, clubs/churches do this sort of thing all the time informally, e.g. survey the crowd and ask how many people would come to the event if it were held, and then hold the event iff at least x people say they would come, with social disapproval being the punishment for people who say they would come and then don't.

And of course, the sort of things Kickstarter funds. So I guess that's part of my answer right there.

Load More