All of cousin_it's Comments + Replies

If nothing else, I expect mildly-superhuman sales and advertising will be enough to ensure that the human share of the universe will decrease over time. And I expect the laws will continue being at least mildly influenced by deep pockets, to keep at least some such avenues possible. If you imagine a hard lock on these and other such things, well that seems unrealistic to me.

2ryan_greenblatt16h
I'm just trying to claim that this is possible in principle. I'm not particularly trying to argue this is realistic. I'm just trying to argue something like "If we gave out property right to the entire universe and backchained from ensuring the reasonable enforcement of these property rights and actually did a good job on enforcement, things would be fine." This implicitly requires handling violations of property rights (roughly speaking) like: * War/coups/revolution/conquest * Super-persuasion and more mundane concerns of influence I don't know how to scalably handle AI revolution without ensuring a property basically as strong as alignment, but that seems orthogonal. We also want to handle "AI monopolies" and "insufficient AI competition resulting in dead weight loss (or even just AIs eating more of the surplus than is necessary)". But, we at least in theory can backchain from handling this to what intervertions are needed in practice.

Like I said, people who have been outcompeted won't keep owning a lot of property for long. Even if that property is equity in AI companies, something or other will happen to make them lose it. (A very convincing AI-written offer of stock buyback, for example.)

Even if you have long term preferences, bold of you to assume that these preferences will stay stable in a world with AIs. I expect an AI, being smarter than a human, can just talk you into signing away the stuff you care about. It'll be like money-naive people vs loan sharks, times 1000.

2ryan_greenblatt16h
I think this is just a special case of more direct harms/theft? Like imagine that some humans developed the ability to mind control others, this can probably be handled via the normal system of laws etc. The situation gets more confusing as the AIs are more continuous with more mundane persuasion (that we currently allow in our society). But, I still think you can build a broadly liberal society which handles super-persuasion.

Maybe an even better analogy is non-Euclidean geometry. Agent foundations is studying a strange alternate world where agents know the source code to themselves and the universe, where perfect predictors exist and so on. It's not an abstraction of our world, but something quite different. But surprisingly it turns out that many aspects of decision-making in our world have counterparts in the alternate world, and in doing so we shed a strange light on what decision-making in our world actually means.

I'm not even sure these investigations should be tied to AI... (read more)

3Alex_Altair13h
I just want to flag that this is very much not a defining characteristic of agent foundations! Some work in agent foundations will make assumptions like this, some won't -- I consider it a major goal of agent foundations to come up with theories that do not rely on assumptions like this. (Or maybe you just meant those as examples?)
8Alexander Gietelink Oldenziel1d
Modelling always requires idealisation. Currently, in many respects the formal models that Agent Foundations use to capture the informal notion of agency, intention, goal etc are highly idealised. This is not an intrinsic feature of Agent Foundations or mathematical modelling- just a reflection of the inadequate mathematical and conceptual state of the world. By analogy - intro to Newtonian Mechanics begins with frictionless surfaces and the highly simple orbits of planetary systems. That doesn't mean that Newtonian Mechanics in more sophisticated forms cannot be applied to the real world. One can get lost in the ethereal beauty of ideal worlds. That should not detract from the ultimate aim of mathematical modelling of the real world.

They think (correctly) that AI will take away many jobs, and that AI companies care only about money and aren't doing anything to prevent or mitigate job loss.

You think AIs won't be able to offer humans some deals that are appealing in the short term but lead to AIs owning everything in the long term? Humans offer such deals to other humans all the time and libertarianism doesn't object much.

2ryan_greenblatt20h
Why is this a problem? People who are interested in the long run can buy these property rights while people who don't care can sell them. If AIs respect these property rights[1] but systematically care more about the long run future, then so be it. I expect that in practice some people will explicitly care about the future (e.g. me) and also some people will want to preserve option value. ---------------------------------------- 1. Or we ensure they obey these property rights, e.g. with alignment. ↩︎

No, just regular people.

3Linch2d
Interesting! That does align better with the survey data than what I see on e.g. Twitter.
3Liam Donovan2d
What do they have against AI? Seems like the impact on regular people has been pretty minimal.  Also, if GPT4 level technology ws allowed to fully mature and diffuse to a wide audience without increasing in base capability, it seems like the impact on everyone would be hugely beneficial

OpenAI currently creates a massive amount of value for humanity and by default should be defended tooth and nail.

Interesting how perspectives differ on this. From what I see around me, if tomorrow a lightning from God destroyed all AI technology, there'd be singing and dancing in the streets.

3Wei Dai16h
Just saw a poll result that's consistent with this.
7Linch2d
Out of curiosity, is "around you" a rationalist-y crowd, or a different one?

People who have been outcompeted won't keep owning a lot of property for long. Something or other will happen to make them lose it. Maybe some individuals will find ways to stay afloat, but as a class, no.

2jmh21h
Does any of this discussion (both branches from your first comment)change if one starts with the assuming that AIs are actually owned, and can be bought, by humans? Owned directly but some and indirectly by others via equity in AI companies.

A happy path: merge with the AIs?

I think if you're serious about preserving human value into the future, you shouldn't start with something like "let's everyone install AI-brain connectors and then we'll start preserving human value, pinky promise". Instead you should start by looking at human lives and values as they exist now, and adopt a standard of "first do no harm".

I think AI risk insurance might be incompatible with libertarianism. Consider the "rising tide" scenario, where AIs gradually get better than humans at everything, gradually take all jobs and outbid us for all resources to use in their economy, leaving us with nothing. According to libertarianism this is ok, you just got outcompeted, mate. And if you band together with other losers and try to extract payment from superior workers who are about to outcompete you, well, then you're clearly a villain according to libertarianism. Even if these "superior workers" are machines that will build a future devoid of anything good or human. It makes complete sense to fight against that, but it requires a better theory than libertarianism.

2ryan_greenblatt2d
I think there isn't an issue as long as you ensure property rights for the entire universe now. Like if every human is randomly assigned a silver of the universe (and then can trade accordingly), then I think the rising tide situation can be handled reasonably. We'd need to ensuring that AIs as a class can't get away with violating our existing property rights to the universe, but the situation is analogous to other rights. This is a bit of an insane notion of property rights and randomly giving a chunk to every currently living human is pretty arbitrary, but I think everything works fine if we ensure these rights now.
1Christopher King2d
Human labor becomes worthless but you can still get returns from investments. For example, if you have land, you should rent the land to the AGI instead of selling it.

This seems quite plausible actually. Even without the objective morality angle, a morally nice AI could imagine a morally nice world that can only be achieved by having humans not exist. (For example, a world of beautiful and smart butterflies that are immune to game theory, but their existence requires game-theory-abiding agents like us to not exist, because our long-range vibrations threaten the tranquility of the matrix or something.) And maybe the argument is genuinely so right that most humans upon hearing it would agree to not exist, something like collectively sacrificing ourselves for our collective children. I have no idea how to deal with this possibility.

7Said Achmiz5d
This describes an argument that is persuasive; your described scenario does not require the argument to be right. (Indeed my view is that the argument would obviously be wrong, as it would be arguing for a false conclusion.)

I think these are all still pretty bad. For example if there are human uploads but no stronger AI, that will lead to horrors ("data can't defend itself" - Sam Hughes). If there are biological superbrains, same. Look at what we humans did with our intelligence, we've introduced a horror (factory farms and fish farms) that surpasses anything in nature. Going forward we must take two steps in morality for every step in capability, otherwise horrors will increase proportionally.

The physical world with all its problems is kind of a marvel, in that it allows a d... (read more)

This is like saying "I imagine there will be countries that renounce firearms". There aren't such countries. They got eaten by countries that use firearms. The social order of the whole world is now kept by firearms.

The same will happen with AI, if it's as much a game changer as firearms.

1Gesild Muka1mo
I think I understand, we're discussing with different scales in mind. I'm saying individually (or if your community is a small local group) nothing has to end but if your interests and identity are tied to sizeable institutions, technical communities etc. many will be disrupted by AI to the point where they could fade away completely. Maybe I'm just an unrealistic optimist, I don't believe collective or individual meaning has to fade away just because the most interesting and cutting edge work is done exclusively by machines.

Yeah. It also worries me that a lot of value change has happened in the past, and much of it has been caused by selfish powerful actors for their ends rather than ours. The most obvious example is jingoism, which is often fanned up by power-hungry leaders. A more subtle example is valuing career or professional success. The part of it that isn't explained by money or the joy of the work itself, seems to be an irrational desire installed into us by employers.

1Nora_Ammann23d
Agree! Examples abound. You can never escape your local ideological context - you can only try to find processes that have some hope at occasionally pumping into the bounds of your current ideology and press beyond it - no reliably receipt (just like there is no reliably receipt to make yourself notice your own blind spot) - but there is the hope for things that in expectation and intertemporally can help us with this.  Which poses a new problem (or clarifies the problem we're facing): we don't get to answer the question of value change legitimacy in a theoretical vacuum -- instead we are already historically embedded in a collective value change trajectory, affecting both what we value but also what we (can) know.  I think that makes it sound a bit hopeless from one perspective, but on the other hand, we probably also shouldn't let hypothetical worlds we could never have reached weight us down -- there are many hypothetical worlds we still can reach that it is worth fighting for.

Then it seems to me that judging the agent's purity of intentions is also a deep problem. At least for humans it is. For example, a revolutionary may only want to overthrow the unjust hierarchy, but then succeed and end up in power. So they didn't consciously try to gain power, but maybe evolution gave them some behaviors that happen to gain power, without the agent explicitly encoding "the desire for power" at any level.

2Daniel Kokotajlo1mo
I think this is not so big of a problem, if we have the assumed level of mechinterp. 

I'm no longer sure the problem makes sense. Imagine an AI whose on-goal is to make money for you, and whose off-goal is to do nothing in particular. Imagine you turn it on, and it influences the government to pay a monthly stipend to people running money-making AIs, including you. By that action, is the AI making money for you in a legitimate way? Or is it bribing you to keep it running and avoid pressing the shutdown button? How do you even answer a question like that?

4Daniel Kokotajlo1mo
If we had great mechinterp, I'd answer the question by looking into the mind of the AI and seeing whether or not it considered the "this will reduce the probability of the shutdown button being pressed" possibility in its reasoning (or some similar thing), and if so, whether it considered it a pro, a con, or a neutral side-effect.

The thesis seems wrong to me. Horror games meet the same kind of need as horror movies do, and horror movies aren't a power fantasy. Puzzle games meet the same kind of need that puzzle toys do, and I never heard of anyone calling them power fantasies. Esport games meet the same kind of need as sports do, and sports doesn't work as a power fantasy for most people. Adventure games meet the same kind of need as listening to a good campfire story, which isn't always a power fantasy either. And then there are driving simulators where you just drive across the U... (read more)

2Raemon2mo
I do think he’s explicitly arguing that ‘power fantasy’ applies to more things than you generally think of. And he’s specifically arguing that games which don’t give you some kind of feeling of mastery aren’t actually fun enough to sink dozens+ hours into, which has implications on what sort of product you’re building. ( also I think of the things you list, sports seems pretty explicitly a power fantasy to me in a way that exactly maps onto the things we normally think of re: ‘adolescent power fantasy’, and feels like an actively good intuition pump for ‘yeah it’s useful to notice the expansionist definition of power fantasy applies’) That said I notice I am pretty fucking confused about what’s going on with Trucking simulators.

Hah. And even if the king had a computer that could simulate how the golem would react to the suicide order, even that wouldn't help the king, if the golem followed updateless decision theory.

I don't fully understand Vanessa's approach yet.

About caring about other TDT agents, it feels to me like the kind of thing that should follow from the right decision theory. Here's one idea. Imagine you're a TDT agent that has just been started / woken up. You haven't yet observed anything about the world, and haven't yet observed your utility function either - it's written in a sealed envelope in front of you. Well, you have a choice: take a peek at your utility function and at the world, or use this moment of ignorance to precommit to cooperate with ever... (read more)

I think you can make it more symmetrical by imagining two groups that can both coordinate within themselves (like TDT), but each group cares only about its own welfare and not the other group's. And then the larger group will choose to cooperate and the smaller one will choose to defect. Both groups are doing as well as they can for themselves, the game just favors those whose values extend to a smaller group.

2Wei Dai3mo
I think I kind of get what you're saying, but it doesn't seem right to model TDT as caring about all other TDT agents, as they would exploit other TDT agents if they could do so without negative consequences to themselves, e.g., if a TDT AI was in a one-shot game where they unilaterally decide whether to attack and take over another TDT AI or not. Maybe you could argue that the TDT agent would refrain from doing this because of considerations like its decision to attack being correlated with other AIs' decisions to potentially attack it in other situations/universes, but that's still not the same as caring for other TDT agents. I mean the chain of reasoning/computation you would go through in the two cases seem very different. Also it's not clear to me what implications your idea has even if it was correct, like what does it suggest about what the right decision theory is? BTW do you have any thoughts on Vanessa Kosoy's decision theory ideas?

About 2TDT-1CDT. If two groups are mixed into a PD tournament, and each group can decide on a strategy beforehand that maximizes that group's average score, and one group is much smaller than the other, then that smaller group will get a higher average score. So you could say that members of the larger group are "handicapped" by caring about the larger group, not by having a particular decision theory. And it doesn't show reflective inconsistency either: for an individual member of a larger group, switching to selfishness would make the larger group worse ... (read more)

6Wei Dai3mo
But the situation isn't symmetrical, meaning if you reversed the setup to have 2 CDT agents and 1 TDT agent, the TDT agent doesn't do better than the CDT agents, so it does seem like the puzzle has something to do with decision theory, and is not just about smaller vs larger groups? (Sorry, I may be missing your point.)

Administered by the state, of course. Open air prison where you can choose where to live, when to go to bed and wake up, what to eat, who to work with and so on, would feel a lot less constraining to the spirit than the prisons we have now.

I think that's the key factor to me. It's a bit hard to define. A punishment should punish, but not constrain the spirit. For example, a physical ball and chain (though it looks old-fashioned and barbaric) seems like an okay punishment to me, because it's very clear that it only limits the body. The spirit stays free, yo... (read more)

Just my opinions.

  1. How an anarchist society can work without police. To me the example of Makhno's movement shows that it can work if most people are armed and willing to keep order, without delegating that task to anyone. (In this case they were armed because they were coming out of a world war.) Once people start saying "eh, I'm peaceful, I'll delegate the task of keeping order to someone else", you eventually end up with police.

  2. Is police inherently bad. I think no, it depends mostly on what kind of laws it's enforcing and how fairly. Traffic laws, a

... (read more)
2dr_s3mo
About 2: I think there are certain typical ideological malaises that are a risk for professionalised police and military forces regardless of the specifics of their job. "We are the only bastion against chaos, civvies can only afford to live peacefully because we make the hard calls, they should be more grateful and shut up" etc. Every category can develop an inflated sense of self importance, but of course if your job is to wield a monopoly on violence that's much more dangerous. In addition, there's the typical "if all you have is a hammer every problem looks like a nail" issue. But it's not like community policing would be free from its own malaises, so whichever way one goes, control mechanisms have to be in place. I doubt there's some inherently stable system not at risk from people getting a little crazy, especially when violence is involved.
3dr_s3mo
The problem is that crowds are not known to be the coolest minds. When people talk about "community policing" they should acknowledge they have a position more akin to the NRA's on guns: "we think it is really important to keep this power in the hands of the people and we think a few more dead are a price worth paying for it". The idea that community policing won't ever result in injustice, accidental panicked shootings or lynching is nonsense. There are issues with professionalisation, but people who think that you can just have the best of both worlds by achieving some kind of grand societal enlightenment are deluded (and I expect would actually end up at the head of the lynch mobs, because they usually are the ones who lack self awareness the most and can't see how they could be wrong).
9Sinclair Chen3mo
I'm a fan of corporal punishment as an alternative to prison for most crimes

Exile can either be administered by the state, in which case it's an open air prison, or by emergent criminal gangs. Forced labor emerges either way I think.

Maybe one example is the idea of Dutch book. It comes originally from real world situations (sport betting and so on) and then we apply it to rationality in the abstract.

Or another example, much older, is how Socrates used analogy. It was one of his favorite tools I think. When talking about some confusing thing, he'd draw an analogy with something closer to experience. For example, "Is the nature of virtue different for men and for women?" - "Well, the nature of strength isn't that much different between men and women, likewise the nature of health, so maybe virtue works the same way." Obviously this way of reasoning can easily go wrong, but I think it's also pretty indicative of how people do philosophy.

I don't say it's not risky. The question is more, what's the difference between doing philosophy and other intellectual tasks.

Here's one way to look at it that just occurred to me. In domains with feedback, like science or just doing real world stuff in general, we learn some heuristics. Then we try to apply these heuristics to the stuff of our mind, and sometimes it works but more often it fails. And then doing good philosophy means having a good set of heuristics from outside of philosophy, and good instincts when to apply them or not. And some luck, in ... (read more)

4Wei Dai3mo
Do you have any examples that could illustrate your theory? It doesn't seem to fit my own experience. I became interested in Bayesian probability, universal prior, Tegmark multiverse, and anthropic reasoning during college, and started thinking about decision theory and ideas that ultimately led to UDT, but what heuristics could I have been applying, learned from what "domains with feedback"? Maybe I used a heuristic like "computer science is cool, lets try to apply it to philosophical problems" but if the heuristics are this coarse grained, it doesn't seem like the idea can explain how detailed philosophical reasoning happens, or be used to ensure AI philosophical competence?

I'm pretty much with you on this. But it's hard to find a workable attack on the problem.

One question though, do you think philosophical reasoning is very different from other intelligence tasks? If we keep stumbling into LLM type things which are competent at a surprisingly wide range of tasks, do you expect that they'll be worse at philosophy than at other tasks?

7Wei Dai3mo
I'm not sure but I do think it's very risky to depend on LLMs to be good at philosophy by default. Some of my thoughts on this: * Humans do a lot of bad philosophy and often can't recognize good philosophy. (See popularity of two-boxing among professional philosophers.) Even if a LLM has learned how to do good philosophy, how will users or AI developers know how to prompt it to elicit that capability (e.g., which philosophers to emulate)? (It's possible that even solving metaphilosophy doesn't help enough with this, if many people can't recognize the solution as correct, but there's at least a chance that the solution does look obviously correct to many people, especially if there's not already wrong solutions to compete with). * What if it learns how to do good philosophy during pre-training, but RLHF trains that away in favor of optimizing arguments to look good to the user. * What if philosophy is just intrinsically hard for ML in general (I gave an argument for why ML might have trouble learning philosophy from humans in the section Replicate the trajectory with ML? of Some Thoughts on Metaphilosophy, but I'm not sure how strong it is) or maybe it's just some specific LLM architecture that has trouble with this, and we never figure this out because the AI is good at finding arguments that look good to humans? * Or maybe we do figure out that AI is worse at philosophy than other tasks, after it has been built, but it's too late to do anything with that knowledge (because who is going to tell the investors that they've lost their money because we don't want to differentially decelerate philosophical progress by deploying the AI).

This could even be inverted. I've seen many people claim they were more romantically successful when they were poor, jobless, ill, psychologically unstable, on drugs and so on. Have experienced something like that myself as well. My best explanation is that such things make you come across as more real and exciting in some way. Because most people at most times are boring as hell.

That suggests the possibility to get into some kind of hardship on purpose, to gain more "reality". But I'm not sure you can push yourself on purpose into as much genuine panic and desperation as it takes. You'd stop yourself earlier.

Some years ago I made a version of it that works on formulas in provability logic. That logic is decidable, so you can go ahead and code it, and it'll solve any decision problem converted into such formulas. The same approach can deal with observations and probability (but can't deal with other agents or any kind of logical probability). You could say it's a bit tautological though: once you've agreed to convert decision problems into such formulas, you've gone most of the way, and FDT is the only answer that works at all.

2shminux3mo
Interesting! It seems like something like that should be a canonical reference for "let's enter a problem" e.g. smoking lesion, then "select decision theory", and out pops the answer. Of course, formalizing the problem seems like the hard part.

Whether or not fish suffer, it seems people are mostly ok with the existence of "suffering farms" as long as they're out of sight. It's just a nasty fact about people. And once you allow yourself to notice suffering, you start noticing that it's everywhere. There's a good book "Pilgrim at Tinker Creek" where the author lives in a cabin in the woods to be close to nature, and then gradually starts to notice how insects are killing each other in horrible ways all the time, under every leaf, for millions of years.

And so, whether or not fish suffer, maybe I sh... (read more)

I think the biggest problem with this idea is that, when you summarize a historical situation leading up to a certain event, that information has already been filtered and colored by historians after the event (for example if they were historians for the winning side in a war). It may be very different from what most contemporaries knew or felt at the time.

I think even a relatively strong AI will choose to takeover quickly and accept large chance of failure. Because the moment the AI appears is, ipso facto, the moment other AIs can also appear somewhere else on the internet. So waiting will likely lead to another AI taking over. Acting early with a 15% chance of getting caught (say) might be preferable to that.

Huh? It seems to me that in the deductive version the student will still, every day, find proofs that the exam is on all days.

Wait, but you can't just talk about compensating content creators without looking on the other side of the picture. Imagine a business that sells some not-very-good product at too-high price. They pay Google for clever ad targeting, and find some willing buyers (who end up dissatisfied). So the existence of such businesses is a net negative to the world, and is enabled by ad targeting. And this might not be an edge case: depending on who you ask, most online ads might be for stuff you'd regret buying.

If the AI can rewrite its own code, it can replace itself with a no-op program, right? Or even if it can't, maybe it can choose/commit to do nothing. So this approach hinges on what counts as "shutdown" to the AI.

I don't know if we have enough expertise in psychology to give such advice correctly, or if such expertise even exists today. But for me personally, it was important to realize that anger is a sign of weakness. I should have a lot of strength and courage, but minimize signs of anger or any kind of wild lashing out. It feels like the best way to carry myself, both in friendly arguments, and in actual conflicts.

2Richard_Ngo6mo
Curious if you feel like the advice I gave would have also helped: I think that "anger is a sign of weakness" is directionally correct for some people but that "minimize signs of anger" is the wrong long-term goal. (I do agree that minimizing wild lashing out is a good goal though.)

Yeah, it would have to be at least 3 individuals mating. And there would be some weird dynamics: the individual that feels less fit than the partners would have a weaker incentive to mate, because its genes would be less likely to continue. Then the other partners would have to offer some bribe, maybe take on more parental investment. Then maybe some individuals would pretend to be less fit, to receive the bribe. It's tricky to think about, maybe it's already researched somewhere?

Cochran had a post saying if you take a bunch of different genomes and make a new one by choosing the majority allele at each locus, you might end up creating a person smarter/healthier/etc than anyone who ever lived, because most of the bad alleles would be gone. But to me it seems a bit weird, because if the algorithm is so simple and the benefit is so huge, why hasn't nature found it?

But to me it seems a bit weird, because if the algorithm is so simple and the benefit is so huge, why hasn't nature found it?

How is nature supposed to gather statistical data about the population to determine what the majority allele is?

5Kaj_Sotala6mo
Hmm, two individuals of a species mating obviously couldn't compare their genomes with other representatives of the species and take the modal allele. But many species, especially plants, do carry more than two copies of each chromosome (e.g. black mulberry apparently has 44 copies of each gene). How difficult would it be to evolve a process that compared the alleles on each chromosome that the individual carried and picked the modal one for producing gametes? Intuitively it feels to me like it'd be hard for biology to do/evolve and that it'd require something more like a computer, but I haven't studied biology much so I don't expect my intuition to be very predictive. That Wikipedia article for polyploidy also didn't mention any research to have found polyploidy to have such a function.
7kman6mo
Mildly deleterious mutations take a long time to get selected out, so you end up with an equilibrium where a small fraction of organisms have them. Genetic load is a relevant concept.

Coming back to this idea again after a long time, I recently heard a funny argument against morality-based vegetarianism: no animal ever showed the slightest moral scruple against eating humans, so why is it wrong for us to eat animals? I go back and forth on whether this "Stirnerian view" makes sense or not.

Here's a debate protocol that I'd like to try. Both participants independently write statements of up to 10K words and send them to each other at the same time. (This can be done through an intermediary, to make sure both statements are sent before either is received.) Then they take a day to revise their statements, fixing the uncovered weak points and preemptively attacking the other's weak points, and send them to each other again. This continues for multiple rounds, until both participants feel they have expressed their position well and don't need to ... (read more)

I think ideas like Nash equilibrium get their importance from predictive power: do they correctly predict what will happen in the real world situation which is modeled by the game. For example, the biological situations that settle on game-theoretic equilibria even though the "players" aren't thinking at all.

In your particular game, saying "Nash equilibrium" doesn't really narrow down what will happen, as there are equilibria for all temperatures from 30 to 99.3. The 99 equilibrium in particular seems pretty brittle: if Alice breaks it unilaterally on roun... (read more)

6Jacob Watts7mo
This seems to be phrased like a disagreement, but I think you're mostly saying things that are addressed in the original post. It is totally fair to say that things wouldn't go down like this if you stuck 100 actual prisoners or mathematicians or whatever into this scenario. I don't believe OP was trying to claim that it would. The point is just that sometimes bad equilibria can form from everyone following simple, seemingly innocuous rules. It is a faithful execution of certain simple strategic approaches, but it is a bad strategy in situations like this because it fails to account for things like modeling the preferences/behavior of other agents. To address your scenario: Ya, sure this could happen "in real life", but the important part is that this solution assumes that Alice breaking the equilibrium on round 1 is evidence that she'll break it on round 2. This is exactly why the character Rowan asks: and it is yields the response that  This is followed by discussion of how we might add mathematical elements to account for predicting the behavior of other agents.  Humans predict the behavior of other agents automatically and would not be likely to get stuck in this particular bad equilibrium. That said, I still think this is an interesting toy example because it's kind of similar to some bad equilibria which humans DO get stuck in (see these comments for example). It would be interesting to learn more about the mathematics and try to pinpoint what makes these failure modes more/less likely to occur.

I don't see any group of people on LW running around criticizing every new idea. Most criticism on LW is civil, and most of it is helpful at least in part. And the small proportion that isn't helpful at all, is still useful to me as a test: can I stop myself from overreacting to it?

3the gears to ascension8mo
hi, I do that! I try to do it nicely, because I do it on purpose with an aim to help people feel challenged but welcome. I'm happy to also make a habit of criticizing bad criticism :D criticize me criticizing!

Civility >>> incivility, but it is insufficient to make criticism useful and net positive.

There is a LOT wrong with the below; please no one mistake this for unnuanced endorsement of the comic or its message; I'm willing to be more specific on request about which parts I think are good versus which are bad or reinforcing various confusions. But I find this is useful for gesturing in the direction of a dynamic that feels very familiar on LW:

Wondermark Comics on Twitter: ""Pardon me, I couldn't help ...

I think orthogonality and instrumental convergence are mostly arguments for why the singleton scenario is scary. And in my experience, the singleton scenario is the biggest sticking point when talking with people who are skeptical of AI risk. One alternative is to talk about the rising tide scenario: no single AI taking over everything, but AIs just grow in economic and military importance across the board while still sharing some human values and participating in the human economy. That leads to a world of basically AI corporations which are too strong fo... (read more)

1JavierCC8mo
What would be an example of a value that is clearly 'non-human'? AI power being used for 'random stuff' by the AIs' volition? 

If AI-induced change leads to enough concentration of economic and military power that most people become economically and militarily irrelevant, I don't expect democracy to last long. One way or another, the distribution of political power will shift toward the actual distribution of economic and military power.

1Ppxl8mo
This is what I believe as well. The post-AI economy will look absolutely nothing like what we have now. It's not something you can achieve via policy changes. There are way too many vested interested and institutions we dont know how to ever get rid of peacefully. 

That's one way to look at it, though I wouldn't put the blame on capitalists only. Workers will also prefer to buy goods and services produced with the help of AI, because it's cheaper. If workers could get over their self-interest and buy only certified AI-free goods and services, the whole problem would stop tomorrow, with all AI companies going out of business. Well, workers won't get over their self-interest; and neither will capitalists.

1dr_s8mo
Well, Moloch does as Moloch wants. But honestly I still tend to place more blame on the people who in smaller numbers kick the process in motion than on the people who simply respond to incentives while dealing with a vastly larger coordination problem in conditions of greater scarcity. The smaller the group and the more their abundance, the easier it is to choose to run against Moloch, and the greater the responsibility if you go along anyway.

I think there's no need for secrecy. If AI can develop a datacenter maintained by robots or other similar tech, human companies will be happy to buy and sell it, and help with the parts the AI can't yet do. Think of it as a "rising tide" scenario, where the robot sector of the economy outgrows the human sector. Money translates to power, as the robot sector becomes the highest bidder for security services, media influence, lobbying etc. When there comes a need to displace humans from some land and resources, it might look to humans less like a war and more like a powerful landlord pushing them out, with few ways to organize and push back. Similar to enclosures in early-modern England.

2dr_s8mo
Capitalists just kicking workers out of the process step by step, then finding out at the very last minute that they have outlived their usefulness to the Machine God.

I think if it happens, it'll help shift policy because it'll be a strong argument in policy discussions. "Look, many researchers aren't just making worried noises about safety but taking this major action."

Hm, pushing a bus full of kids towards a 10% chance of precipice is also pretty harsh. Though I agree we should applaud those who decline to do it.

3Bucky8mo
Agreed, intended to distinguish between the weak claim “you should stop pushing the bus” and the stronger “there’s no game theoretic angle which encourages you to keep pushing”.

Yeah, it's not the kind of strike whose purpose is to get concessions from employers. Though I guess the thing in Atlas Shrugged was also called a "strike" and it seems similar in spirit to this.

Load More