The trouble here is that deep disagreements aren't often symmetrically held with the same intensity. Consider the following situation:
Say we have Protag and Villain. Villain goes around torturing people and happens upon Protag's brother. Protag's brother is subsequently tortured and killed. Protag is unable to forgive Villain but Villain has nothing personal against Protag. Which of the following is the outcome?
The first case is sad but understandable here -- but also allows extremist purple-tribe members to veto non-extremist green-tribe members (where purple and green ideologies pertain to something silly like "how to play pool correctly"). The second case is perverse. The third case is just "violate people's preferences for retribution, but with extra steps."
So, silly question that doesn't really address the point of this post (this may very well be just a point of clarity thing but it would be useful for me to have an answer due to earning-to-give related reasons off-topic for this post) --
Here you claim that CDT is a generalization of decision-theories that includes TDT (fair enough!):
Here, "CDT" refers -- very broadly -- to using counterfactuals to evaluate expected value of actions. It need not mean physical-causal counterfactuals. In particular, TDT counts as "a CDT" in this sense.
But here you describe CDT as two-boxing in Newcomb, which conflicts with my understanding that TDT one-boxes coupled with your claim that TDT counts as a CDT:
For example, in Newcomb, CDT two-boxes, and agrees with EDT about the consequences of two-boxing. The disagreement is only about the value of the other action.
So is this conflict a matter of using the colloquial definition of CDT in the second quote but a broader one in the first, having a more general framework for what two-boxing is than my own, or knowing something about TDT that I don't?
Thanks! This is great.
A year ago, Joaquin Phoenix made headlines when he appeared on the red carpet at the Golden Globes wearing a tuxedeo with a paper bag over his head that read, "I am a shape-shifter. I can't change the world. I can only change myself."
-- GPT-3 generated news article humans found easiest to distinguish from the real deal.
... I haven't read the paper in detail but we may have done it; we may be on the verge of superhuman skill at absurdist comedy! That's not even completely a joke. Look at the sentence "I am a shape-shifter. I can't change the world. I can only change myself." It's successful (whether intended or not) wordplay. "I can't change the world. I can only change myself" is often used as a sort of moral truism (e.g. Man in the Mirror, Michael Jackson). In contrast, "I am a shape-shifter" is a literal claim about one's ability to change themselves.
The upshot is that GPT-3 can equivocate between the colloquial meaning of a phrase and the literal meaning of a phrase in a way that I think is clever. I haven't looked into whether the other GPTs did this (it makes sense that a statistical learner would pick up this kind of behavior) but dayum.
I propose that we ought to have less faith in our ability to control AI or its worldview and place more effort into making sure that potential AIs exist in a sociopolitical environment where it is to their benefit not to destroy us.
This is probably the crux of our disagreement. If an AI is indeed powerful enough to wrest power from humanity, the catastrophic convergence conjecture implies that it by default will. And if the AI is indeed powerful enough to wrest power from humanity, I have difficulty envisioning things we could offer it in trade that it couldn't just unilaterally satisfy for itself in a cheaper and more efficient manner.
As an intuition pump for this, I think that the AI-human power differential will be more similar to the human-animal differential than the company-human differential. In the latter case, the company actually relies on humans for continued support (something an AI that can roll-out human-level AI won't need to do at some point) and thus has to maintain a level of trust. In the former case, well... people don't really negotiate with animals at all.
Yeah I don't do it for mainly selfish reasons but I agree that there are a lot of benefits to separating arguments into multiple comments in terms of improving readability and structure. Frankly, I commend you for doing it (and I'm particularly amenable to it because I like bullet-points). With that said, here are some reasons you shouldn't take too seriously for why I don't:
Nice post! The moof scenario reminds me somewhat of Paul Christiano's slow take-off scenario which you might enjoy reading about. This is basically my stance as well.
AI boxing is actually very easy for Hardware Bound AI. You put the AI inside of an air-gapped firewall and make sure it doesn't have enough compute power to invent some novel form of transmission that isn't known to all of science. Since there is a considerable computational gap between useful AI and "all of science", you can do quite a bit with an AI in a box without worrying too much about it going rogue.
My major concern with AI boxing is the possibility that the AI might just convince people to let it out (ie remove the firewall, provide unbounded internet access, connect it to a Cloud). Maybe you can get around this by combining a limited AI output data stream with a very arduous gated process for letting the AI out in advance but I'm not very confident.
If the biggest threat from AI doesn't come from AI Foom, but rather from Chinese-owned AI with a hostile world-view.
The biggest threat from AI comes from AI-owned AI with a hostile worldview -- no matter whether how the AI gets created. If we can't answer the question "how do we make sure AIs do the things we want them to do when we can't tell them all the things they shouldn't do?", we might wind up with Something Very Smart scheming to take over the world while lacking at least one Important Human Value. Think Age of Em except the Ems aren't even human.
Advancing AI research is actually one of the best things you can do to ensure a "peaceful rise" of AI in the future. The sooner we discover the core algorithms behind intelligence, the more time we will have to prepare for the coming revolution. The worst-case scenario still is that some time in the mid 2030's a single research team comes up with a revolutionary new software that puts them miles ahead of anyone else. The more evenly distributed AI research is, the more mutually beneficial economic games will ensure the peaceful rise of AI.
Because I'm still worried about making sure AI is actually doing the things we want it to do, I'm worried that faster AI advancements will imperil this concern. Beyond that, I'm not really worried about economic dominance in the context of AI. Given a slow takeoff scenario, the economy will be booming like crazy wherever AI has been exercised to its technological capacities even before AGI emerges. In a world of abundant labor and so on, the need for mutually beneficial economic games with other human players, let alone countries, will be much less.
I'm a little worried about military dominance though -- since the country with the best military AI may leverage it to radically gain a geopolitical upper-hand. Still, we were able to handle nuclear weapons so we should probably be able to handle this to.
Admittedly the first time I read this I was confused because you went "When a bad thing happens to you, that has direct, obvious bad effects on you. But it also has secondary effects on your model of the world." This gave the sense that the issue was with the model of the world and not the world itself. This isn't what you meant but I made a list of reasons talking is a thing people do anyway:
Applying these systems to the kind of choices that I make in everyday life I can see all of them basically saying something like:...
The tricky thing with these kinds of ethical examples is that a bunch of selfish (read: amoral) people would totally take care of their bodies, be nice to they're in iterated games with, try to improve themselves in their professional lives, and seek long-term relationship value. The only unambiguously selfless thing on that list in my opinion is donating -- and that tends to kick the question of ethics down the road to the matter of who you are donating to. This differs in different ethical philosophies.
In any case, the takeaway from this is that people's definitions of what they ought to do are deeply entangled with the things that they would want to do. I think this is why many of the ethical systems you're describing make similar suggestions. But, once you start to think about actions you might not actually be comfortable doing -- many ethical systems make nontrivial claims.
Not every ethical system says you may lie if it makes people feel better. Not every ethical system says you shouldn't eat meat. Not every ethical system says you should invest in science. Not every ethical system says you should pray. Not every ethical system says you should seek out lucrative employment purely to donate the money.
These non-trivial claims matter. Because in some cases, they correspond to the highest leverage ethical actions a person could possibly take -- eclipsing the relevance of ordinary day-to-day actions entirely.
There are easy ways to being a better moral agent, but to do that, you should probably maximize the time you spend taking care of yourself, taking care of others, volunteering, or working towards important issues… rather than reading Kant.
I agree with this though. If you want to do ethical things... just go ahead and do them. If it wasn't something you cared about before you read about moral imperatives, its unlikely to start being something you care about after.
Nah. Based on my interaction with humans who work from home, most aren't really that invested in the whole "support the paperclip factories" thing -- as evidenced by their willingness to chill out now that they're away from offices and can do it without being yelled at (sorry humans! forgive me for revealing your secrets!). Nearly half of Americans live paycheck to paycheck so (on the margin), Covid19 is absolutely catastrophic for the financial well-being (read: self-agency) of many people which propagates into the long-term via wage scarring. It's completely understandable that they're freaking out.
Also note that many of the people objecting to being forced to stay home are older. They might not be as at-risk as old/infirm people but they're still at serious risk anyway. I'd frankly do quite a bit to avoid getting coronavirus if I could and I'm young. If you're in dire enough straits to risk getting coronavirus for employment, you're probably doing it because you need to -- certainly not because of any abstract concerns about paperclip factories.
That being said, there are totally a bunch of people who are acting like our paperclip-making capabilities outweigh the importance of old and infirm humans. They aren't most humans but they exist. They're called Moloch's Army and a bunch of the other humans really are working on figuring how to talk about them in public. Beware though, the protestors you are thinking of might not be the droids you're looking for.