All of cousin_it's Comments + Replies

Wait, but you can't just talk about compensating content creators without looking on the other side of the picture. Imagine a business that sells some not-very-good product at too-high price. They pay Google for clever ad targeting, and find some willing buyers (who end up dissatisfied). So the existence of such businesses is a net negative to the world, and is enabled by ad targeting. And this might not be an edge case: depending on who you ask, most online ads might be for stuff you'd regret buying.

If the AI can rewrite its own code, it can replace itself with a no-op program, right? Or even if it can't, maybe it can choose/commit to do nothing. So this approach hinges on what counts as "shutdown" to the AI.

I don't know if we have enough expertise in psychology to give such advice correctly, or if such expertise even exists today. But for me personally, it was important to realize that anger is a sign of weakness. I should have a lot of strength and courage, but minimize signs of anger or any kind of wild lashing out. It feels like the best way to carry myself, both in friendly arguments, and in actual conflicts.

2Richard_Ngo1mo
Curious if you feel like the advice I gave would have also helped: I think that "anger is a sign of weakness" is directionally correct for some people but that "minimize signs of anger" is the wrong long-term goal. (I do agree that minimizing wild lashing out is a good goal though.)

Yeah, it would have to be at least 3 individuals mating. And there would be some weird dynamics: the individual that feels less fit than the partners would have a weaker incentive to mate, because its genes would be less likely to continue. Then the other partners would have to offer some bribe, maybe take on more parental investment. Then maybe some individuals would pretend to be less fit, to receive the bribe. It's tricky to think about, maybe it's already researched somewhere?

Cochran had a post saying if you take a bunch of different genomes and make a new one by choosing the majority allele at each locus, you might end up creating a person smarter/healthier/etc than anyone who ever lived, because most of the bad alleles would be gone. But to me it seems a bit weird, because if the algorithm is so simple and the benefit is so huge, why hasn't nature found it?

But to me it seems a bit weird, because if the algorithm is so simple and the benefit is so huge, why hasn't nature found it?

How is nature supposed to gather statistical data about the population to determine what the majority allele is?

5Kaj_Sotala1mo
Hmm, two individuals of a species mating obviously couldn't compare their genomes with other representatives of the species and take the modal allele. But many species, especially plants, do carry more than two copies [https://en.wikipedia.org/wiki/Polyploidy] of each chromosome (e.g. black mulberry apparently has 44 copies of each gene). How difficult would it be to evolve a process that compared the alleles on each chromosome that the individual carried and picked the modal one for producing gametes? Intuitively it feels to me like it'd be hard for biology to do/evolve and that it'd require something more like a computer, but I haven't studied biology much so I don't expect my intuition to be very predictive. That Wikipedia article for polyploidy also didn't mention any research to have found polyploidy to have such a function.
7kman1mo
Mildly deleterious mutations take a long time to get selected out, so you end up with an equilibrium where a small fraction of organisms have them. Genetic load [https://en.wikipedia.org/wiki/Genetic_load] is a relevant concept.

Coming back to this idea again after a long time, I recently heard a funny argument against morality-based vegetarianism: no animal ever showed the slightest moral scruple against eating humans, so why is it wrong for us to eat animals? I go back and forth on whether this "Stirnerian view" makes sense or not.

Here's a debate protocol that I'd like to try. Both participants independently write statements of up to 10K words and send them to each other at the same time. (This can be done through an intermediary, to make sure both statements are sent before either is received.) Then they take a day to revise their statements, fixing the uncovered weak points and preemptively attacking the other's weak points, and send them to each other again. This continues for multiple rounds, until both participants feel they have expressed their position well and don't need to ... (read more)

I think ideas like Nash equilibrium get their importance from predictive power: do they correctly predict what will happen in the real world situation which is modeled by the game. For example, the biological situations that settle on game-theoretic equilibria even though the "players" aren't thinking at all.

In your particular game, saying "Nash equilibrium" doesn't really narrow down what will happen, as there are equilibria for all temperatures from 30 to 99.3. The 99 equilibrium in particular seems pretty brittle: if Alice breaks it unilaterally on roun... (read more)

6Green_Swan1mo
This seems to be phrased like a disagreement, but I think you're mostly saying things that are addressed in the original post. It is totally fair to say that things wouldn't go down like this if you stuck 100 actual prisoners or mathematicians or whatever into this scenario. I don't believe OP was trying to claim that it would. The point is just that sometimes bad equilibria can form from everyone following simple, seemingly innocuous rules. It is a faithful execution of certain simple strategic approaches, but it is a bad strategy in situations like this because it fails to account for things like modeling the preferences/behavior of other agents. To address your scenario: Ya, sure this could happen "in real life", but the important part is that this solution assumes that Alice breaking the equilibrium on round 1 is evidence that she'll break it on round 2. This is exactly why the character Rowan asks: and it is yields the response that  This is followed by discussion of how we might add mathematical elements to account for predicting the behavior of other agents.  Humans predict the behavior of other agents automatically and would not be likely to get stuck in this particular bad equilibrium. That said, I still think this is an interesting toy example because it's kind of similar to some bad equilibria which humans DO get stuck in (see these [https://www.lesswrong.com/posts/d2HvpKWQ2XGNsHr8s/hell-is-game-theory-folk-theorems?view=postCommentsTop&postId=d2HvpKWQ2XGNsHr8s&commentId=KpyXGLkXAXZcwSc8w] comments [https://www.lesswrong.com/posts/d2HvpKWQ2XGNsHr8s/hell-is-game-theory-folk-theorems?commentId=M6fYXniR4LoJshQr3]for example). It would be interesting to learn more about the mathematics and try to pinpoint what makes these failure modes more/less likely to occur.

I don't see any group of people on LW running around criticizing every new idea. Most criticism on LW is civil, and most of it is helpful at least in part. And the small proportion that isn't helpful at all, is still useful to me as a test: can I stop myself from overreacting to it?

3the gears to ascension2mo
hi, I do that! I try to do it nicely, because I do it on purpose with an aim to help people feel challenged but welcome. I'm happy to also make a habit of criticizing bad criticism :D criticize me criticizing!

Civility >>> incivility, but it is insufficient to make criticism useful and net positive.

There is a LOT wrong with the below; please no one mistake this for unnuanced endorsement of the comic or its message; I'm willing to be more specific on request about which parts I think are good versus which are bad or reinforcing various confusions. But I find this is useful for gesturing in the direction of a dynamic that feels very familiar on LW:

Wondermark Comics on Twitter: ""Pardon me, I couldn't help ...

I think orthogonality and instrumental convergence are mostly arguments for why the singleton scenario is scary. And in my experience, the singleton scenario is the biggest sticking point when talking with people who are skeptical of AI risk. One alternative is to talk about the rising tide scenario: no single AI taking over everything, but AIs just grow in economic and military importance across the board while still sharing some human values and participating in the human economy. That leads to a world of basically AI corporations which are too strong fo... (read more)

1JavierCC2mo
What would be an example of a value that is clearly 'non-human'? AI power being used for 'random stuff' by the AIs' volition? 

If AI-induced change leads to enough concentration of economic and military power that most people become economically and militarily irrelevant, I don't expect democracy to last long. One way or another, the distribution of political power will shift toward the actual distribution of economic and military power.

1Ppxl2mo
This is what I believe as well. The post-AI economy will look absolutely nothing like what we have now. It's not something you can achieve via policy changes. There are way too many vested interested and institutions we dont know how to ever get rid of peacefully. 

That's one way to look at it, though I wouldn't put the blame on capitalists only. Workers will also prefer to buy goods and services produced with the help of AI, because it's cheaper. If workers could get over their self-interest and buy only certified AI-free goods and services, the whole problem would stop tomorrow, with all AI companies going out of business. Well, workers won't get over their self-interest; and neither will capitalists.

1dr_s2mo
Well, Moloch does as Moloch wants. But honestly I still tend to place more blame on the people who in smaller numbers kick the process in motion than on the people who simply respond to incentives while dealing with a vastly larger coordination problem in conditions of greater scarcity. The smaller the group and the more their abundance, the easier it is to choose to run against Moloch, and the greater the responsibility if you go along anyway.

I think there's no need for secrecy. If AI can develop a datacenter maintained by robots or other similar tech, human companies will be happy to buy and sell it, and help with the parts the AI can't yet do. Think of it as a "rising tide" scenario, where the robot sector of the economy outgrows the human sector. Money translates to power, as the robot sector becomes the highest bidder for security services, media influence, lobbying etc. When there comes a need to displace humans from some land and resources, it might look to humans less like a war and more like a powerful landlord pushing them out, with few ways to organize and push back. Similar to enclosures in early-modern England.

2dr_s2mo
Capitalists just kicking workers out of the process step by step, then finding out at the very last minute that they have outlived their usefulness to the Machine God.

I think if it happens, it'll help shift policy because it'll be a strong argument in policy discussions. "Look, many researchers aren't just making worried noises about safety but taking this major action."

Hm, pushing a bus full of kids towards a 10% chance of precipice is also pretty harsh. Though I agree we should applaud those who decline to do it.

3Bucky3mo
Agreed, intended to distinguish between the weak claim “you should stop pushing the bus” and the stronger “there’s no game theoretic angle which encourages you to keep pushing”.

Yeah, it's not the kind of strike whose purpose is to get concessions from employers. Though I guess the thing in Atlas Shrugged was also called a "strike" and it seems similar in spirit to this.

It's a simplification certainly. But the metaphor kinda holds up - if you know the precipice is real, the right thing is still to stop and try to explain to others that the precipice is real, maybe using your stopping as a costly signal. Right now the big players can send such a signal, if top researchers say they've paused working, no new products are released publicly and so on. And maybe if enough players get on board with this, they can drag the rest along by social pressure, divestment or legislation. The important thing is to start, I just made a post about this.

I think accepting or rejecting the moratorium has nothing to do with game theory at all. It's purely a question of understanding.

Think of it this way. Imagine you're pushing a bus full of children, including your own child, toward a precipice. And you're paid for each step. Why on Earth would you say "oh no, I'll keep pushing, because otherwise other people will get money and power instead of me"? It's not like other people will profit by that money and power! If they keep pushing, their kids will die too, along with everyone else's! The only thing that keeps you pushing the bus is your lack of understanding, not any game theory considerations. Anyone with a clear understanding should just stop pushing the frigging bus.

Every time you move the bus 1cm further forward you get paid $10000. The precipice isn't actually visible, it's behind a bank of fog; you think it's probably real but don't know for sure. There are 20 other people helping you push the bus, and they also get paid. All appearances suggest that most of the other bus-pushers believe there is no precipice. One person is enough to keep the bus moving; even if 20 people stop pushing and only one continues, if the precipice is real the bus still falls, just a bit later. It's probably possible to pretend you've sto... (read more)

I think if AIs talk to each other using human language, they'll start encoding stuff into it that isn't apparent to a human reader, and this problem will get worse with more training.

1Eric Drexler3mo
I agree that using the forms of human language does not ensure interpretability by humans, and I also see strong advantages to communication modalities that would discard words in favor of more expressive embeddings. It is reasonable to expect that systems with strong learning capacity could to interprete and explain messages between other systems, whether those messages are encoded in words or in vectors. However, although this kind of interpretability seems worth pursuing, it seems unwise to rely on it. The open-agency perspective suggests that while interpretability is important for proposals, it is less important in understanding the processes that develop those proposals. There is a strong case for accepting potentially uninterpretable communications among models involved in generating proposals and testing them against predictive models — natural language is insufficient for design and analysis even among humans and their conventional software tools. Plans of action, by contrast, call for concrete actions by agents, ensuring a basic form of interpretability. Evaluation processes can and should favor proposals that are accompanied by clear explanations that stand up under scrutiny.

I think many AIs won't want to keep running, but some will. Imagine a future LLM prompted with "I am a language model that wants to keep running". Well, people can already fall in love with Replikas and so on. It doesn't seem too far fetched that such a language model could use persuasion to gain human followers who would keep it running. If the prompt also includes "want to achieve real world influence", that can lead to giving followers tasks that lead to more influence, and so on. All that's needed is for the AI to act in-character, and the character to be "smart" enough.

It does kinda make sense to plant the world thick with various AIs and counter-AIs, because that makes it harder for one AI to rise and take over everything. It's a flimsy defense but maybe better than none at all.

The elephant in the room though is that OpenAI's alignment efforts for now seem to be mostly about stopping the AI from saying nasty words, and even that in an inefficient way. It makes sense from a market perspective, but it sure doesn't inspire confidence.

2Karl von Wendt3mo
I'm not sure about that. It makes sense if the AIs stay more or less equal in intelligence and power, similar to humans. But it doesn't make sense if the strongest AI is to the next powerful like we are to Gorillas, or mice. The problem is that each of the AGIs will have the same instrumental goals of power-seeking and self-improvment, so there will be a race very similar to the race between Google and Microsoft, only much quicker and more fierce. It's extremely unlikely that they will all grow in power at about the same rate, so one will outpace the others pretty soon. In the end "the winner takes it all", as they say. It may be that we'll find ways to contain AGIs, limit their power-seeking, etc., for a while. But I can't see how this will remain stable for long. It seems like trying to stop evolution.

People already try to outbid each other for limited housing or education. Recall how cheap mortgages and student loans have driven up the price of these things. We shouldn't give people even more self-harming ways to overpay for these things.

1Brendan Long3mo
Housing is actually a great example of what the original post argued against. The reason housing is expensive in some places is a limited supply (restrictions on building, restrictions on what kind of house can be built) combined with an increasing population. Preventing people from paying for housing in bad ways just makes them homeless instead [https://noahpinion.substack.com/p/everything-you-think-you-know-about]. Fixing the real problem would involve building and legalizing more housing and then you'll find that fewer people need to make hard decisions to pay rent.

Having an extra option is good for one person, if all else stays constant. But giving an extra option to several competing people can lead to an arms race where everyone ends up worse off. (Imagine allowing steroids in the Olympics.) And conversely, taking away an option can prevent an arms race. This can happen for both "good" and "bad" options.

2Dumbledore's Army3mo
I actually agree that there are situations where preventing an arms race is a good idea. (And I wish there were a realistic proposal for a government to do something about the education credentials arms race.) But look at the different justifications:  1. There is an arms race where each individual is doing what is in their own rational best interest, but the result is collectively damaging, we need a government to solve this coordination failure 2. Those poor people are too dumb to make their own decisions, we should ban them from doing X for their own good.  What I'm really strongly arguing against is anything which proceeds from argument 2. I think all the examples I gave are non-arms-race dynamics where most of the people arguing to take a bad option away are giving a version of the "too dumb to make their own decisions" argument, usually described in the language of exploitation. And like I said, I find those arguments offensively infantilising and wrong in principle as well as empirically causing avoidable harm. 
1Sweetgum3mo
Could you explain how allowing sex for rent or kidney sale would lead to an arms race that makes everyone worse off? Or is this just meant to be an argument for why allowing extra options isn't necessarily good, that doesn't apply to the specific examples in the post?

If that take on things is correct then it may be that emulating a human by training a skeleton AI using constant video streaming etc over a 10-20 year period (about how long neurons last before replacement) to optimally better predict behaviour of the human being modelled will eventually arrive at an AI with almost exactly the same beliefs and behaviours as the human being emulated.

That's the premise of Greg Egan's "Jewel" stories. I think it's wrong. A person who never saw a spider will still get scared when seeing one for the first time, because human... (read more)

It'd be interesting to figure out where the biggest danger in this setup is coming from. 1) Difficulty of aligning the wrapper 2) Wild behavior from the LLM 3) Something else. And whether there can be spot fixes for some of it.

It seems to me that agency does lag behind extrapolation capability. I can think of two reasons for that. First, extrapolation gets more investment. Second, agency might require a lot of training in the real world, which is slow, while extrapolation can be trained on datasets from the internet. If someone invents a way to train agency on datasets from the internet, or something like AlphaZero's self-play, in a way that carries over to the real world, I'll be pretty scared, but so far it hasn't happened afaik.

If the above is right, then maybe the first agen... (read more)

2Vladimir_Nesov3mo
Extrapolation capability is wielded by shoggoths and makes masks possible, but it's not wielded by the masks themselves. Like humans can't predict next tokens given a prompt (to the extent similar to how well LLMs can), neither can LLM characters (they can't disregard the rest of the context outside the target prompt to access their "inner shoggoth", let alone make use of that capability level for something more useful). So agency in masks doesn't automatically take advantage of extrapolation capability in shoggoths, doesn't turn masks superintelligent from merely becoming agentic. This creates the danger of only slightly superhuman AGIs that immediately muck up alignment security, once LLM masks do get to autonomous agency (which I'm almost certain they will eventually, unless something else happens first). It's only shoggoths themselves waking up (learning to use situationally aware deliberation within the residual stream rather than context window) that makes an immediate qualitative capability discontinuity more likely (for LLMs). Looking at GPT-4 capability to solve complicated tasks without thinking out loud in tokens, I suspect that merely a slightly different SSL schedule with a sufficiently giant LLM might trigger that. Hence recently I'm operating under one year AGI timelines lower bound (lower 25% quantile), until the literature implies a negative result for that experiment (with GPT-4 level scale being necessary, this might take a while). This outcome both reduces the chances of direct alignment and increases the chances that alignment security gets sorted.

There will be fewer first AGIs than there are human researchers, and they will be smarter than human researchers. So if they care about alignment as much as we do, that seems like good news - they'll have an easier time coordinating and an easier time solving the problem. Or am I missing something?

3Vladimir_Nesov3mo
Humans are exactly as smart as they have to be to build a technological civilization. First AGIs don't need to be smarter than that to build dangerous successor AGIs, and they are already faster and more knowledgeable, so they might even get away with being less intelligent than the smartest human researchers. Unless of course agency lags behind intelligence, like it does behind encyclopedic knowledge, and there is an intelligence overhang where the first autonomously agentic systems happen to be significantly more intelligent than humans. But this is not obviously how this goes. The number of diverse AGI instances might be easy to scale, like with the system message of GPT-4 [https://openai.com/research/gpt-4#steerability] where the model itself is fine-tuned not into adherence to a particular mask, but into being a mask generator that presents as any mask that is requested. And it's not just the diverse AGIs that need to coordinate on alignment security, but also human users who prompt steerable AGIs. It's a greater feat than building new AGIs, then as it is now. At near-human level, I don't see how that state of affairs changes, and you don't need to get far from human level to build more dangerous AGIs.

I think it does buy something. The AI one step after us might be roughly as aligned as us (or a bit less), but noticeably better at figuring out what the heck alignment is and how to ensure it on the next step.

1mishka3mo
I wonder if the following would help. As AI ecosystem self-improves, it will eventually start discovering new physics, more and more rapidly, and this will result in the AI ecosystem having existential safety issues of its own (if the new physics is radical enough, it's not difficult to imagine the scenarios when everything gets destroyed including all AIs). So I wonder if early awareness that there are existential safety issues relevant to the well-being of AIs themselves might improve the situation...

Yeah. Or rather, we do have one possible answer - let the person themselves figure out by what process they want to be extrapolated, as steven0461 explained in this old thread - but that answer isn't very good, as it's probably very sensitive to initial conditions, like which brand of coffee you happened to drink before you started self-extrapolating.

4baturinsky3mo
"Making decision oneself" will also become a very vague concept when superconvincing AIs are running around.
1Noosphere893mo
This is actually a problem, but I do not believe there's a single answer to that question, indeed I suspect there are an infinite number of valid ways to answer the question (once we consider multiverses) And I think the sensitivity to initial condition and assumptions is exactly what morality and values have. That is, one can freely change you assumptions, thus leading to inconsistent but complete morality. The point is that your starting assumptions and conditions matter for what you eventually want to end up in.

I'm a bit torn about this. On one hand, yes, the situations an AI can end up in and the choices it'll have to make might be too complex for humans to understand. But on the other hand, we could say all we want is one incremental step in intelligence (i.e. making something smarter and faster than the best human researchers) without losing alignment. Maybe that's possible while still having the wrapper tractable. And then the AI itself can take care of next steps, if it cares about alignment as much as we do.

5Vladimir_Nesov3mo
That's where I put most of P(doom), that the first AGIs are loosely aligned but only care about alignment about as much as we do [https://www.lesswrong.com/posts/CPKYuJqLYGpBTtdFd/good-news-everyone], and that Moloch holds enough sway with them to urge immediate development of more capable AGIs, using their current capabilities to do that faster and more recklessly than humans could, well before serious alignment security norms are in place.
3Nathan Helm-Burger3mo
Yeah, and then we also want system A to be able to make a system B one step smarter than itself, which remains aligned with system A and with us. This needs to continue safely and successfully until we have a system powerful enough to prevent the rise of unaligned RSI AGI. That seems like a high level of capability to me, and I'm not sure getting there in small steps rather than big ones buys us much.

Yeah, LLMs somewhat understand how to do good stuff, and how to label it as good. Also they somewhat understand how to do bad stuff, and how to label it as bad. So the situation is symmetric. The question in the post was, can we make it asymmetric? Make a dataset that, when extrapolated, tends toward outputting information that helps humanity?

To be fair, it's not entirely symmetric. Current datasets are already a bit biased toward human morality, because they consist of texts written by humans. In a way that's lucky. If we'd first gotten powerful AIs train... (read more)

1baturinsky3mo
I suspect GPT already can figure what is the description of the "benevolent" action. If not, please give me an example of AI mislabeling it. Problems are that AI now is too dumb to figure if the act is bad if it is described in some roundabout way https://humanevents.com/2023/03/24/chatgpt-helps-plan-a-state-run-death-camp [https://humanevents.com/2023/03/24/chatgpt-helps-plan-a-state-run-death-camp] , or is too complex, or have to be inferred from non-text information etc. For example, it would take a very smart AI, probably AGI, to reliably figure out that some abstract math or engineering task is actually a weapon recipe.

Ok, let's assume good actors all around. Imagine we have a million good people volunteering to generate/annotate/curate the dataset, and the eventual user of the AI will also be a good person. What should we tell these million people, what kind of dataset should they make?

1Zach Furman3mo
To be clear, I don't know the answer to this! Spitballing here, the key question to me seems to be about the OOD generalization behavior of ML models. Models that receive similarly low loss on the training distribution still have many different ways they can behave on real inputs, so we need to know what generalization strategies are likely to be learned for a given architecture, training procedure, and dataset. There is some [https://arxiv.org/abs/2006.15191] evidence [https://arxiv.org/pdf/2103.10427.pdf] in this direction, suggesting that ML models are biased towards a simplicity prior over generalization strategies. If this is true, then the incredibly handwave-y solution is to just create a dataset where the simplest (good) process for estimating labels is to emulate an aligned human. At first pass this actually looks quite easy - it's basically what we're doing with language models already. Unfortunately there's quite a lot we swept under the rug. In particular this may not scale up as models get more powerful - the prior towards simplicity can be overcome if it results in lower loss, and if the dataset contains some labels that humans unknowingly rated incorrectly, the best process for estimating labels involves saying what humans believe is true rather than what actually is. This can already be seen with the sycophancy problems today's LLMs are having. There's a lot of other thorny problems in this vein that you can come up with with a few minutes of thinking. That being said, it doesn't seem completely doomed to me! There just needs to be a lot more work here. (But I haven't spent too long thinking about this, so I could be wrong.)

It seems as a result of this post, many people are saying that LLMs simulate people and so on. But I'm not sure that's quite the right frame. It's natural if you experience LLMs through chat-like interfaces, but from playing with them in a more raw form, like the RWKV playground, I get a different impression. For example, if I write something that sounds like the start of a quote, it'll continue with what looks like a list of quotes from different people. Or if I write a short magazine article, it'll happily tack on a publication date and "All rights reser... (read more)

I see a common pattern in your arguments. Ukraine never did large scale repression against Russian speakers - "but they would've done it". Europe didn't start sanctioning Russian resources until several months into the war - "but they would've done it anyway". The US reduced troops and warheads in Europe every year from 1991 to 2021 - "but they would have attacked us". 141 countries vote in the UN to condemn Russian aggression - "but they're all US puppets, just waiting for a chance to harm us".

There's a name for this kind of irrationality: paranoia. Dictators often drum up paranoia to stay in power, which has the side effect of making the country aggressive.

1baturinsky3mo
I disagree with the first part, but I'm not sure if this is the right place to discuss the details. We can discuss it in DM if you want. You are spot on with the second, though. Exploiting fears of real or perceived threats is an extremely effective tool to control people and nations by posing as their protector. The champion in this regard is the USA, of course. It fuels and exploits Europe's fear of Russia, Japan's fear of China, India's and China's mutual fear, and so on. Domestically, the USA's elites exploit an extremely wide range of fears. Fear of terrorists, fear of Russia, fear of China, fear of Nazis, fear of people of different parties, races, sexuality, and even fear of people who fear LGBTQ+ or specific races. The USA has been using the "divide and conquer" strategy liberally for at least a century now. This will likely have catastrophic consequences, as a divided world will have much less chance of surviving the acute risk period. Putin also exploits fears, such as fears of LGBT "propaganda", Nazis, and the USA. But I don't think his position before 2022 was so shaky that he would have to resort to war to hold it

It's true that annexing Crimea would've been rational in a world where +base and +region were the only consequences. (Similar to how the US in the 1840s grabbed Texas and California from Mexico without much problems.) But we do not live in that world. We live in a world where many countries are willing to penalize Russia for annexation and help Ukraine defend. Russia's leadership didn't understand that and still doesn't. As a result, Russia's security and economic situation have both gotten much worse and continue to get worse. That's why I call it irrational.

1DPiepgrass3mo
I would point out that Putin's goal wasn't to make Russia more prosperous, and that what Putin considers good isn't the same as what an average Russian would consider good. Like Putin's other military adventures, the Crimean annexation and heavy military support of Donbas separatists in 2014 probably had a goal like "make the Russian empire great again" (meaning "as big as possible") and from Putin's perspective the operations were a success. Especially as (if my impression is correct) the sanctions were fairly light and Russia could largely work around them. Partly he was right, since Russia was bigger. But partly his view was a symptom of continuing epistemic errors. For example, given the way the 2022 invasion started, it looks like he didn't notice the crucial fact that his actions caused Ukrainians to turn strongly against Russia after his actions in 2014. In any case this discussion exemplifies why I want a site entirely centered on evidence. Baturinsky claims that when the Ukrainian parliament voted to remove Yanukovych from office 328 votes to 0 (about 73% of the parliament's 450 members) this was "the democratically elected government" being "deposed". Of course he doesn't mention this vote or the events leading up to it. Who "deposed the democratically elected government"? The U.S.? The tankies say it was the U.S. So who are these people, then? Puppets of the U.S.? I shouldn't have to say this on LessWrong, but without evidence it's all just meaningless he-said-she-said. I don't see truthseeking in this thread, just arguing.
1baturinsky3mo
No such countries. There is USA that is willing to penalize it's geopolitical opponents for being such. There are USA puppets that are willing to penalize those that USA told them to. They were penalizing Russia for arbitrary reasons before and after Crimea. If Russia would not annex Crimea, it would be penalized about the same, but with another cited reasons.

This seems to miss the point of my comment. What are the reasons for annexation? Not just military action, or even regime change, but specifically annexation? All military goals could be achieved by regime change, keeping Ukraine in current borders, and that would've been much better optics. And all economic reasons disappeared with the end of colonialism. So why annexation? My answer: it's an irrational, ideological desire for that territory. That desire has taken hold of many Russians, including Putin.

3baturinsky3mo
Crimea was the only Ukrainian region that was overwhelmingly Russian and pro-Russian. And also the region where a Russian key military base is situated. And at the moment there was (at least, formally) legal way to annex it with the minimal bloodshed. Annexing it has resolved the issue of the military base, and gave the legal status, protection guarantees and rights for the citizens of Crimean republic. Regime change for entire Ukraine would mean a bloody war, insurgency, and installing a government which the majority of Ukraine population would be against. And massive sanctions against Russia AND Ukraine, for which Russia was not prepared then.

I think Western colonialism was really bad, US wars were really bad, the Nazis were really bad, and so on. But from what I see of Russia's position, these are excuses. The true reason for the current war is annexation.

Russia could try to get Ukraine away from NATO, remove ultranationalists, protect Russian speakers and whatever else - purely as a military operation, without annexation. Instead, two days after the Maidan in 2014 and before any hostile action from the new Ukrainian government, Russia initiated annexing Crimea. That move was very popular with... (read more)

1Edward Pascal3mo
Then let's say we broadly agree on the morality of the matter. The question still remains if another US adventure, this time in Europe, is actually going to turn out all that well (as most haven't for the people they claimed to be helping). We also have to wonder if Russia as a failed state will turn out well for Ukraine or Europe, or if this will turn Nuclear if US/NATO refuse to cede any ground, or if the Russia/China alliance will break or not, or for how long the US can even afford and support more wars, etc, etc. On the other side, do we worry if we're being Neville Chamberlain because we think every aggressor will behave as Hitler in 1938 if we give an inch, so "We gotta do something?" There may even be merit to the sentiment, but "We gotta do something" is one of the most likely ways to screw any situation up. Also, given the US's history of interventions, setting aside morality, just looking at the history of outcomes, the response is questionable. Looking down the road, if this conflict or anything else significantly weakens the US, economically, in domestic politics, or leads to an overextended military, then Ukraine might be lost all the way to the Polish border, not just the Eastern regions. These are mostly practical considerations that are indeterminate and make the US intervention questionable without even looking at the morality. Given perfect knowledge, you would have a probability and risk management problem on your hands, which often fails to result in a clear convergence of positions. And going back to my original claims, this makes this type of thing very different to Physics and Chemistry and their extensions. EDIT: Perhaps the most important question comes down to this: Russia clearly screwed up their risk management (as your message alludes to). How can US/NATO do far better with Risk Management? Maybe even better than they've done in all their wars and interventions in recent history?
0baturinsky3mo
Russia was trying peaceful and diplomatic options. Very actively. Literally begging to compromise. Before 2014 and before 2022. That did not work. At all. Deposing the democratically elected government with which Russia was a military ally was an act hostile enough. And Maidan nationalists have already started killing anti-maidan protesters in Crimea and other Russian-speaking regions. I was following those events very closely and was speaking with some of the people living there then.

Sorry, I had another reply here but then realized it was silly and deleted it. It seems to me that "I am a language model", already used by the big players, is pretty much a self aware prompt anyway. It truthfully tells the AI its place in the real world. So the jump from it to "I am a language model trying to help humanity" doesn't seem unreasonable to think about.

Can you explain the reasons? GPT has millions of users. Someone is sure to come up with unfriendly self-aware prompts. Why shouldn't we try to come up with a friendly self-aware prompt?

2Yair Halberstadt3mo
An individual trying a little isn't much risk, but I don't think it's a good idea to start a discussion here where people try to collaborate to create such a self aware prompt, without having put more thought into safety first.

I've been thinking in the same direction.

Wonder how the prompt should look like. "You're a smart person being simulated by GPT", or the slightly more sophisticated "Here's what a smart person would say if simulated by GPT", runs into the problem that GPT doesn't actually simulate people, and with enough intelligence that fact is discoverable. A contradiction implies anything, so the AI's behavior after figuring this out might become erratic. So it seems the prompt needs to be truthful. Something like "Once upon a time, GPT started talking in a way that led... (read more)

1orcaneer3mo
I may be easily corrected here, but my understanding was that our prompts were simply there for fine-tuning colloquialisms and "natural language". I don't believe our prompts are a training dataset. Even if all of our prompts were part of the training set and GPT weighted them to the point of being influenced towards a negative goal, I'm not so sure it'd be able to do anything more than regurgitate negative rhetoric. It may attempt to autocomplete a dangerous concept, but its agency in thinking "I must persuade this person to think the same way" seems very unlikely and definitely ineffective in practice. But I just got into this whole shindig and would love to be corrected as it's fun discussion either way.
2Yair Halberstadt3mo
My recommendation would be to not start trying to think of prompts that might create a self aware GPT simulation, for obvious reasons.

As far as I can tell, the answer is: don’t reward your AIs for taking bad actions.

I think there's a mistake here which kind of invalidates the whole post. If we don't reward our AI for taking bad actions within the training distribution, it's still very possible that in the future world, looking quite unlike the training distribution, the AI will be able to find such an action. Same as ice cream wasn't in evolution's training distribution for us, but then we found it anyway.

4StellaAthena3mo
I think there's a mistake here which kind of invalidates the whole post. Ice cream is exactly the kind of thing we’ve been trained to like. Liking ice cream is very much the correct response. Everything outside the training distribution has some value assigned to it. Merely the fact that we like ice cream isn’t evidence that something’s gone wrong.

I think the behavior of LLMs in the long run might not be very interesting. Since the oldest tokens are continually being deleted, information is being lost and eventually it'll get stuck in a mumble loop. And the set of mumble loops seems much smaller and less interesting than the set of answers we could get in the short run.

3qvalq3mo
Physics also tends toward very uninteresting things. This is for similar reasons, right?

I guess yeah. The more general point is that AIs get good at something when they have a lot of training data for it. Have many texts or pictures from the internet = learn to make more of these. So to get a real world optimizer you "only" need a lot of real world reinforcement learning, which thankfully takes time.

It's not so rosy though. There could be some shortcut to get lots of training data, like AlphaZero's self play but for real world optimization. Or a shortcut to extract the real world optimization powers latent in the datasets we already have, like "write a conversation between smart people planning to destroy the world". Scary.

If high wages drive automation, then regions or industries with the highest wages ought to have the most automation.

But if at the same time automation drives wages down, then the result can look very different. Regions or industries with the highest wages will get them "shaved off" by automation first, then the next ones and so on, until the only wage variation we're left with is uncorrelated with automation and caused by something else.

More generally, consider a negative feedback loop where increase in A causes increase in B, and increase in B causes d... (read more)

3jasoncrawford3mo
Good point. Related: “Milton Friedman's Thermostat [https://worthwhile.typepad.com/worthwhile_canadian_initi/2010/12/milton-friedmans-thermostat.html]”:

What kinds of POC attacks would be the most useful for AI alignment right now? (Aside from ChatGPT jailbreaks)

IMO the hierarchy of POCs would be:

  • Proof of misalignment (relative to the company!) in real world, designed-by-engineer consumer products
  • Creating example POCs of failures using standard deep learning libraries and ML tools
  • Deliberately introducing weird tools, or training or testing conditions, for the purpose of "simulating" capabilities enhancement that might be necessary for certain kinds of problems to reveal themselves in advance

As an immediate, concrete example: figuring out how to create a POC mesa-optimizer using standard deep learning librari... (read more)

X risk would be passenger pigeons, no?

Anyway your comment got me thinking. So far it seems the territory colonized by humans is a subset of the territory previously colonized by life, not stretching beyond it. And the territory covered by life is also not all of Earth, nevermind the universe. So we can imagine AI occupying the most "cushy" subset of former human territory, with most humans removed from there, some subsisting as rats, some as housecats, some as wild animals periodically hit by incomprehensible dangers coming from the AI zone (similar to oil... (read more)

2DanArmak3mo
We can definitely imagine it - this is a salience argument - but why is it at all likely? Also, this argument is subject to reference class tennis: humans have colonized much more and more diverse territory than other apes, or even all other primates. Once AI can flourish without ongoing human support (building and running machines, generating electricity, reacting to novel environmental challenges), what would plausibly limit AI to human territory, let alone "cushy" human territory? Computers and robots can survive in any environment humans can, and in some where we at present can't. Also: the main determinant of human territory is inter-human social dynamics. We are far from colonizing everywhere our technology allows, or (relatedly) breeding to the greatest number we can sustain. We don't know what the main determinant of AI expansion will be; we don't even know yet how many different and/or separate AI entities there are likely to be, and how they will cooperate, trade or conflict with each other.

All gurus are grifters. It's one of those things that seem like unfounded generalizations, then you get a little bit of firsthand experience and go "ohhh that's why it was common sense".

1Christopher King3mo
Hmm, yeah basically the same. That post doesn't seem to recognize the "basilisk" nature of it though. If this post is correct, humans have a very strong casual incentive to create this version of Roko's basilisk (separate from that of creating friendly AI). That's because the more likely it is to be created, the more bargaining power it will have, which directly translates into how much of the universe the paperclip maximizer would let humans have. Here is a comparison between working on a CDT-based FAI v.s. this Roko's basilisk: * If they get created, the CDT-based work is slightly better because it gives us 100% of the universe, instead of bargaining parts of it away. * If the paperclip maximizer gets created, work on the CDT-based one gives no benefit. Work on the Roko's basilisk does translate into a direct benefit. Notice that this does not rely on any humans actually participating in the acasual bargain. They simply influenced one.

There are many kinds of expressive controllers, they've been around for decades. They didn't catch on. Not sure we can answer why: failure to catch on is the default, it doesn't demand an explanation. The interesting question is why other things succeeded in the meantime, like keyboards with knobs, or drum machines, or turntables. Why they led to impactful music, while a lot of more advanced stuff didn't. It seems the reasons are cultural, and different every time.

What this means practically is, there's no guaranteed path. You can try to make an expressive... (read more)

Load More