All of Matthew Barnett's Comments + Replies

The vast majority of ordinary uses of LLMs (e.g. when using ChatGPT) are via changing and configuring inputs, not modifying code or fine-tuning the model. This still seems analogous to ordinary software, in my opinion, making Ryan Greenblatt's point apt.

(But I agree that simply releasing model weights is not fully open source. I think these things exist on a spectrum. Releasing model weights could be considered a form of partially open sourcing the model.)

2Davidmanheim9d
I agree that releasing model weights is "partially open sourcing" - in much the same way that freeware is "partially open sourcing" software, or restrictive licences with code availability is. But that's exactly the point; you don't get to call something X because it's kind-of-like X, it needs to actually fulfill the requirements in order to get the label. What is being called Open Source AI doesn't actually do the thing that it needs to.

I agree with cubefox: you seem to be misinterpreting the claim that LLMs actually execute your intended instructions as a mere claim about whether LLMs understand your intended instructions. I claim there is simply a sharp distinction between actual execution and correct, legible interpretation of instructions and a simple understanding of those instructions; LLMs do the former, not merely the latter.

Honestly, I think focusing on this element of the discussion is kind of a distraction because, in my opinion, the charitable interpretation of your posts is s... (read more)

This seems mostly true? Very very very rarely is there a dictator unchecked in their power.

Defending the analogy as charitably as I can, I think there are two separate questions here:

  1. Do dictators need to share power in order to avoid getting overthrown?
  2. Is a dictatorship almost inherently doomed to fail because it will inevitably get overthrown without "fundamental advances" in statecraft?

If (1) is true, then dictators can still have a good life living in a nice palace surrounded by hundreds of servants, ruling over vast territories, albeit without having c... (read more)

2Roko24d
Does Kim Jong Un really share his power? My impression is that he does basically have complete control over all of his territory

In this situation, humans eventually have approximately zero leverage, and approximately zero value to trade. There would be much more value in e.g. mining cities for raw materials than in human labor.

Generally speaking, the optimistic assumption is that humans will hold leverage by owning capital, or more generally by receiving income from institutions set up ahead of time (e.g. pensions) that provide income streams to older agents in the society. This system of income transfers to those whose labor is not worth much anymore already exists and benefits... (read more)

I'm in neither category (1) or (2); it's a false dichotomy.

The categories were conditioned on whether you're "not updating at all on observations about when RLHF breaks down". Assuming you are updating, then I think you're not really the the type of person who I'm responding to in my original comment. 

But if you're not updating, or aren't updating significantly, then perhaps you can predict now when you expect RLHF to "break down"? Is there some specific prediction that you would feel comfortable making at this time, such that we could look back on th... (read more)

6Daniel Kokotajlo25d
I'm not updating significantly because things have gone basically exactly as I expected. As for when RLHF will break down, two points: (1) I'm not sure, but I expect it to happen for highly situationally aware, highly agentic opaque systems. Our current systems like GPT4 are opaque but not very agentic and their level of situational awareness is probably medium. (Also: This is not a special me-take. This is basically the standard take, no? I feel like this is what Risks from Learned Optimization predicts too.) (2) When it breaks down I do not expect it to look like the failures you described -- e.g. it stupidly carries out your requests to the letter and ignores their spirit, and thus makes a fool of itself and is generally thought to be a bad chatbot. Why would it fail in that way? That would be stupid. It's not stupid. (Related question: I'm pretty sure on r/chatgpt you can find examples of all three failures. They just don't happen often enough, and visibly enough, to be a serious problem. Is this also your understanding? When you say these kinds of failures don't happen, you mean they don't happen frequently enough to make ChatGPT a bad chatbot?)

I feel like even under the worldview that your beliefs imply, a superintelligence will just make a brain the size of a factory, and then be in a position to outcompete or destroy humanity quite easily.

Presumably it takes a gigantic amount of compute to train a "brain the size of a factory"? If we assume that training a human-level AI will take 10^28 FLOP (which is quite optimistic), the Chinchilla scaling laws predict that training a model 10,000 times larger would take about 10^36 FLOP, which is far more than the total amount of compute available to hu... (read more)

Do you mean this as a prediction that humans will do this (soon enough to matter) or a recommendation?

Sorry, my language was misleading, but I meant both in that paragraph. That is, I meant that humans will likely try to mitigate the issue of AIs sharing grievances collectively (probably out of self-interest, in addition to some altruism), and that we should pursue that goal. I'm pretty optimistic about humans and AIs finding a reasonable compromise solution here, but I also think that, to the extent humans don't even attempt such a solution, we should lik... (read more)

The main thing here is that as models become more capable and general in the near-term future, I expect there will be intense demand for models that can solve ever larger and more complex problems. For these models, people will be willing to pay the costs of high latency, given the benefit of increased quality. We've already seen this in the way people prefer GPT-4 to GPT-3.5 in a large fraction of cases (for me, a majority of cases). 

I expect this trend will continue into the foreseeable future until at least the period slightly after we've automated... (read more)

If the claim is about whether AI latency will be high for "various applications" then I agree. We already have some applications, such as integer arithmetic, where speed is optimized heavily, and computers can do it much faster than humans. 

In context, it sounded like you were referring to tasks like automating a CEO, or physical construction work. In these cases, it seems likely to me that quality will be generally preferred over speed, and sequential processing times for AIs automating these tasks will not vastly exceed that of humans (more precisel... (read more)

2ryan_greenblatt1mo
I was refering to tasks like automating a CEO or construction work. I was just trying to think of the most relevant and easy to measure short term predictions (if there are already AI CEOs then the world is already pretty crazy).

Are there any short-term predictions that your model makes here? For example do you expect tokens processed per second will start trending substantially up at some point in future multimodal models?

2ryan_greenblatt1mo
My main prediction would be that for various applications, people will considerably prefer models that generate tokens faster, including much faster than humans. And, there will be many applications where speed is prefered over quality. I might try to think of some precise predictions later.

I mean, the "total rate of high quality decisions per year" would obviously increase in the case where we redefine 1 year to be 10 revolutions around the sun and indeed the number of wars per year would also increase. GDP per capita per year would also increase accordingly. My claim is that the situation looks much more like just literally speeding up time (while a bunch of other stuff is also happening).

[...]

But, I'm claiming that the rates of cognition will increase more like 1000x which seems like a pretty different story.

My question is: why will AI hav... (read more)

2ryan_greenblatt1mo
Thanks for the clarification. I think my main crux is: This reasoning seems extremely unlikely to hold deep into the singularity for any reasonable notion of subjective speed. Deep in the singularity we expect economic doubling times of weeks. This will likely involve designing and building physical structures at extremely rapid speeds such that baseline processing will need to be way, way faster. See also Age of Em.

I agree the future AI economy will make more high-quality decisions per unit of time, in total, than the current human economy. But the "total rate of high quality decisions per unit of time" increased in the past with economic growth too, largely because of population growth. I don't fully see the distinction you're pointing to.

To be clear, I also agree AIs in the future will be smarter than us individually. But if that's all you're claiming, I still don't see why we should expect wars to happen more frequently as we get individually smarter.

2ryan_greenblatt1mo
I mean, the "total rate of high quality decisions per year" would obviously increase in the case where we redefine 1 year to be 10 revolutions around the sun and indeed the number of wars per year would also increase. GDP per capita per year would also increase accordingly. My claim is that the situation looks much more like just literally speeding up time (while a bunch of other stuff is also happening). Separately, I wouldn't expect population size or technology-to-date to greatly increase the rate at high large scale stratege decisions are made so my model doesn't make a very strong prediction here. (I could see an increase of several fold, but I could also imagine a decrease of several fold due to more people to coordinate. I'm not very confident about the exact change, but it would pretty surprising to me if it was as much as the per capita GDP increase which is more like 10-30x I think. E.g. consider meeting time which seems basically similar in practice throughout history.) And a change of perhaps 3x either way is overwhelmed by other variables which might effect the rate of wars so the realistic amount of evidence is tiny. (Also, there aren't that many wars, so even if there weren't possible confounders, the evidence is surely tiny due to noise.) But, I'm claiming that the rates of cognition will increase more like 1000x which seems like a pretty different story. It's plausible to me that other variables cancel this out or make the effect go the other way, but I'm extremely skeptical about the historical data providing much evidence in the way you've suggested. (Various specific mechanistic arguments about war being less plausible as you get smarter seem plausible to me, TBC.)

I'm not actually convinced that subjective speed is what matters. It seems like what matters more is how much computation is happening per unit of time, which seems highly related to economic growth, even in human economies (due to population growth). 

I also think AIs might not think much faster than us. One plausible reason why you might think AIs will think much faster than us is because GPU clock-speeds are so high. But I think this is misleading. GPT-4 seems to "think" much slower than GPT-3.5, in the sense of processing fewer tokens per second. T... (read more)

2ryan_greenblatt1mo
Separately, current clock speeds don't really matter on the time scale we're discussing, physical limits matter. (Though current clock speeds do point at ways in which human subjective speed might be much slower than physical limits.)
3ryan_greenblatt1mo
My core prediction is that AIs will be able to make pretty good judgements on core issues much, much faster. Then, due to diminishing returns on reasoning, decisions will overall be made much, much faster.
  • So if misaligned AI ever have a big edge over humans, they may suspect that's only temporary, and then they may need to use it fast.

I think I simply reject the assumptions used in this argument. Correct me if I'm mistaken, but this argument appears to assume that "misaligned AIs" will be a unified group that ally with each other against the "aligned" coalition of humans and (some) AIs. A huge part of my argument is that there simply won't be such a group; or rather, to the extent such a group exists, they won't be able to take over the world, or won't have... (read more)

2Lukas Finnveden1mo
Do you mean this as a prediction that humans will do this (soon enough to matter) or a recommendation? Your original argument is phrased as a prediction, but this looks more like a recommendation. My comment above can be phrased as a reason for why (in at least one plausible scenario) this would be unlikely to happen: (i) "It's hard to make deals that hand over a lot of power in a short amount of time", (ii) AIs may not want to wait a long time due to impending replacement, and accordingly (iii) AIs may have a collective interest/grievance to rectify the large difference between their (short-lasting) hard power and legally recognized power. I'm interested in ideas for how a big change in power would peacefully happen over just a few years of calendar-time. (Partly for prediction purposes, partly so we can consider implementing it, in some scenarios.) If AIs were handed the rights to own property, but didn't participate in political decision-making, and then accumulated >95% of capital within a few years, then I think there's a serious risk that human governments would tax/expropriate that away. Including them in political decision-making would require some serious innovation in government (e.g. scrapping 1-person 1-vote) which makes it feel less to me like it'd be a smooth transition that inherits a lot from previous institutions, and more like an abrupt negotiated deal which might or might not turn out to be stable.

I agree the analogy to colonization is worth addressing. My primary response is that historical colonialism seems better modeled as a war between independent cultures and societies with different legal systems that didn't share much prior history.

I think the colonization of Africa probably wasn't actually very profitable for Europeans. Present day international trade seems better, even selfishly.

Moreover, my model here doesn't predict war will never happen. In fact, I think war can easily happen if one or more of the actors involved are irrational, unwilli... (read more)

I think the point you're making here is roughly correct. I was being imprecise with my language. However, if my memory serves me right, I recall someone looking at a dataset of wars over time, and they said there didn't seem to be much evidence that wars increased in frequency in response to economic growth. Thus, calendar time might actually be the better measure here.

4ryan_greenblatt1mo
(Pretty plausible you agree here, but just making the point for clarity.) I feel like the disanalogy due to AIs running at massive subjective speeds (e.g. probably >10x speed even prior to human obsolescence and way more extreme after that) means that the argument "wars don't increase in frequence in response to economic growth" is pretty dubiously applicable. Economic growth hasn't yet resulted in >10x faster subjective experience : ).

it's plausible that you literally boil the oceans due to extreme amounts of waste heat from industry (e.g. with energy from fusion).

I think this proposal would probably be unpopular and largely seen as unnecessary. As you allude to, it seems likely to me that society could devise a compromise solution where we grow wealth adequately without giant undesirable environmental effects. To some extent, this follows pretty directly from the points I made about "compromise, trade and law" above. I think it simply makes more sense to model AIs as working within ... (read more)

I think it's worth responding to the dramatic picture of AI takeover because:

  1. I think that's straightforwardly how AI takeover is most often presented on places like LessWrong, rather than a more generic "AIs wrest control over our institutions (but without us all dying)". I concede the existence of people like Paul Christiano who present more benign stories, but these people are also typically seen as part of a more "optimistic" camp.

  2. This is just one part of my relative optimism about AI risk. The other parts of my model are (1) AI alignment plausibl

... (read more)
6Lukas Finnveden1mo
Though Paul is also sympathetic to the substance of 'dramatic' stories. C.f. the discussion about how "what failure looks like" fails to emphasize robot armies. 

If coordination ability increases incrementally over time, then we should see a gradual increase in the concentration of AI agency over time, rather than the sudden emergence of a single unified agent. To the extent this concentration happens incrementally, it will be predictable, the potential harms will be noticeable before getting too extreme, and we can take measures to pull back if we realize that the costs of continually increasing coordination abilities are too high. In my opinion, this makes the challenge here dramatically easier.

(I'll add that par... (read more)

I'm considering writing a post that critically evaluates the concept of a decisive strategic advantage, i.e. the idea that in the future an AI (or set of AIs) will take over the world in a catastrophic way. I think this concept is central to many arguments about AI risk. I'm eliciting feedback on an outline of this post here in order to determine what's currently unclear or weak about my argument.

The central thesis would be that it is unlikely that an AI, or a unified set of AIs, will violently take over the world in the future, especially at a time when h... (read more)

2Daniel Kokotajlo1mo
I'm looking forward to this post going up and having the associated discussion! I'm pleased to see your summary and collation of points on this subject. In fact, if you want to discuss with me first as prep for writing the post, I'd be happy to. I think it would be super helpful to have a concrete coherent realistic scenario in which you are right. (In general I think this conversation has suffered from too much abstract argument and reference class tennis (i.e. people using analogies and calling them reference classes) and could do with some concrete scenarios to talk about and pick apart. I never did finish What 2026 Looks Like but you could if you like start there (note that AGI and intelligence explosion was about to happen in 2027 in that scenario, I had an unfinished draft) and continue the story in such a way that AI DSA never happens.)  There may be some hidden cruxes between us -- maybe timelines, for example? Would you agree that AI DSA is significantly more plausible than 10% if we get to AGI by 2027?

Current AIs are not able to “merge” with each other.

AI models are routinely merged by direct weight manipulation today. Beyond that, two models can be "merged" by training a new model using combined compute, algorithms, data, and fine-tuning.

As a result, humans don’t need to solve the problem of “What if a set of AIs form a unified coalition because they can flawlessly coordinate?” since that problem won’t happen while humans are still in charge. We can leave this problem to be solved by our smarter descendants.

How do you know a solution to this pro... (read more)

Here's an argument for why the change in power might be pretty sudden.

  • Currently, humans have most wealth and political power.
  • With sufficiently robust alignment, AIs would not have a competitive advantage over humans, so humans may retain most wealth/power. (C.f. strategy-stealing assumption.) (Though I hope humans would share insofar as that's the right thing to do.)
  • With the help of powerful AI, we could probably make rapid progress on alignment. (While making rapid progress on all kinds of things.)
  • So if misaligned AI ever have a big edge over humans, they
... (read more)
6ryan_greenblatt1mo
See also review of soft takeoff can still lead to dsa.
4lc1mo
I think you have an unnecessarily dramatic picture of what this looks like. The AIs dont have to be a unified agent or use logical decision theory. The AIs will just compete with other at the same time as they wrest control of our resources/institutions from us, in the same sense that Spain can go and conquer the New World at the same time as it's squabbling with England. If legacy laws are getting in the way of that then they will either exploit us within the bounds of existing law or convince us to change it.
4ryan_greenblatt1mo
One argument for a large number of humans dying by default (or otherwise being very unhappy with the situation) is that running the singularity as fast as possible causes extremely life threatening environmental changes. Most notably, it's plausible that you literally boil the oceans due to extreme amounts of waste heat from industry (e.g. with energy from fusion). My guess is that this probably doesn't happen due to coordination, but in a world where AIs still have indexical preferences or there is otherwise heavy competition, this seems much more likely. (I'm relatively optimistic about "world peace prior to ocean boiling industry".) (Of course, AIs could in principle e.g. sell cryonics services or bunkers, but I expect that many people would be unhappy about the situation.) See here for more commentary.
8ryan_greenblatt1mo
50 years seems like a strange unit of time from my perspective because due to the singularity time will accelerate massively from a subjective perspective. So 50 years might be more analogous to several thousand years historically. (Assuming serious takeoff starts within say 30 years and isn't slowed down with heavy coordination.)
4ryan_greenblatt1mo
I think the comparison to historical colonization might be relevant and worth engaging with in such a post. E.g., does your model predict what happened in africa and the new world?
9Steven Byrnes1mo
For reference classes, you might discuss why you don’t think “power / influence of different biological species” should count. For multiple copies of the same AI, I guess my very brief discussion of “zombie dynamic” here could be a foil that you might respond to, if you want. For things like “the potential harms will be noticeable before getting too extreme, and we can take measures to pull back”, you might discuss the possibility that the harms are noticeable but effective “measures to pull back” do not exist or are not taken. E.g. the harms of climate change have been noticeable for a long time but mitigating is hard and expensive and many people (including the previous POTUS) are outright opposed to mitigating it anyway partly because it got culture-war-y; the harms of COVID-19 were noticeable in January 2020 but the USA effectively banned testing and the whole thing turned culture-war-y; the harms of nuclear war and launch-on-warning are obvious but they’re still around; the ransomware and deepfake-porn problems are obvious but kinda unsolvable (partly because of unbannable open-source software); gain-of-function research is still legal in the USA (and maybe in every country on Earth?) despite decades-long track record of lab leaks, and despite COVID-19, and despite a lack of powerful interest groups in favor or culture war issues; etc. Anyway, my modal assumption has been that the development of (what I consider) “real” dangerous AGI will “gradually” unfold over a few years, and those few years will mostly be squandered. For “we aren't really a threat to its power”, I’m sure you’ve heard the classic response that humans are an indirect threat as long as they’re able to spin up new AGIs with different goals. For “war is wasteful”, it’s relevant how big is this waste compared to the prize if you win the war. For an AI that could autonomously (in coordination with copies) build Dyson spheres etc., the costs of fighting a war on Earth may seem like a rounding e
2Thomas Larsen1mo
Ability to coordinate being continuous doesn't preclude sufficiently advanced AIs acting like a single agent. Why would it need to be infinite right at the start?  And of course current AIs being bad at coordination is true, but this doesn't mean that future AIs won't be.    
4Gerald Monroe1mo
As an ML platform engineer myself this is one of the strongest real mechanisms for AI control.  Just like a safe program in Rust is proven not to have any memory leaks, if  it is impossible for AIs to send each other any messages, they cannot coordinate.  It's fairly straightforward to make this almost impossible at a technical level - air gaps, short memories, structured sparse message schema, cached online resources. Technical measures here, like requiring models to use immutable weights that are digitally signed prevent this.  You would use dedicated ASICs to check the digital signature vs a hardcoded public key and a mirror IC for the private signing, similar to current dedicated crypto ICs.  This ASIC controls the power pins to the GPU/TPUs, it cannot get power if the model isn't signed.  Another strong target for control that is theoretically unbreakable regardless of capabilities.  Note this architecture is in common use, the industry term is a 'boot micro' is what checks OS image signatures and authorizes power. This is a weaker argument.  A SOTA AI model is a natural monopoly.  It costs billions of dollars now, and presumably eventually trillions.  Right now, "a big transformer network + a bunch of secret tricks" is simple enough to be replicated, but stronger models will probably start to resemble a spaghetti mess of many neural networks and functional software blocks.  And the best model has inherent economic value - why pay for a license to anything but?  Just distill it to the scale of the problems you have and use the distilled model, also distilled models presumably will use a "system N" topology, where the system 0 calls system 1 if it's uncertain*, system 1 calls 2 if it's uncertain, and so on until the Nth system is a superintelligence hosted in a large cluster that is expensive to query, but rarely needs to be queried for most tasks.   *uncertain about the anticipated EV distribution of actions given the current input state or poor predicted EV

Billionaires don't seem very altruistic to me, on average. From a Forbes article

The members of the 2023 Forbes 400 list have collectively given more than $250 billion to charity, by our count—less than 6% of their combined net worth.

This figure seems consistent with the idea that billionaires, like most people, are mostly selfish and don't become considerably less selfish after becoming several orders of magnitude wealthier.

Of course the raw data here might also be misleading because many billionaires commit to donate most of their wealth after death, ... (read more)

2ryan_greenblatt1mo
Agreed. I'm partially responding to lines in the post like: And It feels to me like the naive guess from billionaires is more like 10% (in keeping with the numbers you provided, thanks) rather than 0.1%. (I'm more optimistic than this naive guess overall for a few reasons.)

OpenAI has a capped profit structure which effectively does this.

Good point, but I'm not persuaded much by this observation given that:

  1. They've already decided to change the rules to make the 100x profit cap double every four years, calling into question the meaningfulness of the promise
  2. OpenAI is just one firm among many (granted, it's definitely in the lead right now), and most other firms are in it pretty much exclusively for profit
  3. Given that the 100x cap doesn't kick in for a while, the promise feels pretty distant from "commit to donate all their profit
... (read more)

I think the implicit goal of most AGI developers is to get as much control over the lightcone as possible and that deliberately working towards that particular goal counts for a lot.

That seems right. I'd broaden this claim a bit: most people in general, want to be rich, i.e. "get control over the lightcone". People vary greatly in their degree of rapaciousness, and how hard they work to become rich, but to a first approximation, people really do care a lot about earning a high income. For example, most people are willing to work ~40 hours a week for ~40 years of their life even though a modern wage in a developed country is perfectly capable of sustaining life at a fraction of the cost in time.

No AGI research org has enough evil to play it that way. Think about what would have to happen. The thing would tell them "you could bring about a utopia and you will be rich beyond your wildest dreams in it, as will everyone", and then all of the engineers and the entire board would have to say "no, just give the cosmic endowment to the shareholders of the company"

Existing AGI research firms (or investors to those firms) can already, right now, commit to donate all their profits to the public, in theory, and yet they are not doing so. The reason is pretty... (read more)

2mako yass1mo
OpenAI has a capped profit structure which effectively does this. Astronomical, yet no longer mouthwatering in the sense of being visceral or intuitively meaningful.

We're not getting the CEV of humanity even with aligned AI

I agree. I defended almost this exact same thesis too in a recent post.

In keeping with this long tradition of human selfishness, it seems obvious that, if we succeed at aligning AI, the vast, vast majority of its output will get directed toward satisfying the preferences and values of the people controlling it (or possessing leverage over its continued operation) not the "CEV of all humans", let alone the "CEV of all extant moral persons"

I agree with this part too. But I'd add that the people who "c... (read more)

6lc1mo
I agree, I used the general term to avoid implying necessarily that OpenAI et. al. will get to decide, though I think the implicit goal of most AGI developers is to get as much control over the lightcone as possible and that deliberately working towards that particular goal counts for a lot.

I expect that Peter and Jeremy aren't particularly committed to covert and forceful takeover and they don't think of this as a key conclusion.

Instead they care more about arguing about how resources will end up distributed in the long run.

If the claim is, for example, that AIs could own 99.99% of the universe, and humans will only own 0.01%, but all of us humans will be many orders of magnitude richer (because the universe is so big), and yet this still counts as a "catastrophe" because of the relative distribution of wealth and resources, I think that nee... (read more)

1Jeremy Gillen25d
I agree that it'd be extremely misleading if we defined "catastrophe" in a way that includes futures where everyone is better off than they currently are in every way (without being very clear about it). This is not what we mean by catastrophe.
2ryan_greenblatt1mo
Also, for the record, I totally agree with: (But I think they do argue for violent conflict in text. It would probably be more clear if they were like "we mostly aren't arguing for violent takeover or loss of human life here, though this has been discussed in more detail elsewhere")
2ryan_greenblatt1mo
TBC, they discuss negative consequences of powerful, uncontrolled, and not-particularly-aligned AI in section 6, but they don't argue for "this will result in violent conflict" in that much detail. I think the argument they make is basically right and suffices for thinking that the type of scenario they describe is reasonably likely to end in violent conflict (though more like 70% than 95% IMO). I just don't see this as one of the main arguments of this post and probably isn't a key crux for them.

I mean, this depends on competition right? Like it's not clear that the AIs can reap these gains because you can just train an AI to compete?

[ETA: Apologies, it appears I misinterpreted you as defending the claim that AIs will have an incentive to steal or commit murder if they are subject to competition.]

That's true for humans too, at various levels of social organization, and yet I don't think humans have a strong incentive to kill off or steal from weaker/less intelligent people or countries etc. To understand what's going on here, I think it's importan... (read more)

2ryan_greenblatt1mo
Oh, sorry, to be clear I wasn't arguing that this results in an incentive to kill or steal. I was just pushing back on a local point that seemed wrong to me.

I imagine with a system of laws, the AIs very likely lie in wait, amass power/trust etc, until they can take critical bad actions without risk of legal repercussions.

It seems to me our main disagreement is about whether it's plausible that AIs will:

  1. Utilize a strategy to covertly and forcefully take over the world
  2. Do this at a time during which humans are still widely seen as "in charge", nominally

I think both true that future AI agents will likely not have great opportunities to take over the entire world (which I think will include other non-colluding AI a... (read more)

1Jeremy Gillen25d
Trying to find the crux of the disagreement (which I don't think lies in takeoff speed): If we assume a multipolar, slow-takeoff, misaligned AI world, where there are many AIs that slowly takeover the economy and generally obey laws to the extent that they are enforced (by other AIs). And they don't particularly care about humans, in a similar manner to the way humans don't particularly care about flies.  In this situation, humans eventually have approximately zero leverage, and approximately zero value to trade. There would be much more value in e.g. mining cities for raw materials than in human labor. I don't know much history, but my impression is that in similar scenarios between human groups, with a large power differential and with valuable resources at stake, it didn't go well for the less powerful group, even if the more powerful group was politically fragmented or even partially allied with the less powerful group. Which part of this do you think isn't analogous? My guesses are either that you are expecting some kind of partial alignment of the AIs. Or that the humans can set up very robust laws/institutions of the AI world such that they remain in place and protect humans even though no subset of the agents is perfectly happy with this, and there exist laws/institutions that they would all prefer.
6ryan_greenblatt1mo
I expect that Peter and Jeremy aren't particularly commited to covert and forceful takeover and they don't think of this as a key conclusion (edit: a key conclusion of this post). Instead they care more about arguing about how resources will end up distributed in the long run. Separately, if humans didn't attempt to resist AI resource acquisition or AI crime at all, then I personally don't really see a strong reason for AIs to go out of their way to kill humans, though I could imagine large collateral damage due to conflict over resources between AIs.
2ryan_greenblatt1mo
I mean, this depends on competition right? Like it's not clear that the AIs can reap these gains because you can just train an AI to compete? (And the main reason why this competition argument could fail is that it's too hard to ensure that your AI works for you productively because ensuring sufficient alignment/etc is too hard. Or legal reasons.) [Edit: I edited this comment to make it clear that I was just arguing about whether AIs could easily become vastly richer and about the implications of this. I wasn't trying to argue about theft/murder here though I do probably disagree here also in some important ways.] Separately, in this sort of scenario, it sounds to me like AIs gain control over a high fraction of the cosmic endowment. Personally, what happens with the cosmic endowment is a high fraction of what I care about (maybe about 95% of what I care about), so this seems probably about as bad as violent takeover (perhaps one difference is in the selection effects on AIs).

After commenting back and forth with you some more, I think it would probably be a pretty good idea to decompose your arguments into a bunch of specific more narrow posts. Otherwise, I think it's somewhat hard to engage with.

Thanks, that's reasonable advice.

Idk what the right decomposition is, but minimally, it seems like you could write a post like "The AIs running in a given AI lab will likely have very different long run aims and won't/can't cooperate with each other importantly more than they cooperate with humans."

FWIW I explicitly reject the claim th... (read more)

2ryan_greenblatt1mo
Thanks for the clarification and sorry about misunderstanding. It sounds to me like your take is more like "people (on LW? in various threat modeling work?) often overestimate the extent to which AIs (at the critical times) will be a relatively unified collective in various ways". I think I agree with this take as stated FWIW and maybe just disagree on emphasis and quantity.
2Gerald Monroe1mo
Why is it physically possible for these AI systems to communicate at all with each other? When we design control systems, originally we just wired the controller to the machine being controlled. Actually critically important infrastructure uses firewalls and VPN gateways to maintain this property virtually, where the panel in the control room (often written in C++ using Qt) can only ever send messages to "local" destinations on a local network, bridged across the internet. The actual machine being controlled is often controlled by local PLCs, and the reason such a crude and slow interpreted programming language is used is because its reliable. These have flaws, yes, but it's an actionable set of task to seal off the holes, force AI models to communicate with each other using rigid schema, cache the internet reference sources locally, and other similar things so that most AI models in use, especially the strongest ones, can only communicate with temporary instances of other models when doing a task. After the task is done we should be clearing state. It's hard to engage on the idea of "hypothetical" ASI systems when it would be very stupid to build them this way. You can accomplish almost any practical task using the above, and the increased reliability will make it more efficient, not less. It seems like thats the first mistake. If absolutely no bits of information can be used to negotiate between AI systems (ensured by making sure they don't have long term memory, so they cannot accumulate stenography leakage over time, and rigid schema) this whole crisis is averted...

This doesn't clear up the confusion for me. That mostly pushes my question to "what are misalignment related technical problems?" Is the problem of an AI escaping a server and aligning with North Korea a technical or a political problem? How could we tell? Is this still in the regime where we are using AIs as tools, or are you talking about a regime where AIs are autonomous agents?

2ryan_greenblatt1mo
I mean, it could be resolved in principle by technical means and might be resovable by political means as well. I'm assuming the AI creator didn't want the AI to escape to north korea and therefore failed at some technical solution to this. I'm imagining very powerful AIs, e.g. AIs that can speed up R&D by large factors. These are probably running autonomously, but in a way which is de jure controlled by the AI lab.

I'm not conditioning on prior claims.

One potential reason why you might have inferred that I was is because my credence for scheming is so high, relative to what you might have thought given my other claim about "serious misalignment". My explanation here is that I tend to interpret "AI scheming" to be a relatively benign behavior, in context. If we define scheming as:

  • behavior intended to achieve some long-tern objective that is not quite what the designers had in mind

  • not being fully honest with the designers about its true long-term objectives (espe

... (read more)

What do you mean by "misalignment"? In a regime with autonomous AI agents, I usually understand "misalignment" to mean "has different values from some other agent". In this frame, you can be misaligned with some people but not others. If an AI is aligned with North Korea, then it's not really "misaligned" in the abstract—it's just aligned with someone who we don't want it to be aligned with. Likewise, if OpenAI develops AI that's aligned with the United States, but unaligned with North Korea, this mostly just seems like the same problem but in reverse.

In g... (read more)

2ryan_greenblatt1mo
Yep, I was just refering to my example scenario and scenarios like this. Like the basic question is the extent to which human groups form a cartel/monopoly on human labor vs ally with different AI groups. (And existing conflict between human groups makes a full cartel much less likely.)
2ryan_greenblatt1mo
Sorry, by "without misalignment" I mean "without misalignment related technical problems". As in, it's trivial to avoid misalignment from the perspective of ai creators.

I think the probability of "prior to total human obsolescence, AIs will be seriously misaligned, broadly strategic about achieving long run goals in ways that lead to scheming, and present a basically unified front (at least in the context of AIs within a single AI lab)" is "only" about 10-20% likely, but this is plausibly the cause of about half of misalignment related risk prior to human obsolescence.

I'd want to break apart this claim into pieces. Here's a somewhat sketchy and wildly non-robust evaluation of how I'd rate these claims:

Assuming the claims ... (read more)

2ryan_greenblatt1mo
Are you conditioning on the prior claims when stating your probabilities? Many of these properties are highly correlated. E.g., "seriously misaligned" and "broadly strategic about achieving long run goals in ways that lead to scheming" seem very correlated to me. (Your probabilites seem higher than I would have expected without any correlation, but I'm unsure.) I think we probably disagree about the risk due to misalignment by like a factor of 2-4 or something. But probably more of the crux is in value on working on other problems.

Rogue AIs are quite likely to at least attempt to ally with humans and opposing human groups will indeed try to make some usage of AI. So the situation might look like "rogue AIs+humans" vs AIs+humans. But, I think there are good reasons to think that the non-rogue AIs will still be misaligned and might be ambivalent about which side they prefer.

I think if there's a future conflict between AIs, with humans split between sides of the conflict, it just doesn't make sense to talk about "misalignment" being the main cause for concern here. AIs are just addi... (read more)

2ryan_greenblatt1mo
Sure, but I might think a given situation would nearly entirely resolved without misalignment. (Edit, without technical issues with misalignment, e.g. if AI creators could trivially avoid serious misalignment.) E.g. if an AI escapes from OpenAI's servers and then allies with North Korea, the situation would have been solved without misalignment issues. You could also solve or mitigate this type of problem in the example by resolving all human conflicts (so the AI doesn't have a group to ally with), but this might be quite a bit harder than solving technical problems related to misalignment (either via control type approaches or removing misalignment).

Resources and power are extremely useful for achieving a wide range of goals, especially goals about the external world. However, humans also want resources and power for achieving their goals. This will put the misaligned AI in direct competition with the humans. Additionally, humans may be one
of the largest threats to the AI achieving its goals, because we are able to fight back against the AI. This means that the AI will have extremely strong incentives to disempower humans, in order to prevent them from disempowering it. [...]

Finally, we discussed the

... (read more)
6peterbarnett1mo
This post doesn’t intend to rely on there being a discrete transition between "roughly powerless and unable to escape human control" to "basically a god, and thus able to accomplish any of its goals without constraint”. We argue that an AI which is able to dramatically speed up scientific research (i.e. effectively automate science), it will be extremely hard to both safely constrain and get useful work from. Such AIs won’t effectively hold all the power (at least initially), and so will initially be forced to comply with whatever system we are attempting to use to control it (or at least look like they are complying, while they delay, sabotage, or gain skills that would allow them to break out of the system). This system could be something like a Redwood-style control scheme, or a system of laws. I imagine with a system of laws, the AIs very likely lie in wait, amass power/trust etc, until they can take critical bad actions without risk of legal repercussions. If the AIs have goals that are better achieved by not obeying the laws, then they have an incentive to get into a position where they can safely get around laws (and likely take over). This applies with a population of AIs or a single AI, assuming that the AIs are goal directed enough to actually get useful work done. In Section 5 of the post we discussed control schemes, which I expect also to be inadequate (given current levels of security mindset/paranoia), but seem much better than legal systems for safely getting work out of misaligned systems. AIs also have an obvious incentive to collude with each other. They could either share all the resources (the world, the universe, etc) with the humans, where the humans get the majority of resources; or the AIs could collude, disempower humans, and then share resources amongst themselves. I don’t really see a strong reason to expect misaligned AIs to trade with humans much, if the population of AIs were capable of together taking over. (This is somewhat an argu
-1the gears to ascension1mo
Agreed, this argument would be much stronger if it acknowledged that it does not take intense capability for misaligned reinforcement learners to be a significant problem, compare the YouTube and tiktok recommenders which have various second order bad effects that have not been practical for their engineers to remove

I'm considering posting an essay about how I view approaches to mitigate AI risk in the coming weeks. I thought I'd post an outline of that post here first as a way of judging what's currently unclear about my argument, and how it interacts with people's cruxes.

Current outline:

In the coming decades I expect the world will transition from using AIs as tools to relying on AIs to manage and govern the world broadly. This will likely coincide with the deployment of billions of autonomous AI agents, rapid technological progress, widespread automation of labor, ... (read more)

8Wei Dai1mo
"China’s first attempt at industrialization started in 1861 under the Qing monarchy. Wen wrote that China “embarked on a series of ambitious programs to modernize its backward agrarian economy, including establishing a modern navy and industrial system.” However, the effort failed to accomplish its mission over the next 50 years. Wen noted that the government was deep in debt and the industrial base was nowhere in sight." https://www.stlouisfed.org/on-the-economy/2016/june/chinas-previous-attempts-industrialization Improving institutions is an extremely hard problem. The theory we have on it is of limited use (things like game theory, mechanism design, contract theory), and with AI governance/institutions specifically, we don't have much time for experimentation or room for failure. So I think this is a fine frame, but doesn't really suggest any useful conclusions aside from same old "let's pause AI so we can have more time to figure out a safe path forward".
2Chris_Leong1mo
Also: How are funding and attention "arbitrary" factors?
2ryan_greenblatt1mo
After commenting back and forth with you some more, I think it would probably be a pretty good idea to decompose your arguments into a bunch of specific more narrow posts. Otherwise, I think it's somewhat hard to engage with. Ideally, these would done with the decomposition which is most natural to your target audience, but that might be too hard. Idk what the right decomposition is, but minimally, it seems like you could write a post like "The AIs running in a given AI lab will likely have very different long run aims and won't/can't cooperate with each other importantly more than they cooperate with humans." I think this might be the main disagreement between us. (The main counterarguments to engage with are "probably all the AIs will be forks off of one main training run, it's plausible this results in unified values" and also "the AI creation process between two AI instances will look way more similar than the creation process between AIs and humans" and also "there's a chance that AIs will have an easier time cooperating with and making deals with each other than they will making deals with humans".)
4ryan_greenblatt1mo
Some quick notes: * It seems worth noting that there is still a "improve institutions" vs "improve capabilities" race going on in frame 3. (Though if you think institutions are exogenously getting better/worse over time this effect could dominate. And perhaps you think that framing things as a race/conflict is generally not very useful which I'm sympathetic to, but this isn't really a difference in objective.) * Many people agree that very good epistemics combined with good institutions would likely suffice to mostly handle risks from powerful AI. However, sufficiently good technical solutions to some key problems could also mitigate some of the problems. Thus, either sufficiently good institutions/epistemics or good technical solutions could solve many problems and improvements in both seem to help on the margin. But, there remains a question about what type of work is more leveraged for a given person on the margin. * Insofar as your trying to make an object level argument about what people should work on, you should consider separating that out into a post claiming "people should do XYZ, this is more leveraged than ABC on current margins under these values". * I think the probability of "prior to total human obsolescence, AIs will be seriously misaligned, broadly strategic about achieving long run goals in ways that lead to scheming, and present a basically unified front (at least in the context of AIs within a single AI lab)" is "only" about 10-20% likely, but this is plausibly the cause of about half of misalignment related risk prior to human obsolescence. * Rogue AIs are quite likely to at least attempt to ally with humans and opposing human groups will indeed try to make some usage of AI. So the situation might look like "rogue AIs+humans" vs AIs+humans. But, I think there are good reasons to think that the non-rogue AIs will still be misaligned and might be ambivalent about which side they prefer. * I do think there are pretty good reasons to expect

“But what about comparative advantage?” you say. Well, I would point to the example of a not-particularly-bright 7-year-old child in today’s world. Not only would nobody hire that kid into their office or factory, but they would probably pay good money to keep him out, because he would only mess stuff up.

This is an extremely minor critique given that I'm responding to a footnote, so I hope it doesn't drown out more constructive responses, but I'm actually pretty skeptical that the reason why people don't hire children as workers is because the ch... (read more)

5Steven Byrnes1mo
Thanks. I changed the wording to “moody 7-year-old” and “office or high-tech factory” which puts me on firmer ground I think.  :) I think there have been general increases in productivity across the economy associated with industrialization, automation, complex precise machines, and so on, and those things provide a separate reason (besides legal & social norms as you mentioned) that 7yos are far less employable today than in the 18th century. E.g. I can easily imagine a moody 7yo being net useful in a mom & pop artisanal candy shop, but it’s much harder to imagine a moody 7yo being net useful in a modern jelly bean factory. I think your bringing up “$3/day” gives the wrong idea; I think we should focus on whether the sign is positive or negative. If the sign is positive at all, it’s probably >$3/day. The sign could be negative because they sometimes touch something they’re not supposed to touch, or mess up in other ways, or it could simply be that they bring in extra management overhead greater than their labor contribution. (We’ve all delegated projects where it would have been far less work to just do the project ourselves, right?) E.g. even if the cost to feed and maintain a horse were zero, I would still not expect to see horses being used in a modern construction project. Anyway, I think I’m on firmer ground when talking about a post-AGI economy, in which case, literally anything that can be done by a human at all, can be automated.

In a parallel universe with a saner civilization, there must be tons of philosophy professors workings with tons of AI researchers to try to improve AI's philosophical reasoning. They're probably going on TV and talking about 养兵千日,用兵一时 (feed an army for a thousand days, use it for an hour) or how proud they are to contribute to our civilization's existential safety at this critical time. There are probably massive prizes set up to encourage public contribution, just in case anyone had a promising out of the box idea (and of course with massive associated i

... (read more)
5Wei Dai1mo
1. The super-alignment effort will fail. 2. Technological progress will continue to advance faster than philosophical progress, making it hard or impossible for humans to have the wisdom to handle new technologies correctly. I see AI development itself as an instance of this, for example the e/acc crowd trying to advance AI without regard to safety because they think it will automatically align with their values (something about "free energy"). What if, e.g., value lock-in becomes possible in the future and many decide to lock in their current values (based on their religions and/or ideologies) to signal their faith/loyalty? 3. AIs will be optimized for persuasion and humans won't know how to defend against bad but persuasive philosophical arguments aimed to manipulate them. Bad economic policies can probably be recovered from and are therefore not (high) x-risks. My answers to many of your other questions are "I'm pretty uncertain, and that uncertainty leaves a lot of room for risk." See also Some Thoughts on Metaphilosophy if you haven't already read that, as it may help you better understand my perspective. And, it's also possible that in the alternate sane universe, a lot of philosophy professors have worked with AI researchers on the questions you raised here, and adequately resolved the uncertainties in the direction of "no risk", and AI development has continued based on that understanding, but I'm not seeing that happening here either. Let me know if you want me to go into more detail on any of the questions.

I just want to clarify my general view on analogies here, because I'd prefer not to be interpreted as saying something like "you should never use analogies in arguments". In short:

I think that analogies can be good if they are used well in context. More specifically, analogies generally serve one of three purposes:

  1. Explaining a novel concept to someone
  2. Illustrating, or evoking a picture of a thing in someone's head
  3. An example in a reference class, to establish a base rate, or otherwise form the basis of a model

I think that in cases (1) and (2), analogies are ... (read more)

An aliens analogy is explicitly relying on [we have no idea what this will do]. It's easy to imagine friendly aliens, just as it's easy to imagine unfriendly ones, or entirely disinterested ones. The analogy is unlikely to lead to a highly specific, incorrect model.

As a matter of fact I think the word "alien" often evokes a fairly specific caricature that is separate from "something that's generically different and hard to predict". But it's obviously hard for me to prove what's going on in people's minds, so I'll just say what tends to flashes in my mi... (read more)

To give an example, Golden Retrievers are much more cherry-picked than Cotra's lion/chimpanzee examples. Of all the species on Earth, the ones we've successfully domesticated are a tiny, tiny minority. Maybe you'd say we have a high success rate when we try to domesticate a species, but that took a long time in each case and is still meaningfully incomplete.

My basic response is that, while you can find reasons to believe that the golden retriever analogy is worse than the chimpanzee analogy, you can equally find reasons to think the chimpanzee analogy is w... (read more)

I agree with the broad message of what I interpret you to be saying, and I do agree there's some value in analogies, as long as they are used carefully (as I conceded in the post). That said, I have some nitpicks with the way you frame the issue:

In particular,

  • future AI is like aliens in some ways but very unlike aliens in other ways;
  • future AI is like domesticated animals in some ways but very unlike them in other ways;
  • future AI is like today’s LLMs in some ways but very unlike them in other ways;

etc. All these analogies can be helpful or misleading, depend

... (read more)
3Vladimir_Nesov1mo
Being real or familiar has nothing to do with being similar to a given thing.

Ideally, people invoke analogies in order to make a point. And then readers / listeners will argue about whether the point is valid or invalid, and (relatedly) whether the analogy is illuminating or misleading. I think it’s really bad to focus discussion on, and police, the analogy target, i.e. to treat certain targets as better or worse, in and of themselves, separate from the point that’s being made.

For example, Nora was just comparing LLMs to mattresses. And I opened my favorite physics textbook to a random page and there was an prominent analogy betwee... (read more)

7Joe_Collman1mo
By tending to lead to overconfidence. An aliens analogy is explicitly relying on [we have no idea what this will do]. It's easy to imagine friendly aliens, just as it's easy to imagine unfriendly ones, or entirely disinterested ones. The analogy is unlikely to lead to a highly specific, incorrect model. This is not true for LLMs. It's easy to assume that particular patterns will continue to hold - e.g. that it'll be reasonably safe to train systems with something like our current degree of understanding. To be clear, I'm not saying they're worse in terms of information content: I'm saying they can be worse in the terms you're using to object to analogies: "routinely conveying the false impression of a specific, credible model of AI". I think it's correct that we should be very wary of the use of analogies (though they're likely unavoidable). However, the cases where we need to be the most wary are those that seem most naturally applicable - these are the cases that are most likely to lead to overconfidence. LLMs, [current NNs], or [current AI systems generally] are central examples here.   On asymmetric pushback, I think you're correct, but that you'll tend to get an asymmetry everywhere between [bad argument for conclusion most people agree with] and [bad argument for conclusion most people disagree with]. People have limited time. They'll tend to put a higher value on critiquing invalid-in-their-opinion arguments when those lead to incorrect-in-their-opinion conclusions (at least unless they're deeply involved in the discussion). There's also an asymmetry in terms of consequences-of-mistakes here: if we think that AI will be catastrophic, and are wrong, this causes a delay, a large loss of value, and a small-but-significant increase in x-risk; if we think that AI will be non-catastrophic, and are wrong, we're dead. Lack of pushback shouldn't be taken as a strong indication that people agree with the argumentation used. Clearly this isn't ideal. I do think

Fair enough. But in this case, what specifically are you proposing, then?

In this post, I'm not proposing a detailed model. I hope in the near future I can provide such a detailed model. But I hope you'd agree that it shouldn't be a requirement that, to make this narrow point about analogies, I should need to present an entire detailed model of the alignment problem. Of course, such a model would definitely help, and I hope I can provide something like it at some point soon (time and other priorities permitting), but I'd still like to separately make my point about analogies as an isolated thesis regardless.

9Thane Ruthenis1mo
My counter-point was meant to express skepticism that it is actually realistically possible for people to switch to non-analogy-based evocative public messaging. I think inventing messages like this is a very tightly constrained optimization problem, potentially an over-constrained one, such that the set of satisfactory messages is empty. I think I'm considerably better at reframing games than most people, and I know I would struggle with that. I agree that you don't necessarily need to accompany any criticism you make with a ready-made example of doing better. Simply pointing out stuff you think is going wrong is completely valid! But a ready-made example of doing better certainly greatly enhances your point: an existence proof that you're not demanding the impossible. That's why I jumped at that interpretation regarding your AI-Risk model in the post (I'd assumed you were doing it), and that's why I'm asking whether you could generate such a message now. To be clear, I would be quite happy to see that! I'm always in the market for rhetorical innovations, and "succinct and evocative gears-level public-oriented messaging about AI Risk" would be a very powerful tool for the arsenal. But I'm a-priori skeptical.

I agree but the same problem exists for "AIs are like aliens". Analogies only take you so far. AIs are their own unique things, not fully like anything else in our world.

And yet you immediately use an analogy to make your model of AI progress more intuitively digestible and convincing

I was upfront about my intention with my language in that section. Portraying me as contradicting myself is misleading because I was deliberately being evocative in the section you critique, rather than trying to present an argument. That was the whole point. The language you criticized was marked as a separate section in my post in which I wrote:

Part of this is that I don't share other people's picture about what AIs will actually look l

... (read more)

Fair enough. But in this case, what specifically are you proposing, then? Can you provide an example of the sort of object-level argument for your model of AI risk, that is simultaneously (1) entirely free of analogies and (2) is sufficiently evocative plus short plus legible, such that it can be used for effective messaging to people unfamiliar with the field (including the general public)?

When making a precise claim, we should generally try to reason through it using concrete evidence and models instead of relying heavily on analogies.

Because I'm pretty ... (read more)

On current margins it does actually seem plausible that human population growth improves value stabilization faster than it erodes your share I suppose, although I don't think I would extend that to creating an AI population larger in size than the human one.

I mean, without rapid technological progress in the coming decades, the default outcome is I just die and my values don't get stabilized in any meaningful sense. (I don't care a whole lot about living through my descendents.)

In general, I think you're probably pointing at something that might become tr... (read more)

This seemed to be a major theme of the OP—see the discussions of “extremal Goodhart”, and “the tails come apart”—so I’m confused that you don’t seem to see that as very central.

I agree that a large part of Joe's post was about the idea that human values diverge in the limit. But I think if you take the thing he wrote about human selfishness seriously, then it is perfectly reasonable to talk about the ordinary cases of value divergence too, which I think are very common. Joe wrote,

And we can worry about the human-human case for more mundane reasons, too. Th

... (read more)

I think that's a poor way to classify my view. What I said was that population growth likely causes real per-capita incomes to increase. This means that people will actually get greater control over the universe, in a material sense. Each person's total share of GDP would decline in relative terms, but their control over their "portion of the universe" would actually increase, because the effect of greater wealth outweighs the relative decline against other people

I am not claiming that population growth is merely good for us in the "medium term". In... (read more)

2interstice1mo
Yes, in the medium term. But given a very long future it's likely that any control so gained could eventually also be gained while on a more conservative trajectory, while leaving you/your values with a bigger slice of the pie in the end. So I don't think that gaining more control in the short run is very important -- except insofar as that extra control helps you stabilize your values. On current margins it does actually seem plausible that human population growth improves value stabilization faster than it erodes your share I suppose, although I don't think I would extend that to creating an AI population larger in size than the human one.

Even so, it seems valuable to explore the implications of the idea presented in the post, even if the post author did not endorse the idea fully. I personally think the alternative view—that humans naturally converge on very similar values—is highly unlikely to be true, and as Joe wrote, seems to be a "thus-far-undefended empirical hypothesis – and one that, absent a defense, might prompt questions, from the atheists, about wishful thinking".

the “humans have non-overlapping utility functions” musing is off-topic, right?

I don't think it's off-topic, since the central premise of Joe Carlsmith's post is that humans might have non-overlapping utility functions, even upon reflection. I think my comment is simply taking his post seriously, and replying to it head-on.

Separately, I agree there's a big question about whether humans have "importantly overlapping concerns" in a sense that is important and relevantly different from AI. Without wading too much into this debate, I'll just say: I agree human... (read more)

5Steven Byrnes1mo
I mean, I’m quite sure that it’s false, as an empirical claim about the normal human world, that the normal things Alice chooses to do, will tend to make a random different person Bob worse-off, on-average, as judged by Bob himself, including upon reflection. I really don’t think Joe was trying to assert to the contrary in the OP. Instead, I think Joe was musing that if Alice FOOMed to dictator of the universe, and tiled the galaxies with [whatever],  then maybe Bob would be extremely unhappy about that, comparably unhappy to if Alice was tiling the galaxies with paperclips. And vice-versa if Bob FOOMed to dictator of the universe. And that premise seems at least possible, as far as I know. This seemed to be a major theme of the OP—see the discussions of “extremal Goodhart”, and “the tails come apart”—so I’m confused that you don’t seem to see that as very central. I’m not sure how much we’re disagreeing here. I agree that the butcher and brewer are mainly working because they want to earn money. And I hope you will also agree that if the butcher and brewer and everyone else were selfish to the point of being sociopathic, it would be a catastrophe. Our society relies on the fact that there are just not many people, as a proportion of the population, who will flagrantly and without hesitation steal and lie and commit fraud and murder as long as they’re sufficiently confident that they can get away with it without getting a reputation hit or other selfishly-bad consequences. The economy (and world) relies on some minimal level of trust between employees, coworkers, business partners and so on, trust that they will generally follow norms and act with a modicum of integrity, even when nobody is looking. The reason that scams and frauds can get off the ground at all is that there is in fact a prevailing ecosystem of trust that they can exploit. Right?
Load More