All of CarlShulman's Comments + Replies

Biology-Inspired AGI Timelines: The Trick That Never Works

A perfectly correlated time series of compute and labor would not let us say which had the larger marginal contribution, but we have resources to get at that, which I was referring to with 'plausible decompositions.' This includes experiments with old and new software and hardware, like the chess ones Paul recently commissioned, and studies by AI Impacts, OpenAI, and Neil Thompson. There are AI scaling experiments, and observations of the results of shocks like the end of Dennard scaling, the availability of GPGPU computing, and Besiroglu's data on the rel... (read more)

2Charlie Steiner3hThe chess link maybe should go to hippke's work [https://www.lesswrong.com/posts/J6gktpSgYoyq5q3Au/benchmarking-an-old-chess-engine-on-new-hardware] . What you can see there is that a fixed chess algorithm takes an exponentially growing amount of compute and transforms it into logarithmically-growing Elo. Similar behavior features in recent pessimistic predictions [https://spectrum.ieee.org/deep-learning-computational-cost] of deep learning's future trajectory. If general navigation of the real world suffers from this same logarithmic-or-worse penalty when translating hardware into performance metrics, then (perhaps surprisingly) we can't conclude that hardware is the dominant driver of progress by noticing that the cost of compute is dropping rapidly.

I will have to look at these studies in detail in order to understand, but I'm confused how can this pass some obvious tests. For example, do you claim that alpha-beta pruning can match AlphaGo given some not-crazy advantage in compute? Do you claim that SVMs can do SOTA image classification with not-crazy advantage in compute (or with any amount of compute with the same training data)? Can Eliza-style chatbots compete with GPT3 however we scale them up?

Biology-Inspired AGI Timelines: The Trick That Never Works

Progress in AI has largely been a function of increasing compute, human software research efforts, and serial time/steps. Throwing more compute at researchers has improved performance both directly and indirectly (e.g. by enabling more experiments, refining evaluation functions in chess, training neural networks, or making algorithms that work best with large compute more attractive).

Historically compute has grown by many orders of magnitude, while human labor applied to AI and supporting software  by only a few. And on plausible decompositions of pro... (read more)

Historically compute has grown by many orders of magnitude, while human labor applied to AI and supporting software by only a few. And on plausible decompositions of progress (allowing for adjustment of software to current hardware and vice versa), hardware growth accounts for more of the progress over time than human labor input growth.

So if you're going to use an AI production function for tech forecasting based on inputs (which do relatively OK by the standards tech forecasting), it's best to use all of compute, labor, and time, but it makes sense

... (read more)

I commend this comment and concur with the importance of hardware, the straw-manning of Moravec, etc.

However I do think that EY had a few valid criticisms of Ajeya's model in particular - it ends up smearing probability mass over many anchors or sub-models, most of which are arguably poorly grounded in deep engineering knowledge. And yes you can use it to create your own model, but most people won't do that and are just looking at the default median conclusion.

Moore's Law is petering out as we run up against the constraints of physics for practical irrever... (read more)

The evaluation function of an AI is not its aim

You may be interested in some recent empirical experiments, demonstrating objective robustness failures/inner misalignment, including ones predicted in the risks from learned optimization paper.

What will 2040 probably look like assuming no singularity?

There is at least one firm doing drone delivery in China and they just approved a standard for it.

What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)

Mainly such complete (and irreversible!) delegation to such incompetent systems being necessary or executed. If AI is so powerful that the nuclear weapons are launched on hair-trigger without direction from human leadership I expect it to not be awful at forecasting that risk.

You could tell a story where bargaining problems lead to mutual destruction, but the outcome shouldn't be very surprising on average, i.e. the AI should be telling you about it happening with calibrated forecasts.

3JesseClifton7moOk, thanks for that. I’d guess then that I’m more uncertain than you about whether human leadership would delegate to systems who would fail to accurately forecast catastrophe. It’s possible that human leadership just reasons poorly about whether their systems are competent in this domain. For instance, they may observe that their systems perform well in lots of other domains, and incorrectly reason that “well, these systems are better than us in many domains, so they must be better in this one, too”. Eagerness to deploy before a more thorough investigation of the systems’ domain-specific abilities may be exacerbated by competitive pressures. And of course there is historical precedent for delegation to overconfident military bureaucracies. On the other hand, to the extent that human leadership is able to correctly assess their systems’ competence in this domain, it may be only because there has been a sufficiently successful AI cooperation research program. For instance, maybe this research program has furnished appropriate simulation environments to probe the relevant aspects of the systems’ behavior, transparency tools for investigating cognition about other AI systems, norms for the resolution of conflicting interests and methods for robustly instilling those norms, etc, along with enough researcher-hours applying these tools to have an accurate sense of how well the systems will navigate conflict. As for irreversible delegation — there is the question of whether delegation is in principle reversible, and the question of whether human leaders would want to override their AI delegates once war is underway. Even if delegation is reversible, human leaders may think that their delegates are better suited to wage war on their behalf once it has started. Perhaps because things are simply happening so fast for them to have confidence that they could intervene without placing themselves at a decisive disadvantage.
What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)

The US and China might well wreck the world  by knowingly taking gargantuan risks even if both had aligned AI advisors, although I think they likely wouldn't.

But what I'm saying is really hard to do is to make the scenarios in the OP (with competition among individual corporate boards and the like) occur without extreme failure of 1-to-1 alignment (for both companies and governments). Competitive pressures are the main reason why AI systems with inadequate 1-to-1 alignment would be given long enough leashes to bring catastrophe. I would cosign Vanessa... (read more)

The US and China might well wreck the world by knowingly taking gargantuan risks even if both had aligned AI advisors, although I think they likely wouldn't.

But what I'm saying is really hard to do is to make the scenarios in the OP (with competition among individual corporate boards and the like) occur without extreme failure of 1-to-1 alignment

I'm not sure I understand yet. For example, here’s a version of Flash War that happens seemingly without either the principals knowingly taking gargantuan risks or extreme intent-alignment failure.

  1. The pri

... (read more)
What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)

I think I disagree with you on the tininess of the advantage conferred by ignoring human values early on during a multi-polar take-off.  I agree the long-run cost of supporting humans is tiny, but I'm trying to highlight a dynamic where fairly myopic/nihilistic power-maximizing entities end up quickly out-competing entities with other values, due to, as you say, bargaining failure on the part of the creators of the power-maximizing entities.



Right now the United States has a GDP of >$20T, US plus its NATO allies and Japan >$40T, the PRC >$14T,... (read more)

2Sammy Martin1moIn my recent writeup of an investigation into AI Takeover scenarios I made an identical comparison - i.e. that the optimistic analogy looks like avoiding nuclear MAD for a while and the pessimistic analogy looks like optimal climate mitigation [https://www.lesswrong.com/posts/zkF9PNSyDKusoyLkP/investigating-ai-takeover-scenarios#Unprecedentedly_Dangerous] :
3Andrew_Critch8moCarl, thanks for this clear statement of your beliefs. It sounds like you're saying (among other things) that American and Chinese cultures will not engage in a "race-to-the-bottom" in terms of how much they displace human control over the AI technologies their companies develop. Is that right? If so, could you give me a % confidence on that position somehow? And if not, could you clarify? To reciprocate: I currently assign a ≥10% chance of a race-to-the-bottom on AI control/security/safety between two or more cultures this century, i.e., I'd bid 10% to buy in a prediction market on this claim if it were settlable. In more detail, I assign a ≥10% chance to a scenario where two or more cultures each progressively diminish the degree of control they exercise over their tech, and the safety of the economic activities of that tech to human existence, until an involuntary human extinction event. (By comparison, I assign at most around a ~3% chance of a unipolar "world takeover" event, i.e., I'd sell at 3%.) I should add that my numbers for both of those outcomes are down significantly from ~3 years ago due to cultural progress in CS/AI (see this ACM blog post [https://acm-fca.org/2018/03/29/negativeimpacts/]) allowing more discussion of (and hence preparation for) negative outcomes, and government pressures to regulate the tech industry.
Another (outer) alignment failure story

I think they are fighting each other all the time, though mostly in very prosaic ways (e.g. McDonald's and Burger King's marketing AIs are directly competing for customers). Are there some particular conflicts you imagine that are suppressed in the story?

 

I think the one that stands out the most is 'why isn't it possible for some security/inspector AIs to get a ton of marginal reward by whistleblowing against the efforts required for a flawless global camera grab?' I understand the scenario says it isn't because the demonstrations are incomprehensible, but why/how?

7paulfchristiano8moYes, if demonstrations are comprehensible then I don't think you need much explicit AI conflict to whistleblow since we will train some systems to explain risks to us. The global camera grab must involve plans that aren't clearly bad to humans even when all the potential gotchas are pointed out. For example they may involve dynamics that humans just don't understand, or where a brute force simulation or experiment would be prohibitively expensive without leaps of intuition that machines can make but humans cannot. Maybe that's about tiny machines behaving in complicated ways or being created covertly, or crazy complicated dynamics of interacting computer systems that humans can't figure out. It might involve the construction of new AI-designed AI systems which operate in different ways whose function we can't really constrain except by seeing predictions of their behavior from an even-greater distance (machines which are predicted to lead to good-looking outcomes, which have been able to exhibit failures to us if so-incentivized, but which are even harder to control). (There is obviously a lot you could say about all the tools at the human's disposal to circumvent this kind of problem.) This is one of the big ways in which the story is more pessimistic than my default, and perhaps the highlighted assumptions rule out the most plausible failures, especially (i) multi-year takeoff, (ii) reasonable competence on behalf of the civilization, (iii) "correct" generalization. Even under those assumptions I do expect events to eventually become incomprehensible in the necessary ways, but it feels more likely that there will be enough intervening time for ML systems to e.g. solve alignment or help us shift to a new world order or whatever. (And as I mention, in the worlds where the ML systems can't solve alignment well enough in the intervening time, I do agree that it's unlikely we can solve it in advance.)
"New EA cause area: voting"; or, "what's wrong with this calculation?"

That is the opposite error, where one cuts off the close election cases. The joint probability density function over vote totals is smooth because of uncertainty (which you can see from polling errors), so your chance of being decisive scales proportionally with the size of the electorate and the margin of error in polling estimation.

"New EA cause area: voting"; or, "what's wrong with this calculation?"

The error is a result of assuming the coin is exactly 50%, in fact polling uncertainties mean your probability distribution over its 'weighting' is smeared over at least several percentage points. E.g. if your credence from polls/538/prediction markets is smeared uniformly from 49% to 54%, then the chance of the election being decided by a single vote is one divided by 5% of the # of voters.

You can see your assumption is wrong because it predicts that tied elections should be many orders of magnitude more common than they are. There is a symmetric error wh... (read more)

The Upper Limit of Value

As I said, the story was in combination with one-boxing decision theories and our duplicate counterparts.

The Upper Limit of Value

I suppose by 'the universe' I meant what you would call the inflationary multiverse, that is including distant regions we are now out of contact with. I personally tend not to call regions separated by mere distance separate universes.

"and the only impact of our actions with infinite values is the number of black holes we create."

Yes, that would be the infinite impact I had in mind, doubling the number would double the number of infinite branching trees of descendant universes.

Re simulations, yes, there is indeed a possibility of influencing other levels, although we would be more clueless, and it is a way for us to be in a causally connected patch with infinite future.

2Davidmanheim10moWe tried to be clear that we were discussing influenceable value, i.e. value relevant for decisions. Unreachable parts of our universe, which are uninfluenceable, may not be finite, but not in a way that changes any decision we would make. I agree that they are part of the universe, but I think that if we assume standard theories of physics, i.e. without child universes and without assuming simulation, the questions in infinite ethics don't make them relevant. But we should probably qualify these points more clearly in the paper.
The Upper Limit of Value

Thanks David, this looks like a handy paper! 

Given all of this, we'd love feedback and discussion, either as comments here, or as emails, etc.

I don't agree with the argument that infinite impacts of our choices are of Pascalian improbability, in fact I think we probably face them as a consequence of one-boxing decision theory, and some of the more plausible routes to local infinite impact are missing from the paper:

  • The decision theory section misses the simplest argument for infinite value: in an infinite inflationary universe with infinite copies of
... (read more)
2Davidmanheim10moThanks for this! If I understand you initial point, I agree that the route to infinite value wouldn't be through infinitesimal probabilities, as we say in the paper. I'm less sure what you mean by "one-boxing decision theory" - we do discuss alternative decision theories briefly, but find only a limited impact of even non-causal decision theories without also accepting multiverses, and not renormalizing value. Regarding "in an infinite inflationary universe with infinite copies of me," we point out in the paper that the universe cannot support infinite copies of anything, since it's bounded in mass, space, and time - see A.2.1 and A.4. You suggest that there may be ways around this in your next two claims. Regarding baby universes, perhaps we should have addressed it - as we noted in the introduction, we limited the discussion to a fairly prosaic setting. However, assuming Smolin's model, we still have no influence on the contents of the baby universe. If we determined that those universes were of positive value, despite having no in-principle way of determining their content or accessing them, then I could imagine tiling the universe with black holes to maximize the number of such universes is a possible optimal strategy - and the only impact of our actions with infinite values is the number of black holes we create. Finally, if we accept the simulation hypothesis, we again have no necessary access to the simulators' universe. Only if we both accept the hypothesis and believe we can influence the parent universe in determinable ways can we make decisions that have an infinite impact. In that case, infinite value is again only accessible via this route.
What trade should we make if we're all getting the new COVID strain?

Little reaction to the new strain news, or little reaction to new strains outpacing vaccines and getting a large chunk of the population over the next several months?

2PeterMcCluskey1yLittle reaction to the likely spread of the new strains.
The Colliding Exponentials of AI

These projections in figure 4 seem to falsely assume training optimal compute scales linearly with model size. It doesn't, you also need to show more data points to the larger models so training compute grows superlinearly, as discussed in OAI scaling papers. That changes the results by orders of magnitude (there is uncertainty about which of two inconsistent scaling trends to extrapolate further out, as discussed in the papers).

5Daniel Kokotajlo1yThis is not what I took from those papers. The scaling laws paper has a figure showing that if you hold data fixed and increase model size, performance improves, whereas "you need to show more data points to the larger models" would predict that performance would degrade, because if the model gets larger then it's needs aren't being met. Rather, what's going on is that at the optimal compute allocation larger models get shown more data points. The way to maximize performance with a given increase in compute is to allocate a bit more than half of the increased compute to increased model size, and the remainder to increased data. That said, figure 4 still overestimates the gains we should expect from increased compute, I think. But for a different reason: The small models in that figure were given "too much data," (they were all given 300B tokens IIRC) and thus represent inefficient uses of compute -- the same amount of compute would have led to more performance if they had increased model size a bit and decreased data. So the "true slope" of the line--the slope the line would have if compute had been used optimally, which is what we want to extrapolate--would be slightly smaller.
Are we in an AI overhang?
Maybe the real problem is just that it would add too much to the price of the car?

Yes. GPU/ASICs in a car will have to sit idle almost all the time, so the costs of running a big model on it will be much higher than in the cloud.

Rafael Harth's Shortform

I'm not a utilitarian, although I am closer to that than most people (scope sensitivity goes a long way in that direction), and find it a useful framework for highlighting policy considerations (but not the only kind of relevant normative consideration).

And no, Nick did not assert an estimate of x-risk as simultaneously P and <P.

3ChristianKl1yHow does it feel to be considered important enough by GTP-3 to be mentioned?
Tips/tricks/notes on optimizing investments

This can prevent you from being able to deduct the interest as investment interest expense on your taxes due to interest tracing rules (you have to show the loan was not commingled with non-investment funds in an audit), and create a recordkeeping nightmare at tax time.

Open & Welcome Thread - June 2020

Re hedging, a common technique is having multiple fairly different citizenships and foreign-held assets, i.e. such that if your country become dangerously oppressive you or your assets wouldn't be handed back to it. E.g. many Chinese elites pick up a Western citizenship for them or their children, and wealthy people fearing change in the US sometimes pick up New Zealand or Singapore homes and citizenship.

There are many countries with schemes to sell citizenship, although often you need to live in them for some years after you make your investment. Th... (read more)

6Wei_Dai1yI was initially pretty excited about the idea of getting another passport, but on second thought I'm not sure it's worth the substantial costs involved. Today people aren't losing their passports or having their movements restricted for (them or their family members) having expressed "wrong" ideas, but just(!) losing their jobs, being publicly humiliated, etc. This is more the kind of risk I want to hedge against (with regard to AI), especially for my family. If the political situation deteriorates even further to where the US government puts official sanctions on people like me, humanity is probably just totally screwed as a whole and having another passport isn't going to help me that much.
7hg001yPermanent residency (as opposed to citizenship) is a budget option. For example, for Panama, I believe if you're a citizen of one of 50 nations on their "Friendly Nations" list, you can obtain permanent residency by depositing $10K in a Panamanian bank account. If I recall correctly, Paraguay's permanent residency has similar prerequisites ($5K deposit required) and is the easiest to maintain--you just need to be visiting the country every 3 years.
The EMH Aten't Dead
April was the stock market's best month in 30 years, which is not really what you expect during a global pandemic.

Historically the biggest short-term gains have been disproportionately amidst or immediately following bear markets, when volatility is highest.

Right, April's rally wasn't due to "actually, everything is great now", it was due to "whew, it looks like the most apocalyptic scenarios we were seeing in March aren't likely, and there's a limit to how bad it's going to get".

The EMH Aten't Dead

Sure, it's part of how they earn money, but competition between them limits what's left, since they're bidding against each other to take the other side from the retail investor, who buys from or sells to the hedge fund offering the best deal at the time (made somewhat worse by deadweight losses from investing in speed).

The EMH Aten't Dead
It doesn't suggest that. Factually, we know that a majority of investors underperform indexes.

Absolutely, I mean that when you break out the causes of the underperformance, you can see how much is from spending time out of the market, from paying high fees, from excessive trading to pay spreads and capital gains taxes repeatedly, from retail investors not starting with all their future earnings invested (e.g. often a huge factor in the Dalbar studies commonly cited to sell high fee mutual funds to retail investors), and how much from unwittingly i... (read more)

3ChristianKl2yActive investors need to spend money to hire analysts, build computer models and high-frequency trading computers. Let's say it costs $10 dollar/per trade to do the analysis to be able to do a trade with a retail investor that nets the hedge fund $10.10. Even when there's no strong competition with other hedge funds over that $0.10 of profit, the retail investor is still screwed by a significant $10.10.
The EMH Aten't Dead

Thank you, I enjoyed this post.

One thing I would add is that the EMH also suggests one can make deviations that don't have very high EMH-predicted costs. Small investors do underperform indexes a lot by paying extra fees, churning with losses to spreads and capital gains taxes, spending time out of the market, and taking too much or too little over risk (and especially too much uncompensated risk from under diversification). But given the EMH they also can't actively pick equities with large expected underperformance. Otherwise, a hedge fund coul... (read more)

I’d also flag that going all-in on EMH and modern financial theory still leads to fairly unusual investing behavior for a retail investor, moreso than I had thought before delving into it.

Seconding this. It turns out that investing under the current academic version of EMH (with time-varying risk premia and multifactor models) is a lot more complicated than putting one's money into an index fund. I'm still learning, but one thing even Carl didn't mention is that modern EMH is compatible with (even demands) certain forms of market timing, if your financi

... (read more)
5ChristianKl2yIt doesn't suggest that. Factually, we know that a majority of investors underperform indexes. When there's an event that will cause retail investors to predictively make bad investments some hedge fund will do high frequency trades as soon the event becomes known to be able to trade the opposite site of the trade. All events that cause more retail investors to buy a stock then it cause retail investors to sell the stock needs some hedge fund or bank to take the opposite side of the trade and likely that hedge fund or bank is in the trade because it has models that suggest it's a good trade for them. A hedge fund that provides liquity to trades is going to make as much money under EMH as it cost to do the market making when it competes with other hedge funds. It worth noting that a targeted date index fund does make predictable trades where someone needs to do the market making and will likely make a small profit for doing the market making.
3Thomas Kwa2yDo we know that this isn't currently happening, i.e. that observing what retail investors buy and betting against them isn't a major profit stream for hedge funds?
Fast Takeoff in Biological Intelligence

I agree human maturation time is enough on its own to rule out a human reproductive biotech 'fast takeoff,' but also:

  • In any given year the number of new births is very small relative to the existing workforce, of billions of humans, including many people with extraordinary abilities
  • Most of those births are unplanned or to parents without access to technologies like IVF
  • New reproductive technologies are adopted gradually by risk-averse parents
  • Any radical enhancement would carry serious risks of negative surprise side effects, further reducing the u
... (read more)
2019 AI Alignment Literature Review and Charity Comparison
MIRI researchers contributed to the following research led by other organisations
MacAskill & Demski's A Critique of Functional Decision Theory

This seems like a pretty weird description of Demski replying to MacAskill's draft.

2Ben Pace2yI also thought so. I wondered maybe if Larks is describing that MacAskill incorporated Demski's comments-on-a-draft into the post.
Does GPT-2 Understand Anything?

The interesting content kept me reading, but it would help the reader to have lines between paragraphs in the post.

5Douglas Summers-Stay2yfixed
Honoring Petrov Day on LessWrong, in 2019

I have launch codes and don't think this is good. Specifically, I think it's bad.

Did you consider the unilateralist curse before making this comment?

Do you consider it to be a bad idea if you condition the assumption that only one other person with launch access who sees this post in the time window choose to say it was a bad idea?

5jefftk2yIs the objection over the amount (there's a higher number where it would be a good trade), being skeptical of the counterfactuality of the donation (would the money really be spent fully selfishly?), or something else?
Why so much variance in human intelligence?

A mouse brain has ~75 million neurons, a human brain ~85 billion neurons. The standard deviation of human brain size is ~10%. If we think of that as a proportional increase rather than an absolute increase in the # of neurons, that's ~74 standard deviations of difference. The correlation between # of neurons and IQ in humans is ~0.3, but that's still a massive difference. Total neurons/computational capacity does show a pattern somewhat like that in the figure. Chimps' brains are a factor of ~3x smaller than humans, ~12 standard deviations.

S... (read more)

Tal Yarkoni: No, it's not The Incentives—it's you

Survey and other data indicate that in these fields most people were doing p-hacking/QRPs (running tests selected ex post, optional stopping, reporting and publication bias, etc), but a substantial minority weren't, with individual, subfield, and field variation. Some people produced ~100% bogus work while others were ~0%. So it was possible to have a career without the bad practices Yarkoni criticizes, aggregating across many practices to look at overall reproducibility of research.

And he is now talking about people who have been informed about the... (read more)

4rohinmshah2yI'm curious how many were able to hit 0%? Based on my 10x estimate [https://www.lesswrong.com/posts/5nH5Qtax9ae8CQjZ9/no-it-s-not-the-incentives-it-s-you#jNWHKjBq33cpozHqh] below I'd estimate 9%, but that was definitely a number I pulled out of nowhere. I personally feel the most pressure to publish because the undergrads I work with need a paper to get into grad school. I wonder if it's similar for tenured professors with their grad students. Also, the article seems to be condemning academics who are not tenured, e.g. Thought experiment (that I acknowledge is not reality): Suppose that it were actually the case that in order to stay in academia you had to engage in QRPs. Do you still think it is right to call out / punish such people? It seems like this ends up with you always punishing everyone in academia, with no gains to actually published research, or you abolish academia outright.
Unconscious Economics

There is a literature on firm productivity showing large firm variation in productivity and average productivity growth by expansion of productive firms relative less productive firms. E.g. this , this , this , and this.

9ESRogs3yI'm not totally sure I'm parsing this sentence correctly. Just to clarify, "large firm variation in productivity" means "large variation in the productivity of firms" rather than "variation in the productivity of large firms", right? Also, the second part is saying that on average there is productivity growth across firms, because the productive firms expand more than the less productive firms, yes?
What failure looks like

OK, thanks for the clarification!

My own sense is that the intermediate scenarios are unstable: if we have fairly aligned AI we immediately use it to make more aligned AI and collectively largely reverse things like Facebook click-maximization manipulation. If we have lost the power to reverse things then they go all the way to near-total loss of control over the future. So i would tend to think we wind up in the extremes.

I could imagine a scenario where there is a close balance among multiple centers of AI+human power, and some but not all of those centers... (read more)

5SoerenMind3yIt'd be nice to hear a response from Paul to paragraph 1. My 2 cents: I tend to agree that we end up with extremes eventually. You seem to say that we would immediately go to alignment given somewhat aligned systems so Paul's 1st story barely plays out. Of course, the somewhat aligned systems may aim at the wrong thing if we try to make them solve alignment. So the most plausible way it could work is if they produce solutions that we can check. But if this were the case, human supervision would be relatively easy. That's plausible but it's a scenario I care less about. Additionally, if we could use somewhat aligned systems to make more aligned ones, iterated amplification probably works for alignment (narrowly defined by "trying to do what we want"). The only remaining challenge would be to create one system that's somewhat smarter than us and somewhat aligned (in our case that's true by assumption). The rest follows, informally speaking, by induction as long as the AI+humans system can keep improving intelligence as alignment is improved. Which seems likely. That's also plausible but it's a big assumption and may not be the most important scenario / isn't a 'tale of doom'.
What failure looks like
Failure would presumably occur before we get to the stage of "robot army can defeat unified humanity"---failure should happen soon after it becomes possible, and there are easier ways to fail than to win a clean war. Emphasizing this may give people the wrong idea, since it makes unity and stability seem like a solution rather than a stopgap. But emphasizing the robot army seems to have a similar problem---it doesn't really matter whether there is a literal robot army, you are in trouble anyway.

I agree other powerful tools can achieve the s... (read more)

I do agree there was a miscommunication about the end state, and that language like "lots of obvious destruction" is an understatement.

I do still endorse "military leaders might issue an order and find it is ignored" (or total collapse of society) as basically accurate and not an understatement.

What failure looks like
I think we can probably build systems that really do avoid killing people, e.g. by using straightforward versions of "do things that are predicted to lead to videos that people rate as acceptable," and that at the point when things have gone off the rails those videos still look fine (and to understand that there is a deep problem at that point you need to engage with complicated facts about the situation that are beyond human comprehension, not things like "are the robots killing people?"). I'm not visualizing the case where no
... (read more)

My median outcome is that people solve intent alignment well enough to avoid catastrophe. Amongst the cases where we fail, my median outcome is that people solve enough of alignment that they can avoid the most overt failures, like literally compromising sensors and killing people (at least for a long subjective time), and can build AIs that help defend them from other AIs. That problem seems radically easier---most plausible paths to corrupting sensors involve intermediate stages with hints of corruption that could be recognized by a weaker AI (and hence ... (read more)

What failure looks like

I think the kind of phrasing you use in this post and others like it systematically misleads readers into thinking that in your scenarios there are no robot armies seizing control of the world (or rather, that all armies worth anything at that point are robotic, and so AIs in conflict with humanity means military force that humanity cannot overcome). I.e. AI systems pursuing badly aligned proxy goals or influence-seeking tendencies wind up controlling or creating that military power and expropriating humanity (which eventually couldn't fight back ther... (read more)

I agree that robot armies are an important aspect of part II.

In part I, where our only problem is specifying goals, I don't actually think robot armies are a short-term concern. I think we can probably build systems that really do avoid killing people, e.g. by using straightforward versions of "do things that are predicted to lead to videos that people rate as acceptable," and that at the point when things have gone off the rails those videos still look fine (and to understand that there is a deep problem at that point you need to engage wit... (read more)

The Vox article also mistakes the source of influence-seeking patterns to be about social influence rather than 'systems that try to increase in power and numbers tend to do so, so are selected for if we accidentally or intentionally produce them and don't effectively weed them out; this is why living things are adapted to survive and expand; such desires motivate conflict with humans when power and reproduction can be obtained by conflict with humans, which can look like robot armies taking control.

Yes, I agree the Vox article made this mistake.... (read more)

Act of Charity

There's an enormous difference between having millions of dollars of operating expenditures in an LLC (so that an org is legally allowed to do things like investigate non-deductible activities like investment or politics), and giving up the ability to make billions of dollars of tax-deductible donations. Open Philanthropy being an LLC (so that its own expenses aren't tax-deductible, but it has LLC freedom) doesn't stop Good Ventures from making all relevant donations tax-deductible, and indeed the overwhelming majority of grants on its grants page are deductible.

6habryka3yYep, sorry. I didn't mean to imply that all of Open Phil's funding is non-deductible, just that they decided that it was likely enough that they would find non-deductible opportunities that they went through the effort of restructuring their org to do so (and also gave up a bunch of other benefits like the ability to sponsor visas efficiently). My comment wasn't very clear on that.
Two Neglected Problems in Human-AI Safety

I think this is under-discussed, but also that I have seen many discussions in this area. E.g. I have seen it come up and brought it up in the context of Paul's research agenda, where success relies on humans being able to play their part safely in the amplification system. Many people say they are more worried about misuse than accident on the basis of the corruption issues (and much discussion about CEV and idealization, superstimuli, etc addresses the kind of path-dependence and adversarial search you mention).

However, those varied problems mostly ... (read more)

4Wei_Dai3yI agree with all of this but I don't think it addresses my central point/question. (I'm not sure if you were trying to, or just making a more tangential comment.) To rephrase, it seems to me that ‘ML safety problems in humans’ is a natural/obvious framing that makes clear that alignment to human users/operators is likely far from sufficient to ensure the safety of human-AI systems, that in some ways corrigibility is actually opposed to safety, and that there are likely technical angles of attack on these problems. It seems surprising that someone like me had to point out this framing to people who are intimately familiar with ML safety problems, and also surprising that they largely respond with silence.
"Artificial Intelligence" (new entry at Stanford Encyclopedia of Philosophy)

Another Bringsjord classic :

> However, we give herein a novel, formal modal argument showing that since it's mathematically possible that human minds are hypercomputers, such minds are in fact hypercomputers.

4CarlShulman3yNo superintelligent AI computers [http://kryten.mm.rpi.edu/SB_singularity_math_final.pdf], because they lack hypercomputation.
S-risks: Why they are the worst existential risks, and how to prevent them

That's what the congenital deafness discussion was about.

You have preferences over pain and pleasure intensities that you haven't experienced, or new durations of experiences you know. Otherwise you wouldn't have anything to worry about re torture, since you haven't experienced it.

Consider people with pain asymbolia:

Pain asymbolia is a condition in which pain is perceived, but with an absence of the suffering that is normally associated with the pain experience. Individuals with pain asymbolia still identify the stimulus as painful but do not display the

... (read more)
0cousin_it4yMusic and chocolate are known to be mostly safe. I guess I'm more cautious about new self-modifications that can change my decisions massively, including decisions about more self-modifications. It seems like if I'm not careful, you can devise a sequence that will turn me into a paperclipper. That's why I discount such agents for now, until I understand better what CEV means.
S-risks: Why they are the worst existential risks, and how to prevent them

"My point was comparing pains and pleasures that could be generated with similar amount of resources. Do you think they balance out for human decision making?"

I think with current tech it's cheaper and easier to wirehead to increase pain (i.e. torture) than to increase pleasure or reduce pain. This makes sense biologically, since organisms won't go looking for ways to wirehead to maximize their own pain, evolution doesn't need to 'hide the keys' as much as with pleasure or pain relief (where the organism would actively seek out easy means of subv... (read more)

0cousin_it4yWe could certainly make agents for whom pleasure and pain would use equal resources per util. The question is if human preferences today (or extrapolated) would sympathize with such agents to the point of giving them the universe. Their decision-making could look very inhuman to us. If we value such agents with a discount factor, we're back at square one.
S-risks: Why they are the worst existential risks, and how to prevent them

"one filled with pleasure and the other filled with pain, feels strongly negative rather than symmetric to us"

Comparing pains and pleasures of similar magnitude? People have a tendency not to do this, see the linked thread.

"Another sign is that pain is an internal experience, while our values might refer to the external world (though it's very murky"

You accept pain and risk of pain all the time to pursue various pleasures, desires and goals. Mice will cross electrified surfaces for tastier treats.

If you're going to care about hedonic st... (read more)

0cousin_it4yMy point was comparing pains and pleasures that could be generated with similar amount of resources. Do you think they balance out for human decision making? For example, I'd strongly disagree to create a box of pleasure and a box of pain, do you think my preference would go away after extrapolation?
Increasing GDP is not growth

I meant GWP without introducing the term. Edited for clarity.

Increasing GDP is not growth

If you have a constant population, and GDP increases, productivity per person has increased. But if you have a border on a map enclosing some people, and you move it so it encloses more people, productivity hasn't increased.

Can you give examples of people confirmed to be actually making the mistake this post discusses? I don't recall seeing any.

The standard economist claim (and the only version I've seen promulgated in LW and EA circles) is that it increases gross world product (total and per capita) because migrants are much more productive when they ... (read more)

0Douglas_Knight5yDid you mean world to modify GDP? If you did, that's really confusing, because GDP ("domestic") is specifically local. If you concatenate "world GDP" is pretty clear what you mean, but if you separate that like this, it is natural to parse it as "world and national," which is probably not what you mean, since that is pretty much the error Phil is talking about. Your links are careful to always concatenate, though.
Claim explainer: donor lotteries and returns to scale

I came up with the idea and basic method, then asked Paul if he would provide a donor lottery facility. He did so, and has been taking in entrants and solving logistical issues as they come up.

I agree that thinking/researching/discussing more dominates the gains in the $1-100k range.

0The_Jaded_One5yIs giving small amounts of money away really something that individuals should spend a lot of time thinking about - like many days of research? Is picking 9 other people whose competence you trust and delegating the decision to a randomly chosen one of them much easier than just doing whatever research you wanted to do, and then sharing your results? Are givewell doing duch a bad job at making recommendations that you have to improvise this and do their job for them?
0Benquo5yI think we have an underassignment of credit problem here. You can't both be the junior partner in this. :P Thanks for describing the details a bit more.
0Benquo5yI too agree that the gains mainly come from more/better evaluation.
Optimizing the news feed

A different possibility is identifying vectors in Facebook-behavior space, and letting users alter their feeds accordingly, e.g. I might want to see my feed shifted in the direction of more intelligent users, people outside the US, other political views, etc. At the individual level, I might be able to request a shift in my feed in the direction of individual Facebook friends I respect (where they give general or specific permission).

Synthetic supermicrobe will be resistant to all known viruses

That advantage only goes so far:

  • Plenty of nonviral bacteria-eating entities exist, and would become more numerous
  • Plant and antibacterial defenses aren't viral-based
  • For the bacteria to compete in the same niche as unmodified versions it has to fulfill a similar ecological role: photosynthetic cyanobacteria with altered DNA would still produce oxygen and provide food
  • It couldn't benefit from exchanging genetic material with other kinds of bacteria
Astrobiology III: Why Earth?

Primates and eukaryotes would be good.

5CellBioGuy5yThe short version before I get a chance to write more posts: Primates appear to be an interestingly potentiated lineage, prone to getting smart when they get large, due to differences in brain development established 50+ megayears ago that make their brains much more impressive per unit volume than most larger mammals. The great apes other than humans actually seem to run into energetic limits to feeding their brains and have smaller brains than you'd expect for a primate of their size, while humans are right on the generic primate trendline. Birds are another potentiated lineage - their brains are about 6x as compact as a comparable primate brain. Eukaryotes are really weird. The one thing that is incontrovertible these days is that the classic 3-domains-of-life idea, with eukaryotes and archaea as sister clades, is turning out to be wrong. Eukaryotes are turning out to have come from a fusion/symbiosis of a bacterium and something that fits entirely within the archaeal domain. Various people who are studying their origin and evolution have their pet models and hold to them too tightly and fight each other bitterly, though some things are finally coming out for sure. A lot of their weird features may come from particular population genetic selective pressures that come from competition between copies of the mitochondrial genome, and a lot of others may come from the fact that they invented sex and have low population sizes both of which allow types of evolution and genetic drift that you are much less likely to see in the eubacteria or archaebacteria, the two 'primary' domains (whose separation represent the deepest branch in the tree of life). But the fact that ALL eukaryotes have a huge constellation of weird traits with no intermediate forms means their origin was a weird event, and opinions vary on if that means it was a singular extremely unlikely event or if all those weird properties come logically from how they formed, and on if there was strong first-mov
Quick puzzle about utility functions under affine transformations

Your example has 3 states: vanilla, chocolate, and neither.

But you only explicitly assigned utilities to 2 of them, although you implicitly assigned the state of 'neither' a utility of 0 initially. Then when you applied the transformation to vanilla and chocolate you didn't apply it to the 'neither' state, which altered preferences for gambles over both transformed and untransformed states.

E.g. if we initially assigned u(neither)=0 then after the transformation we have u(neither)=4, u(vanilla)=7, u(chocolate)=12. Then an action with a 50% chance of neither and 50% chance of chocolate has expected utility 8, while the 100% chance of vanilla has expected utility 7.

A toy model of the control problem

Maybe explain how it works when being configured, and then stops working when B gets a better model of the situation/runs more trial-and-error trials?

0Stuart_Armstrong6yOk.
Load More