All of elifland's Comments + Replies

3Jakub Kraus6mo
Without specific countermeasures... [https://www.lesswrong.com/posts/pRkFkzwKZ2zfa3R6H/without-specific-countermeasures-the-easiest-path-to] seems similar to Carlsmith (they present similar arguments in a similar manner and utilize the philosophy approach [https://bounded-regret.ghost.io/more-is-different-for-ai/]), so I wouldn't expect it to do much better.
4Vael Gates6mo
Yeah, we were focusing on shorter essays for this pilot survey (and I think Richard's revised essay came out a little late in the development of this survey? Can't recall) but I'm especially interested in "The alignment problem from a deep learning perspective", since it was created for an ML audience.

Mostly agree. For some more starting points, see posts with the AI-assisted alignment tag. I recently did a rough categorization of strategies for AI-assisted alignment here.

If this strategy is promising, it likely recommends fairly different prioritisation from what the alignment community is currently doing.

Not totally sure about this, my impression (see chart here) is that much of the community already considers some form of AI-assisted alignment to be our best shot. But I'd still be excited for more in-depth categorization and prioritization of strateg... (read more)

Agree directionally. I made a similar point in my review of "Is power-seeking AI an existential risk?":

In one sentence, my concern is that the framing of the report and decomposition is more like “avoid existential catastrophe” than “achieve a state where existential catastrophe is extremely unlikely and we are fulfilling humanity’s potential”, and this will bias readers toward lower estimates.

Meanwhile Rationality A-Z is just super long. I think anyone who's a longterm member of LessWrong or the alignment community should read the whole thing sooner or later – it covers a lot of different subtle errors and philosophical confusions that are likely to come up (both in AI alignment and in other difficult challenges)

My current guess is that the meme "every alignment person needs to read the Sequences / Rationality A-Z" is net harmful.  They seem to have been valuable for some people but I think many people can contribute to reducing AI x-risk ... (read more)

2Ruby8mo
I think the tough thing here is it's very hard to evaluate who, if anyone, is making any useful contributions. After all, no one has successfully aligned a superintelligence to date. Maybe it's all way-off track. All else equal, I trust people who've read the Sequences to be better judges of whether we're making progress in the absence of proper end-to-end feedback than those who haven't. Caveat: I am not someone who could plausibly claim to have made any potential contribution myself. :P
3Joseph Bloom8mo
I think it's plausible that it is either harmful to perpetuate "every alignment person needs to read the Sequences / Rationality A-Z" or maybe even inefficient. For example, to the extent that alignment needs more really good machine learning engineers, it's possible they might benefit less from the sequences than a conceptual alignment researcher. However, relying on anecdotal evidence seems potentially unnecessary. We might be able to use polls, or otherwise systemically investigate the relationship between interest/engagement with the sequences and various paths to contribution with AI. A prediction market might also work for information aggregation. I'd bet that all else equal, engagement with the sequences is beneficial but that this might be less pronounced among those growing up in academically inclined cultures.

Written and forecasted quickly, numbers are very rough. Thomas requested I make a forecast before anchoring on his comment (and I also haven't read others).

I’ll make a forecast for the question:  What’s the chance a set of >=1 warning shots counterfactually tips the scales between doom and a flourishing future, conditional on a default of doom without warning shots?

We can roughly break this down into:

  1. Chance >=1 warning shots happens
  2. Chance alignment community / EA have a plan to react to warning shot well
  3. Chance alignment community / EA have enoug
... (read more)

Just made a bet with Jeremy Gillen that may be of interest to some LWers, would be curious for opinions:

Sure, I wasn't clear enough about this in the post (there was also some confusion on Twitter about whether I was only referring to Christiano and Garfinkel rather than any "followers").

I was thinking about roughly hundreds of people in each cluster, with the bar being something like "has made at least a few comments on LW or EAF related to alignment and/or works or is upskilling to work on alignment".

FYI: You can view community median forecasts for each question at this link. Currently it looks like:

Epistemic status: Exploratory

My overall chance of existential catastrophe from AI is ~50%.

My split of worlds we succeed is something like:

  1. 10%: Technical alignment ends up not being that hard, i.e. if you do common-sense safety efforts you end up fine.
  2. 20%: We solve alignment mostly through hard technical work, without that much aid from governance/coordination/etc. and likely with a lot of aid from weaker AIs to align stronger AIs.
  3. 20%: We solve alignment through lots of hard technical work but very strongly aided by governance/coordination/etc. to slow
... (read more)

Good point, and you definitely have more expertise on the subject than I do. I think my updated view is ~5% on this step.

I might be underconfident about my pessimism on the first step (competitiveness of process-based systems) though. Overall I've updated to be slightly more optimistic about this route to impact.

6Lost Futures9mo
I'm skeptical. Guzey seems to be conflating two separate points in the section you've linked: 1. TFP is not a reliable indicator for measuring growth from the utilization of technological advancement 2. Bloom et al's "Are Ideas Getting Harder to Find?" is wrong to use TFP as a measure of research output The second point is probably true, but not the question we're seeking to answer. Research output does not automatically translate to growth from technological advancement. Is it absurd? I'm not so sure. Between '73 and '82 the oil shock led to skyrocketing energy prices. Guzey acknowledges this economic crisis but goes on to claim that the indicator must be bad since semiconductors got better, crop yields improved, and life expectancy improved. And he's right, for Bloom's paper, this is a major discrepancy. TFP is not a good measure of research output. However, TFP roughly measures an economy's technological capacity given current restraints. America in '73 was more productive than America in '82 because a key technological input (energy) was significantly cheaper in '73 than it would be for most of the following decade while the technological advancements made during the same period were not enough to offset the balance. Let's look at the other examples provided: According to the data provided, France's TFP peaked prior to the Great Recession and has largely stagnated since. This doesn't seem surprising given France's sluggish economic growth since then. French GDP peaked in 2008. Its labor productivity has also barely grown [https://www.oecd.org/global-forum-productivity/researchandimpact/stagnation-of-productivity-in-France-a-legacy-of-the-crisis-or-a-structural-slowdown.pdf]. If one examines the data without holding the bias that tech advancements since 2001 MUST have vastly improved productivity, the results are hardly surprising. This is harder to explain. According to the data, Italy's TFP effectively peaked in 1979, remained near this peak un

Most problems that people work on in research are roughly the right difficulty, because the ambition level is adjusted to be somewhat challenging but not unachievable. If it's too hard then the researcher just moves on to another project. This is the problem selection process we're used to, and might bias our intuitions here.

On the other hand, we want to align AGI because it's a really important problem, and have no control over the difficulty of the problem. And if you think about the distribution of difficulties of all possible problems, it would be a hu... (read more)

1Jack R9mo
That argument makes sense, thanks

Thanks for clarifying your views; makes sense that there isn't a clean distinction between accelerating alignment and theoretical thinking.

I do think there is a distinction between doing theoretical thinking that might be a prerequisite to safely accelerate alignment research substantially, and directly accelerating theoretical alignment. I thought you had updated between these two, toward the second; do you disagree with that?

My understanding is that they have very short (by my lights) timelines which recently updated them toward pushing much more toward just trying to automate alignment research rather than thinking about the theory.

janus9moΩ4116

Our plan to accelerate alignment does not preclude theoretical thinking, but rather requires it. The mainline agenda atm is not full automation (which I expect to be both more dangerous and less useful in the short term), but what I've been calling "cyborgism": I want to maximize the bandwidth between human alignment researchers and AI tools/oracles/assistants/simulations. It is essential that these tools are developed by (or in a tight feedback loop with) actual alignment researchers doing theory work, because we want to simulate and play with thought pro... (read more)

Haven’t yet had a chance to read the article, but from verbal conversations I’d guess they’d endorse something similar (though probably not every word) to Thomas Larsen’s opinion on this in Footnote 5 in this post:

Answer: I see a categorical distinction between trying to align agentic and oracle AIs. Conjecture is trying only for oracle LLMs, trained without any RL pressure giving them goals, which seems way safer. OpenAI doing recursive reward modeling / IDA type schemes involves creating agentic AGIs and therefore faces also a lot more alignment issues

... (read more)
1Alex Lawsen 9mo
Yeah this is the impression I have of their views too, but I think there are good reasons to discuss what this kind of theoretical framework says about RL anyway, even if you're very against pushing the RL SoTA.

It’s probably a bit frustrating to not have their work summarized, and then be asked to explain their own work, when all of their work is published already

 

Fair, I see why this would be frustrating and apologize for any frustration caused. In an ideal world we would have read many of these papers and summarized them ourselves, but that would have taken a lot of time and I think the post was valuable to get out ASAP.

ETA: Probably it would have been better to include more of a disclaimer on the "everyone" point from the get-go, I think not doing this was a mistake.

5aogara10mo
(Also, this is an incredibly helpful writeup and it’s only to be expected that some stuff would be missing. Thank you for sharing it!)

"strongly influences the organization that builds AGI" applies to all alignment research initiatives right? Alignment researchers at e.g. DeepMind have less of an uphill battle but they still have to convince the rest of DeepMind to adopt their work. 

 

Yes, I didn't mean to imply this was necessarily an Ought-specific problem and I guess it may have been a bit unfair for me to only do a BOTEC on Ought. I included it because I had the most fleshed-out thoughts on it but it could give the wrong impression about relative promise when others don't hav... (read more)

9Vika9mo
I would expect that the way Ought (or any other alignment team) influences the AGI-building org is by influencing the alignment team within that org, which would in turn try to influence the leadership of the org. I think the latter step in this chain is the bottleneck - across-organization influence between alignment teams is easier than within-organization influence. So if we estimate that Ought can influence other alignment teams with 50% probability, and the DM / OpenAI / etc alignment team can influence the corresponding org with 20% probability, then the overall probability of Ought influencing the org that builds AGI is 10%. Your estimate of 1% seems too low to me unless you are a lot more pessimistic about alignment researchers influencing their organization from the inside. 
1jungofthewon10mo
All good, thanks for clarifying.

(speaking for just myself, not Thomas but I think it’s likely he’d endorse most of this)

I agree it would be great to include many of these academic groups; the exclusion wasn’t out of any sort of malice. Personally I don’t know very much about what most of these groups are doing or their motivations; if any of them want to submit brief write ups I‘d be happy to add them! :)

edit: lol, Thomas responded with a similar tone while I was typing

Good point. For myself:

  1. Background (see also https://www.elilifland.com/): I did some research on adversarial robustness of NLP models while in undergrad. I then worked at Ought as a software/research engineer for 1.5 years, was briefly a longtermist forecasting entrepreneur then have been thinking independently about alignment strategy among other things for the past 2 months.
  2. Research tastes: I'm not great at understanding and working on super mathy stuff, so I mostly avoided giving opinions on these. I enjoy toy programming puzzles/competitions but got bo
... (read more)

Given that all the forecasts seem to be wrong in the "things happened faster than we expected" direction, we should probably expect HLAI to happen faster than expected as well.

 

I don't think we should update too strongly on these few data points; e.g. a previous analysis of Metaculus' AI predictions found "weak evidence to suggest the community expected more AI progress than actually occurred, but this was not conclusive". MATH and MMLU feel more relevant than the average Metaculus AI prediction but not enough to strongly outweigh the previous finding... (read more)

Steelmanning might be particularly useful in cases where we have reason to believe those who have engaged most with the arguments are biased toward ones side of the debate.

As described in But Have They Engaged with the Arguments?, perhaps a reason many who dismiss AI risk haven't engaged much with the arguments is the selection effect of engaging more if the first arguments one hears seems true. Therefore it might be useful to steelman arguments by generally reasonable people against AI risk that might seem off due to lack of engagement with existing count... (read more)

Overall agree that progress was very surprising and I'll be thinking about how it affects my big picture views on AI risk and timelines; a few relatively minor nitpicks/clarifications below.

For instance, superforecaster Eli Lifland posted predictions for these forecasts on his blog.

I'm not a superforecaster (TM) though I think some now use the phrase to describe any forecasters with good ~generalist track records?

While he notes that the Hypermind interface limited his ability to provide wide intervals on some questions, he doesn’t make that complaint for t

... (read more)

Yeah I've been sporadically making progress on a personal forecasting retrospective, will include reflections and updated forecasts if/when I get around to finishing that.

4jasoncrawford1y
I basically agree. Growth will hit some limit set by the laws of physics in at most single-digit thousands of years. But there are orders of magnitude of headroom between here and there

Given the success of this experiment, we should propose a modified version of futarchy where laws are similarly written letter by letter!

Thanks, I agree with this and it's probably not good branding anyway. 

I was thinking the "challenge" was just doing the intervention (e.g. being vegan), but agree that the framing is confusing since it refers to something different in the clinical context. I will edit my shortforms to reflect this updated view.

[crossposted from EA Forum]

Reflecting a little on my shortform from a few years ago, I think I wasn't ambitious enough in trying to actually move this forward.

I want there to be an org that does "human challenge"-style RCTs across lots of important questions that are extremely hard to get at otherwise, including (top 2 are repeated from previous shortform):

  1. Health effects of veganism
  2. Health effects of restricting sleep
  3. Productivity of remote vs. in-person work
  4. Productivity effects of blocking out focused/deep work

Edited to add: I no longer think "human challen... (read more)

5rossry1y
I'm confused why these would be described as "challenge" RCTs, and worry that the term will create broader confusion in the movement to support challenge trials for disease. In the usual clinical context, the word "challenge" in "human challenge trial" refers to the step of introducing the "challenge" of a bad thing (e.g., an infectious agent) to the subject, to see if the treatment protects them from it. I don't know what a "challenge" trial testing the effects of veganism looks like? (I'm generally positive on the idea of trialing more things; my confusion+comment is just restricted to the naming being proposed here.)

(epistemic status: exploratory)

I think more people into LessWrong in high school - college should consider trying Battlecode. It's somewhat similar to The Darwin Game which was pretty popular on here and I think generally the type of people who like LessWrong will both enjoy and be good at Battlecode. (edited to add:  A short description of Battlecode is that you write a bot to beat other bots at a turn-based strategy game. Each unit executes its own code so communication/coordination is often one of the most interesting parts.)

I did it with friends f... (read more)

2JBlack1y
On the same line but more commercial is the game Screeps [https://screeps.com/], which has both ongoing and seasonal servers run by the developers as well as private servers (you can run your own).

Your prior is for discontinuities throughout the entire development of a technology, so shouldn't your prior be for discontinuity at any point during the development of AI, rather than discontinuity at or around the specific point when AI becomes AGI? It seems this would be much lower, though we could then adjust upward based on the particulars of why we think a discontinuity is more likely at AGI.

2NunoSempere2y
Yep.

Holden Karnofsky wrote on Cold Takes:

I estimate that there is more than a 10% chance we'll see transformative AI within 15 years (by 2036); a ~50% chance we'll see it within 40 years (by 2060); and a ~2/3 chance we'll see it this century (by 2100).

I copied these bins to create Holden's approximate forecasted distribution (note that Holden's forecast is for Transformative AI rather than human-level AGI):

Compared to the upvote-weighted mixture in the OP, it puts more probability on longer timelines, with a median of 2060 vs. 2047 and 1/3 vs. 1/5 on after 210... (read more)

It's very likely that when the US intelligence community reports on 25. August on their data about the orgins of the COVID-19 they will conclude that it was a lab leak.

Are you open to betting on this? GJOpen community is at 9% that the report will conclude that lab leak is more likely than not, I’m at 12%.

In particular, my actual credence in lab leak is higher (~45%) but I’m guessing the most likely outcome of the report is that it’s inconclusive, and that political pressures will play a large role in the outcome.

4Matthew Barnett2y
This Metaculus question [https://www.metaculus.com/questions/7211/us-conclude-covid-lab-leak-by-june-2022/] with a slightly different operationalization, and a longer timeframe, says 39% right now.

Someone who is near the top of the leaderboard is both accurate and highly experienced

I think this unfortunately isn't true right now, and just copying the community prediction would place very highly (I'm guessing if made as soon as the community prediction appeared and updated every day, easily top 3 (edit: top 10)). See my comment below for more details.

You can look at someone's track record in detail, but we're also planning to roll out a more ways to compare people with each other.

I'm very glad to hear this. I really enjoy Metaculus but my main gripe ... (read more)

I actually think it's worth tracking: ConsensusBot should be a user, it should always update continuously to the public consensus prediction in its absence, and it shouldn't be counted as a prediction, so we can see what it looks like and how it scores. 

And there should be a contest to see if anyone can use a rule that looks only at predictions, and does better than ConsensusBot (e.g. by deciding whose predictions to care about more vs. less, or accounting for systematic bias, etc). 

If the user is interested in getting into the top ranks, this strategy won't be anything like enough.

I think this isn't true empirically for a reasonable interpretation of top ranks. For example, I'm ranked 5th on questions that have resolved in the past 3 months due to predicting on almost every question.

Looking at my track record, for questions resolved in the last 3 months, evaluated at all times, here's how my log score looks compared to the community:

  • Binary questions (N=19): me: -.072 vs. community: -.045
  • Continuous questions (N=20): me: 2.35 vs. commu
... (read more)

There's also a Metaculus question about this:

It looks like people can change their predictions after they initially submit them. Is this history recorded somewhere, or just the current distribution?

We do store the history.  You can view them by going https://elicit.org/binary then searching for the question, e.g. https://elicit.org/binary?binaryQuestions.search=Will%20there%20be%20more%20than%2050.  Although as noted by Oli, we currently only display predictions that haven't been withdrawn.

Is there an option to have people "lock in" their answer? (Maybe they can still edit/delete for a shor

... (read more)

Epistemic status: extremely uncertain

I created my Elicit forecast by:

  • Slightly adjusting down the 1/6 estimate of existential risk during the next century made in The Precipice
  • Making the shape of the distribution roughly give a little more weight to time periods when AGI is currently forecasted to be more likely to come

[I work for Ought.]

I must admit I haven't followed the discussions you're referring to but if I were to spend more time forecasting this question I would look into them.

I didn't include effects of COVID in my forecast as it looks like the Zillow Home Value Index for Seattle has remained relatively steady since March (2% drop). I'm skeptical that there are likely to be large effects from COVID in the future when there hasn't been a large effect from COVID thus far,

A few reasons I could be wrong:

  • Zillow data is inaccurate or incomplete, or I'm int
... (read more)
2GuySrinivasan3y
Cool deal. The reason I asked about black swans was specifically because of the potential permanent shift in companies' allowance or even desire for remote work. Chance seems low but significant, impact seems modest with a chance of large.

My forecast is based on:

I don't have a background in quantum computing, so there's a chance I'm misinterpreting the question in some way, but I learned a lot doing the research for the forecast (like that there's a lot of controversy regarding whether quantum supremacy has been achieved yet).

Amusingly, during my research I stumbled upon this Metaculus... (read more)

My forecast is based on historical data from Zillow. I explained my reasoning in the notes. The summary is that housing prices haven't changed very much in Seattle since April 2019 (on the whole it's risen 1%). On the other hand, prices in more expensive areas have stayed the same or declined slightly. I settled on a boring median of the price staying the same. Due to how stable the prices have been recently, I think most of the variation will come from the individual house and which neighborhood it's in, with an outside chance of large Seat... (read more)

3GuySrinivasan3y
Most of the variance I've seen discussed comes from a severe change in the tech labor market producing a severe change in the housing market in the Bay Area, Seattle, etc. Did you intentionally or unintentionally leave out the effects of COVID? Or are they wrapped up in black swans?

I think it's >1% likely that the one of the first few surveys Rohin conducted would result in a fraction of >0.5.

Evidence from When Will AI Exceed Human Performance?, in the form of median survey responses of researchers who published at ICML and NIPS in 2015:

  • 5% chance given to Human Level Machine Intelligence (HLMI) having an extremely bad long run impact (e.g. human extinction)
  • Does Stuart Russell's argument for why highly advanced AI might pose a risk point at an important problem? 39% say at least important, 70% at least moderately impor
... (read more)
2Rohin Shah3y
Agree that Q2 is more likely to be the bottleneck. See also my response to Amanda above [https://www.lesswrong.com/posts/Azqmzp5JoXJihMcr4/competition-amplify-rohin-s-prediction-on-agi-researchers?commentId=7FgwfAzsqSZ9KfiyB].