RyanCarey

Comments

Open & Welcome Thread - September 2020

I guess it's because high-conviction ideologies outperform low-conviction ones, including nationalistic and political ideologies, and religions. Dennett's Gold Army/Silver Army analogy explains how conviction can build loyatly and strength, but a similar thing is probably true for movement-builders. Also, conviction might make adherents feel better, and therefore simply be more attractive.

Comparing reward learning/reward tampering formalisms

It would be nice to draw out this distinction in more detail. One guess:

  • Uninfluencability seems similar to requiring zero individual treatment effect of D on R.
  • Riggability (from the paper) would then correspond to zero average treatment effect of D on R
Subscripting Typographic Convention For Citations/Dates/Sources/Evidentials: A Proposal

This is a cool idea. However, are you actually using the subscript in two confusingly different ways? In I_2010, it seems you're talking about you, indexed to the year 2020, whereas in {Abdul Bey}_2000, it seems you're citing a book. It would be pretty bad for people to see a bunch of the first kind of case, and then expect citations, but only get them half of the time.

Defining AI wireheading

Seems like the idea is that wireheading denotes specification gaming that is egregious in its focus on the measurement channel. I'm inclined to agree..

What technical prereqs would I need in order to understand Stuart Armstrong's research agenda?

You could check out Best Textbooks on Every Subject. But people usually recommend Linear Algebra Done Right for LinAlg. Understanding ML seems good for ML Theory. Sutton and Barto is an easy read for RL.

What technical prereqs would I need in order to understand Stuart Armstrong's research agenda?

It may be that technical prereqs are missing. It could also be that you're missing a broader sense of "mathematical maturity", or that you're struggling because Stuart's work is simply hard to understand. That said, useful prereq areas (in which you could also gain overall mathematical maturity) would include:

  • Probability theory
  • Linear Algebra
  • Machine learning theory
  • Reinforcement Learning

It's probably overkill to go deep into these topics. Usually, what you need is in the first chapter.

Where are people thinking and talking about global coordination for AI safety?

I would guess three main disagreements are:

i) are the kinds of transformative AI that we're reasonably likely to get in the next 25 years are unalignable?

ii) how plausible are the extreme levels of cooperation Wei Dai wants

iii) how important is career capital/credibility?

I'm perhaps midway between Wei Dai's view and the median governance view so may be an interesting example. I think we're ~10% likely to get transformative general AI in the next 20 years, and ~6% likely to get an incorrigible one, and ~5.4% likely to get incorrigible general AI that's insufficiently philosophically competent. Extreme cooperation seems ~5% likely, and is correlated with having general AI. It would be nice if more people worked on that, or on whatever more-realistic solutions would work for the transformative unsafe AGI scenario, but I'm happy for some double-digit percentage of governance researchers to keep working on less extreme (and more likely) solutions to build credibility.

Forum participation as a research strategy

I agree that some people can benefit from doing both, although getting everyone online is a hard collective action problem. I just claim that many researchers will satisfy with OP. At MIRI/FHI/OpenAI there are ~30-150 researchers, who think about a wide range of areas, which seems broadly comparable to the researchers among LessWrong/AF's active users (depending on your definition of 'researcher', or 'active'). Idea-exchange is extended by workshops and people moving jobs. Many in such a work environment will fund that FP has unacceptably low signal-noise ratio and will inevitably avoid FP...

Forum participation as a research strategy

I would note that many of these factors apply as benefits of office-chat participation (OP) as well. The main benefit of FP absent from OP, I suppose, is preparing you for efficient written communication, but the rest seem feature in both. The fact that their benefits overlap explains why remote researchers benefit so much more than others from FP.

Load More