alyssavance

Comments

Note: I have deleted a long comment that I didn't feel like arguing with. I reserve the right to do this for future comments. Thank you.

CFAR’s new focus, and AI Safety

This is just a guess, but I think CFAR and the CFAR-sphere would be more effective if they focused more on hypothesis generation (or "imagination", although that term is very broad). Eg., a year or so ago, a friend of mine in the Thiel-sphere proposed starting a new country by hauling nuclear power plants to Antarctica, and then just putting heaters on the ground to melt all the ice. As it happens, I think this is a stupid idea (hot air rises, so the newly heated air would just blow away, pulling in more cold air from the surroundings). But it is an idea, and the same person came up with (and implemented) a profitable business plan six months or so later. I can imagine HPJEV coming up with that idea, or Elon Musk, or von Neumann, or Google X; I don't think most people in the CFAR-sphere would, it's just not the kind of thing I think they've focused on practicing.

On the importance of Less Wrong, or another single conversational locus

Was including tech support under "admin/moderation" - obviously, ability to eg. IP ban people is important (along with access to the code and the database generally). Sorry for any confusion.

On the importance of Less Wrong, or another single conversational locus

If the money is there, why not just pay a freelancer via Gigster or Toptal?

On the importance of Less Wrong, or another single conversational locus

I appreciate the effort, and I agree with most of the points made, but I think resurrect-LW projects are probably doomed unless we can get a proactive, responsive admin/moderation team. Nick Tarleton talked about this a bit last year:

"A tangential note on third-party technical contributions to LW (if that's a thing you care about): the uncertainty about whether changes will be accepted, uncertainty about and lack of visibility into how that decision is made or even who makes it, and lack of a known process for making pull requests or getting feedback on ideas are incredibly anti-motivating." (http://lesswrong.com/lw/n0l/lesswrong_20/cy8e)

That's obviously problematic, but I think it goes way beyond just contributing code. As far as I know, right now, there's no one person with both the technical and moral authority to:

  • set the rules that all participants have to abide by, and enforce them
  • decide principles for what's on-topic and what's off-topic
  • receive reports of trolls, and warn or ban them
  • respond to complaints about the site not working well
  • decide what the site features should be, and implement the high-priority ones

Pretty much any successful subreddit, even smallish ones, will have a team of admins who handle this stuff, and who can be trusted to look at things that pop up within a day or so (at least collectively). The highest intellectual-quality subreddit I know of, /r/AskHistorians, has extremely active and rigorous moderation, to the extent that a majority of comments are often deleted. Since we aren't on Reddit itself, I don't think we need to go quite that far, but there has to be something in place.

Why CFAR's Mission?

I mostly agree with the post, but I think it'd be very helpful to add specific examples of epistemic problems that CFAR students have solved, both "practice" problems and "real" problems. Eg., we know that math skills are trainable. If Bob learns to do math, along the way he'll solve lots of specific math problems, like "x^2 + 3x - 2 = 0, solve for x". When he's built up some skill, he'll start helping professors solve real math problems, ones where the answers aren't known yet. Eventually, if he's dedicated enough, Bob might solve really important problems and become a math professor himself.

Training epistemic skills (or "world-modeling skills", "reaching true beliefs skills", "sanity skills", etc.) should go the same way. At the beginning, a student solves practice epistemic problems, like the ones Tetlock uses in the Good Judgement Project. When they get skilled enough, they can start trying to solve real epistemic problems. Eventually, after enough practice, they might have big new insights about the global economy, and make billions at a global macro fund (or some such, lots of possibilities of course).

To use another analogy, suppose Carol teaches people how to build bridges. Carol knows a lot about why bridges are important, what the parts of a bridge are, why iron bridges are stronger than wood bridges, and so on. But we'd also expect that Carol's students have built models of bridges with sticks and stuff, and (ideally) that some students became civil engineers and built real bridges. Similarly, if one teaches how to model the world and find truth, it's very good to have examples of specific models built and truths found - both "practice" ones (that are already known, or not that important) and ideally "real" ones (important and haven't been discovered before).

Why CFAR? The view from 2015

Hey! Thanks for writing all of this up. A few questions, in no particular order:

  • The CFAR fundraiser page says that CFAR "search[es] through hundreds of hours of potential curricula, and test[s] them on smart, caring, motivated individuals to find the techniques that people actually end up finding useful in the weeks, months and years after our workshops." Could you give a few examples of curricula that worked well, and curricula that worked less well? What kind of testing methodology was used to evaluate the results, and in what ways is that methodology better (or worse) than methods used by academic psychologists?

  • One can imagine a scale for the effectiveness of training programs. Say, 0 points is a program where you play Minesweeper all day; and 100 points is a program that could take randomly chosen people, and make them as skilled as Einstein, Bismarck, or von Neumann. Where would CFAR rank its workshops on this scale, and how much improvement does CFAR feel like there has been from year to year? Where on this scale would CFAR place other training programs, such as MIT grad school, Landmark Forum, or popular self-help/productivity books like Getting Things Done or How to Win Friends and Influence People? (One could also choose different scale endpoints, if mine are too suboptimal.)

  • While discussing goals for 2015, you note that "We created a metric for strategic usefulness, solidly hitting the first goal; we started tracking that metric, solidly hitting the second goal." What does the metric for strategic usefulness look like, and how has CFAR's score on the metric changed from 2012 through now? What would a failure scenario (ie. where CFAR did not achieve this goal) have looked like, and how likely do you think that failure scenario was?

  • CFAR places a lot of emphasis on "epistemic rationality", or the process of discovering truth. What important truths have been discovered by CFAR staff or alumni, which would probably not have been discovered without CFAR, and which were not previously known by any of the staff/alumni (or by popular media outlets)? (If the truths discovered are sensitive, I can post a GPG public key, although I think it would be better to openly publish them if that's practical.)

  • You say that "As our understanding of the art grew, it became clear to us that “figure out true things”, “be effective”, and “do-gooding” weren’t separate things per se, but aspects of a core thing." Could you be more specific about what this caches out to in concrete terms; ie. what the world would look like if this were true, and what the world would look like if this were false? How strong is the empirical evidence that we live in the first world, and not the second? Historically, adjusted for things we probably can't change (like eg. IQ and genetics), how strong have the correlations been between truth-seeking people like Einstein, effective people like Deng Xiaoping, and do-gooding people like Norman Borlaug?

  • How many CFAR alumni have been accepted into Y Combinator, either as part of a for-profit or a non-profit team, after attending a CFAR workshop?

A Proposed Adjustment to the Astronomical Waste Argument

Religions partially involve values and I think values are a plausible area for path-dependence.

Please explain the influence that, eg., the theological writings of Peter Abelard, described as "the keenest thinker and boldest theologian of the 12th Century", had on modern-day values that might reasonably have been predictable in advance during his time. And that was only eight hundred years ago, only ten human lifetimes. We're talking about timescales of thousands or millions or billions of current human lifetimes.

Conceivably, the genetic code, base ten math, ASCII, English language and units, Java, or the Windows operating system might last for trillions of years.

This claim is prima facie preposterous, and Robin presents no arguments for it. Indeed, it is so farcically absurd that it substantially lowers my prior on the accuracy of all his statements, and it lowers my prior on your statements that you would present it with no evidence except a blunt appeal to authority. To see why, consider, eg., this set of claims about standards lasting two thousand years (a tiny fraction of a comparative eyeblink), and why even that is highly questionable. Or this essay about programming languages a mere hundred years from now, assuming no x-risk and no strong-AI and no nanotech.

For specific examples of changes that I believe could have very broad impact and lead to small, unpredictable positive trajectory changes, I would offer political advocacy of various kinds (immigration liberalization seems promising to me right now), spreading effective altruism, and supporting meta-research.

Do you have any numbers on those? Bostrom's calculations obviously aren't exact, but we can usually get key numbers (eg. # of lives that can be saved with X amount of human/social capital, dedicated to Y x-risk reduction strategy) pinned down to within an order of magnitude or two. You haven't specified any numbers at all for the size of "small, unpredictable positive trajectory changes" in comparison to x-risk, or the cost-effectiveness of different strategies for pursuing them. Indeed, it is unclear how one could come up with such numbers even in theory, since the mechanisms behind such changes causing long-run improved outcomes remain unspecified. Making today's society a nicer place to live is likely worthwhile for all kinds of reasons, but expecting it to have direct influence on the future of a billion years seems absurd. Ancient Minoans from merely 3,500 years ago apparently lived very nicely, by the standards of their day. What predictable impacts did this have on us?

Furthermore, pointing to "political advocacy" as the first thing on the to-do list seems highly suspicious as a signal of bad reasoning somewhere, sorta like learning that your new business partner has offices only in Nigeria. Humans are biased to make everything seem like it's about modern-day politics, even when it's obviously irrelevant, and Cthulhu knows it would be difficult finding any predictable effects of eg. Old Kingdom Egypt dynastic struggles on life now. Political advocacy is also very unlikely to be a low-hanging-fruit area, as huge amounts of human and social capital already go into it, and so the effect of a marginal contribution by any of us is tiny.

A Proposed Adjustment to the Astronomical Waste Argument

The main reason to focus on existential risk generally, and human extinction in particular, is that anything else about posthuman society can be modified by the posthumans (who will be far smarter and more knowledgeable than us) if desired, while extinction can obviously never be undone. For example, any modification to the English language, the American political system, the New York Subway or the Islamic religion will almost certainly be moot in five thousand years, just as changes to Old Kingdom Egypt are moot to us now.

The only exception would be if the changes to post-human society are self-reinforcing, like a tyrannical constitution which is enforced by unbeatable strong nanotech for eternity. However, by Bostrom's definition, such a self-reinforcing black hole would be an existential risk.

Are there any examples of changes to post-human society which a) cannot ever be altered by that society, even when alteration is a good idea, b) represent a significant utility loss, even compared to total extinction, c) are not themselves total or near-total extinction (and are thus not existential risks), and d) we have an ability to predictably effect at least on par with our ability to predictably prevent x-risk? I can't think of any, and this post doesn't provide any examples.

Load More