# ricraz's Shortform

26th Apr 2020128 comments
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
126 comments, sorted by Highlighting new comments since
New Comment
Some comments are truncated due to high volume. Change truncation settings

One fairly strong belief of mine is that Less Wrong's epistemic standards are not high enough to make solid intellectual progress here. So far my best effort to make that argument has been in the comment thread starting here. Looking back at that thread, I just noticed that a couple of those comments have been downvoted to negative karma. I don't think any of my comments have ever hit negative karma before; I find it particularly sad that the one time it happens is when I'm trying to explain why I think this community is failing at its key goal of cultivating better epistemics.

There's all sorts of arguments to be made here, which I don't have time to lay out in detail. But just step back for a moment. Tens or hundreds of thousands of academics are trying to figure out how the world works, spending their careers putting immense effort into reading and producing and reviewing papers. Even then, there's a massive replication crisis. And we're trying to produce reliable answers to much harder questions by, what, writing better blog posts, and hoping that a few of the best ideas stick? This is not what a desperate effort to find the truth looks like.

And we're trying to produce reliable answers to much harder questions by, what, writing better blog posts, and hoping that a few of the best ideas stick? This is not what a desperate effort to find the truth looks like.

It seems to me that maybe this is what a certain stage in the desperate effort to find the truth looks like?

Like, the early stages of intellectual progress look a lot like thinking about different ideas and seeing which ones stand up robustly to scrutiny.  Then the best ones can be tested more rigorously and their edges refined through experimentation.

It seems to me like there needs to be some point in the desparate search for truth in which you're allowing for half-formed thoughts and unrefined hypotheses, or else you simply never get to a place where the hypotheses you're creating even brush up against the truth.

5Richard_Ngo5moIn the half-formed thoughts stage, I'd expect to see a lot of literature reviews, agendas laying out problems, and attempts to identify and question fundamental assumptions. I expect that (not blog-post-sized speculation) to be the hard part of the early stages of intellectual progress, and I don't see it right now. Perhaps we can split this into technical AI safety and everything else. Above I'm mostly speaking about "everything else" that Less Wrong wants to solve. Since AI safety is now a substantial enough field that its problems need to be solved in more systemic ways.
3Matt Goldenberg5moI would expect that later in the process. Agendas laying out problems and fundamental assumptions don't spring from nowhere (at least for me), they come from conversations where I'm trying to articulate some intuition, and I recognize some underlying pattern. The pattern and structure doesn't emerge spontaneously, it comes from trying to pick around the edges of a thing, get thoughts across, explain my intuitions and see where they break. I think it's fair to say that crystallizing these patterns into a formal theory is a "hard part", but the foundation for making it easy is laid out in the floundering and flailing that came before.

One fairly strong belief of mine is that Less Wrong's epistemic standards are not high enough to make solid intellectual progress here.

I think this is literally true. There seems to be very little ability to build upon prior work.

Out of curiosity do you see Less Wrong as significantly useful or is it closer to entertainment/habit? I've found myself thinking along the same lines as I start thinking about starting my PhD program etc. The utility of Less Wrong seems to be a kind of double-edged sword. On the one hand, some of the content is really insightful and exposes me to ideas I wouldn't otherwise encounter. On the other hand, there is such an incredible amount of low-quality content that I worry that I'm learning bad practices.

3Viliam5moIronically, some people already feel threatened by the high standards here. Setting them higher probably wouldn't result in more good content. It would result in less mediocre content, but probably also less good content, as the authors who sometimes write a mediocre article and sometimes a good one, would get discouraged and give up. Ben Pace gives a few examples of great content in the next comment. It would be better to easier separate the good content from the rest, but that's what the reviews are for. Well, only one review so far, if I remember correctly. I would love to see reviews of pre-2018 content (maybe multiple years in one review, if they were less productive). Then I would love to see the winning content get the same treatment as the Sequences -- edit them and arrange them into a book, and make it "required reading" for the community (available as a free PDF).
7Zachary Robertson5moI broadly agree here. However, I do see the short-forms as a consistent way to skirt around this. I'd say at least 30% of the Less Wrong value proposition are the conversations I get to have. Short-forms seem to be more adapted for continuing conversations and they have a low bar for being made. I could clarify a bit. My main problem with low quality content isn't exactly that it's 'wrong' or something like that. Mostly, the issues I'm finding most common for me are, 1. Too many niche pre-requisites. 2. No comments 3. Nagging feeling post is reinventing the wheel I think one is a ridiculously bad problem. I'm literally getting a PhD in machine learning, write about AI Safety, and still find a large number of those posts (yes AN posts) glazed in internal-jargon that makes it difficult to connect with current research. Things get even worse when I look at non-AI related things. Two is just a tragedy of the fact the rich get richer. While I'm guilty of this also, I think that requiring posts to also post seed questions/discussion topics in the comments could go a long way to alleviate this problem. I oftentimes read a post and want to leave a comment, but then don't because I'm not even sure the author thought about the discussion their post might start. Three is probably a bit mean. Yet, more than once I've discovered a Less Wrong concept already had a large research literature devoted to it. I think this ties in with one due to the fact niche pre-reqs often go hand-in-hand with insufficient literature review.
9Ben Pace5moThe top posts in the 2018 Review [https://www.lesswrong.com/posts/3yqf6zJSwBF34Zbys/2018-review-voting-results] are filled with fascinating and well-explained ideas. Many of the new ideas are not settled science, but they're quite original and substantive, or excellent distillations of settled science, and are often the best piece of writing on the internet about their topics. You're wrong about LW epistemic standards not being high enough to make solid intellectual progress, we already have. On AI alone (which I am using in large part because there's vaguely more consensus around it than around rationality), I think you wouldn't have seen almost any of the public write-ups (like Embedded Agency and Zhukeepa's Paul FAQ) without LessWrong, and I think a lot of them are brilliant. I'm not saying we can't do far better, or that we're sufficiently good. Many of the examples of success so far are "Things that were in people's heads but didn't have a natural audience to share them with". There's not a lot of collaboration at present, which is why I'm very keen to build the new LessWrong Docs that allows for better draft sharing and inline comments and more. We're working on the tools for editing tags, things like edit histories and so on, that will allow us to build a functioning wiki system to have canonical writeups and explanation that people add to and refine. I want future iterations of the LW Review to have more allowance for incorporating feedback from reviewers. There's lots of work to do, and we're just getting started. But I disagree the direction isn't "a desperate effort to find the truth". That's what I'm here for. Even in the last month or two, how do you look at things like this [https://www.lesswrong.com/posts/AHhCrJ2KpTjsCSwbt/inner-alignment-explain-like-i-m-12-edition] and this [https://www.lesswrong.com/posts/xJyY5QkQvNJpZLJRo/radical-probabilism-1] and this [https://www.lesswrong.com/posts/r3NHPD3dLFNk9QE2Y/search-versus-design-1] and this [htt

As mentioned in my reply to Ruby, this is not a critique of the LW team, but of the LW mentality. And I should have phrased my point more carefully - "epistemic standards are too low to make any progress" is clearly too strong a claim, it's more like "epistemic standards are low enough that they're an important bottleneck to progress". But I do think there's a substantive disagreement here. Perhaps the best way to spell it out is to look at the posts you linked and see why I'm less excited about them than you are.

Of the top posts in the 2018 review, and the ones you linked (excluding AI), I'd categorise them as follows:

Interesting speculation about psychology and society, where I have no way of knowing if it's true:

• Local Validity as a Key to Sanity and Civilization
• The Loudest Alarm Is Probably False
• Anti-social punishment (which is, unlike the others, at least based on one (1) study).
• Babble
• Intelligent social web
• Unrolling social metacognition
• Simulacra levels
• Can you keep this secret?

Same as above but it's by Scott so it's a bit more rigorous and much more compelling:

• Is Science Slowing Down?
• The tails coming apart as a metaph
... (read more)

Quoting your reply to Ruby below, I agree I'd like LessWrong to be much better at "being able to reliably produce and build on good ideas".

The reliability and focus feels most lacking to me on the building side, rather than the production, which I think we're doing quite well at. I think we've successfully formed a publishing platform that provides and audience who are intensely interested in good ideas around rationality, AI, and related subjects, and a lot of very generative and thoughtful people are writing down their ideas here.

We're low on the ability to connect people up to do more extensive work on these ideas – most good hypotheses and arguments don't get a great deal of follow up or further discussion.

Here are some subjects where I think there's been various people sharing substantive perspectives, but I think there's also a lot of space for more 'details' to get fleshed out and subquestions to be cleanly answered:

... (read more)

"I see a lot of (very high quality) raw energy here that wants shaping and directing, with the use of lots of tools for coordination (e.g. better collaboration tools)."

Yepp, I agree with this. I guess our main disagreement is whether the "low epistemic standards" framing is a useful way to shape that energy. I think it is because it'll push people towards realising how little evidence they actually have for many plausible-seeming hypotheses on this website. One proven claim is worth a dozen compelling hypotheses, but LW to a first approximation only produces the latter.

When you say "there's also a lot of space for more 'details' to get fleshed out and subquestions to be cleanly answered", I find myself expecting that this will involve people who believe the hypothesis continuing to build their castle in the sky, not analysis about why it might be wrong and why it's not.

That being said, LW is very good at producing "fake frameworks". So I don't want to discourage this too much. I'm just arguing that this is a different thing from building robust knowledge about the world.

5Ben Pace5moI will continue to be contrary and say I'm not sure I agree with this. For one, I think in many domains new ideas are really hard to come by, as opposed to making minor progress in the existing paradigms. Fundamental theories in physics, a bunch of general insights about intelligence (in neuroscience and AI), etc. And secondly, I am reminded of what Lukeprog wrote in his moral consciousness report, that he wished the various different philosophies-of-consciousness would stop debating each other, go away for a few decades, then come back with falsifiable predictions. I sometimes take this stance regarding many disagreements of import, such as the basic science vs engineering approaches to AI alignment. It's not obvious to me that the correct next move is for e.g. Eliezer and Paul to debate for 1000 hours, but instead to go away and work on their ideas for a decade then come back with lots of fleshed out details and results that can be more meaningfully debated. I feel similarly about simulacra levels, Embedded Agency, and a bunch of IFS stuff. I would like to see more experimentation and literature reviews where they make sense, but I also feel like these are implicitly making substantive and interesting claims about the world, and I'd just be interested in getting a better sense of what claims they're making, and have them fleshed out + operationalized more. That would be a lot of progress to me, and I think each of them is seeing that sort of work (with Zvi, Abram, and Kaj respectively leading the charges on LW, alongside many others).

I think I'm concretely worried that some of those models / paradigms (and some other ones on LW) don't seem pointed in a direction that leads obviously to "make falsifiable predictions."

And I can imagine worlds where "make falsifiable predictions" isn't the right next step, you need to play around with it more and get it fleshed out in your head before you can do that. But there is at least some writing on LW that feels to me like it leaps from "come up with an interesting idea" to "try to persuade people it's correct" without enough checking.

(In the case of IFS, I think Kaj's sequence is doing a great job of laying it out in a concrete way where it can then be meaningfully disagreed with. But the other people who've been playing around with IFS didn't really seem interested in that, and I feel like we got lucky that Kaj had the time and interest to do so.)

8Richard_Ngo5moI feel like this comment isn't critiquing a position I actually hold. For example, I don't believe that "the correct next move is for e.g. Eliezer and Paul to debate for 1000 hours". I am happy for people to work towards building evidence for their hypotheses in many ways, including fleshing out details, engaging with existing literature, experimentation, and operationalisation. Perhaps this makes "proven claim" a misleading phrase to use. Perhaps more accurate to say: "one fully fleshed out theory is more valuable than a dozen intuitively compelling ideas". But having said that, I doubt that it's possible to fully flesh out a theory like simulacra levels without engaging with a bunch of academic literature and then making predictions. I also agree with Raemon's response below.
2John_Maxwell5moDepends on the claim, right? If the cost of evaluating a hypothesis is high, and hypotheses are cheap to generate, I would like to generate a great deal before selecting one to evaluate.
1Ben Pace5moA housemate of mine said to me they think LW has a lot of breadth, but could benefit from more depth. I think in general when we do intellectual work we have excellent epistemic standards, capable of listening to all sorts of evidence that other communities and fields would throw out, and listening to subtler evidence than most scientists ("faster than science"), but that our level of coordination and depth is often low. "LessWrongers should collaborate more and go into more depth in fleshing out their ideas" sounds more true to me than "LessWrongers have very low epistemic standards".
In general when we do intellectual work we have excellent epistemic standards, capable of listening to all sorts of evidence that other communities and fields would throw out, and listening to subtler evidence than most scientists ("faster than science")

"Being more openminded about what evidence to listen to" seems like a way in which we have lower epistemic standards than scientists, and also that's beneficial. It doesn't rebut my claim that there are some ways in which we have lower epistemic standards than many academic communities, and that's harmful.

In particular, the relevant question for me is: why doesn't LW have more depth? Sure, more depth requires more work, but on the timeframe of several years, and hundreds or thousands of contributors, it seems viable. And I'm proposing, as a hypothesis, that LW doesn't have enough depth because people don't care enough about depth - they're willing to accept ideas even before they've been explored in depth. If this explanation is correct, then it seems accurate to call it a problem with our epistemic standards - specifically, the standard of requiring (and rewarding) deep investigation and scholarship.

6John_Maxwell5moYour solution to the "willingness to accept ideas even before they've been explored in depth" problem is to explore ideas in more depth. But another solution is to accept fewer ideas, or hold them much more provisionally. I'm a proponent of the second approach because: * I suspect even academia doesn't hold ideas as provisionally as it should. See Hamming on expertise: https://forum.effectivealtruism.org/posts/mG6mckPHAisEbtKv5/should-you-familiarize-yourself-with-the-literature-before?commentId=SaXXQXLfQBwJc9ZaK [https://forum.effectivealtruism.org/posts/mG6mckPHAisEbtKv5/should-you-familiarize-yourself-with-the-literature-before?commentId=SaXXQXLfQBwJc9ZaK] * I suspect trying to browbeat people to explore ideas in more depth works against the grain of an online forum as an institution. Browbeating works in academia because your career is at stake, but in an online forum, it just hurts intrinsic motivation and cuts down on forum use (the forum runs on what Clay Shirky called "cognitive surplus", essentially a term for peoples' spare time and motivation). I'd say one big problem with LW 1.0 that LW 2.0 had to solve before flourishing was people felt too browbeaten to post much of anything. If we accept fewer ideas / hold them much more provisionally, but provide a clear path to having an idea be widely held as true, that creates an incentive for people to try & jump through hoops--and this incentive is a positive one, not a punishment-driven browbeating incentive. Maybe part of the issue is that on LW, peer review generally happens in the comments after you publish, not before. So there's no publication carrot to offer in exchange for overcoming the objections of peer reviewers.
4Richard_Ngo5mo"If we accept fewer ideas / hold them much more provisionally, but provide a clear path to having an idea be widely held as true, that creates an incentive for people to try & jump through hoops--and this incentive is a positive one, not a punishment-driven browbeating incentive." Hmm, it sounds like we agree on the solution but are emphasising different parts of it. For me, the question is: who's this "we" that should accept fewer ideas? It's the set of people who agree with my argument that you shouldn't believe things which haven't been fleshed out very much. But the easiest way to add people to that set is just to make the argument, which is what I've done. Specifically, note that I'm not criticising anyone for producing posts that are short and speculative: I'm criticising the people who update too much on those posts.
4John_Maxwell5moFair enough. I'm reminded of a time someone summarized one of my posts as being a definitive argument against some idea X and me thinking to myself "even I don't think my post definitively settles this issue" haha.
3Raemon5moYeah, this is roughly how I think about it. I do think right now LessWrong should lean more in the direction the Richard is suggesting – I think it was essential to establish better Babble procedures but now we're doing well enough on that front that I think setting clearer expectations of how the eventual pruning works is reasonable.
4Richard_Ngo5moI wanted to register that I don't like "babble and prune" as a model of intellectual development. I think intellectual development actually looks more like: 1. Babble 2. Prune 3. Extensive scholarship 4. More pruning 5. Distilling scholarship to form common knowledge And that my main criticism is the lack of 3 and 5, not the lack of 2 or 4. I also note that: a) these steps get monotonically harder, so that focusing on the first two misses *almost all* the work; b) maybe I'm being too harsh on the babble and prune framework because it's so thematically appropriate for me to dunk on it here; I'm not sure if your use of the terminology actually reveals a substantive disagreement.
2Raemon5moI basically agree with your 5-step model (I at least agree it's a more accurate description than Babel and Prune, which I just meant as rough shorthand). I'd add things like "original research/empiricism" or "more rigorous theorizing" to the "Extensive Scholarship" step. I see the LW Review as basically the first of (what I agree should essentially be at least) a 5 step process. It's adding a stronger Step 2, and a bit of Step 5 (at least some people chose to rewrite their posts to be clearer and respond to criticism) ... Currently, we do get non-zero Extensive Scholarship and Original Empiricism. (Kaj's Multi-Agent Models of Mind [https://www.lesswrong.com/s/ZbmRyDN8TCpBTZSip] seems like it includes real scholarship. Scott Alexander / Eli Tyre and Bucky's exploration into Birth Order Effects seemed like real empiricism). Not nearly as much as I'd like. But John's comment elsethread [https://www.lesswrong.com/s/ZbmRyDN8TCpBTZSip] seems significant: This reminded of a couple posts in the 2018 Review, Local Validity as Key to Sanity and Civilization [https://www.lesswrong.com/posts/WQFioaudEH8R7fyhm/local-validity-as-a-key-to-sanity-and-civilization] , and Is Clickbait Destroying Our General Intelligence? [https://www.lesswrong.com/posts/YicoiQurNBxSp7a65/is-clickbait-destroying-our-general-intelligence] . Both of those seemed like "sure, interesting hypothesis. Is it real tho?" During the Review I created a followup "How would we check if Mathematicians are Generally More Law Abiding? [https://www.lesswrong.com/posts/9MztEdRLeYcTDvuiZ/how-would-we-check-if-mathematicians-are-generally-more-law] " question, trying to move the question from Stage 2 to 3. I didn't get much serious response, probably because, well, it was a much harder question. But, honestly... I'm not sure it's actually a question that was worth asking. I'd like to know if Eliezer's hypothesis about mathematicians is true, but I'm not sure it ranks near the top of questions I'd want people to p
2John_Maxwell5mo1. All else equal, the harder something is, the less we should do it. 2. My quick take is that writing lit reviews/textbooks is a comparative disadvantage of LW relative to the mainstream academic establishment. In terms of producing reliable knowledge... if people actually care about whether something is true, they can always offer a cash prize for the best counterargument (which could of course constitute citation of academic research). The fact that people aren't doing this suggests to me that for most claims on LW, there isn't any (reasonably rich) person who cares deeply re: whether the claim is true. I'm a little wary of putting a lot of effort into supply if there is an absence of demand. (I guess the counterargument is that accurate knowledge is a public good so an individual's willingness to pay doesn't get you complete picture of the value accurate knowledge brings. Maybe what we need is a way to crowdfund bounties for the best argument related to something.) (I agree that LW authors would ideally engage more with each other and academic literature on the margin.)
4AllAmericanBreakfast5moI’ve been thinking about the idea of “social rationality” lately, and this is related. We do so much here in the way of training individual rationality - the inputs, functions, and outputs of a single human mind. But if truth is a product, then getting human minds well-coordinated to produce it might be much more important than training them to be individually stronger. Just as assembly line production is much more effective in producing almost anything than teaching each worker to be faster in assembling a complete product by themselves. My guess is that this could be effective not only in producing useful products, but also in overcoming biases. Imagine you took 5 separate LWers and asked them to create a unified consensus response to a given article. My guess is that they’d learn more through that collective effort, and produce a more useful response, than if they spent the same amount of time individually evaluating the article and posting their separate replies. Of course, one of the reasons we don’t to that so much is that coordination is an up-front investment and is unfamiliar. Figuring out social technology to make it easier to participate in might be a great project for LW.

There's been a fair amount of discussion of that sort of thing here: https://www.lesswrong.com/tag/group-rationality There are also groups outside LW thinking about social technology such as RadicalxChange.

Imagine you took 5 separate LWers and asked them to create a unified consensus response to a given article. My guess is that they’d learn more through that collective effort, and produce a more useful response, than if they spent the same amount of time individually evaluating the article and posting their separate replies.

I'm not sure. If you put those 5 LWers together, I think there's a good chance that the highest status person speaks first and then the others anchor on what they say and then it effectively ends up being like a group project for school with the highest status person in charge. Some related links.

3AllAmericanBreakfast5moThat’s definitely a concern too! I imagine such groups forming among people who either already share a basic common view, and collaborate to investigate more deeply. That way, any status-anchoring effects are mitigated. Alternatively, it could be an adversarial collaboration. For me personally, some of the SSC essays in this format have led me to change my mind in a lasting way.
2curi4moPeople also reject ideas before they've been explored in depth. I've tried to discuss [https://curi.us/2064-less-wrong-lacks-representatives-and-paths-forward] similar issues with LW [https://www.lesswrong.com/posts/oLScLrrfsGps8SduN/less-wrong-lacks-representatives-and-paths-forward] before but the basic response was roughly "we like chaos where no one pays attention to whether an argument has ever been answered by anyone; we all just do our own thing with no attempt at comprehensiveness or organizing who does what; having organized leadership of any sort, or anyone who is responsible for anything, would be irrational" (plus some suggestions that I'm low social status and that therefore I personally deserve to be ignored. there were also suggestions – phrased rather differently but amounting to this – that LW will listen more if published ideas are rewritten, not to improve on any flaws, but so that the new versions can be published at LW before anywhere else, because the LW community's attention allocation is highly biased towards that).
2Ben Pace5moI feel somewhat inclined to wrap up this thread at some point, even while there's more to say. We can continue if you like and have something specific or strong you'd like to ask, but otherwise will pause here.
1TAG5moYou have to realise that what you are doing isn't adequate in order to gain the motivation to do it better, and that is unlikely to happen if you are mostly communicating with other people who think everything is OK.
3TAG5moLesswrong is competing against philosophy as well as science, and philosophy has broader criterion of evidence still. In fact , lesswrongians are often frustrated that mainstream philosophy takes such topics as dualism or theism seriously.. even though theres an abundance of Bayesian evidence for them.
9Ruby5mo(Thanks for laying out your position in this level of depth. Sorry for how long this comment turned out. I guess I wanted to back up a bunch of my agreement with words. It's a comment for the sake of everyone else, not just you.) I think there's something to what you're saying, that the mentality itself could be better. The Sequences have been criticized [ it's more like "epistemic standards are low enough that they're an important bottleneck to progress".] because Eliezer didn't cite previous thinkers all that much, but at least as far as the science goes, as you said, he was drawing on academic knowledge. I also think we've lost something precious with the absence of epic topic reviews by the likes of Luke. Kaj Sotala still brings in heavily from outside knowledge, John Wentworth did a great review on Biological Circuits, and we get SSC crossposts that have that, but otherwise posts aren't heavily referencing or building upon outside stuff. I concede that I would like to see a lot more of that. I think Kaj was rightly disappointed that he didn't get more engagement with his post whose gist was "this is what the science really says about S1 & S2, one of your most cherished concepts, LW community". I wouldn't say the typical approach is strictly bad, there's value in thinking freshly for oneself or that failure to reference previous material shouldn't be a crime or makes a text unworthy, but yeah, it'd be pretty cool if after Alkjash laid out Babble & Prune (which intuitively feels so correct), someone had dug through what empirical science we have to see whether the picture lines up. Or heck, actually gone and done some kind of experiment. I bet it would turn up something interesting. And I think what you're saying is that the issue isn't just that people aren't following up with scholarship and empiricism on new ideas and models, but that they're actually forgetting that these are the next steps. Instead, they're overconfident in our homegrown models, as thou

This is only tangentially relevant, but adding it here as some of you might find it interesting:

Venkatesh Rao has an excellent Twitter thread on why most independent research only reaches this kind of initial exploratory level (he tried it for a bit before moving to consulting). It's pretty pessimistic, but there is a somewhat more optimistic follow-up thread on potential new funding models. Key point is that the later stages are just really effortful and time-consuming, in a way that keeps out a lot of people trying to do this as a side project alongside a separate main job (which I think is the case for a lot of LW contributors?)

Quote from that thread:

Research =

a) long time between having an idea and having something to show for it that even the most sympathetic fellow crackpot would appreciate (not even pay for, just get)

b) a >10:1 ratio of background invisible thinking in notes, dead-ends, eliminating options etc

With a blogpost, it’s like a week of effort at most from idea to mvp, and at most a 3:1 ratio of invisible to visible. That’s sustainable as a hobby/side thing.

To do research-grade thinking you basically have to be independently wealthy and accept 90% d

... (read more)
6Richard_Ngo5moAlso, I liked your blog post! More generally, I strongly encourage bloggers to have a "best of" page, or something that directs people to good posts. I'd be keen to read more of your posts but have no idea where to start.
6drossbucket5moThanks! I have been meaning to add a 'start here' page for a while, so that's good to have the extra push :) Seems particularly worthwhile in my case because a) there's no one clear theme and b) I've been trying a lot of low-quality experimental posts this year bc pandemic trashed motivation, so recent posts are not really reflective of my normal output. For now some of my better posts in the last couple of years might be Cognitive decoupling and banana phones [https://drossbucket.com/2019/10/23/cognitive-decoupling-and-banana-phones/] (tracing back the original precursor of Stanovich's idea), The middle distance [https://drossbucket.com/2019/10/24/the-middle-distance/] (a writeup of a useful and somewhat obscure idea from Brian Cantwell Smith's On the Origin of Objects), and the negative probability post [https://drossbucket.com/2019/08/01/negative-probability/] and its followup.
3Richard_Ngo5moThanks, these links seem great! I think this is a good (if slightly harsh) way of making a similar point to mine: "I find that autodidacts who haven’t experienced institutional R&D environments have a self-congratulatory low threshold for what they count as research. It’s a bit like vanity publishing or fan fiction. This mismatch doesn’t exist as much in indie art, consulting, game dev etc"
2DanielFilan5moAs mentioned in this comment [https://www.lesswrong.com/posts/K4eDzqS2rbcBDsCLZ/unrolling-social-metacognition-three-levels-of-meta-are-not?commentId=vEzubk5Fj8L99mKJq] , the Unrolling social metacognition paper is closely related to at least one research paper.
5Richard_Ngo5moRight, but this isn't mentioned in the post? Which seems odd. Maybe that's actually another example of the "LW mentality": why is the fact that there has been solid empirical research into 3 layers not being enough not important enough to mention in a post on why 3 layers isn't enough? (Maybe because the post was time-boxed? If so that seems reasonable, but then I would hope that people comment saying "Here's a very relevant paper, why didn't you cite it?")
7Zachary Robertson5moI think a distinction should be made between intellectual progress (whatever that is) and distillation. I know lots of websites that do amazing distillation of AI related concepts (literally distill.pub). I think most people would agree that sort of work is important in order to make intellectual progress, but I also think significantly less people would agree distillation is intellectual progress. Having this distinction in mind, I think your examples from AI are not as convincing. Perhaps more so once you consider the Less Wrong is often being used more as a platform to share these distillations than to create them. I think you're right that Less Wrong has some truly amazing content. However, once again, it seems a lot of these posts are not inherently from the ecosystem but are rather essentially cross-posted. If I say a lot of the content on LW is low-quality it's mostly an observation about what I expect to find from material that builds on itself. The quality of LW-style accumulated knowledge seems lower than it could be. On a personal note, I've actively tried to explore using this site as a way to engage with research and have come to a similar opinion as Richard. The most obvious barrier is the separation between LW and AIAF. Effectively, if you're doing AI safety research, to second-order approximation you can block LW (noise) and only look at AIAF (signal). I say to second-order because anything from LW that is signal ends up being posted on AIAF anyway which means the method is somewhat error-tolerant. This probably comes off as a bit pessimistic. Here's a concrete proposal I hope to try out soon enough. Pick a research question. Get a small group of people/friends together. Start talking about the problem and then posting on LW. Iterate until there's group consensus.

Much of the same is true of scientific journals. Creating a place to share and publish research is a pretty key piece of intellectual infrastructure, especially for researchers to create artifacts of their thinking along the way.

The point about being 'cross-posted' is where I disagree the most.

This is largely original content that counterfactually wouldn't have been published, or occasionally would have been published but to a much smaller audience. What Failure Looks Like wasn't crossposted, Anna's piece on reality-revealing puzzles wasn't crossposted. I think that Zvi would have still written some on mazes and simulacra, but I imagine he writes substantially more content given the cross-posting available for the LW audience. Could perhaps check his blogging frequency over the last few years to see if that tracks. I recall Zhu telling me he wrote his FAQ because LW offered an audience for it, and likely wouldn't have done so otherwise. I love everything Abram writes, and while he did have the Intelligent Agent Foundations Forum, it had a much more concise, technical style, tiny audience, and didn't have the conversational explanations and stories and cartoons that have... (read more)

5rohinmshah5moYeah, that's true, though it might have happened at some later point in the future as I got increasingly frustrated by people continuing to cite VNM at me (though probably it would have been a blog post and not a full sequence). Reading through this comment tree, I feel like there's a distinction to be made between "LW / AIAF as a platform that aggregates readership and provides better incentives for blogging", and "the intellectual progress caused by posts on LW / AIAF". The former seems like a clear and large positive of LW / AIAF, which I think Richard would agree with. For the latter, I tend to agree with Richard, though perhaps not as strongly as he does. Maybe I'd put it as, I only really expect intellectual progress from a few people who work on problems full time who probably would have done similar-ish work if not for LW / AIAF (but likely would not have made it public). I'd say this mostly for the AI posts. I do read the rationality posts and don't get a different impression from them, but I also don't think enough about them to be confident in my opinions there.
3Ben Pace5moBy "AN" do you mean the AI Alignment Forum, or "AIAF"?
1Zachary Robertson5moYa, totally messed up that. I meant the AI Alignment Forum or AIAF. I think out of habit I used AN (Alignment Newsletter)
2Ben Pace5moI did suspect you'd confused it with the Alignment Newsletter :)
5Ruby5moThanks for chiming in with this. People criticizing the epistemics is hopefully how we get better epistemics. When the Californian smoke isn't interfering with my cognition as much, I'll try to give your feedback (and Rohin's [https://www.lesswrong.com/posts/Wnqua6eQkewL3bqsF/matt-botvinick-on-the-spontaneous-emergence-of-learning?commentId=pYpPnAKrz64ptyRid] ) proper attention. I would generally be interested to hear your arguments/models in detail, if you get the chance to lay them out. My default position is LW has done well enough historically (e.g. Ben Pace's examples) for me to currently be investing in getting it even better. Epistemics and progress could definitely be a lot better, but getting there is hard. If I didn't see much progress on the rate of progress in the next year or two, I'd probably go focus on other things, though I think it'd be tragic if we ever lost what we have now. And another thought: Yes and no. Journal articles have their advantages, and so do blog posts. A bunch of recent LessWrong team's work has been around filling in the missing pieces for the system to work, e.g. Open Questions (hasn't yet worked for coordinating research), Annual Review, Tagging, Wiki. We often talk about conferences and "campus". My work on Open Questions involved thinking about i) a better template for articles than "Abstract, Intro, Methods, etc.", but Open Questions didn't work for unrelated reasons we haven't overcome yet, ii) getting lit reviews done systematically by people, iii) coordinating groups around research agendas. I've thought about re-attempting the goals of Open Questions with instead a "Research Agenda" feature that lets people communally maintain research agendas and work on them. It's a question of priorities whether I work on that anytime soon. I do really think many of the deficiencies of LessWrong's current work compared to academia are "infrastructure problems" at least as much as the epistemic standards of the community. Which me
7Richard_Ngo5moFor the record, I think the LW team is doing a great job. There's definitely a sense in which better infrastructure can reduce the need for high epistemic standards, but it feels like the thing I'm pointing at is more like "Many LW contributors not even realising how far away we are from being able to reliably produce and build on good ideas" (which feels like my criticism of Ben's position in his comment, so I'll respond more directly there).
5Pongo5moIt seems really valuable to have you sharing how you think we’re falling epistemically short and probably important for the site to integrate the insights behind that view. There are a bunch of ways I disagree with your claims about epistemic best practices, but it seems like it would be cool if I could pass your ITT more. I wish your attempt to communicate the problems you saw had worked out better. I hope there’s a way for you to help improve LW epistemics, but also get that it might be costly in time and energy.
4Viliam5moNow they're positive again. Confusing to me, their Ω-karma (karma on another website) is also positive. Does it mean they previously had negative LW-karma but positive Ω-karma? Or that their Ω-karma also improved as a result of you complaining on LW a few hours ago? Why would it? (Feature request: graph of evolution of comment karma as a function of time.)
2Richard_Ngo5moI'm confused, what is Ω-karma?
3MikkW5moAI Alignment Forum karma (which is also displayed here on posts that are crossposted)
1NaiveTortoise5moI'd be curious what, if any, communities you think set good examples in this regard. In particular, are there specific academic subfields or non-academic scenes that exemplify the virtues you'd like to see more of?
5Richard_Ngo5moMaybe historians of the industrial revolution? Who grapple with really complex phenomena and large-scale patterns, like us, but unlike us use a lot of data, write a lot of thorough papers and books, and then have a lot of ongoing debate on those ideas. And then the "progress studies" crowd is an example of an online community inspired by that tradition (but still very nascent, so we'll see how it goes). More generally I'd say we could learn to be more rigorous by looking at any scientific discipline or econ or analytic philosophy. I don't think most LW posters are in a position to put in as much effort as full-time researchers, but certainly we can push a bit in that direction.
3NaiveTortoise5moThanks for your reply! I largely agree with drossbucket [https://www.lesswrong.com/posts/FuGfR3jL3sw6r8kB4/ricraz-s-shortform?commentId=YkN5oaCnFuwZmJnJ5] 's reply. I also wonder how much this is an incentives problem. As you mentioned and in my experience, the fields you mentioned strongly incentivize an almost fanatical level of thoroughness that I suspect is very hard for individuals to maintain without outside incentives pushing them that way. At least personally, I definitely struggle and, frankly, mostly fail to live up to the sorts of standards you mention when writing blog posts in part because the incentive gradient feels like it pushes towards hitting the publish button. Given this, I wonder if there's a way to shift the incentives on the margin. One minor thing I've been thinking of trying for my personal writing is having a Knuth or Nintil [https://nintil.com/prove-wrong-get-money] style "pay for mistakes" policy. Do you have thoughts on other incentive structures to for rewarding rigor or punishing the lack thereof?
6Richard_Ngo5moIt feels partly like an incentives problem, but also I think a lot of people around here are altruistic and truth-seeking and just don't realise that there are much more effective ways to contribute to community epistemics than standard blog posts. I think that most LW discussion is at the level where "paying for mistakes" wouldn't be that helpful, since a lot of it is fuzzy. Probably the thing we need first are more reference posts that distill a range of discussion into key concepts, and place that in the wider intellectual context. Then we can get more empirical. (Although I feel pretty biased on this point, because my own style of learning about things is very top-down). I guess to encourage this, we could add a "reference" section for posts that aim to distill ongoing debates on LW. In some cases you can get a lot of "cheap" credit by taking other people's ideas and writing a definitive version of them aimed at more mainstream audiences. For ideas that are really worth spreading, that seems useful.

The crucial heuristic I apply when evaluating AI safety research directions is: could we have used this research to make humans safe, if we were supervising the human evolutionary process? And if not, do we have a compelling story for why it'll be easier to apply to AIs than to humans?

Sometimes this might be too strict a criterion, but I think in general it's very valuable in catching vague or unfounded assumptions about AI development.

2adamShimi1moBy making human safe, do you mean with regard to evolution's objective?
2Richard_Ngo1moNo. I meant: suppose we were rerunning a simulation of evolution, but can modify some parts of it (e.g. evolution's objective). How do we ensure that whatever intelligent species comes out of it is safe in the same ways we want AGIs to be safe? (You could also think of this as: how could some aliens overseeing human evolution have made humans safe by those aliens' standards of safety? But this is a bit trickier to think about because we don't know what their standards are. Although presumably current humans, being quite aggressive and having unbounded goals, wouldn't meet them).
4adamShimi1moOkay, thanks. Could you give me an example of a research direction that passes this test? The thing I have in mind right now is pretty much everything that backchain to local search [https://www.lesswrong.com/posts/qEjh8rpxjG4qGtfuK/the-backchaining-to-local-search-technique-in-ai-alignment] , but maybe that's not the way you think about it.
2Richard_Ngo1moSo I think Debate is probably the best example of something that makes a lot of sense when applied to humans, to the point where they're doing human experiments on it already. But this heuristic is actually a reason why I'm pretty pessimistic about most safety research directions.
4adamShimi1moSo I've been thinking about this for a while, and I think I disagree with what I understand of your perspective. Which might obviously mean I misunderstand your perspective. What I think I understand is that you judge safety research directions based on how well they could work on an evolutionary process like the one that created humans. But for me, the most promising approach to AGI is based on local search, which differs a bit from evolutionary process. I don't really see a reason to consider evolutionary processes instead of local search, and even then, the specific approach of evolution for humans is probably far too specific as a test bench. This matters because problems for one are not problems for the other. For example, one way to mess with an evolutionary process is to find way for everything to survive and reproduce/disseminate. Technology in general did that for humans, which means the evolutionary pressure decreased as technology evolved. But that's not a problem for local search, since at each step there will be only one next program. On the other hand, local search might be dangerous because of things like gradient hacking [https://www.alignmentforum.org/posts/uXH4r6MmKPedk8rMA/gradient-hacking]. And they don't make sense for evolutionary processes. In conclusion, I feel for the moment that backchaining to local search [https://www.lesswrong.com/posts/qEjh8rpxjG4qGtfuK/the-backchaining-to-local-search-technique-in-ai-alignment] is a better heuristic for judging safety research directions. But I'm curious about where our disagreement lies on this issue.
8Richard_Ngo1moOne source of our disagreement: I would describe evolution as a type of local search. The difference is that it's local with respect to the parameters of a whole population, rather than an individual agent. So this does introduce some disanalogies, but not particularly significant ones (to my mind). I don't think it would make much difference to my heuristic if we imagined that humans had evolved via gradient descent over our genes instead. In other words, I like the heuristic of backchaining to local search, and I think of it as a subset of my heuristic. The thing it's missing, though, is that it doesn't tell you which approaches will actually scale up to training regimes which are incredibly complicated, applied to fairly intelligent agents. For example, impact penalties make sense in a local search context for simple problems. But to evaluate whether they'll work for AGIs, you need to apply them to massively complex environments. So my intuition is that, because I don't know how to apply them to the human ancestral environment, we also won't know how to apply them to our AGIs' training environments. Similarly, when I think about MIRI's work on decision theory, I really have very little idea how to evaluate it in the context of modern machine learning. Are decision theories the type of thing which AIs can learn via local search? Seems hard to tell, since our AIs are so far from general intelligence. But I can reason much more easily about the types of decision theories that humans have, and the selective pressures that gave rise to them. As a third example, my heuristic endorses Debate due to a high-level intuition about how human reasoning works, in addition to a low-level intuition about how it can arise via local search.
4adamShimi1moSo if I try to summarize your position, it's something like: backchain to local search for simple and single-AI cases, and then think about aligning humans for the scaled and multi-agents version? That makes much more sense, thanks! I also definitely see why your full heuristic doesn't feel immediately useful to me: because I mostly focus on the simple and single-AI case. But I've been thinking more and more (in part thanks to your writing) that I should allocate more thinking time to the more general case. I hope your heuristic will help me there.
4Richard_Ngo1moCool, glad to hear it. I'd clarify the summary slightly: I think all safety techniques should include at least a rough intuition for why they'll work in the scaled-up version, even when current work on them only applies them to simple AIs. (Perhaps this was implicit in your summary already, I'm not sure.)

A well-known analogy from Yann LeCun: if machine learning is a cake, then unsupervised learning is the cake itself, supervised learning is the icing, and reinforcement learning is the cherry on top.

I think this is useful for framing my core concerns about current safety research:

• If we think that unsupervised learning will produce safe agents, then why will the comparatively small contributions of SL and RL make them unsafe?
• If we think that unsupervised learning will produce dangerous agents, then why will safety techniques which focus on SL and RL (i.e. basically all of them) work, when they're making comparatively small updates to agents which are already misaligned?

I do think it's more complicated than I've portrayed here, but I haven't yet seen a persuasive response to the core intuition.

2steve21521moI wrote a few posts on self-supervised learning last year: * https://www.lesswrong.com/posts/SaLc9Dv5ZqD73L3nE/the-self-unaware-ai-oracle [https://www.lesswrong.com/posts/SaLc9Dv5ZqD73L3nE/the-self-unaware-ai-oracle] * https://www.lesswrong.com/posts/EMZeJ7vpfeF4GrWwm/self-supervised-learning-and-agi-safety [https://www.lesswrong.com/posts/EMZeJ7vpfeF4GrWwm/self-supervised-learning-and-agi-safety] * https://www.lesswrong.com/posts/L3Ryxszc3X2J7WRwt/self-supervised-learning-and-manipulative-predictions [https://www.lesswrong.com/posts/L3Ryxszc3X2J7WRwt/self-supervised-learning-and-manipulative-predictions] I'm not aware of any airtight argument that "pure" self-supervised learning systems, either generically or with any particular architecture, are safe to use, to arbitrary levels of intelligence, though it seems very much worth someone trying to prove or disprove that. For my part, I got distracted by other things and haven't thought about it much since then. The other issue is whether "pure" self-supervised learning systems would be capable enough to satisfy our AGI needs, or to safely bootstrap to systems that are. I go back and forth on this. One side of the argument I wrote up here [https://www.lesswrong.com/posts/AKtn6reGFm5NBCgnd/in-defense-of-oracle-tool-ai-research] . The other side is, I'm now (vaguely) thinking that people need a reward system to decide what thoughts to think, and the fact that GPT-3 doesn't need reward is not evidence of reward being unimportant but rather evidence that GPT-3 is nothing like an AGI [https://www.lesswrong.com/posts/SkcM4hwgH3AP6iqjs/can-you-get-agi-from-a-transformer] . Well, maybe. For humans, self-supervised learning forms the latent representations, but the reward system controls action selection. It's not altogether unreasonable to think that action selection, and hence reward, is a more important thing to focus on for safety research. AGIs are dangerous when they take dangerous actions, to a first appr

In a bayesian rationalist view of the world, we assign probabilities to statements based on how likely we think they are to be true. But truth is a matter of degree, as Asimov points out. In other words, all models are wrong, but some are less wrong than others.

Consider, for example, the claim that evolution selects for reproductive fitness. Well, this is mostly true, but there's also sometimes group selection, and the claim doesn't distinguish between a gene-level view and an individual-level view, and so on...

So just assigning it a single probability seems inadequate. Instead, we could assign a probability distribution over its degree of correctness. But because degree of correctness is such a fuzzy concept, it'd be pretty hard to connect this distribution back to observations.

Or perhaps the distinction between truth and falsehood is sufficiently clear-cut in most everyday situations for this not to be a problem. But questions about complex systems (including, say, human thoughts and emotions) are messy enough that I expect the difference between "mostly true" and "entirely true" to often be significant.

Has this been discussed before? Given Less Wrong's name, I'd be surprised if not, but I don't think I've stumbled across it.

6habryka5dThis feels generally related to the problems covered in Scott and Abram's research over the past few years. One of the sentences that stuck out to me the most was (roughly paraphrased since I don't want to look it up): I.e. our current formulations of bayesianism like solomonoff induction only formulate the idea of a hypothesis at such a low level that even trying to think about a single hypothesis rigorously is basically impossible with bounded computational time. So in order to actually think about anything you have to somehow move beyond naive bayesianism.
2Richard_Ngo5dThis seems reasonable, thanks. But I note that "in order to actually think about anything you have to somehow move beyond naive bayesianism" is a very strong criticism. Does this invalidate everything that has been said about using naive bayesianism in the real world? E.g. every instance where Eliezer says "be bayesian". One possible answer is "no, because logical induction fixes the problem". My uninformed guess is that this doesn't work because there are comparable problems with applying to the real world. But if this is your answer, follow-up question: before we knew about logical induction, were the injunctions to "be bayesian" justified? (Also, for historical reasons, I'd be interested in knowing when you started believing this.)
2habryka4dI think it definitely changed a bunch of stuff for me, and does at least a bit invalidate some of the things that Eliezer said, though not actually very much. In most of his writing Eliezer used bayesianism as an ideal that was obviously unachievable, but that still gives you a rough sense of what the actual limits of cognition are, and rules out a bunch of methods of cognition as being clearly in conflict with that theoretical ideal. I did definitely get confused for a while and tried to apply Bayes to everything directly, and then felt bad when I couldn't actually apply bayes theorem in some situations, which I now realize is because those tended to be problems where embededness or logical uncertainty mattered a lot. My shift on this happened over the last 2-3 years or so. I think starting with Embedded Agency, but maybe a bit before that.
2Richard_Ngo4dWhich ones? In Against Strong Bayesianism [https://www.lesswrong.com/posts/5aAatvkHdPH6HT3P9/against-strong-bayesianism] I give a long list of methods of cognition that are clearly in conflict with the theoretical ideal, but in practice are obviously fine. So I'm not sure how we distinguish what's ruled out from what isn't. Can you give an example of a real-world problem where logical uncertainty doesn't matter a lot, given that without logical uncertainty, we'd have solved all of mathematics and considered all the best possible theories in every other domain?
2habryka4dI think in-practice there are lots of situations where you can confidently create a kind of pocket-universe where you can actually consider hypotheses in a bayesian way. Concrete example: Trying to figure out who voted a specific way on a LW post. You can condition pretty cleanly on vote-strength, and treat people's votes as roughly independent, so if you have guesses on how different people are likely to vote, it's pretty easy to create the odds ratios for basically all final karma + vote numbers and then make a final guess based on that. It's clear that there is some simplification going on here, by assigning static probabilities for people's vote behavior, treating them as independent (though modeling some subset of independence wouldn't be too hard), etc.. But overall I expect it to perform pretty well and to give you good answers. (Note, I haven't actually done this explicitly, but my guess is my brain is doing something pretty close to this when I do see vote numbers + karma numbers on a thread) Well, it's obvious that anything that claims to be better than the ideal bayesian update is clearly ruled out. I.e. arguments that by writing really good explanations of a phenomenon you can get to a perfect understanding. Or arguments that you can derive the rules of physics from first principles. There are also lots of hypotheticals where you do get to just use Bayes properly and then it provides very strong bounds on the ideal approach. There are a good number of implicit models behind lots of standard statistics models that when put into a bayesian framework give rise to a more general formulation. See the Wikipedia article for "Bayesian interpretations of regression" for a number of examples. Of course, in reality it is always unclear whether the assumptions that give rise to various regression methods actually hold, but I think you can totally say things like "given these assumption, the bayesian solution is the ideal one, and you can't perform better than
2Raemon4dAre you able to give examples of the times you tried to be Bayesian and it failed because embedded was?
1EpicNamer270985dScott and Abram? Who? Do they have any books I can read to familiarize myself with this discourse?
2habryka4dScott: https://lesswrong.com/users/scott-garrabrant [https://lesswrong.com/users/scott-garrabrant] Abram: https://lesswrong.com/users/abramdemski [https://lesswrong.com/users/abramdemski]
2Richard_Ngo5dScott Garrabrant and Abram Demski, two MIRI researchers. For introductions to their work, see the Embedded Agency sequence [https://www.alignmentforum.org/s/Rm6oQRJJmhGCcLvxh], the Consequences of Logical Induction sequence [https://www.alignmentforum.org/s/HmANELvkhAZ9eDxFS], and the Cartesian Frames sequence [https://www.alignmentforum.org/s/2A7rrZ4ySx6R8mfoT].
5DanielFilan5dRelated but not identical: this shortform post [https://www.lesswrong.com/posts/WgMhovN7Gs6Jpn3PH/shortform?commentId=KMFtzECWfB5TJxkXP] .
2Zack_M_Davis5dSee the section about scoring rules in the Technical Explanation [https://www.yudkowsky.net/rational/technical].
2Richard_Ngo5dHmmm, but what does this give us? He talks about the difference between vague theories and technical theories, but then says that we can use a scoring rule to change the probabilities we assign to each type of theory. But my question is still: when you increase your credence in a vague theory, what are you increasing your credence about? That the theory is true? Nor can we say that it's about picking the "best theory" out of the ones we have, since different theories may overlap partially.
2Zack_M_Davis4dIf we can quantify how good a theory is at making accurate predictions (or rather, quantify a combination of accuracy and simplicity [https://www.lesswrong.com/posts/mB95aqTSJLNR9YyjH/message-length]), that gives us a sense in which some theories are "better" (less wrong) than others, without needing theories to be "true".

Oracle-genie-sovereign is a really useful distinction that I think I (and probably many others) have avoided using mainly because "genie" sounds unprofessional/unacademic. This is a real shame, and a good lesson for future terminology.

4adamShimi2moAfter rereading the chapter in Superintelligence, it seems to me that "genie" captures something akin to act-based agents [https://ai-alignment.com/act-based-agents-8ec926c79e9c]. Do you think that's the main way to use this concept in the current state of the field, or do you have other applications in mind?
2Richard_Ngo2moAh, yeah, that's a great point. Although I think act-based agents is a pretty bad name, since those agents may often carry out a whole bunch of acts in a row - in fact, I think that's what made me overlook the fact that it's pointing at the right concept. So not sure if I'm comfortable using it going forward, but thanks for point that out.
2DanielFilan2moPerhaps the lesson is that terminology that is acceptable in one field (in this case philosophy) might not be suitable in another (in this case machine learning).
4Richard_Ngo1moI don't think that even philosophers take the "genie" terminology very seriously. I think the more general lesson is something like: it's particularly important to spend your weirdness points wisely when you want others to copy you, because they may be less willing to spend weirdness points.
1adamShimi2moIs that from Superintelligence? I googled it, and that was the most convincing result.
2Richard_Ngo1moYepp.

I suspect that AIXI is misleading to think about in large part because it lacks reusable parameters - instead it just memorises all inputs it's seen so far. Which means the setup doesn't have episodes, or a training/deployment distinction; nor is any behaviour actually "reinforced".

4DanielFilan2moI kind of think the lack of episodes makes it more realistic for many problems, but admittedly not for simulated games. Also, presumably many of the component Turing machines have reusable parameters and reinforce behaviour, altho this is hidden by the formalism. [EDIT: I retract the second sentence]
2DanielFilan2moActually I think this is total nonsense produced by me forgetting the difference between AIXI and Solomonoff induction.
2Richard_Ngo2moWait, really? I thought it made sense (although I'd contend that most people don't think about AIXI in terms of those TMs reinforcing hypotheses, which is the point I'm making). What's incorrect about it?
2DanielFilan2moWell now I'm less sure that it's incorrect. I was originally imagining that like in Solomonoff induction, the TMs basically directly controlled AIXI's actions, but that's not right: there's an expectimax. And if the TMs reinforce actions by shaping the rewards, in the AIXI formalism you learn that immediately and throw out those TMs.
2Richard_Ngo2moOh, actually, you're right (that you were wrong). I think I made the same mistake in my previous comment. Good catch.
2[comment deleted]2mo
4steve21522moHumans don't have a training / deployment distinction either... Do humans have "reusable parameters"? Not quite sure what you mean by that.
6Richard_Ngo2moYes we do: training is our evolutionary history, deployment is an individual lifetime. And our genomes are our reusable parameters. Unfortunately I haven't yet written any papers/posts really laying out this analogy, but it's pretty central to the way I think about AI, and I'm working on a bunch of related stuff as part of my PhD, so hopefully I'll have a more complete explanation soon.
2steve21522moOh, OK, I see what you mean. Possibly related: my comment here [https://www.lesswrong.com/posts/KrJfoZzpSDpnrv9va/draft-report-on-ai-timelines?commentId=TM84D4Jofq4fWdBuK] .

I've recently discovered waitwho.is, which collects all the online writing and talks of various tech-related public intellectuals. It seems like an important and previously-missing piece of infrastructure for intellectual progress online.

I believe that humans have already crossed a threshold that, in a certain sense, puts us on an equal footing with any other being who has mastered abstract reasoning. There’s a notion in computing science of “Turing completeness”, which says that once a computer can perform a set of quite basic operations, it can be programmed to do absolutely any calculation that any other computer can do. Other computers might be faster, or have more memory, or have multiple processors running at the same time, but my 1988 A
... (read more)

Equivocation. "Who's 'we', flesh man?" Even granting the necessary millions or billions of years for a human to sit down and emulate a superintelligence step by step, it is still not the human who understands, but the Chinese room.

1NaiveTortoise4moI've seen this quote before and always find it funny because when I read Greg Egan, I constantly find myself thinking there's no way I could've come up with the ideas he has even if you gave me months or years of thinking time.
3gwern4moYes, there's something to that, but you have to be careful if you want to use that as an objection. Maybe you wouldn't easily think of it, but that doesn't exclude the possibility of you doing it: you can come up with algorithms you can execute which would spit out Egan-like ideas, like 'emulate Egan's brain neuron by neuron'. (If nothing else, there's always the ol' dovetail-every-possible-Turing-machine hammer.) Most of these run into computational complexity problems, but that's the escape hatch Egan (and Scott Aaronson has made a similar argument) leaves himself by caveats like 'given enough patience, and a very large notebook'. Said patience might require billions of years, and the notebook might be the size of the Milky Way galaxy, but those are all finite numbers, so technically Egan is correct as far as that goes.
1NaiveTortoise4moYeah good point - given generous enough interpretation of the notebook my rejection doesn't hold. It's still hard for me to imagine that response feeling meaningful in the context but maybe I'm just failing to model others well here.

There's some possible world in which the following approach to interpretability works:

• Put an AGI in a bunch of situations where it sometimes is incentivised to lie and sometimes is incentivised to tell the truth.
• Train a lie detector which is given all its neural weights as input.
• Then ask the AGI lots of questions about its plans.

One problem that this approach would face if we were using it to interpret a human is that the human might not consciously be aware of what their motivations are. For example, they may believe they are doing something for altr... (read more)

I've heard people argue that "most" utility functions lead to agents with strong convergent instrumental goals. This obviously depends a lot on how you quantify over utility functions. Here's one intuition in the other direction. I don't expect this to be persuasive to most people who make the argument above (but I'd still be interested in hearing why not).

If a non-negligible percentage of an agent's actions are random, then to describe it as a utility-maximiser would require an incredibly complex utility function (becaus... (read more)

4TurnTrout9moI'm not sure if you consider me to be making that argument [https://www.lesswrong.com/posts/6DuJxY8X45Sco4bS2/seeking-power-is-instrumentally-convergent-in-mdps] , but here are my thoughts: I claim that most reward functions lead to agents with strong convergent instrumental goals. However, I share your intuition that (somehow) uniformly sampling utility functions over universe-histories might not lead to instrumental convergence. To understand instrumental convergence and power-seeking, consider how many reward functions we might specify automatically imply a causal mechanism for increasing reward. The structure of the reward function implies that more is better, and that there are mechanisms for repeatedly earning points (for example, by showing itself a high-scoring input). Since the reward function is "simple" (there's usually not a way to grade exact universe histories), these mechanisms work in many different situations and points in time. It's naturally incentivized to assure its own safety in order to best leverage these mechanisms for gaining reward. Therefore, we shouldn't be surprised to see a lot of these simple goals leading to the same kind of power-seeking behavior. What structure is implied by a reward function? * Additive/Markovian: while a utility function might be over an entire universe-history, reward is often additive over time steps. This is a strong constraint which I don't always expect to be true, but i think that among the goals with this structure, a greater proportion of them have power-seeking incentives. * Observation-based: while a utility function might be over an entire universe-history, the atom of the reward function is the observation. Perhaps the observation is an input to update a world model, over which we have tried to define a reward function. I think that most ways of doing this lead to power-seeking incentives. * Agent-centric: reward functions are defined with respect to what the agent ca
5Richard_Ngo9moI've just put up a post [https://www.lesswrong.com/posts/5aAatvkHdPH6HT3P9/against-bayesianism] which serves as a broader response to the ideas underpinning this type of argument.
2Richard_Ngo9moI think this depends a lot on how you model the agent developing. If you start off with a highly intelligent agent which has the ability to make long-term plans, but doesn't yet have any goals, and then you train it on a random reward function - then yes, it probably will develop strong convergent instrumental goals. On the other hand, if you start off with a randomly initialised neural network, and then train it on a random reward function, then probably it will get stuck in a local optimum pretty quickly, and never learn to even conceptualise these things called "goals". I claim that when people think about reward functions, they think too much about the former case, and not enough about the latter. Because while it's true that we're eventually going to get highly intelligent agents which can make long-term plans, it's also important that we get to control what reward functions they're trained on up to that point. And so plausibly we can develop intelligent agents that, in some respects, are still stuck in "local optima" in the way they think about convergent instrumental goals - i.e. they're missing whatever cognitive functionality is required for being ambitious on a large scale.
2TurnTrout9moAgreed – I should have clarified. I've been mostly discussing instrumental convergence with respect to optimal policies. The path through policy space is also important.
4Richard_Ngo9moMakes sense. For what it's worth, I'd also argue that thinking about optimal policies at all is misguided (e.g. what's the optimal policy for humans - the literal best arrangement of neurons we could possibly have for our reproductive fitness? Probably we'd be born knowing arbitrarily large amounts of information. But this is just not relevant to predicting or modifying our actual behaviour at all).
2TurnTrout9moI disagree. 1. We do in fact often train agents using algorithms which are proven to eventually converge to the optimal policy.[1] [#fn-JSonSFrQAasDKZR3n-1] Even if we don't expect the trained agents to reach the optimal policy in the real world, we should still understand what behavior is like at optimum. If you think your proposal is not aligned at optimum but is aligned for realistic training paths, you should have a strong story for why. 2. Formal theorizing about instrumental convergence with respect to optimal behavior is strictly easier than theorizing about ϵ-optimal behavior, which I think is what you want for a more realistic treatment of instrumental convergence for real agents. Even if you want to think about sub-optimal policies, if you don't understand optimal policies... good luck! Therefore, we also have an instrumental (...) interest in studying the behavior at optimum. -------------------------------------------------------------------------------- 1. At least, the tabular algorithms are proven, but no one uses those for real stuff. I'm not sure what the results are for function approximators, but I think you get my point. ↩︎ [#fnref-JSonSFrQAasDKZR3n-1]
2Richard_Ngo9mo1. I think it's more accurate to say that, because approximately none of the non-trivial theoretical results hold for function approximation, approximately none of our non-trivial agents are proven to eventually converge to the optimal policy. (Also, given the choice between an algorithm without convergence proofs that works in practice, and an algorithm with convergence proofs that doesn't work in practice, everyone will use the former). But we shouldn't pay any attention to optimal policies anyway, because the optimal policy in an environment anything like the real world is absurdly, impossibly complex, and requires infinite compute. 2. I think theorizing about ϵ-optimal behavior is more useful than theorizing about optimal behaviour by roughly ϵ, for roughly the same reasons. But in general, clearly I can understand things about suboptimal policies without understanding optimal policies. I know almost nothing about the optimal policy in StarCraft, but I can still make useful claims about AlphaStar (for example: it's not going to take over the world). Again, let's try cash this out. I give you a human - or, say, the emulation of a human, running in a simulation of the ancestral environment. Is this safe? How do you make it safer? What happens if you keep selecting for intelligence? I think that the theorising you talk about will be actively harmful for your ability to answer these questions.
2TurnTrout9moI'm confused, because I don't disagree with any specific point you make - just the conclusion. Here's my attempt at a disagreement which feels analogous to me: My response in this "debate" is: if you start with a spherical cow and then consider which real world differences are important enough to model, you're better off than just saying "no one should think about spherical cows". I don't understand why you think that. If you can have a good understanding of instrumental convergence and power-seeking for optimal agents, then you can consider whether any of those same reasons apply for suboptimal humans. Considering power-seeking for optimal agents is a relaxed problem [https://www.lesswrong.com/posts/JcpwEKbmNHdwhpq5n/problem-relaxation-as-a-tactic] . Yes, ideally, we would instantly jump to the theory that formally describes power-seeking for suboptimal agents with realistic goals in all kinds of environments. But before you do that, a first step is understanding power-seeking in MDPs [https://www.lesswrong.com/posts/6DuJxY8X45Sco4bS2/seeking-power-is-provably-instrumentally-convergent-in-mdps] . Then, you can take formal insights from this first step and use them to update your pre-theoretic intuitions where appropriate.
5Richard_Ngo9moThanks for engaging despite the opacity of the disagreement. I'll try to make my position here much more explicit (and apologies if that makes it sound brusque). The fact that your model is a simplified abstract model is not sufficient to make it useful. Some abstract models are useful. Some are misleading and will cause people who spend time studying them to understand the underlying phenomenon less well than they did before. From my perspective, I haven't seen you give arguments that your models are in the former category not the latter. Presumably you think they are in fact useful abstractions - why? (A few examples of the latter: behaviourism, statistical learning theory, recapitulation theory [https://en.wikipedia.org/wiki/Recapitulation_theory], Gettier-style analysis of knowledge). My argument for why they're overall misleading: when I say that "the optimal policy in an environment anything like the real world is absurdly, impossibly complex, and requires infinite compute", or that safety researchers shouldn't think about AIXI, I'm not just saying that these are inaccurate models. I'm saying that they are modelling fundamentally different phenomena than the ones you're trying to apply them to. AIXI is not "intelligence", it is brute force search, which is a totally different thing that happens to look the same in the infinite limit. Optimal tabular policies are not skill at a task, they are a cheat sheet, but they happen to look similar in very simple cases. Probably the best example of what I'm complaining about is Ned Block trying to use Blockhead [https://en.wikipedia.org/wiki/Blockhead_(thought_experiment)] to draw conclusions about intelligence. I think almost everyone around here would roll their eyes hard at that. But then people turn around and use abstractions that are just as unmoored from reality as Blockhead, often in a very analogous way. (This is less a specific criticism of you, TurnTrout, and more a general criticism of the field). Forgive
4TurnTrout9moThanks for elaborating this interesting critique. I agree we generally need to be more critical of our abstractions. Falsifying claims and "breaking" proposals is a classic element of AI alignment discourse and debate. Since we're talking about superintelligent agents, we can't predict exactly what a proposal would do. However, if I make a claim ("a superintelligent paperclip maximizer would keep us around because of gains from trade"), you can falsify this by showing that my claimed policy is dominated by another class of policies ("we would likely be comically resource-inefficient in comparison; GFT arguments don't model dynamics which allow killing other agents and appropriating their resources"). Even we can come up with this dominant policy class, so the posited superintelligence wouldn't miss it either. We don't know what the superintelligent policy will be, but we know what it won't be (see also Formalizing convergent instrumental goals [https://intelligence.org/2015/11/26/new-paper-formalizing-convergent-instrumental-goals/] ). Even though I don't know how Gary Kasparov will open the game, I confidently predict that he won't let me checkmate him in two moves. NON-OPTIMAL POWER AND INSTRUMENTAL CONVERGENCE Instead of thinking about optimal policies, let's consider the performance of a given algorithm A. A(M,R) takes a rewardless MDP M and a reward function R as input, and outputs a policy. Definition. Let R be a continuous distribution over reward functions with CDF F. The average return achieved by algorithm A at state s and discount rate γ is∫RVA (M,R)R(s,γ)dF(R). Instrumental convergence with respect to A's policies can be defined similarly ("what is the R-measure of a given trajectory under A?"). The theory I've laid out allows precise claims, which is a modest benefit to our understanding. Before, we just had intuitions about some vague concept called "instrumental convergence". Here's bad reasoning, which implies that the cow tears a hole in spac
4Richard_Ngo9moI'm afraid I'm mostly going to disengage here, since it seems more useful to spend the time writing up more general + constructive versions of my arguments, rather than critiquing a specific framework. If I were to sketch out the reasons I expect to be skeptical about this framework if I looked into it in more detail, it'd be something like: 1. Instrumental convergence isn't training-time behaviour, it's test-time behaviour. It isn't about increasing reward, it's about achieving goals (that the agent learned by being trained to increase reward). 2. The space of goals that agents might learn is very different from the space of reward functions. As a hypothetical, maybe it's the case that neural networks are just really good at producing deontological agents, and really bad at producing consequentialists. (E.g, if it's just really really difficult for gradient descent to get a proper planning module working). Then agents trained on almost all reward functions will learn to do well on them without developing convergent instrumental goals. (I expect you to respond that being deontological won't get you to optimality. But I would say that talking about "optimality" here ruins the abstraction, for reasons outlined in my previous comment).
2TurnTrout9moI was actually going to respond, "that's a good point, but (IMO) a different concern than the one you initially raised". I see you making two main critiques. 1. (paraphrased) "A won't produce optimal policies for the specified reward function [even assuming alignment generalization off of the training distribution], so your model isn't useful" – I replied to this critique above. 2. "The space of goals that agents might learn is very different from the space of reward functions." I agree this is an important part of the story. I think the reasonable takeaway is "current theorems on instrumental convergence [https://arxiv.org/abs/1912.01683] help us understand what superintelligent A won't do, assuming no reward-result gap. Since we can't assume alignment generalization, we should keep in mind how the inductive biases of gradient descent affect the eventual policy produced." I remain highly skeptical of the claim that applying this idealized theory of instrumental convergence worsens our ability to actually reason about it. ETA: I read some information you privately messaged me, and i see why you might see the above two points as a single concern.
2Pattern9moIs the point that people try to use algorithms which they think will eventually converge to the optimal policy? (Assuming there is one.)
2TurnTrout9moSomething like that, yeah.
2DanielFilan5moI object to the claim that agents that act randomly can be made "arbitrarily simple". Randomness is basically definitionally complicated!
2Richard_Ngo5moEh, this seems a bit nitpicky. It's arbitrarily simple given a call to a randomness oracle, which in practice we can approximate pretty easily. And it's "definitionally" easy to specify as well: "the function which, at each call, returns true with 50% likelihood and false otherwise."
2DanielFilan5moIf you get an 'external' randomness oracle, then you could define the utility function pretty simply in terms of the outputs of the oracle. If the agent has a pseudo-random number generator (PRNG) inside it, then I suppose I agree that you aren't going to be able to give it a utility function that has the standard set of convergent instrumental goals, and PRNGs can be pretty short. (Well, some search algorithms are probably shorter, but I bet they have higher Kt complexity, which is probably a better measure for agents)
2Vaniver9moI'd take a different tack here, actually; I think this depends on what the input to the utility function is. If we're only allowed to look at 'atomic reality', or the raw actions the agent takes, then I think your analysis goes through, that we have a simple causal process generating the behavior but need a very complicated utility function to make a utility-maximizer that matches the behavior. But if we're allowed to decorate the atomic reality with notes like "this action was generated randomly", then we can have a utility function that's as simple as the generator, because it just counts up the presence of those notes. (It doesn't seem to me like this decorator is meaningfully more complicated than the thing that gave us "agents taking actions" as a data source, so I don't think I'm paying too much here.) This can lead to a massive explosion in the number of possible utility functions (because there's a tremendous number of possible decorators), but I think this matches the explosion that we got by considering agents that were the outputs of causal processes in the first place. That is, consider reasoning about python code that outputs actions in a simple game, where there are many more possible python programs than there are possible policies in the game.
2Richard_Ngo9moSo in general you can't have utility functions that are as simple as the generator, right? E.g. the generator could be deontological. In which case your utility function would be complicated. Or it could be random, or it could choose actions by alphabetical order, or... And so maybe you can have a little note for each of these. But now what it sounds like is: "I need my notes to be able to describe every possible cognitive algorithm that the agent could be running". Which seems very very complicated. I guess this is what you meant by the "tremendous number" of possible decorators. But if that's what you need to do to keep talking about "utility functions", then it just seems better to acknowledge that they're broken as an abstraction. E.g. in the case of python code, you wouldn't do anything analogous to this. You would just try to reason about all the possible python programs directly. Similarly, I want to reason about all the cognitive algorithms directly.
2Vaniver9moThat's right. I realized my grandparent comment is unclear here: This should have been "consequence-desirability-maximizer" or something, since the whole question is "does my utility function have to be defined in terms of consequences, or can it be defined in terms of arbitrary propositions?". If I want to make the deontologist-approximating Innocent-Bot, I have a terrible time if I have to specify the consequences that correspond to the bot being innocent and the consequences that don't, but if you let me say "Utility = 0 - badness of sins committed" then I've constructed a 'simple' deontologist. (At least, about as simple as the bot that says "take random actions that aren't sins", since both of them need to import the sins library.) In general, I think it makes sense to not allow this sort of elaboration of what we mean by utility functions, since the behavior we want to point to is the backwards assignment of desirability to actions based on the desirability of their expected consequences, rather than the expectation of any arbitrary property. --- Actually, I also realized something about your original comment which I don't think I had the first time around; if by "some reasonable percentage of an agent's actions are random" you mean something like "the agent does epsilon-exploration" or "the agent plays an optimal mixed strategy", then I think it doesn't at all require a complicated utility function to generate identical behavior. Like, in the rock-paper-scissors world, and with the simple function 'utility = number of wins', the expected utility maximizing move (against tough competition) is to throw randomly, and we won't falsify the simple 'utility = number of wins' hypothesis by observing random actions. Instead I read it as something like "some unreasonable percentage of an agent's actions are random", where the agent is performing some simple-to-calculate mixed strategy that is either suboptimal or only optimal by luck (when the optimal mixed strat
4Richard_Ngo9moThis is in fact the intended reading, sorry for ambiguity. Will edit. But note that there are probably very few situations where exploring via actual randomness is best; there will almost always be some type of exploration which is more favourable. So I don't think this helps. To be pedantic: we care about "consequence-desirability-maximisers" (or in Rohin's terminology, goal-directed agents) because they do backwards assignment. But I think the pedantry is important, because people substitute utility-maximisers for goal-directed agents, and then reason about those agents by thinking about utility functions, and that just seems incorrect. What do you mean by optimal here? The robot's observed behaviour will be optimal for some utility function, no matter how long you run it.
2Vaniver9moValid point. This also seems right. Like, my understanding of what's going on here is we have: * 'central' consequence-desirability-maximizers, where there's a simple utility function that they're trying to maximize according to the VNM axioms * 'general' consequence-desirability-maximizers, where there's a complicated utility function that they're trying to maximize, which is selected because it imitates some other behavior The first is a narrow class, and depending on how strict you are with 'maximize', quite possibly no physically real agents will fall into it. The second is a universal class, which instantiates the 'trivial claim' that everything is utility maximization. Put another way, the first is what happens if you hold utility fixed / keep utility simple, and then examine what behavior follows; the second is what happens if you hold behavior fixed / keep behavior simple, and then examine what utility follows. Distance from the first is what I mean by "the further a robot's behavior is from optimal"; I want to say that I should have said something like "VNM-optimal" but actually I think it needs to be closer to "simple utility VNM-optimal." I think you're basically right in calling out a bait-and-switch that sometimes happens, where anyone who wants to talk about the universality of expected utility maximization in the trivial 'general' sense can't get it to do any work, because it should all add up to normality, and in normality there's a meaningful distinction between people who sort of pursue fuzzy goals and ruthless utility maximizers.