All Posts

Sorted by Magic (New & Upvoted)

Monday, December 9th 2019
Mon, Dec 9th 2019

No posts for December 9th 2019

Sunday, December 8th 2019
Sun, Dec 8th 2019

No posts for December 8th 2019
Shortform [Beta]
28BrienneYudkowsky9hSuppose you wanted to improve your social relationships on the community level. (I think of this as “my ability to take refuge in the sangha”.) What questions might you answer now, and then again in one year, to track your progress? Here’s what’s come to mind for me so far. I’m probably missing a lot and would really like your help mapping things out. I think it’s a part of the territory I can only just barely perceive at my current level of development. * If something tragic happened to you, such as a car crash that partially paralyzed you or the death of a loved one, how many people can you name whom you'd find it easy and natural to ask for help with figuring out your life afterward? * For how many people is it the case that if they were hospitalized for at least a week you would visit them in the hospital? * Over the past month, how lonely have you felt? * In the past two weeks, how often have you collaborated with someone outside of work? * To what degree do you feel like your friends have your back? * Describe the roll of community in your life. * How do you feel as you try to describe the roll of community in your life? * When's the last time you got angry with someone and confronted them one on one as a result? * When's the last time you apologized to someone? * How strong is your sense that you're building something of personal value with the people around you? * When's the last time you spent more than ten minutes on something that felt motivated by gratitude? * When a big change happens in your life, such as loosing your job or having a baby, how motivated do you feel to share the experience with others? * When you feel motivated to share an experience with others, how satisfied do you tend to be with your attempts to do that? * Do you know the love languages of your five closest friends? To what extent does that influence how you behave toward them? * Does it seem to you that your friends know your love
4AlexMennen1dTheorem: Fuzzy beliefs (as in https://www.alignmentforum.org/posts/Ajcq9xWi2fmgn8RBJ/the-credit-assignment-problem#X6fFvAHkxCPmQYB6v [https://www.alignmentforum.org/posts/Ajcq9xWi2fmgn8RBJ/the-credit-assignment-problem#X6fFvAHkxCPmQYB6v] ) form a continuous DCPO. (At least I'm pretty sure this is true. I've only given proof sketches so far) The relevant definitions: A fuzzy belief over a set X is a concave function ϕ:ΔX→[0,1] such that sup(ϕ)=1 (where ΔX is the space of probability distributions on X). Fuzzy beliefs are partially ordered by ϕ≤ψ⟺∀μ∈ΔX:ϕ(μ)≥ψ(μ) . The inequalities reverse because we want to think of "more specific"/"less fuzzy" beliefs as "greater", and these are the functions with lower values; the most specific/least fuzzy beliefs are ordinary probability distributions, which are represented as the concave hull of the function assigning 1 to that probability distribution and 0 to all others; these should be the maximal fuzzy beliefs. Note that, because of the order-reversal, the supremum of a set of functions refers to their pointwise infimum. A DCPO (directed-complete partial order) is a partial order in which every directed subset has a supremum. In a DCPO, define x<<y to mean that for every directed set D with supD≥y, ∃d∈D such that d≥x. A DCPO is continuous if for every y , y=sup{x∣x<<y}. Lemma: Fuzzy beliefs are a DCPO. Proof sketch: Given a directed set D , (supD)(μ)=min{d(μ)∣d∈D} is convex, and {μ∣(supD)(μ)=1}=⋂d∈D{μ∣d(μ)=1}. Each of the sets in that intersection are non-empty, hence so are finite intersections of them since D is directed, and hence so is the whole intersection since ΔX is compact. Lemma: ϕ<<ψ iff {μ∣ψ(μ)=1} is contained in the interior of {μ∣ϕ(μ)=1} and for every μ such that ψ(μ)≠1, ϕ(μ)>ψ(μ). Proof sketch: If supD≥ψ, then ⋂d∈D{μ∣d(μ)=1}⊆{μ∣ψ(μ)=1} , so by compactness of ΔX and directedness of D, there should be d∈D such that {μ∣d(μ)=1}⊆int({μ∣ϕ(μ)=1}). Similarly, for each μ such that ψ(μ)≠1, there should be dμ∈D s
3[comment deleted]15h

Saturday, December 7th 2019
Sat, Dec 7th 2019

No posts for December 7th 2019
Shortform [Beta]
11Adam Scholl2dSo apparently Otzi the Iceman [https://en.wikipedia.org/wiki/%C3%96tzi] still has a significant amount [https://sci-hub.tw/https://link.springer.com/article/10.1007/s00018-013-1360-y] of brain tissue. Conceivably some memories are preserved?
10Raemon2dIn response to lifelonglearner's comment I did some experimenting with making the page a bit bolder. Curious what people think of this screenshot where "unread" posts are bold, and "read" posts are "regular" (as opposed to the current world, where "unread" posts "regular", and read posts are light-gray).

Friday, December 6th 2019
Fri, Dec 6th 2019

No posts for December 6th 2019
Shortform [Beta]
42BrienneYudkowsky2dSome advice to my past self about autism: Learn about what life is like for people with a level 2 or 3 autism diagnosis. Use that reference class to predict the nature of your problems and the strategies that are likely to help. Only after making those predictions, adjust for your own capabilities and circumstances. Try this regardless of how you feel about calling yourself autistic or seeking a diagnosis. Just see what happens. Many stereotypically autistic behaviors are less like symptoms of an illness, and more like excellent strategies for getting shit done and having a good life. It’s just hard to get them all working together. Try leaning into those behaviors and see what’s good about them. For example, you know how when you accidentally do something three times in a row, you then feel compelled to keep doing it the same way at the same time forever? Studying this phenomenon in yourself will lead you to build solid and carefully designed routines that allow you to be a lot more reliably vibrant. You know how some autistic people have one-on-one aides, caretakers, and therapists who assist in their development and day-to-day wellbeing? Read a bit about what those aides do. You’ll notice right away that the state of the art in this area is crap, but try to imagine what professional autism aides might do if they really had things figured out and were spectacular at their jobs. Then devote as many resources as you can spare for a whole year to figuring out how to perform those services for yourself. It seems to me that most of what’s written about autism by neurotypicals severely overemphasizes social stuff. You’ll find almost none of it compelling. Try to understand what’s really going on with autism, and your understanding will immediately start paying off in non-social quality of life improvements. Keep at it, and it’ll eventually start paying off in deep and practical social insights as well (which I know you don’t care about right now, but it’s true). I
21Raemon3dOver in this thread, Said asked [https://www.lesswrong.com/posts/5zSbwSDgefTvmWzHZ/affordance-widths#iM4Jfa3ThJcFii2Pm] the reasonable question "who exactly is the target audience with this Best of 2018 book?" I'm working on a post that goes into a bit more detail about the Review Phase, and, to be quite honest, the whole process is a bit in flux – I expect us (the LW team as well as site participants) to learn, over the course of the review process, what aspects of it are most valuable. But, a quick "best guess" answer for now. I see the overall review process as having two "major phases": * Phase 1: Nomination/Review/Voting/Post-that-summarizes-the-voting * Phase 2: Compilation and Publication I think the first phase should be oriented entirely around "internal consumption" – figuring out what epistemic standard to hold ourselves to, and how, so that we can do better in the future. (As well as figuring out what ideas we've developed that should be further built upon). Any other benefits are incidental. The final book/sequence is at least somewhat externally facing. I do expect it to be some people's first introduction to LessWrong, and other people's "one thing they read from LW this year". And at least some consideration should be given to those people's reading experience (which will be lacking a lot of context). But my guess is that should come more in the form of context-setting editor commentary than in decisions about what to include. I think “here are the fruits of our labors; take them and make use of them” is more of what I was aiming for. (Although "what standards are we internally holding ourselves to, and what work should we build towards?" is still an important function of the finished product). It'd be nice if people were impressed, but a better frame for that goal is "Outsiders looking in can get an accurate picture of how productive our community is, and what sort of things we do", and maybe they are impressed by that or maybe not. (I re

Thursday, December 5th 2019
Thu, Dec 5th 2019

No posts for December 5th 2019
Shortform [Beta]
8BrienneYudkowsky3dThread on The Abolition of Man by C. S. Lewis
8Raemon4dAfter this weeks's stereotypically sad experience with the DMV.... (spent 3 hours waiting in lines, filling out forms, finding out I didn't bring the right documentation, going to get the right documentation, taking a test, finding out somewhere earlier in the process a computer glitched and I needed to go back and start over, waiting more, finally getting to the end only to learn I was also missing another piece of identification which rendered the whole process moot) ...and having just looked over a lot of 2018 posts [https://www.lesswrong.com/nominations] investigating coordination failure... I find myself wondering if it's achievable to solve one particular way in which bureaucracy is terrible: the part where each node/person in the system only knows a small number of things, so you have to spend a lot of time rehashing things, and meanwhile can't figure out if your goal is actually achievable. (While attempting to solve this problem, it's important to remember that at least some of the inconvenience of bureaucracy may be an active ingredient [https://slatestarcodex.com/2018/08/30/bureaucracy-as-active-ingredient/] rather than inefficiency. But at least in this case it didn't seem so: drivers licenses aren't a conserved resource that the DMV wants to avoid handing out. If I had learned early on that I couldn't get my license last Monday it would have not only saved me time, but saved DMV employee hassle) I think most of the time there's just no incentive to really fix this sort of thing (while you might have saved DMV employee hassle, you probably wouldn't save them time, since they still just work the same 8 hour shift regardless. And if you're the manager of a DMV you probably don't care too much about your employees having slightly nicer days. But, I dunno man, really!?. Does it seem like at least Hot New Startups could be sold on software that, I dunno, tracks all the requirements of a bureaucratic process and tries to compile "will this work?" at sta
4AABoyles3dAttention Conservation Warning: I envision a model which would demonstrate something obvious, and decide the world probably wouldn't benefit from its existence. The standard publication bias is that we must be 95% certain a described phenomenon exists before a result is publishable (at which time it becomes sufficiently "confirmed" to treat the phenomenon as a factual claim). But the statistical confidence of a phenomenon conveys interesting and useful information regardless of what that confidence is. Consider the space of all possible relationships: most of these are going to be absurd (e.g. the relationship between number of minted pennies and number of atoms in moons of Saturn), and exhibit no correlation. Some will exhibit weak correlations (in the range of p = 0.5). Those are still useful evidence that a pathway to a common cause exists! The universal prior on random relationships should be roughly zero, because most relationships will be absurd. What would science look like if it could make efficient use of the information disclosed by presently unpublishable results? I think I can generate a sort of agent-based model to imagine this. Here's the broad outline: 1. Create a random DAG representing some complex related phenomena. 2. Create an agent which holds beliefs about the relationship between nodes in the graph, and updates its beliefs when it discovers a correlation with p > 0.95. 3. Create a second agent with the same belief structure, but which updates on every experiment regardless of the correlation. 4. On each iteration have each agent select two nodes in the graph, measure their correlation, and update their beliefs. Then have them compute the DAG corresponding to their current belief matrix. Measure the difference between the DAG they output and the original DAG created in step 1. I believe that both agents will converge on the correct DAG, but the un-publication-biased agent will converge much more rapidly. There a
2Chris_Leong4dEDT agents handle Newcomb's problem as follows: they observe that agents who encounter the problem and one-box do better on average than those who encounter the problem and two-box, so they one-box. That's the high-level description, but let's break it down further. Unlike CDT, EDT doesn't worry about the fact that their may be a correlation between your decision and hidden state. It assumes that if the visible state before you made your decision is the same, then the counterfactuals generated by considering your possible decisions are comparable. In other words, any differences in hidden state, such as you being a different agent or money being placed in the box, are attributed to your decision (see my previous discussion here [https://www.lesswrong.com/posts/SbAofYCgKkaXReDy4/shortform#yKRZgXjt3qvzpWQEr])

Wednesday, December 4th 2019
Wed, Dec 4th 2019

No posts for December 4th 2019
Shortform [Beta]
34BrienneYudkowsky4dHere’s what Wikipedia has to say about monographs [https://en.wikipedia.org/wiki/Monograph] . “A monograph is a specialist work of writing… or exhibition on a single subject or an aspect of a subject, often by a single author or artist, and usually on a scholarly subject… Unlike a textbook, which surveys the state of knowledge in a field, the main purpose of a monograph is to present primary research and original scholarship ascertaining reliable credibility to the required recipient. This research is presented at length, distinguishing a monograph from an article.” I think it’s a bit of an antiquated term. Either that or it’s chiefly British, because as an American I’ve seldom encountered it. I know the word because Sherlock Holmes is always writing monographs. In *A Study In Scarlet*, he says, “I gathered up some scattered ash from the floor. It was dark in colour and flakey—such an ash as is only made by a Trichinopoly. I have made a special study of cigar ashes—in fact, I have written a monograph upon the subject. I flatter myself that I can distinguish at a glance the ash of any known brand, either of cigar or of tobacco.” He also has a monograph on the use of disguise in crime detection, and another on the utilities of dogs in detective work. When I tried thinking of myself as writing “monographs” on things, I broke though some sort of barrier. The things I wrote turned out less inhibited and more… me. I benefited from them myself more as well. What I mean by “monograph” is probably a little different from what either Sherlock or academia means, but it’s in the same spirit. I think of it as a photo study or a character sketch, but in non-fiction writing form. Here are my guidelines for writing a monograph. 1. Pick a topic you can personally investigate. It doesn’t matter whether it’s “scholarly”. It’s fine if other people have already written dozens of books on the subject, regardless of whether you’ve read them, just as long as you can stick your own
15TurnTrout5dListening to Eneasz Brodski's excellent reading of Crystal Society [http://www.hpmorpodcast.com/?page_id=1958], I noticed how curious I am about how AGI will end up working. How are we actually going to do it? What are those insights? I want to understand quite badly, which I didn't realize until experiencing this (so far) intelligently written story. Similarly, how do we actually "align" agents, and what are good frames for thinking about that? Here's to hoping we don't sate the former curiosity too early.
11mr-hire4dAs part of the Athena Rationality Project, we've recently launched two new prototype apps that may be of interest to LWers Virtual Akrasia Coach The first is a Virtual Akrasia Coach [http://athenarationality.mattgoldenberg.net/project/akrasia-procrastination-coach/] , which comes out of a few months of studying various interventions for Akrasia, then testing the resulting ~25 habits/skills through internet based lessons to refine them. We then took the resulting flowchart for dealing with Akrasia, and created a "Virtual Coach" that can walk you through a work session, ensuring your work is focused, productive and enjoyable. Right now about 10% of people find it useful to use in every session, 10% of people find it useful to use when they're procrastinating, and 10% of people find it useful to use when they're practicing the anti-akrasia habits. The rest don't find it useful, or think it would be useful but don't tend to use it. I know many of you may be wondering how the idea of 25 skills fits in with the Internal Conflict model of akrasia. One way to frame the skills is that for people with chronic akrasia, we've found that they tend to have certain patterns that lead to internal conflict - For instance, one side thinks it would be good to work on something, but another side doesn't like uncertainty. You can solve this by internal double crux, or you can have a habit to always know your next action so there's no uncertainty. By using this and the other 24 tools you can prevent a good portion of internal conflict from showing up in the first place. Habit Installer/Uninstaller App The habit installer/uninstaller app is an attempt to create a better process for creating TAPs, and using a modified Murphyjitsu process to create setbacks for those taps. Here's how it works. 1. When you think of a new TAP to install, add it to your habit Queue.. 2. When the TAP reaches the top of the Habit Queue, it gives you a "Conditioning Session" - these are a set of au
6hamnox4dI could discuss everything within a few very concrete examples. A concrete example tends to create a working understanding in a way mathematical abstraction fails to. I want to give my readers real knowledge, so I do often insist on describing concepts in the world without numbers or equations or proofs. However, math exists for a reason. Some patterns generalize so strongly that you simply cannot communicate the breadth of its applications in concrete examples. You have to describe the shape of it by constraint. To do otherwise would render it a handful of independent parlor tricks instead of one sharp and heavy blade.
3cousin_it4dEdit: no point asking this question here.
Load More (5/7)

Tuesday, December 3rd 2019
Tue, Dec 3rd 2019

No posts for December 3rd 2019

Monday, December 2nd 2019
Mon, Dec 2nd 2019

No posts for December 2nd 2019
Shortform [Beta]
53Buck7d[I'm not sure how good this is, it was interesting to me to think about, idk if it's useful, I wrote it quickly.] Over the last year, I internalized Bayes' Theorem much more than I previously had; this led me to noticing that when I applied it in my life it tended to have counterintuitive results; after thinking about it for a while, I concluded that my intuitions were right and I was using Bayes wrong. (I'm going to call Bayes' Theorem "Bayes" from now on.) Before I can tell you about that, I need to make sure you're thinking about Bayes in terms of ratios rather than fractions. Bayes is enormously easier to understand and use when described in terms of ratios. For example: Suppose that 1% of women have a particular type of breast cancer, and a mammogram is 20 times more likely to return a positive result if you do have breast cancer, and you want to know the probability that you have breast cancer if you got that positive result. The prior probability ratio is 1:99, and the likelihood ratio is 20:1, so the posterior probability is 1∗20:99∗1 = 20:99, so you have probability of 20/(20+99) of having breast cancer. I think that this is absurdly easier than using the fraction formulation. I think that teaching the fraction formulation is the single biggest didactic mistake that I am aware of in any field. -------------------------------------------------------------------------------- Anyway, a year or so ago I got into the habit of calculating things using Bayes whenever they came up in my life, and I quickly noticed that Bayes seemed surprisingly aggressive to me. For example, the first time I went to the Hot Tubs of Berkeley, a hot tub rental place near my house, I saw a friend of mine there. I wondered how regularly he went there. Consider the hypotheses of "he goes here three times a week" and "he goes here once a month". The likelihood ratio is about 12x in favor of the former hypothesis. So if I previously was ten to one against the three-times-a-week hyp
29Ben Pace7dGood posts you might want to nominate in the 2018 Review I'm on track to nominate around 30 posts from 2018, which is a lot. Here is a list of about 30 further posts I looked at that I think were pretty good but didn't make my top list, in the hopes that others who did get value out of the posts will nominate their favourites. Each post has a note I wrote down for myself about the post. * Reasons compute may not drive AI capabilities growth [https://www.lesswrong.com/posts/hSw4MNTc3gAwZWdx9/reasons-compute-may-not-drive-ai-capabilities-growth] * I don’t know if it’s good, but I’d like it to be reviewed to find out. * The Principled-Intelligence Hypothesis [https://www.lesswrong.com/posts/Tusi9getaQ2o6kZsb/the-principled-intelligence-hypothesis] * Very interesting hypothesis generation. Unless it’s clearly falsified, I’d like to see it get built on. * Will AI See Sudden Progress? [https://www.lesswrong.com/posts/AJtfNyBsum6ZzWxKR/will-ai-see-sudden-progress] DONE * I think this post should be considered paired with Paul’s almost-identical post. It’s all exactly one conversation. * Personal Relationships with Goodness [https://www.lesswrong.com/posts/7xQAYvZL8T5L6LWyb/personal-relationships-with-goodness] * This felt like a clear analysis of an idea and coming up with some hypotheses. I don’t think the hypotheses really captures what’s going on, and most of the frames here seem like they’ve caused a lot of people to do a lot of hurt to themselves, but it seemed like progress in that conversation. * Are ethical asymmetries from property rights? [https://www.lesswrong.com/posts/zf4gvjTkbcJ5MGsJk/are-ethical-asymmetries-from-property-rights] * Again, another very interesting hypothesis. * Incorrect Hypotheses Point to Correct Observations [https://www.lesswrong.com/posts/MPj7t2w3nk4s9EYYh/incorrect-hypotheses-point-to-correct-observations]
14ozziegooen7dI think one idea I'm excited about is the idea that predictions can be made of prediction accuracy. This seems pretty useful to me. EXAMPLE Say there's a forecaster Sophia who's making a bunch of predictions for pay. She uses her predictions to make a meta-prediction of her total prediction-score on a log-loss scoring function (on all predictions except her meta-predictions). She says that she's 90% sure that her total loss score will be between -5 and -12. The problem is that you probably don't think you can trust Sophia unless she has a lot of experience making similar forecasts. This is somewhat solved if you have a forecaster that you trust that can make a prediction based on Sophia's seeming ability and honesty. The naive thing would be for that forecaster to predict their own distribution of the log-loss of Sophia, but there's perhaps a simpler solution. If Sophia's provided loss distribution is correct, that would mean that she's calibrated in this dimension (basically, this is very similar to general forecast calibration). The trusted forecaster could forecast the adjustment made to her term, instead of forecasting the same distribution. Generally this would be in the direction of adding expected loss, as Sophia probably had more of an incentive to be overconfident (which would result in a low expected score from her) than underconfident. This could perhaps make sense as a percentage modifier (-30% points), a mean modifier (-3 to -8 points), or something else. External clients would probably learn not to trust Sophia's provided expected error directly, but instead the "adjusted" forecast. This can be quite useful. Now, if Sophia wants to try to "cheat the system" and claim that she's found new data that decreases her estimated error, the trusted forecaster will pay attention and modify their adjustment accordingly. Sophia will then need to provide solid evidence that she really believes her work and is really calibrated for the trusted forecaster to bud
5Naryan Wong6dThree Activities We Might Run at Our Next Rationality+ Retreat There is lots of context about who 'we' are, why I called it 'Rationality+', and what the 'retreat' is, but for the moment I'd just like to toss the activity ideas out into the community to see what happens. Happy to answer questions on any of the context or activities in the comments. 1. Tracking - The ability to follow how the components of a group (you, others, the environment) interact to co-create a group experience. The ability to observe how the group experience in turn affects the components. Like systems thinking for groups, in a felt-sense kind of way. a) Activity starts off with a meditation on one's own subjective experience - observing how body sensations, emotions, and thoughts arise and dissipate. Watch with equanimity if possible. b) Next - pair up, and explore how you might understand your partner's subjective experience in the same way that you just observed your own. Try observing their body language, your own mirror neurons, asking questions, making predictions/experiments. You can take turns or play with simultaneous exploration. c) Each pair finds another pair to make a group of four. Taking turns speaking, see if you can collectively discover a 'group subjective truth' that feels true to all of you, and observe how it changes through time. What happens when you align on a living group narrative? Are you able to co-create a direction or meaning for this group? It's possible that this activity could extend into a 1hr+ exploration with groups actually doing things together, while maintaining cohesion. 2. Meaning-making - The ability to find/create subjective relevance in things. Weaving 'random things that happen' into a more salient 'narrative of things that are important'. I speculate that this helps with learning, memory, and allows you to shape your attention to a 'thing'. By learning to 'meaning-make', one could frame experiences in a way that lets you instrumentally get mo
1alenglander7dSomething I've been thinking about recently. I've been reading several discussions surrounding potential risks from AI, especially the essays and interviews on AI Impacts. A lot of these discussions seem to me to center on trying to extrapolate from known data, or to analyze whether AI is or is not analogous to various historical transitions. But it seems to me that trying to reason based on historical precedent or extrapolated data is only one way of looking at these issues. The other way seems to be more like what Bostrom did in Superintelligence, which seems more like reasoning based on theoretical models of how AI works, what could go wrong, how the world would likely react, etc. It seems to me that the more you go with the historical analogies / extrapolated data approach, the more skeptical you'll be of claims from people claiming that AI risk is a huge problem. And conversely, the more you go with the reasoning from theoretical models approach, the more concerned you'll be. I'd probably put Robin Hanson somewhere close to the extreme end of the extrapolated data approach, and I'd put Eliezer Yudkowsky and Nick Bostrom close to the extreme end of the theoretical models approach. AI Impacts seems to fall closer to Hanson on this spectrum. Of course, there's no real hard line between the two approaches. Reasoning from historical precedent and extrapolated data necessarily requires some theoretical modeling, and vice versa. But I still think the basic distinction holds value. If this is right, then the question is how much weight should we put on each type of reasoning, and why? Thoughts?

Sunday, December 1st 2019
Sun, Dec 1st 2019

No posts for December 1st 2019
Shortform [Beta]
10eigen8dHas someone re-read the sequences? did you find value in doing so? Further, I do think the comments on each of the essays are worthy of reading, something I did not do the first time. I can pinpoint a few comments from people in this community on the essays which were very insightful! I wonder if I lost something by not participating in it or by not having read all the comments when I was reading the sequences.
2Hazard7dThis comment will collect things that I think beginner rationalists, "naive" rationalists, or "old school" rationalists (these distinctions are in my head, I don't expect them to translate) do which don't help them.

Saturday, November 30th 2019
Sat, Nov 30th 2019

No posts for November 30th 2019
Shortform [Beta]
10eigen9dEliezer has the sequences, Scott the Codex; what does Robin Hanson have? Can someone point me to a direction where I could start reading his posts in a manner that makes sense? I found this post: https://www.lesswrong.com/posts/SSkYeEpTrYMErtsfa/what-are-some-of-robin-hanson-s-best-posts [https://www.lesswrong.com/posts/SSkYeEpTrYMErtsfa/what-are-some-of-robin-hanson-s-best-posts] which may be helpful, does someone have an opinion on this?
3lifelonglearner8dI've been thinking about interpretable models. If we have some system making decisions for us, it seems good if we can ask it "Why did you suggest action X?" and get back something intelligible. So I read up about what sorts of things other people have come up with. Something that seemed cool was this idea of tree regularization [http://www.shallowmind.co/jekyll/pixyll/2017/12/30/tree-regularization/]. The idea being that decision trees are sort of the standard for interpretable models because they typically make splits along features. You essentially train a regularizer (which is a neural net) which proxies average tree length (i.e. the complexity of a decision tree which is comparable to the actual model you're training). Then, when you're done, you can train a new decision tree which mimics the final neural net (the one you trained with the regularizer). The author pointed out that, in the process of doing so, you can see what features the model thinks are relevant. Sometimes they don't make sense, but the whole point is that you can at least tell that they don't make sense (from a human perspective) because the model is less opaque. You know more than just "well, it's a linear combination of the inputs, followed by some nonlinear transformations, repeated a bunch of times". But if the features don't seem to make sense, I'd still like to know why they were selected. If the system tells us "I suggested decision X because of factors A, B, and C" and C seems really surprising to us, I'd like to know what value it's providing to the prediction. I'm not sure what sort of justification we could expect from the model, though. Something like "Well, there was this regularity that I observed in all of the data you gave me, concerning factor C," seems like what's happening behind the scenes. Maybe that's a sign for us to investigate more in the world, and the responsibility shouldn't be on the system. But, still, food for thought.

Load More Days