AIXSU - AI and X-risk Strategy Unconference
Portland SSC Meetup 10/01/19
Petrov Day Celebration 2019 - Oxford Campsite
Sorted by New
Magic (New & Upvoted)
Show Low Karma
Week Of Sunday, September 15th 2019
Week Of Sun, Sep 15th 2019
No posts for this week
Been mulling around about doing a podcast in which each episode is based on acquiring a particular skillset (self-love, focus, making good investments) instead of just interviewing a particular person. I interview a few people who have a particular skill (e.g. self-love, focus, creating cash flow businesses), and model the cognitive strategies that are common between them. Then interview a few people who struggle a lot with that skill, and model the cognitive strategies that are common between them. Finally, model a few people who used to be bad at the skill but are now good, and model the strategies that are common for them to make the switch. The episode is cut to tell a narrative of what the skills are to be acquired, what beliefs/attitudes need to be let go of and acquired, and the process to acquire them, rather than focusing on interviewing a particular person If there's enough interest, I'll do a pilot episode. Comment with what skillset you'd love to see a pilot episode on. Upvote if you'd have 50% or more chance of listening to the first episode.
There's a phenomenon I currently hypothesize to exist where direct attacks on the problem of AI alignment are criticized much more often than indirect attacks. If this phenomenon exists, it could be advantageous to the field in the sense that it encourages thinking deeply about the problem before proposing solutions. But it could also be bad because it disincentivizes work on direct attacks to the problem (if one is criticism averse and would prefer their work be seen as useful). I have arrived at this hypothesis from my observations: I have watched people propose solutions only to be met with immediate and forceful criticism from others, while other people proposing non-solutions and indirect analyses are given little criticism at all. If this hypothesis is true, I suggest it is partly or mostly because direct attacks on the problem are easier to defeat via argument, since their assumptions are made plain If this is so, I consider it to be a potential hindrance on thought, since direct attacks are often the type of thing that leads to the most deconfusion -- not because the direct attack actually worked, but because in explaining how it failed, we learned what definitely doesn't work.
I seem to differently discount different parts of what I want. For example, I'm somewhat willing to postpone fun to low-probability high-fun futures, whereas I'm not willing to do the same with romance.
Meta-philosophy hypothesis: Philosophy is the process of reifying fuzzy concepts that humans use. By "fuzzy concepts" I mean things where we can say "I know it when I see it." but we might not be able to describe what "it" is. Examples that I believe support the hypothesis: * This shortform is about the philosophy of "philosophy" and this hypothesis is an attempt at an explanation of what we mean by "philosophy". * In epistemology, Bayesian epistemology is a hypothesis that explains the process of learning. * In ethics, an ethical theory attempts to make explicit our moral intuitions. * A clear explanation of consciousness and qualia would be considered philosophical progress.
Epistemic status: Thinking out loud. Introducing the QuestionScientific puzzle I notice I'm quite confused about: what's going on with the relationship between thinking and the brain's energy consumption? On one hand, I'd always been told that thinking harder sadly doesn't burn more energy than normal activity. I believed that and had even come up with a plausible story about how evolution optimizes for genetic fitness not intelligence, and introspective access is pretty bad as it is, so it's not that surprising that we can't crank up our brains energy consumption to think harder. This seemed to jive with the notion that our brain's putting way more computational resources towards perceiving and responding to perception than abstract thinking. It also fit well with recent results calling ego depletion into question and into the framework in which mental energy depletion is the result of a neural opportunity cost calculation [https://www.lesswrong.com/posts/9SSXcQ92ZJHgdqzDj/link-why-self-control-seems-but-may-not-be-limited] . Going even further, studies like this one [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4019873/] left me with the impression that experts tended to require less energy to accomplish the same mental tasks as novices. Again, this seemed plausible under the assumption that experts brains developed some sort of specialized modules over the thousands of hours of practice they'd put in. I still believe that thinking harder doesn't use more energy, but I'm now much less certain about the reasons I'd previously given for this. Chess Players' Energy ConsumptionThis recent ESPN (of all places) article [https://www.espn.com/espn/story/_/id/27593253/why-grandmasters-magnus-carlsen-fabiano-caruana-lose-weight-playing-chess] about chess players' energy consumption during tournaments has me questioning this story. The two main points of the article are: 1. Chess players burn a lot of energy during tournaments, potentially on the order of 6000 ca
Week Of Sunday, September 8th 2019
Week Of Sun, Sep 8th 2019
No posts for this week
Do Anki while Weightlifting Many rationalists appear to be interested in weightlifting. I certainly have enjoyed having a gym habit. I have a recommendation for those who do: Try studying Anki cards [https://twitter.com/michael_nielsen/status/957763229454774272?lang=en] while resting between weightlifting sets. The upside is high. Building the habit of studying Anki cards is hard, and if doing it at the gym causes it to stick, you can now remember things by choice not chance. And the cost is pretty low. I rest for 90 seconds between sets, and do about 20 sets when I go to the gym. Assuming I get a minute in once the overheads are accounted for, that gives me 20 minutes of studying. I go through about 4 cards per minute, so I could do 80 cards per visit to the gym. In practice I spend only ~5 minutes studying per visit, because I don't have that many cards. I'm not too tired to concentrate. In fact, the adrenaline high makes me happy to have something mentally active to do. Probably because of this, it doesn't at all decrease my desire to go to the gym. I find I can add simple cards to my Anki deck at the gym, although the mobile app does make it slow. Give it a try! It's cheap to experiment and the value of a positive result is high.
Selected Aphorisms from Francis Bacon's Novum Organum I'm currently working to format Francis Bacon's Novum Organum [https://en.wikipedia.org/wiki/Novum_Organum] as a LessWrong sequence. It's a moderate-sized project as I have to work through the entire work myself, and write an introduction which does Novum Organum justice and explains the novel move of taking an existing work and posting in on LessWrong (short answer: NovOrg is some serious hardcore rationality and contains central tenets of the LW foundational philosophy notwithstanding being published back in 1620, not to mention that Bacon and his works are credited with launching the modern Scientific Revolution) While I'm still working on this, I want to go ahead and share some of my favorite aphorisms from is so far: 3. . . . The only way to command reality is to obey it . . . 9. Nearly all the things that go wrong in the sciences have a single cause and root, namely: while wrongly admiring and praising the powers of the human mind, we don’t look for true helps for it.Bacon sees the unaided human mind as entirely inadequate for scientific progress. He sees for the way forward for scientific progress as constructing tools/infrastructure/methodogy to help the human mind think/reason/do science. 10. Nature is much subtler than are our senses and intellect; so that all those elegant meditations, theorizings and defensive moves that men indulge in are crazy—except that no-one pays attention to them. [Bacon often uses a word meaning ‘subtle’ in the sense of ‘fine-grained, delicately complex’; no one current English word will serve.] 24. There’s no way that axioms •established by argumentation could help us in the discovery of new things, because the subtlety of nature is many times greater than the subtlety of argument. But axioms •abstracted from particulars in the proper way often herald the discovery of new particulars and point them out, thereby returning the sciences to their active status.Bacon repeat
Facebook comment I wrote in February, in response to the question [https://www.facebook.com/bshlgrs/posts/10215817023473267?comment_id=10215823190387436] 'Why might having beauty in the world matter?': I assume you're asking about why it might be better for beautiful objects in the world to exist (even if no one experiences them), and not asking about why it might be better for experiences of beauty to exist. [... S]ome reasons I think this: 1. If it cost me literally nothing, I feel like I'd rather there exist a planet that's beautiful, ornate, and complex than one that's dull and simple -- even if the planet can never be seen or visited by anyone, and has no other impact on anyone's life. This feels like a weak preference, but it helps get a foot in the door for beauty. (The obvious counterargument here is that my brain might be bad at simulating the scenario where there's literally zero chance I'll ever interact with a thing; or I may be otherwise confused about my values.) 2. Another weak foot-in-the-door argument: People seem to value beauty, and some people claim to value it terminally. Since human value is complicated and messy and idiosyncratic (compare person-specific ASMR triggers or nostalgia triggers or culinary preferences) and terminal and instrumental values are easily altered and interchanged in our brain, our prior should be that at least some people really do have weird preferences like that at least some of the time. (And if it's just a few other people who value beauty, and not me, I should still value it for the sake of altruism and cooperativeness.) 3. If morality isn't "special" -- if it's just one of many facets of human values, and isn't a particularly natural-kind-ish facet -- then it's likelier that a full understanding of human value would lead us to treat aesthetic and moral preferences as more coextensive, interconnected, and fuzzy. If I can value someone else's happiness inherently, without needing to experience or know about i
Eliezer has written about the notion of security mindset [https://www.lesswrong.com/posts/8gqrbnW758qjHFTrH/security-mindset-and-ordinary-paranoia] , and there's an important idea that attaches to that phrase, which some people have an intuitive sense of and ability to recognize, but I don't think Eliezer's post quite captured the essence of the idea, or presented anything like a usable roadmap of how to acquire it. An1lam's recent shortform post [https://www.lesswrong.com/posts/xDWGELFkyKdBpySAf/an1lam-s-short-form-feed#jBwdmYjPCkSCDngX6] talked about the distinction between engineering mindset and scientist mindset, and I realized that, with the exception of Eliezer and perhaps a few people he works closely with, all of the people I know of with security mindset are engineer-types rather than scientist-types. That seemed like a clue; my first theory was that the reason for this is because engineer-types get to actually write software that might have security holes, and have the feedback cycle of trying to write secure software. But I also know plenty of otherwise-decent software engineers who don't have security mindset, at least of the type Eliezer described. My hypothesis is that to acquire security mindset, you have to: * Practice optimizing from a red team/attacker perspective, * Practice optimizing from a defender perspective; and * Practice modeling the interplay between those two perspectives. So a software engineer can acquire security mindset because they practice writing software which they don't want to have vulnerabilities, they practice searching for vulnerabilities (usually as an auditor simulating an attacker rather as an actual attacker, but the cognitive algorithm is the same), and they practice going meta when they're designing the architecture of new projects. This explains why security mindset is very common among experienced senior engineers (who have done each of the three many times), and rare among junior engineers (who haven't yet)
G Gordon Worley III
If an organism is a thing that organizes, then a thing that optimizes is an optimism.
Week Of Sunday, September 1st 2019
Week Of Sun, Sep 1st 2019
How Specificity Works
Someone tried to solve a big schlep of event organizing. Through this app, you: * Pledge money when signing up to an event * Lose it if you don't attend * Get it back if you attend + a share of the money from all the no-shows For some reason it uses crypto as the currency. I'm also not sure about the third clause, which seems to incentivise you to want others to no-show to get their deposits. Anyway, I've heard people wanting something like this to exist and might try it myself at some future event I'll organize. https://kickback.events/ [https://kickback.events/] H/T Vitalik Buterin's Twitter
Watching my kitten learn/play has been interesting from a "how do animals compare to current AIs perspective?" At a high level, I think I've updated slightly towards RL agents being further along the evolutionary progress ladder than I'd previously thought. I've seen critiques of RL agents not being able to do long-term planning as evidence for them not being as smart as animals, and while I think that's probably accurate, I have noticed that my kitten takes a surprisingly long time to learn even 2-step plans. For example, when it plays with a toy on a string, I'll often try putting the toy on a chair that it only knows how to reach by jumping onto another chair first. It took many attempts before it learned to jump onto the other chair and then climb to where I'd put the toy, even though it had previously done that while exploring many times. And even then, it seems to be at risk of "catastrophic forgetting" where we'll be playing in the same way later and it won't remember to do the 2-step move. Related to this, its learning is fairly narrow even for basic skills, e.g. I have 4 identical chairs around a table but it will be afraid of jumping onto one even though it's very comfortable jumping onto another. Now part of this may be that cats are known for being biased towards trial-and-error compared to other similarly-sized mammals like dogs (see Gwern's write-up [https://www.gwern.net/Cat-Sense] for more on this) and that adult cats may be better than kittens at "long-term" planning. However, a lot of critiques of RL, such as Josh Tenenbaum's, argue that our AIs don't even compare to young children in terms of their abilities. This is undoubtedly true with respect to ability to actually move around in the world, grasp objects, etc. but seems less true than I'd previously thought with respect to "higher level" cognitive abilities such as planning. To make this more concrete, I'm skeptical that my kitten could currently succeed at a real life analogue to Montezuma'
Cruxes I Have With Many LW ReadersThere's a crux I seem to have with a lot of LWers that I've struggled to put my finger on for a long time but I think reduces to some combination of: * faith in elegance vs. expectation of messiness; * preference for axioms vs. examples; * identification as primarily a scientist/truth-seeker vs. as an engineer/builder. I tend to be more inclined towards the latter in each case, whereas I think a lot of LWers are inclined towards the former, with the potential exception of the author of realism about rationality [https://www.lesswrong.com/posts/suxvE2ddnYMPJN9HD/realism-about-rationality], who seems to have opinions that overlap with many of my own. While I still feel uncomfortable with the above binaries, I've now gathered enough examples to at least list them as evidence for what I'm talking about. Example 1: Linear Algebra TextbooksA few [https://www.lesswrong.com/posts/C6XJcWtxcMTeQPBs3/the-first-rung-insights-from-linear-algebra-done-right] LWers [https://www.lesswrong.com/posts/BgEfvxBHPfHdaQxLH/insights-from-linear-algebra-done-right] have positively reviewed Linear Algebra Done Right (LADR), in particular complimenting it for revealing the inner workings of Linear Algebra. I too recently read most of this book and did a lot of the exercises. And... I liked it but seemingly less than the other reviewers. In particular, I enjoyed getting a lot of practice reading definition-theorem-proof style math and doing lots of proofs myself, but found myself wishing for more examples and discussion of how to compute things like eigenvalues in practice. While I know that's not what the book's about, the difference I'm pointing to is more that I found the omission of these things bothersome, whereas I suspect the other reviewers were happy with the focus on constructing the different objects mathematically (I'm also obviously making some assumptions here). On the other hand, I've recently been reading sections of Shilov's Linear
Book Review: Civilization and its discontents Freud is the most famous psychologist of all time and although many of his theories are now discredited or seem wildly implausible, I thought it'd be interesting to listen to him to try and understand why it sounded plausible in the first place. At times Freud is insightful and engaging; at other times, he falls into psychoanalytic lingo in such a way that I couldn't follow what he was trying to say. I suppose I can see why people might have assumed that the fault was with their failure to understand. It's a short read, so if you're curious, there isn't that much cost to going ahead and reading it, but this is one of those rare cases where you can really understand the core of what he was getting at from the summary on Wikipedia (https://en.m.wikipedia.org/wiki/Civilization_and_Its_Discontents) Since Wikipedia has a summary, I'll just add a few small remarks. This book focuses on a key paradox; our utter dependence on it for anything more than the most basic survival; but how it requires us to repress our own wants and desires so as to fit in with an ordered society. I find this to be an interesting answer to the question of why there is so much misery despite our material prosperity. It's interesting to re-examine this in light of the modern context. Society is much more liberal than it was in Freud's time, but in recent years people have become more scared of speaking their minds. Repression still exists, it is just off a different form. If Freud is to be believed, we should expect this repression to result in all kinds of be psychological effects, many of which won't appear linked on the surface. Further thoughts: - I liked his chapter on methods humans deal suffering and their limitations as it contained what seemed to be found evaluations. He points out that that the path of a yogi is at best the happiness of quietness, that love cannot be guaranteed to last, that sublimation through art is available only to a
Rationality 010 Meetup (Jester's Court) Principles: * The zeroth skill is being able to notice evidence at all * The point of learning is not to come to the same conclusion as the teacher: the bottom line is not yet written. * Make room for private reasoning, practice non-confrontational forms of dissent, and preserve freedom to self-direct. * Iff it passes muster, pass it on Prompts/Exercises: * Name a trivial promise you could make to someone here, yourself even. Can you make it even simpler? * How long can the group maintain a conversation made of only nods, head-shakes, finger-pointing, and raised eyebrows? * Pick a small 30s task. Imagine it vividly, start to finish, with lots of sensory detail. Then do it. Was it like you imagined? Repeat the task. Was it the same? * Exercise (If there's enough time and focus for it, try the whole Core Transformation sequence) 1. Pick an aspect / thought / behavior you don't like. 2. Recall an instance of when it came up, and where in your body it seemed to reside. 3. Assume it has a positive intent for you, and thank it as you would a well-meaning but mistaken friend. 4. Ask what outcome it's trying to achieve. 5. Thank it for what answer it can give. 6. If you can honestly promise to give that outcome-goal serious consideration next time this aspect comes up, then do so. If you can't, then DON'T. * Pick a partner and lead them (or together, teach a rubber duck) through one of the previous exercises you thought was good. Get feedback on one thing you did well, and one thing you can do differently to improve. Try it again with that in mind. * Bonus points if you record it so you can see your own presentation style * Alternative: notice how they do the exercise differently than you, try to improve your model of the person, the exercise, their engagement. * Practice: One person says some things that are factually incorrect, predicated
Week Of Sunday, August 25th 2019
Week Of Sun, Aug 25th 2019
No posts for this week
Questions around Making Reliable EvaluationsMost existing forecasting platform questions [https://www.lesswrong.com/posts/kMmNdHpQPcnJgnAQF/prediction-augmented-evaluation-systems] are for very clearly verifiable questions: * "Who will win the next election" * "How many cars will Tesla sell in 2030?" But many of the questions we care about are much less verifiable: * "How much value has this organization created?" * "What is the relative effectiveness of AI safety research vs. bio risk research?" One solution attempt would be to have an "expert panel" assess these questions, but this opens up a bunch of issues. How could we know how much we could trust this group to be accurate, precise, and understandable? The topic of, "How can we trust that a person or group can give reasonable answers to abstract questions" is quite generic and abstract, but it's a start. I've decided to investigate this as part of my overall project on forecasting infrastructure [https://app.effectivealtruism.org/funds/far-future/payouts/6vDsjtUyDdvBa3sNeoNVvl] . I've recently been working with Elizabeth [https://www.lesswrong.com/users/pktechgirl] on some high-level research. I believe that this general strand of work could be useful both for forecasting systems and also for the more broad-reaching evaluations that are important in our communities. -------------------------------------------------------------------------------- Early concrete questions in evaluation qualityOne concrete topic that's easily studiable is evaluation consistency. If the most respected philosopher gives wildly different answers to "Is moral realism true" on different dates, it makes you question the validity of their belief. Or perhaps their belief is fixed, but we can determine that there was significant randomness in the processes that determined it. Daniel Kahneman apparently thinks a version of this question is important enough to be writing his new book [https://jasoncollins.blog/2018/05/09/n
I just came back from talking to Max Harms about the Crystal trilogy, which made me think about rationalist fiction, or the concept of hard sci-fi combined with explorations of cognitive science and philosophy of science in general (which is how I conceptualize the idea of rationalist fiction). I have a general sense that one of the biggest obstacles for making progress on difficult problems is something that I would describe as “focusing attention on the problem”. I feel like after an initial burst of problem-solving activity, most people when working on hard problems, either give up, or start focusing on ways to avoid the problem, or sometimes start building a lot of infrastructure around the problem in a way that doesn’t really try to solve it. I feel like one of the most important tools/skills that I see top scientist or problem solvers in general use, is utilizing workflows and methods that allow them to focus on a difficult problem for days and months, instead of just hours. I think at least for me, the case of exam environments displays this effect pretty strongly. I have a sense that in an exam environment, if I am given a question, I successfully focus my full attention on a problem for a full hour, in a way that often easily outperforms me thinking about a problem in a lower key environment for multiple days in a row. And then, when I am given a problem set with concrete technical problems, my attention is again much better focused than when I am given the same problem but in a much less well-defined way. E.g. thinking about solving some engineering problem, but without thinking about it by trying to create a concrete proof or counterproof. My guess is that there is a lot of potential value in fiction that helps people focus their attention on a problem in a real way. In fiction you have the ability to create real-feeling stakes that depend on problem solving, and things like the final exam in Methods of Rationality show how that can be translated int
I was just re-reading the classic paper Artificial Intelligence as Positive and Negative Factor in Global Risk [http://intelligence.org/files/AIPosNegFactor.pdf]. It's surprising how well it holds up. The following quotes seem especially relevant 13 years later. On the difference between AI research speed and AI capabilities speed: The first moral is that confusing the speed of AI research with the speed of a real AI once built is like confusing the speed of physics research with the speed of nuclear reactions. It mixes up the map with the territory. It took years to get that first pile built, by a small group of physicists who didn’t generate much in the way of press releases. But, once the pile was built, interesting things happened on the timescale of nuclear interactions, not the timescale of human discourse. In the nuclear domain, elementary interactions happen much faster than human neurons fire. Much the same may be said of transistors.On neural networks: The field of AI has techniques, such as neural networks and evolutionary programming, which have grown in power with the slow tweaking of decades. But neural networks are opaque—the user has no idea how the neural net is making its decisions—and cannot easily be rendered unopaque; the people who invented and polished neural networks were not thinking about the long-term problems of Friendly AI. Evolutionary programming (EP) is stochastic, and does not precisely preserve the optimization target in the generated code; EP gives you code that does what you ask, most of the time, under the tested circumstances, but the code may also do something else on the side. EP is a powerful, still maturing technique that is intrinsically unsuited to the demands of Friendly AI. Friendly AI, as I have proposed it, requires repeated cycles of recursive self-improvement that precisely preserve a stable optimization target.On funding in the AI Alignment landscape: If tomorrow the Bill and Melinda Gates Foundation allocated a
I'd really like to write more. I've noticed that some ideas become much better after I write them up, and some turn out to be worse than I initially thought. I'd also like to expand my ability to have conversations to include online spaces, which, as a confirmed lurker, I didn't really have much of until after I found myself writing code for the EA Forum. I'm going to try writing a shortform post a day for a week. Acceptable places to post include here on LW, the EA Forum, Facebook, and my org's slack. I'd like to go for at least one each. If that goes well, my next step might be to try this thing called editing and post every other day. After that I'd like to try writing some top level posts. Friday: https://www.lesswrong.com/posts/8WWeGuEQRBYRuQcYJ/jp-s-shortform#MBNfFKQwa8LQdBceQ [https://www.lesswrong.com/posts/8WWeGuEQRBYRuQcYJ/jp-s-shortform#MBNfFKQwa8LQdBceQ] Saturday: https://www.lesswrong.com/posts/8WWeGuEQRBYRuQcYJ/jp-s-shortform#pZ39ANtDxRK7X9KXv [https://www.lesswrong.com/posts/8WWeGuEQRBYRuQcYJ/jp-s-shortform#pZ39ANtDxRK7X9KXv] Sunday: ✓ (FB) Monday: https://www.lesswrong.com/posts/8WWeGuEQRBYRuQcYJ/jp-s-shortform#SopSodvjwvkdAHe2G [https://www.lesswrong.com/posts/8WWeGuEQRBYRuQcYJ/jp-s-shortform#SopSodvjwvkdAHe2G] Tuesday: ✓ (Slack) Wednesday: ✓ (Slack) Thursday: https://forum.effectivealtruism.org/posts/rWoT7mABXTfkCdHvr/jp-s-shortform#rCYFRZ2YoSfyYiXrh [https://forum.effectivealtruism.org/posts/rWoT7mABXTfkCdHvr/jp-s-shortform#rCYFRZ2YoSfyYiXrh]
Live a life worth leaving Facebook for.
Load More Weeks