# All Posts

Sorted by Magic (New & Upvoted)

# Tuesday, October 22nd 2019Tue, Oct 22nd 2019

Personal Blogposts
4[Event]San Francisco Meetup: Board Games170 Hawthorne St, San Francisco, CA 94107, USAOct 29th
0

# Monday, October 21st 2019Mon, Oct 21st 2019

Personal Blogposts
Shortform [Beta]
13Chris_Leong1d Book Review: Communist Manifesto “The history of all hitherto existing society is the history of class struggles. Freeman and slave, patrician and plebeian, lord and serf, guild-master and journeyman, in a word, oppressor and oppressed, stood in constant opposition to one another, carried on an uninterrupted, now hidden, now open fight, that each time ended, either in the revolutionary reconstitution of society at large, or in the common ruin of the contending classes” Overall summary: Given the rise of socialism in recent years, now seemed like an appropriate time to review the Communist Manifesto. At times I felt that Marx’s writing was keenly insightful, at other times I felt he was in ignorance of basic facts and at other times I felt that he held views that were reasonable at the time, but for which the flaws are now obvious. In particular, I found the first-half much more engaging than I expected because, say what you like about Marx, he’s an engaged and poetic writer. Towards the end, the focused shifted into particular time-bounded political disputes for which I neither had the knowledge to understand nor the interest to acquire. At the start, I felt that I already had a decent grasp of the communist impulse and I haven't become any more favourable to communism, but reading this rounded out a few more details of the communist critique of capitalism. Capitalism: Despite being its most famous critic, Marx has a strong appreciation for the power of capitalism. He writes about it sweeping away all the old feudal bonds and how it draws even the most “barbarian” nations into civilisation. He writes about it stripping every occupation previously admired of its halo into its “paid wage labourers”; and undoubtedly some professions are affected far too much by market concerns, but this has to be weighed up against the increase in access that has been brought. He even writes that it has accomplished “wonders far exceeding the Egyptian Pyramids, Roman Acquaducts and
10Vanessa Kosoy21h This is preliminary description of what I dubbed Dialogic Reinforcement Learning (credit for the name goes to tumblr user @di--es---can-ic-ul-ar--es): the alignment scheme I currently find most promising. It seems that the natural formal criterion for alignment (or at least the main criterion) is having a "subjective regret bound": that is, the AI has to converge (in the long term planning limit, γ→1 limit) to achieving optimal expected user!utility with respect to the knowledge state of the user. In order to achieve this, we need to establish a communication protocol between the AI and the user that will allow transmitting this knowledge state to the AI (including knowledge about the user's values). Dialogic RL attacks this problem in the manner which seems the most straightforward and powerful: allowing the AI to ask the user questions in some highly expressive formal language, which we will denote F. F allows making formal statements about a formal model M of the world, as seen from the AI's perspective. M includes such elements as observations, actions, rewards and corruption. That is, M reflects (i) the dynamics of the environment (ii) the values of the user (iii) processes that either manipulate the user, or damage the ability to obtain reliable information from the user. Here, we can use different models of values: a traditional "perceptible" reward function, an instrumental reward function [https://www.alignmentforum.org/posts/aAzApjEpdYwAxnsAS/reinforcement-learning-with-imperceptible-rewards] , a semi-instrumental reward functions, dynamically-inconsistent rewards [https://www.alignmentforum.org/posts/aPwNaiSLjYP4XXZQW/ai-alignment-open-thread-august-2019#C9gRtMRc6qLv7J6k7] , rewards with Knightian uncertainty etc. Moreover, the setup is self-referential in the sense that, M also reflects the question-answer interface and the user's behavior. A single question can consist, for example, of asking for the probability of some sentence in F or the expected
4mr-hire14h * Today I had a great chat with a friend on the difference between #Fluidity and #Congruency * For the past decade+ my goal has been #Congruency (also often called #Alignment), the idea that there should be no difference between who I am internally, what I do externally, and how I represent myself to others * This worked well for quite a long time, and led me great places, but the problems with #Congruency started to show more obviously recently. * Firstly, my internal sense of "rightness" wasn't easily encapsulated in a single sense of consistent principles, it's very fuzzy and context specific. And furthermore, what I can even define as "right" shifts as my #Ontology shifts. * Secondly, and in parallel, as the idea of #Self starts to appear less and less coherent to me, the whole base that the house is built on starts to collapse. * This had led me to begin a shift from #Congruency to #Fluidity. #Fluidity is NOT about behaving by an internally and externally consistent set of principles, rather it's being able to find that sense of "Rightness" - the right way forward - in increasingly complex and nuanced situations. * This "rightness" in any given situation is influenced by the #Ontology's that I'm operating under at any given time, and the #Ontologies are influenced by the sense of "rightness". * But as I hone my ability to fluidly shift ontologies, and my ability to have enough awareness to be in touch with that sense of rightness, it becomes easier to find that sense of rightness/wrongness in a given situation. This is as close as I can come to describing what is sometimes called #SenseMaking.
2Vanessa Kosoy18h In my previous shortform [https://www.alignmentforum.org/posts/dPmmuaz9szk26BkmD/vanessa-kosoy-s-shortform#Wi65Ahs9abL63gPSe] , I used the phrase "attack vector", borrowed from classical computer security. What does it mean to speak of an "attack vector" in the context of AI alignment? I use 3 different interpretations, which are mostly 3 different ways of looking at the same thing. In the first interpretation, an attack vector is a source of perverse incentives . For example, if a learning protocol allows the AI to ask the user questions, a carefully designed question can artificially produce an answer we would consider invalid, for example by manipulating the user or even by hacking the software or hardware of the system in some clever way. If the algorithm treats every answer as valid, this creates a perverse incentive: the AI knows that by phrasing the question in a particular way, a certain answer will result, so it will artificially obtain the answers that are preferable (for example answers that produce an easier to optimize utility function). In this interpretation the "attacker" is the AI itself. In order to defend against the vector, we might change the AI's prior so that the AI knows some of the answers are invalid. If the AI has some method of distinguishing valid from invalid answers, that would eliminate the perverse incentive. In the second interpretation, an attack vector is a vulnerability that can be exploited by malicious hypotheses in the AI's prior. Such a hypothesis is an agent with its own goals (for example, it might arise as a simulation hypothesis [https://ordinaryideas.wordpress.com/2016/11/30/what-does-the-universal-prior-actually-look-like/] ). This agent intentionally drives the system to ask manipulative questions to further these goals. In order to defend, we might design the top level learning algorithm so that it only takes action that are safe with sufficiently high confidence (like in Delegative RL [https://arxiv.org/abs/1907.08
1ᴊᴇɴᴢ13h Calling oneself an idiot is idiotic. I'm an idiot. Therefore, I am an idiot

# Sunday, October 20th 2019Sun, Oct 20th 2019

Shortform [Beta]
13Connor_Flexman2d Remember that just like there are a lot of levels to any skill, there are a lot of levels to any unblocking! It feels to me like perhaps both parties are making a mistake when one person (the discoverer) says, "I finally figured out [how to be emotionally liberated or something]!" and the skeptic is like "whatever, they'll just come back in a few months and say they figured out even more about being emotionally liberated, what a pointless hamster wheel." (Yes, often people are unskilled at this type of thing and the first insight doesn't stick, but I'm talking about the times when it does.) In these cases, the discoverer will *still find higher levels of this* later on! It isn't that they've discovered the True Truth about [emotional liberation], they've just made a leap forward that resolves lots of their known issues. So even if the skeptic is right that they'll discover another thing in the future that sounds very similar, that doesn't actually invalidate their present insight. And for the discoverer, often it is seductive to think you've finally solved that domain. Oftentimes most or all of your present issues there feel resolved! But that's because you triangulate from the most pressing issues. In the future, you'll find other cracks in your reality, and need to figure out superficially similar but slightly skewed domains—and thinking you've permanently solved a complicated domain will only hamper this process. But that doesn't mean your insight isn't exactly as good as you think it is.
4Raemon2d I don't know of a principled way to resolve roomate-things like "what is the correct degree of cleanliness", and this feels sad. You can't say "the correct amount is 'this much' because, well, there isn't actually an objectly correct degree of cleanliness." If you say 'eh, there are no universal truths, just preferences, and negotiation', you incentivize people to see a lot of interactions as transactional and adversarial that don't actually need to be. It also seems to involve exaggerating and/or downplaying one's own preferences. The default outcome is something like "the person who is least comfortable with mess ends up doing most of the cleaning". If cleanliness were just an arbitrary preference this might actually be fine, especially if they really do dramatically care more about it. But usually it's more like "everyone cares at least a bit about being clean, one person just happens to care, say, 15% more and be more quick to act." So everyone else gets the benefits without paying the cost.

# Saturday, October 19th 2019Sat, Oct 19th 2019

Personal Blogposts
1[Event]SSC/LW bangalore Meetup - october2, Church Street, MG ROad, BengaluruOct 20th
0
Shortform [Beta]
1ᴊᴇɴᴢ3d An airship─ is where I want to be, if not to live. Is that welcome?

# Friday, October 18th 2019Fri, Oct 18th 2019

Personal Blogposts
Shortform [Beta]
5hamnox4d Epistemic status: wishful thinking Imagine for a moment, a nomadic tribe They travel to where the need is great, or by opportunity. They are globalists, able to dive into bubbles but always grokking its existence in context of the wider world. They find what needs doing and do it. They speak their own strange dialect that cuts to the heart of things. They follow their own customs, which seamlessly flex and adapt to incorporate effective local practices. Change, even drastic change, is a natural part of their culture. They seek to see. They do not hide their young and hold, their blood and shit, their queer and deplorable. You don't taboo human reality. Wherever they momentarily settle, they strive to leave better than they found. Some of what needs doing wherever they go is providing for their own, of course. They are always prepared to keep infrastructure independent of their neighbors, but only exercise that option when it is efficient. They grok the worth of scale and industry, knowing the alternative. In the same vein, they seek to render aid primarily in ways that promote robust self-reliance rather than create reliance. 'Leave no trace' is the lowest bar to clear. I wish...
4Vanessa Kosoy4d The sketch of a proposed solution to the hard problem of consciousness: An entity is conscious if and only if (i) it is an intelligent agent (i.e. a sufficiently general reinforcement learning system) and (ii) its values depend on the presence and/or state of other conscious entities. Yes, this definition is self-referential, but hopefully some fixed point theorem applies. There may be multiple fixed points, corresponding to "mutually alien types of consciousness". Why is this the correct definition? Because it describes precisely the type of agent who would care about the hard problem of consciousness.
4Chris_Leong4d Here's one way of explaining this: it's a contradiction to have a provable statement that is unprovable, but it's not a contradiction for it to be provable that a statement is unprovable. Similarly, we can't have a scenario that is simultaneously imagined and not imagined, but we can coherently imagine a scenario where things exist without being imagined by beings within that scenario. Rob Besinger [https://www.lesswrong.com/posts/DSTuA9ohakD6k3drh/a-simple-sketch-of-how-realism-became-unpopular] : If I can imagine a tree that exists outside of any mind, then I can imagine a tree that is not being imagined. But "an imagined X that is not being imagined" is a contradiction. Therefore everything I can imagine or conceive of must be a mental object.Berkeley ran with this argument to claim that there could be no unexperienced objects, therefore everything must exist in some mind — if nothing else, the mind of God.The error here is mixing up what falls inside vs. outside of quotation marks. "I'm conceiving of a not-conceivable object" is a formal contradiction, but "I'm conceiving of the concept 'a not-conceivable object'" isn't, and human brains and natural language make it easy to mix up levels like those.
2Chris_Leong4d What does it mean to define a word? There's a sense in which definitions are entirely arbitrary and what word is assigned to what meaning lacks any importance. So it's very easy to miss the importance of these definitions - emphasising a particular aspect and provides a particular lense with which to see the world. For example, if define goodness as the ability to respond well to others, it emphasizes that different people have different needs. One person may want advice, while another simple encouragement. Or if we define love as acceptance of the other, it suggests that one of the most important aspects of love is the idea that true love should be somewhat resilient and not excessively conditional.

# Thursday, October 17th 2019Thu, Oct 17th 2019

Frontpage Posts
Shortform [Beta]
8bgold4d I have a cold, which reminded me that I want fashionable face masks to catch on so that I can wear them all the time in cold-and-flu season without accruing weirdness points.

# Wednesday, October 16th 2019Wed, Oct 16th 2019

Frontpage Posts
Personal Blogposts
Shortform [Beta]
14Connor_Flexman6d Sometimes people are explaining a mental move, and give some advice on where/how it should feel in a spatial metaphor. For example, they say "if you're doing this right, it should feel like the concept is above your head and you're reaching toward it." I have historically had trouble working well with advice like this, and I don't often see it working well for other people. But I think the solution is that for most people, the spatial or feeling advice is best used as an intermediate/terminal checksum, not as something that is constructive. For example, if you try to imagine feeling their feeling, and then seeing what you could do differently to get there, this will usually not work (if it does work fine, carry on, this isn't meant for you). The best way for most people to use advice like this is to just notice your spatial feeling is much different than theirs, be reminded that you definitely aren't doing the same thing as them, and be motivated to go back and try to understand all the pieces better. You're missing some part of the move or context that is generating their spatial intuition, and you want to investigate the upstream generators, not their downstream spatial feeling itself. (Again, this isn't to say you can't learn tricks for making the spatial intuition constructive, just don't think this is expected of you in the moment.) For explainers of mental moves, this model is also useful to remember. Mental moves that accomplish similar goals in different people will by default involve significantly different moving parts in their minds and microstrategies to get there. If you are going to explain spatial intuitions (that most people can't work easily with), you probably want to do one of the following: 1) make sure they are great at working with spatial intuitions 2) make sure they know it's primarily a checksum, not an instruction 3) break down which parts generate that spatial intuition in yourself, so if they don't have it then you can help guide th

# Tuesday, October 15th 2019Tue, Oct 15th 2019

Personal Blogposts
4[Event]San Francisco Meetup: Deep Questions170 Hawthorne St, San Francisco, CA 94107, USAOct 22nd
0

# Monday, October 14th 2019Mon, Oct 14th 2019

Personal Blogposts
Shortform [Beta]
4hunterglenn8d Litany of Gendlin "What is true is already so. Owning up to it doesn't make it worse. Not being open about it doesn't make it go away. "And because it's true, it is what is there to be interacted with. Anything untrue isn't there to be lived. People can stand what is true, for they are already enduring it." There are a few problems with the litanies, but in this case, it's just embarrassing. We have a straightforward equivocation fallacy here, no frills, no subtle twists. Just unclear thinking. People are already enduring the truth(1), therefore, they can stand what is true(2)? In the first usage, true(1) refers to reality, to the universe. We already live in a universe where some unhappy fact is true. Great. But in the second usage, true(2) refers to a KNOWLEDGE of reality, a knowledge of the unhappy fact. So, if we taboo "true" and replace it with what it means, then the statement becomes: "People are already enduring reality as it is, so they must be able to stand knowing about that reality." Which is nothing but conjecture. Are there facts we should be ignorant of? The litany sounds very sure that there are not. If I accept the litany, then I too am very sure. How can I be so sure, what evidence have I seen? It is true that I can think of times that it is better to face the truth, hard though that might be. But that only proves that some knowledge is better than some ignorance, not that all facts are better to know than not. I can think of a few candidates for truths it might be worse for someone to know. - If someone is on their deathbed, I don't think I'd argue with them about heaven (maybe hell). There are all kinds of sad truths that would seem pointless to tell someone right before they died. Who hates them, who has lied to them, how long they will be remembered, why tell any of it? - If someone is trying to overcome an addiction, I don't feel compelled to scrutinize their crystal healing beliefs. - I don't think I'd be doing anyone any favors
2An1lam8d Thing I desperately want: tablet native spaced repetition software that lets me draw flashcards. Cloze deletions are just boxes or hand-drawn occlusions.

# Sunday, October 13th 2019Sun, Oct 13th 2019

Personal Blogposts
Shortform [Beta]
21ChristianKl9d Elon Musks Starship might bring us a new x-risk. Dropping a tungsten rod [http://www.spacedaily.com/reports/US_Project_Thor_would_fire_tungsten_poles_at_targets_from_outer_space_999.html] that weights around 12,000 kg from orbit has a similar destruction potential as nuclear weapons. At present lunch prices bringing a tungsten rod that's weighted 12,000 kg to orbit has a extreme cost for the defense industry that was labeled to be around $230 million a rod. On the other hand, Starship is designed to be able to carry 100 tons with equals 8 rots to space in a single flight and given that Elon talked about being able to launch starship 3 times per day with a cost that would allow transporting humans from one place of the earth to another the launch cost might be less then a million. I found tungsten prices to be around 25$/kilo [https://www.tungsten.com/tips/tungsten-and-costs/]for simple products, which suggest a million dollar might be a valid price for one of the rods. When the rods are dropped they hit within 15 minutes which means that an attacked country has to react faster then towards nuclear weapons. Having the weapons installed in a satellite creates the additional problem that there's no human in the loop who makes the decision to launch. Any person who succeeds in hacking a satellite with tungsten rods can deploy them.
5Gurkenglas9d Suppose we considered simulating some human for a while to get a single response. My math heuristics are throwing up the hypothesis that proving what the response would be is morally equivalent to actually running the simulation - it's just another substrate. Thoughts? Implications? References?
2Chris_Leong9d As I wrote before, evidential decision theory [https://www.lesswrong.com/posts/SbAofYCgKkaXReDy4/chris_leong-s-shortform#yKRZgXjt3qvzpWQEr] can be critiqued for failing to deal properly with situations where hidden state is correlated with decisions. EDT includes differences in hidden state as part of the impact of the decision, when in the case of the smoking lesion, we typically want to say that it is not. However, Newcomb's problem also has hidden state is correlated with your decision. And if we don't want to count this when evaluating decisions in the case of the Smoking Lesion, perhaps we shouldn't count this in the case of Newcomb's? Or is there a distinction? I think I'll try analysing this in terms of the erasure theory of coutnerfactuals at some point