Recent Discussion

A friend observed that fewer people in the effective altruism movement are married than you might expect. I was curious: what are marriage rates like within the EA community? The 2018 EA Survey asked about relationship status, and we can look at how that varies by age:

I'm using "ever married" for people who are currently married or have ever been married, including people who are now divorced, widowed, or separated. Since some of these buckets might be pretty small, let's add sample size information:

The anonymized survey data doesn't have 35-44 data, and the 65+ gro... (Read more)

I have two observations from the 2018 EA survey demographic data:

  • 80% of EAs are atheist/agnostic/non-religious and 3% are Buddhist, compared to 10% and 1% in the US population. These categories have lower marriage rates than the general population.
  • Because EAs tend to be so young (modal age 25), the median age of EAs in the 25-34 age group appears to be around 28, compared to around 30 for the general US population. Similarly, the median age within the 35-44 age group appears to be around 38, compared to 40. Since marriage rate increases so sharply with
... (read more)
2romeostevensit29mUtilitarians code as defectors to normal people for good reason.

Most planning around AI risk seems to start from the premise that superintelligence will come from de novo AGI before whole brain emulation becomes possible. I haven't seen any analysis that assumes both uploads-first and the AI FOOM thesis (Edit: apparently I fail at literature searching), a deficiency that I'll try to get a start on correcting in this post.

It is likely possible to use evolutionary algorithms to efficiently modify uploaded brains. If so, uploads would likely be able to set off an intelligence explosion by running evolutionary algorithms on themselves, selecting for something ... (Read more)

it seems plausible to me that getting better models of neurons would be useful for creating neuromorphic AIs while better brain scanning would not, and both technologies are necessary for brain uploading

is the idea that you cannot scan a brain if you don't know what needs to be scanned, and so that's why you need a model of neurons before you can upload? that you think "scanning everything" and waiting to figure out how it works before emulating the scanned mind is impracal? 

Last week we learned there is plausibly a simple, cheap and easy way out of this entire mess. All we have to do is take our Vitamin D. In case it needed to be said, no. We are not taking our Vitamin D. There’s definitely some voices out there pushing it, including the nation’s top podcaster Joe Rogan, but I don’t see any signs of progress.

Instead, as school restarts, the outside gets colder and pandemic fatigue sets in, people’s precautions are proving insufficient to the task. This week showed that we have taken a clear step backwards across the country. 

I see three ways for things not to get... (Read more)

2PeterMcCluskey2hThis Wikipedia page [] says the pre-1800 average was 4.4 million acres. So it looks like burning every 20 years was typical for a California forest.

Huh wild. I guess I have heard about redwood trees surviving forest fires, so that makes some sense, but man those'd be some big fires.

3crabman18hI've asked Zvi what he thinks about long term consequences of being ill. Due to his answer, my current thinking, which I use to calculate the cost of COVID-19 to myself in dollars, is as follows. COVID-19 long term consequences for myself have 2 components: something that lasts about half a year, and something that's permanent. Or at least modelling it as if it has 2 components is not too bad. The 1st component contains strong fatigue, low grade fever, headaches, or loss of taste and smell and has probability 3% given covid. The 2nd component is permanent lung, heart, or brain damage and I guess has probability about 0.5% given covid. However, this probability estimate is very uncertain and can easily change when new data arrives. I've eyeballed DALY loss estimates for various diseases according to [] (which is a DALY estimate study cited by Doing Good Better) and thought. Due to this I've got estimates of how bad those two components if they happen are: If the 1st component happens, for its duration I will lose 20% of my well-being (as measured in DALY/QALY) and 30% of my productivity. If the 2nd component happens, then for the rest of my life I will lose 8% of my well-being and 10% of my productivity. If you want more details about how I got these percentages, then I can only say what rows in table 2 of that study I found relevant. They are * Illness - Coefficient (lower is better, no adverse effects is 0%, death is 100%) - My comment * Infectious disease: post-acute consequences (fatigue, emotional lability, insomnia) - 26% - The 1st component is basically this * COPD and other chronic respiratory diseases: mild - 1.5% - The 2nd component may realize as this * COPD and other chronic respiratory diseases: moderate - 19% - The 2nd component may realize as this * Heart failure: Mild - 4% - The 2nd component may realize as this
2romeostevensit18halso the EU wants you to be deficient and therefore in this pandemic specifically, wants you to die. No one is getting to sufficient d status on 200iu pills. I take 5k a day.

Hi all, I've been working on some AI forecasting research and have prepared a draft report on timelines to transformative AI. I would love feedback from this community, so I've made the report viewable in a Google Drive folder here.

With that said, most of my focus so far has been on the high-level structure of the framework, so the particular quantitative estimates are very much in flux and many input parameters aren't pinned down well -- I wrote the bulk of this report before July and have received feedback since then that I haven't fully incorporated yet. I'd prefer ... (Read more)

Planned summary for the Alignment Newsletter (which won't go out until it's a Full Open Phil Report):

Once again, we have a piece of work so large and detailed that I need a whole newsletter to summarize it! This time, it is a quantitative model for forecasting when transformative AI will happen.

The overall framework

The key assumption behind this model is that if we train a neural net or other ML model that uses about as much computation as a human brain, that will likely result in transformative AI (TAI) (defined as AI that has an impact comparable to that... (read more)

1adamShimi3hIsn't that in contradiction with posting it to LW (by crossposting)? I mean, it's in free access for everyone, so anyone that wants to share it can find it.
4Ben Pace3hI expect the examples Ajeya has in mind are more like sharing one-line summaries in places that tend to be positively selected for virality and anti-selected for nuance (like tweets), but that substantive engagement by individuals here or in longer posts will be much appreciated.
4Raemon3hI'm assuming part of the point is the LW crosspost still buries things in a hard-to-navigate google doc, which prevents it from easily getting cited or going viral, and Ajeya is asking/hoping for trust that they can get the benefit of some additional review from a wider variety of sources.
A crowd probably best served by a wide variety of translations

TLDR: Language translation is a decent first step for written works, but the ideal looks more like an empathetic personal tutor. There’s a lot to do in-between, both in the near term with human labor, and the longer term with Machine Learning.

Epistemic Status: I’m not an experienced researcher in this field. I’ve read a few audiobooks on language and thought about the area, but I’m sure I’m failing to reference many key papers and books. I’m fairly uncertain about all of this, I suggest taking my opinion very lightly (if at all) and... (Read more)

2ozziegooen4hBy narrow I mean they are aiming to provide language-language translation, but they could hypothetically done on a much more granular level. For instance, a translation that matches the very specific vernacular of some shared Dutch & Jamaican family with its own code words. And there’s no reason the semantics can’t be considerably changed. Maybe Hamlet could be adjusted to take place in whichever professional context a small community would be most likely to understand, and then presented as a post modern punk musical because that community really likes post modern punk musicals. Whatever works. One could argue that "liberal translations could never improve on the source, and therefore we need to force everyone to only use the source." I disagree. Very true! There's actually a lot of discussion of this around Harry Potter, which needed a lot of translations very quickly, and does have a fair bit of wordplay and the like. See here: [] I'm sure there must be a far greater deal of similar discussion around Biblical translations. See the entire field of Hermeneutics [], for instance. That said, I'd note I'm personally interested in this for collective epistemic reasons. I think that the value of "an large cluster of people can better understand each other and thus do much better research and epistemic and moral thinking" is a bigger priority than doing this for artistic reasons, though perhaps it's less interesting.
For instance, a translation that matches the very specific vernacular of some shared Dutch & Jamaican family with its own code words. And there’s no reason the semantics can’t be considerably changed. Maybe Hamlet could be adjusted to take place in whichever professional context a small community would be most likely to understand, and then presented as a post modern punk musical because that community really likes post modern punk musicals. Whatever works.

Yeah okay that is a far more radical definition of 'translation' than I w... (read more)

2ozziegooen5hAgreed that translations are often wrong, but I don't think this is reason to give up on them! Translations between languages often fail, but I'm thankful we have them. The alternative to translation that I was taught in school about Shakespeare was to just give us the source and have us figure it out. I'm absolutely sure we did a terrible job at it, even worse than that bad translation. I don't remember ever having a lesson on how to translate Early Modern English to Modern English. I think I barely understood how large the difference was, let alone interpreted it correctly. My knowledge on this topic comes from the Great Courses course "The Story of Language" by John McWhorter. Lecture 7 is great and goes into detail on the topic. Some quotes, transcribed here []:
4mingyuan2hSince we're basically just on a Shakespeare tangent now, and I really like talking about Shakespeare - I was lucky to have an extremely thorough education in Early Modern English starting from a very young age (starting around 7, I think). Essentially, my theater did Shakespeare completely uncut, and before memorizing your lines you had to listen to cassette tapes where the founder of the theater took you through the full meaning of every single line. I think he recorded these with multiple sources open in front of him, and he'd already devoted decades of study to Shakespeare by the time I was born. And then school gave me a thorough education in literary analysis, and putting all that together, I claim I have a better understanding of Shakespeare than the vast majority of Shakespearean actors, and probably the majority of Shakespeare scholars as well. (I believe most professional Shakespearean actors have no fucking clue what they're saying most of the time, and how in heck is the audience supposed to understand what's going on if the actors don't?) My vocabulary in Shakespearean English is more limited than my native English vocabulary, but I'd still say I'm comfortably fluent in Early Modern English, perhaps even better than I am at French. My friends say that it's really fun to read through Shakespeare plays with me because they actually know what's going on. Shakespeare is really funny! In addition to being really beautiful and moving and incredibly fun to act. Anyway, I'm sorry your school sucked and also that all schools suck. I wish I could give everyone the education in Shakespeare that was given to me. I have ideas on how to make that happen, but alas, doesn't seem like a priority with the world the way it is.

Epistemic Status: I only know as much as anyone else in my reference class (I build ML models, I can grok the GPT papers, and I don't work for OpenAI or a similar lab). But I think my thesis is original.

Related: Gwern on GPT-3

For the last several years, I've gone around saying that I'm worried about transformative AI, an AI capable of making an Industrial Revolution sized impact (the concept is agnostic on whether it has to be AGI or self-improving), because I think we might be one or two cognitive breakthroughs away from building one.

GPT-3 has made me move up my timelines, because it makes me... (Read more)

3capybaralet4hNo, that's zero-shot. Few shot is when you train on those instead of just stuffing them into the context. It looks like mesa-optimization because it seems to be doing something like learning about new tasks or new prompts that are very different from anything its seen before, without any training, just based on the context (0-shot). By "training a model", I assume you mean "a ML model" (as opposed to, e.g. a world model). Yes, I am claiming something like that, but learning vs. inference is a blurry line. I'm not saying it's doing SGD; I don't know what it's doing in order to solve these new tasks. But TBC, 96 steps of gradient descent could be a lot. MAML does meta-learning with 1.

"It's Not You, it's Me: Detecting Flirting and its Misperception in Speed-Dates" is a fascinating approach to the study of flirtation. It uses a machine learning model to parse speed-dating data and detect whether the participants were flirting. Here's a sci-hub link. I found three key insights in the paper.

First of all, people basically assume that others share their own intentions. If they were flirting, they assume their partner was too. They're quite bad at guessing whether their partner was flirting, but they do a bit better than chance.

Secondly, the machine learning model was about 70% a... (Read more)

4ChristianKl12hThis sounds to me like it goodharts on the wrong thing. When on a date your core concern isn't to signal to the other person that you are flirting but that you are a desireable mate.

My primary use case for this would be at parties where whether or not someone is flirting is the core question.

8AllAmericanBreakfast12hMy take is that we’re trying to do both in equal measure, and that a big part of showing you’re a desirable mate is showing that you know how to flirt. In fact, my gut feeling is that signaling that you’re interested/are flirting is more important. When meeting a stranger, there’s plenty of time to suss out their desirable qualities as time goes on. But if you fail to signal interest, you miss out on the opportunity to discover those desirable qualities entirely. There are plenty of people who’d make excellent mates who fail to find relationships because they don’t know how to tell when somebody is interested in them. Likewise, there are people who are terrible mates but have no trouble finding relationships because they know how to tell when somebody’s interested in them.

Response to: Making Beliefs Pay Rent (in Anticipated Experiences), Belief in the Implied Invisible, and No Logical Positivist I

I recently decided that some form of strong verificationism is correct - that beliefs that don't constrain expectation are meaningless (with some caveats). After reaching this conclusion, I went back and read EY's posts on the topic, and found that it didn't really address the strong version of the argument. This post consists of two parts - first, the positive case for verificationism, and second, responding to EY's argument against it.

The case fo... (Read more)

Depends. In a certain vague sense, they are both okay pointers to what I think is the fundamental thing they are about, the two truths doctrine. In another sense, no, because the map and territory metaphor suggests a correspondence theory of truth, whereas ontological and ontic about mental categories and being or existence, respectively, and historically tied to a different approaches to truth, namely those associated with transcendental idealism. And if you don't take my stance that they're both different aspects of the same way of understanding reality ... (read more)

1ike13hI've made a number of different arguments. You can respond by taking ontological terms as primitive, but as I've argued there's strong reasons for rejecting that. Of course I do. Every one of the arguments I've put forward clearly applies only to the kinds of ontological statements I'm talking about. If an argument I believed was broader, then I'd believe a broader class of statements was meaningless. If you disagree, which specific argument of mine (not conclusion) doesn't? I'm not interested in analytical definitions right now. That's how Quine argued against it and I don't care about that construction.
1TAG10hThat's not what I said. I said that you made a claim based on nothing but intuition, and that a contrary claim based on nothing but intuition is neither better nor worse than it. The argument that if it has no observable consequences, it is meaningless does not apply to only ontological statements.
1ike8h> said that you made a claim based on nothing but intuition This isn't true - I've made numerous arguments for this claim not purely based on intuition. >The argument that if it has no observable consequences, it is meaningless does not apply to only ontological statements. I did not make this argument. This is a conclusion that's argued for, not an argument, and the arguments for this conclusion only apply to ontological statements.

Given that social science research often doesn't replicate, is there a good way to search a social science finding or paper and see if it's valid?

Ideally, one would be able to type in e.g. "growth mindset" or a link to Dweck's original research, and see:

  • a statement of the idea e.g. 'When "students believe their basic abilities, their intelligence, their talents, are just fixed traits", they underperform students who "understand that their talents and abilities can be developed through effort, good teaching and persistence." Carol Dweck initially studied
... (read more)
niplav's Shortform
3moShow Highlight
1niplav7hIf we don't program philosophical reasoning into AI systems, they won't be able to reason philosophically.

This is an idea that's been talked about here before, but it's not even exactly clear what philosophical reasoning is or how to train for it, let alone if it's a good idea to teach an AI to do that.

Sometimes there's a concept that can be difficult to understand when entangle with everything else that needs to be understood about our physics.

If you isolate that concept in a simpler universe, it makes it easier to explain how the concept works.

What are such examples?

(I feel like I asked a similar question somewhere at some point, but can't find it)

2Viliam6hRelevant comment here [] :

My question is definitely not limited to novel models. By all means, do let me know if you're aware of other toy models that have (and so can explain) relativistic-like properties, or share other interesting properties with out universe

I have heard from several friends I trust that Wirecutter is no longer very reliable since being acquired by the New York Times in 2016, and that their Wirecutter-advised purchases have become pretty mediocre in the last year or two.

I'd be interested in people making some of that case, as answers to this post, including things like talking about purchases they made or basic errors they found in the reviews, as I don't have strong, publicly verifiable evidence on this at the minute.

This is a biased post, I'm writing this with the hope of helping propagate this info if it's true, which I suspect... (Read more)

A glaring omission in Wirecutter's laptop recommendations: they say the HP Envy x360 13 is superior to their top picks, but don't list it. This is ostensibly due to stock shortages, but a slightly different configuration (with the 4700U) has been in stock for a while. The Envy with 4700U is an extremely good laptop and $300-400 cheaper than their top picks. 

The 13-inch HP Envy x360 with an AMD Ryzen 5-4500U processor is an excellent ultrabook—it’s compact, light, and had nearly 12 hours of battery life in our tests. It has a great keyboard, a responsi

... (read more)

A lot of the discussion of mesa-optimization seems confused.

One thing that might be relevant towards clearing up the confusion is just to remember that "learning" and "inference" should not be thought of as cleanly separated, in the first place, see, e.g. AIXI...

So when we ask "is it learning? Or just solving the task without learning", this seems like a confused framing to me. Suppose your ML system learned an excellent prior, and then just did Bayesian inference at test time. Is that learning? Sure, why not. It might... (read more)

7capybaralet4hIt seems like a lot of people are still thinking of alignment as too binary, which leads to critical errors in thinking like: "there will be sufficient economic incentives to solve alignment", and "once alignment is a bottleneck, nobody will want to deploy unaligned systems, since such a system won't actually do what they want". It seems clear to me that: 1) These statements are true for a certain level of alignment, which I've called "approximate value learning" in the past ( [(] I think I might have also referred to it as "pretty good alignment" or "good enough alignment" at various times. 2) This level of alignment is suboptimal from the point of view of x-safety, since the downside risk of extinction for the actors deploying the system is less than the downside risk of extinction summed over all humans. 3) We will develop techniques for "good enough" alignment before we develop techniques that are acceptable from the standpoint of x-safety. 4) Therefore, the expected outcome is: once "good enough alignment" is developed, a lot of actors deploy systems that are aligned enough for them to benefit from them, but still carry an unacceptably high level of x-risk. 5) Thus if we don't improve alignment techniques quickly enough after developing "good enough alignment", it's development will likely lead to a period of increased x-risk (under the "alignment bottleneck" model).
3capybaralet5hNo, I'm talking about it breaking out during training. The only "shifts" here are: 1) the AI gets smarter 2) (perhaps) the AI covertly influences its external environment (i.e. breaks out of the box a bit). We can imagine scenarios where it's only (1) and not (2). I find them a bit more far-fetched, but this is the classic vision of the treacherous turn... the AI makes a plan, and then suddenly executes it to attain DSA. Once it starts to execute, ofc there is distributional shift, but: A) it is auto-induced distributional shift B) the developers never decided to deploy

Cross-posted, as always, from Putanumonit.

I have written many posts in the shape of giving life advice. I hear back from readers who take it and those who refuse it. Either is good — I’m just a guy on the internet, to be consumed as part of a balanced diet of opinions.

But occasionally I hear: who are you to give life advice, your own life is so perfect! This sounds strange at first. If you think I’ve got life figured out, wouldn’t you want my advice? I think what they mean is that I haven’t had to overcome the hardships they have, hostile people and adverse circumstances.

I talk quite ofte... (Read more)

But occasionally I hear: who are you to give life advice, your own life is so perfect! This sounds strange at first. If you think I’ve got life figured out, wouldn’t you want my advice?

How much your life is determined by your actions, and how much by forces beyond your control, that is an empirical question. You seem to believe it's mostly your actions. I am not trying to disagree here (I honestly don't know), just saying that people may legitimately have either model, or a mix thereof.

If your model is "your life is mostly dete... (read more)

7Viliam6hMaybe a religion that wants to appeal to people with modern sense of justice (i.e. those not satisfied with "the ingroup goes to heaven, the outgroup goes to hell, exactly as you would wish, right?") has no better option than take the just-world hypothesis and dress it up in religious terms.
3Kaj_Sotala7hI like this post, but can't help but to notice that I expect it to be unhelpful to the people who would most need it. For someone deeply mired in victim mentality, they won't be convinced by an intellectual argument for why victim mentality is bad, since to them their victimhood feels like just how things are. On the other hand, I guess it's unavoidable that the people who would most need to hear a particular advice are incapable of hearing it, and this post can still be helpful to people who might otherwise slightly lean that way. (Personally seeing some people's victim mentality has given me a strong incentive to never be like that myself, and it has felt helpful, so I expect that this post will also be helpful to some.)
5AllAmericanBreakfast12hMy favorite part of this post is your comment on how rejection of your own victim mentality helped you develop empathy for the difficult dating experiences of women. I experienced the same thing, and it strikes me as both true and counterintuitive. My hesitancy is around weaving together so many areas of life under the title “victimhood.” I’m not ready to accept that Palestine’s problems are due to millions of Palestinians refusing to cast off their victim mentality. Their experience is structurally different from that of a man having a tough time dating, or a person falling for a scam. It’s totally fair to critique left wing activists for not having an in depth understanding of the issue. Although I’m no longer a leftist, I was in college. I don’t think it’s quite fair to say they’re all in it to burnish their radical credentials. Instead, I’d say that their collective anxiety about acceptance by the others is part of what inhibits them from that in-depth research. It’s all too easy to cast victimology as the new oppressor, and I fear that this post teeters on the verge of that. But I do like this post. The victim narrative is about demanding empathy from others, and people who support it fear that if people reject the narrative, then they are rejecting empathy. Not so. Empathy can be two-fold: acknowledging the unique external difficulties another person faces, while also feeling compassion for the ways in which their attitude and actions may compound those problems.

Here are some somewhat unconnected unconfident thoughts on criticism that I’ve been thinking about recently.


A while ago, when I started having one-on-ones with people I was managing, I went into the meetings with a list of questions I was going to ask them. After the meetings, I’d look at my notes and realize that almost all the value of the meeting came from the part where I asked them what the worst parts of their current work situation were and what the biggest mistakes I was making as a manager were.

I started thinking that almost the whole point of meetings like that is to... (Read more)

One thing that I do to invite more frank criticism from people is to ask in the frame of "I think I'm bad at X, do you have any specific thoughts or suggestions to help me get better?" (where X is a pretty broad category). This pre-commits to the position that you're bad at it, which gets rid of (most of) the status risk for them in criticizing you.

1deluks9176hI feel like most of the value I got from this post is from the first section on management. I wish you had talked about your experiences in more detail. A serious problem with thinking about this stuff is that there are serious issues on both sides. For example, I have found that asking for advice from people who don't actually share your goals/values if often, at best, a waste of time. They give you advice that optimizes their goals not yours without being explicit about what they are doing. In the case of the management you did, I think this problem is less severe because your goals were reasonably overlapping. You just had to convince people that your goals overlapped. In many similar-ish cases the actual incentive is to avoid honesty.
2strangepoop13hOne (dark-artsy) aspect to add here is that the first time you ask somebody for criticism, you're managing more than your general identity, you're also managing your interaction norms with that person. You're giving them permission to criticize you (or sometimes, even think critically about you for the first time), creating common knowledge that there does exist a perspective from which it's okay/expected for them to do that. This is playing with the charity they normally extend to you, which might mean that your words and plans will be given less attention than before, even though there might not be any specific criticism in their head. This is especially relevant for low-legibility/fluid hierarchies, which might collapse and impede functioning from the resulting misalignment, perhaps not unlike your own fears of being "crushed", but at the org level. Although it's usually clear that you'd want to get feedback rather than manage this (at least, I think so), it's important to notice as one kind of anxiety surrounding criticism. This is separate from any narcissistic worries about status, it can be a real systemic worry when you're acting prosocially.
2WorkInProgress17hThis whole post really resonated with me. The way in which you invite criticism seems like an excellent way to find hidden flaws, or confirm or deny theories about weak points. I also try to invite criticism but it's very difficult. Similar to how you mentioned: I often bring up something I fear I'm doing wrong in casual conversation and explain the steps that I'm planning to take to remedy it. It's a bit of a stealthy way to invite criticism because my colleague will either agree with me and suggest additional steps or will state that they don't agree and add another area that they think would be more helpful to address. In mentioning a weakness and then directly proceeding to steps to remedy I think I'm able to get over the scary part of the weak point by engaging the solving part of both my brain and my colleagues brain. A negative or weak area is threatening but iterating on a solution is just a problem to solve and optimize. I also have had some luck in actively asking from criticism from people with little to lose in the situation. Bosses or colleagues during exit interviews, or people who are moving away. Often the transition as well as the feeling that they have less risk in opening up to me can give me some pretty raw feedback. The only problem here is that it's sometimes too raw. I've gotten some pretty brutal feedback after quitting due to the bosses negative feelings about me leaving.

I really like the Vitamin-D inquiries made.

I only stumbled a bit about this: "Of course, there may be another source for the dramatic difference between the two groups, which has not yet been identified. This would usually be the responsibility of the publishing journal to expose. In this case, the publication has been peer-reviewed and published in a small journal specializing in vitamin D. The publisher is Elsevier, which also publishes the Lancet and Cell."

Many people nowadays are stating that peer-review processes are problematic, but I... (read more)

1Kenny11hRelated: * Covid 9/10: Vitamin D - LessWrong []

(This is a basic point about utility theory which many will already be familiar with. I draw some non-obvious conclusions which may be of interest to you even if you think you know this from the title -- but the main point is to communicate the basics. I'm posting it to the alignment forum because I've heard misunderstandings of this from some in the AI alignment research community.)

I will first give the basic argument that the utility quantities of different agents aren't directly comparable, and a few important consequences of this. I'll then spend the rest of the post discussing what to do ... (Read more)

2rohinmshah8hPlanned summary for the Alignment Newsletter:
2abramdemski12hWell, I haven't actually given the argument that it has to be linear. I've just asserted that there is one, referencing Harsanyi and complete class arguments. There are a variety of related arguments. And these arguments have some assumptions which I haven't been emphasizing in our discussion. Here's a pretty strong argument (with correspondingly strong assumptions). 1. Suppose each individual is VNM-rational. 2. Suppose the social choice function is VNM-rational. 3. Suppose that we also can use mixed actions, randomizing in a way which is independent of everything else. 4. Suppose that the social choice function has a strict preference for every Pareto improvement. 5. Also suppose that the social choice function is indifferent between two different actions if every single individual is indifferent. 6. Also suppose the situation gives a nontrivial choice with respect to every individual; that is, no one is indifferent between all the options. By VNM, each individual's preferences can be represented by a utility function, as can the preferences of the social choice function. Imagine actions as points in preference-space, an n-dimensional space where n is the number of individuals. By assumption #5, actions which map to the same point in preference-space must be treated the same by the social choice function. So we can now imagine the social choice function as a map from R^n to R. VNM on individuals implies that the mixed action p * a1 + (1-p) * a2 is just the point p of the way on a line between a1 and a2. VNM implies that the value the social choice function places on mixed actions is just a linear mixture of the values of pure actions. But this means the social choice function can be seen as an affine function from R^n to R. Of course since utility functions don't mind additive constants, we can subtract the value at the origin to get a linear function. But remember that points in this space are just vectors of individual's utilities f

Ah, I think I understand better - I was assuming a much stronger statement of what social choice function is rational for everyone to have, rather than just that there exists a (very large) set of social choice functions, and it it rational for an agent to have any of them, even if it massively differs from other agents' functions.

Thanks for taking the time down this rabbit hole to clarify it for me.

1flodorner17h"The Nash solution differs significantly from the other solutions considered so far. [...] 2. This is the first proposal where the additive constants matter. Indeed, now the multiplicative constants are the ones that don't matter!" In what sense do additive constants matter here? Aren't they neutralized by the subtraction?
Dagon's Shortform
1yShow Highlight

Useful pointers. I do remember those conversations, of course, and I think the objections (and valid uses) remain - one can learn from unlikely or impossible hypotheticals, but it takes extra steps to specify why some parts of it would be applicable to real situations. I also remember the decoupling vs contextualizing discussion, and hadn't connected it to this topic - I'm going to have to think more before I really understand whether Newcomb-like problems have clear enough paths to applicability that they can be decoupled by default or whether there's a default context I can just apply to make sense of them.

So I really appreciate the lessons I've learned from "Rationality", but I wish I had learned them earlier in life. We are now homeschooling my kids, and I want to volunteer to teach my kids plus others who are interested lessons about thinking rationally.

Does anyone have recommendations on how to put together a curriculum which gets at the core ideas of rationality, but is oriented towards young kids? Some criteria:

Children will likely range from 7-11, meaning they should be simple concepts and require very little prior knowledge and only the simplest math.

Lessons should be int... (Read more)

I'm really interested in this too. I have a 1 year old and work in improving engineering education. might be worth checking out.

3goose00013hYes, I agree that doing good science is hard with flash, I've just had everyone telling me that that's what hooks them. Good to know that's not really true. I'm thinking along the lines heavily leading to/giving the model, not necessarily having them come up with it themselves and then testing it. But part of the reason I'm asking here is to see if anyone has ideas regarding models which are discoverable by kids this age so that they can get there by more of their own processes.
2AllAmericanBreakfast12hThat’s fair! I think that’s a good idea to explore and I think it’s great to try things out. If you try something and the kids don’t take to it, no harm done :) One thing you could try is some probability. There’s a classic intro stats demo where you have a class come up with fake sequences of 20 coin flips in a row, and generate some real sequences of 20 coin flips as well, all while the teacher is out of the room. Then the teacher comes in and guesses which are real and which are fake. They can do that because people tend to generate fake sequences with too few stretches of repeated heads and tails. Kids can flip a coin, they’d have fun trying to trick you, and when you guessed right, it might seem like a magic trick. You can also teach them a few things about probability and dice rolls and help them see how it applies to board games.
1goose00013hYes, I suppose I could have been more specific about the number of kids. I will be teaching my own two at a minimum, but could have as many as seven others join. Thanks for the note about the handbook, I'll check it out.
This is a linkpost for

Previously: AGI and Friendly AI in the dominant AI textbook (2011), Stuart Russell: AI value alignment problem must be an "intrinsic part" of the field's mainstream agenda (2014)

The 4th edition of Artificial Intelligence: A Modern Approach came out this year. While the 3rd edition published in 2009 mentions the Singularity and existential risk, it's notable how much the 4th edition gives the alignment problem front-and-center attention as part of the introductory material (speaking in the authorial voice, not just "I.J. Good (1965) says this, Yudkowsky (2008) says that, Omohundro (2008) says t... (Read more)

4abramdemski13hI would further charitably rewrite it as: "In chapter 16, we analyze an incentive which a CIRL agent has to allow itself to be switched off. This incentive is positive if and only if it is uncertain about the human objective." A CIRL agent should be capable of believing that humans terminally value pressing buttons, in which case it might allow itself to be shut off despite being 100% sure about values. So it's just the particular incentive examined that's iff.
5Zack_M_Davis1dCan I also point to this as (some amount of) evidence against concerns that "we" (members of this stupid robot cult that I continue to feel contempt for but don't know how to quit [] ) shouldn't try to have systematically truthseeking discussions about potentially sensitive or low-status subjects because guilt-by-association splash damage from those conversations will hurt AI alignment efforts, which are the most important thing in the world? (Previously: 1 [] 2 [] 3 [] .) Like, I agree that some nonzero amount of splash damage exists. But look! The most popular AI textbook, used in almost fifteen hundred colleges and universities, clearly explains the paperclip-maximizer problem, in the authorial voice, in the first chapter. "These behaviors are not 'unintelligent' or 'insane'; they are a logical consequence of defining winning as the sole objective for the machine." Italics in original! I couldn't transcribe it, but there's even one of those pay-attention-to-this triangles (◀) in the margin, in teal ink. Everyone who gets a CS degree from this year onwards is going to know from the teal ink that there's a problem. If there was a marketing war [] to legitimize AI risk, we won! Now can "we" please stop using the marketing war as an excuse for lying?!

some predictable counterpoints: maybe we won because we were cautious; we could have won harder; many relevant thinkers still pooh-pooh the problem; it's not just the basic problem statement that's important, but potentially many other ideas that aren't yet popular; picking battles isn't lying; arguing about sensitive subjects is fun and I don't think people are very tempted to find excuses to avoid it; there are other things that are potentially the most important in the world that could suffer from bad optics; I'm not against systematically truthseeking discussions of sensitive subjects, just if it's in public in a way that's associated with the rationalism brand

6Zack_M_Davis1dIt is an "iff" in §16.7.2 "Deference to Humans", but the toy setting in which this is shown is pretty impoverished. It's a story problem about a robot Robbie deciding whether to book an expensive hotel room for busy human Harriet, or whether to ask Harriet first. (I think this is fine as a topic-introducing story problem, but agree that the sentence in Chapter 1 referencing it shouldn't have been phrased to make it sound like it applies to machines-in-general.)
Load More