Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
The comments here are a storage of not-posts and not-ideas that I would rather write down than not.
The comments here are a storage of not-posts and not-ideas that I would rather write down than not.
Yesterday I noticed that I had a pretty big disconnect from this: There's a very real chance that we'll all be around, business somewhat-as-usual in 30 years. I mean, in this world many things have a good chance of changing radically, but automation of optimisation will not cause any change on the level of the industrial revolution. DeepMind will just be a really cool tech company that builds great stuff. You should make plans for important research and coordination to happen in this world (and definitely not just decide to spend everything on a last-ditch effort to make everything go well in the next 10 years, only to burn up the commons and your credibility for the subsequent 20).
Only yesterday when reading Jessica's post did I notice that I wasn't thinking realistically/in-detail about it, and start doing that.
Related hypothesis: people feel like they've wasted some period of time e.g. months, years, 'their youth', when they feel they cannot see an exciting path forward for the future. Often this is caused by people they respect (/who have more status than them) telling them they're only allowed a small few types of futures.
Er, Wikipedia has a page on misinformation about Covid, and the first example is Wuhan lab origin. Kinda shocked that Wikipedia is calling this misinformation. Seems like their authoritative sources are abusing their positions. I am scared that I'm going to stop trusting Wikipedia soon enough, which is leaving me feeling pretty shook.
Responding to Scott's response to Jessica.
The post makes the important argument that if we have a word whose boundary is around a pretty important set of phenomena that are useful to have a quick handle to refer to, then
... (read more)
- It's really unhelpful for people to start using the word to also refer to a phenomena with 10x or 100x more occurrences in the world because then I'm no longer able to point to the specific important parts of the phenomena that I was previously talking about
- e.g. Currently the word 'abuser' describes a small number of people during some of their lives. Someone might want to say that technically it should refer to all people all of the time. The argument is understandable, but it wholly destroys the usefulness of the concept handle.
- People often have political incentives to push the concept boundary to include a specific case in a way that, if it were principled, indeed makes most of the phenomena in the category no use to talk about. This allows for selective policing being the people with the political incentive.
- It's often fine for people to bend words a little bit (e.g. when people verb nouns), but when it's in the class of terms w
I will actually clean this up and into a post sometime soon [edit: I retract that, I am not able to make commitments like this right now]. For now let me add another quick hypothesis on this topic whilst crashing from jet lag.
A friend of mine proposed that instead of saying 'lies' I could say 'falsehoods'. Not "that claim is a lie" but "that claim is false".
I responded that 'falsehood' doesn't capture the fact that you should expect systematic deviations from the truth. I'm not saying this particular parapsychology claim is false. I'm saying it is false in a way where you should no longer trust the other claims, and expect they've been optimised to be persuasive.
They gave another proposal, which is to say instead of "they're lying" to say "they're not truth-tracking". Suggest that their reasoning process (perhaps in one particular domain) does not track truth.
I responded that while this was better, it still seems to me that people won't have an informal understanding of how to use this information. (Are you saying that the ideas aren't especially well-evidenced? But they so... (read more)
The definitional boundaries of "abuser," as Scott notes, are in large part about coordinating around whom to censure. The definition is pragmatic rather than objective.*
If the motive for the definition of "lies" is similar, then a proposal to define only conscious deception as lying is therefore a proposal to censure people who defend themselves against coercion while privately maintaining coherent beliefs, but not those who defend themselves against coercion by simply failing to maintain coherent beliefs in the first place. (For more on this, see Nightmare of the Perfectly Principled.) This amounts to waging war against the mind.
Of course, in matter of actual fact we don't strongly censure all cases of consciously deceiving. In some cases (e.g. "white lies") we punish those who fail to lie, and those who call out the lie. I'm also pretty sure we don't actually distinguish between conscious deception and e.g. reflexively saying an expedient thing, when it's abundantly clear that one knows very well that the expedient thing to say is false, as Jessica pointed out here.
*It's not clear to me that this is a good kind of concept to ... (read more)
Okay, I’ll say it now, because there’s been too many times.
If you want your posts to be read, never, never, NEVER post multiple posts at the same time.
Only do that if you don’t mind none of the posts being read. Like if they’re all just reference posts.
I never read a post if there’s two or more to read, it feels like a slog and like there’s going to be lots of clicking and it’s probably not worth it. And they normally do badly on comments on karma so I don’t think it’s just me.
Even if one of them is just meant as reference, it means I won’t read the other one.
I recently circled for the first time. I had two one-hour sessions on consecutive days, with 6 and 8 people respectively.
My main thoughts: this seems like a great way for getting to know my acquaintances, connecting emotionally, and build closer relationships with friends. The background emotional processing happening in individuals is repeatedly brought forward as the object of conversation, for significantly enhanced communication/understanding. I appreciated getting to poke and actually find out whether people's emotional states matched the words they were using. I got to ask questions like:
Not that a lot of my circling time was skeptical of people's words, a lot of the time I trusted the people involved to be accurately reporting their experiences. It was just very interesting - when I noticed I didn't feel like someone was honest about some micro-emotion - to have the affordance to stop and request an honest internal report.
It felt like there was a constant tradeoff betw... (read more)
Good posts you might want to nominate in the 2018 Review
I'm on track to nominate around 30 posts from 2018, which is a lot. Here is a list of about 30 further posts I looked at that I think were pretty good but didn't make my top list, in the hopes that others who did get value out of the posts will nominate their favourites. Each post has a note I wrote down for myself about the post.
... (read more)
- Reasons compute may not drive AI capabilities growth
- I don’t know if it’s good, but I’d like it to be reviewed to find out.
- The Principled-Intelligence Hypothesis
- Very interesting hypothesis generation. Unless it’s clearly falsified, I’d like to see it get built on.
- Will AI See Sudden Progress? DONE
- I think this post should be considered paired with Paul’s almost-identical post. It’s all exactly one conversation.
- Personal Relationships with Goodness
- This felt like a clear analysis of an idea and coming up with some hypotheses. I don’t think the hypotheses really captures what’s going on, and most of the frames here seem like they’ve caused a lot of people to do a lot of hurt to themselves, but it seemed like progress in that conversation.
- Are ethical asymmetries from property rights?
- Again, another very intere
I was just re-reading the classic paper Artificial Intelligence as Positive and Negative Factor in Global Risk. It's surprising how well it holds up. The following quotes seem especially relevant 13 years later.
On the difference between AI research speed and AI capabilities speed:
On neural networks:... (read more)
Reviews of books and films from my week with Jacob:
... (read more)
- The Big Short
- Review: Really fun. I liked certain elements of how it displays bad nash equilibria in finance (I love the scene with the woman from the ratings agency - it turns out she’s just making the best of her incentives too!).
- Grade: B
- Spirited Away
- Review: Wow. A simple story, yet entirely lacking in cliche, and so seemingly original. No cliched characters, no cliched plot twists, no cliched humour, all entirely sincere and meaningful. Didn’t really notice that it was animated (while fantastical, it never really breaks the illusion of reality for me). The few parts that made me laugh, made me laugh harder than I have in ages.
- There’s a small visual scene, unacknowledged by the ongoing dialogue, between the mouse-baby and the dust-sprites which is the funniest thing I’ve seen in ages, and I had to rewind for Jacob to notice it.
- I liked how by the end, the team of characters are all a different order of magnitude in size.
- A delightful, well-told story.
- Grade: A+
- Stranger Than Fiction
- Review: This is now my go-to film of someone trying something original and just failing. Filled with new ideas, but none executed well, a
Hypothesis: power (status within military, government, academia, etc) is more obviously real to humans, and it takes a lot of work to build detailed, abstract models of anything other than this that feel as real. As a result people who have a basic understanding of a deep problem will consistently attempt to manoeuvre into powerful positions vaguely related to the problem, rather than directly solve the open problem. This will often get defended with "But even if we get a solution, how will we implement it?" without noticing that (a) there is no real effort by anyone else to solve the problem and (b) the more well-understood a problem is, the easier it is to implement a solution.
I'd take a bet at even odds that it's single-digit.
To clarify, I don't think this is just about grabbing power in government or military. My outside view of plans to "get a PhD in AI (safety)" seems like this to me. This was part of the reason I declined an offer to do a neuroscience PhD with Oxford/DeepMind. I didn't have any secret for why it might be plausibly crucial.
There's a game for the Oculus Quest (that you can also buy on Steam) called "Keep Talking And Nobody Explodes".
It's a two-player game. When playing with the VR headset, one of you wears the headset and has to defuse bombs in a limited amount of time (either 3, 4 or 5 mins), while the other person sits outside the headset with the bomb-defusal manual and tells you what to do. Whereas with other collaboration games, you're all looking at the screen together, with this game the substrate of communication is solely conversation, the other person is providing all of your inputs about how their half is going (i.e. not shown on a screen).
The types of puzzles are fairly straightforward computational problems but with lots of fiddly instructions, and require the outer person to figure out what information they need from the inner person. It often involves things like counting numbers of wires of a certain colour, or remembering the previous digits that were being shown, or quickly describing symbols that are not any known letter or shape.
So the game trains you and a partner in efficiently building a shared language for dealing with new problems.
More than that, as the game gets harder, often... (read more)
I talked with Ray for an hour about Ray's phrase "Keep your beliefs cruxy and your frames explicit".
I focused mostly on the 'keep your frames explicit' part. Ray gave a toy example of someone attempting to communicate something deeply emotional/intuitive, or perhaps a buddhist approach to the world, and how difficult it is to do this with simple explicit language. It often instead requires the other person to go off and seek certain experiences, or practise inhabiting those experiences (e.g. doing a little meditation, or getting in touch with their emotion of anger).
Ray's motivation was that people often have these very different frames or approaches, but don't recognise this fact, and end up believing aggressive things about the other person e.g. "I guess they're just dumb" or "I guess they just don't care about other people".
I asked for examples that were motivating his belief - where it would be much better if the disagreers took to hear the recommendation to make their frames explicit. He came up with two concrete examples:
... (read more)
- Jim v Ray on norms for shortform, where during one hour they worked through the same reasons
I find "keep everything explicit" to often be a power move designed to make non-explicit facts irrelevant and non-admissible. This often goes along with burden of proof. I make a claim (real example of this dynamic happening, at an unconference under Chatham house rules: That pulling people away from their existing community has real costs that hurt those communities), and I was told that, well, that seems possible, but I can point to concrete benefits of taking them away, so you need to be concrete and explicit about what those costs are, or I don't think we should consider them.
Thus, the burden of proof was put upon me, to show (1) that people central to communities were being taken away, (2) that those people being taken away hurt those communities, (3) in particular measurable ways, (4) that then would impact direct EA causes. And then we would take the magnitude of effect I could prove using only established facts and tangible reasoning, and multiply them together, to see how big this effect was.
I cooperated with this because I felt like the current estimate of this cost for this person was zero, and I could easily raise that, and that was better than nothing,... (read more)
To complement that: Requiring my interlocutor to make everything explicit is also a defence against having my mind changed in ways I don't endorse but that I can't quite pick apart right now. Which kinda overlaps with your example, I think.
I sometimes will feel like my low-level associations are changing in a way I'm not sure I endorse, halt, and ask for something that the more explicit part of me reflectively endorses. If they're able to provide that, then I will willingly continue making the low-level updates, but if they can't then there's a bit of an impasse, at which point I will just start trying to communicate emotionally what feels off about it (e.g. in your example I could imagine saying "I feel some panic in my shoulders and a sense that you're trying to control my decisions"). Actually, sometimes I will just give the emotional info first. There's a lot of contextual details that lead me to figure out which one I do.
One last bit is to keep in mind that most (or, many things), can be power moves.
There's one failure mode, where a person sort of gives you the creeps, and you try to bring this up and people say "well, did they do anything explicitly wrong?" and you're like "no, I guess?" and then it turns out you were picking up something important about the person-giving-you-the-creeps and it would have been good if people had paid some attention to your intuition.
There's a different failure mode where "so and so gives me the creeps" is something you can say willy-nilly without ever having to back it up, and it ends up being it's own power move.
I do think during politically charged conversations it's good to be able to notice and draw attention to the power-move-ness of various frames (in both/all directions)
(i.e. in the "so and so gives me the creeps" situation, it's good to note both that you can abuse "only admit explicit evidence" and "wanton claims of creepiness" in different ways. And then, having made the frame of power-move-ness explicit, talk about ways to potentially alleviate both forms of abuse)
I'd been working on a sequence explaining this all in more detail (I think there's a lot of moving parts and inferential distance to cover here). I'll mostly respond in the form of "finish that sequence."
But here's a quick paragraph that more fully expands what I actually believe:
... (read more)
- If you're building a product with someone (metaphorical product or literal product), and you find yourself disagreeing, and you explain "This is important because X, which implies Y", and they say "What!? But, A, therefore B!" and then you both keep repeating those points over and over... you're going to waste a lot of time, and possibly build a confused frankenstein product that's less effective than if you could figure out how to successfully communicate.
- In that situation, I claim you should be doing something different, if you want to build a product that's actually good.
- If you're not building a product, this is less obviously important. If you're just arguing for fun, I dunno, keep at it I guess.
- A separate, further claim is that the reason you're miscommunicating is because you have a bunch of hidden assumptions in yo
This is the bit that is computationally intractable.
Looking for cruxes is a healthy move, exposing the moving parts of your beliefs in a way that can lead to you learning important new info.
However, there are an incredible number of cruxes for any given belief. If I think that a hypothetical project should accelerate it's development time 2x in the coming month, I could change my mind if I learn some important fact about the long-term improvements of spending the month refactoring the entire codebase; I could change my mind if I learn that the current time we spend on things is required for models of the code to propagate and become common knowledge in the staff; I could change my mind if my models of geopolitical events suggest that our industry is going to tank next week and we should get out immediately.
Live a life worth leaving Facebook for.
Sometimes a false belief about a domain can be quite damaging, and a true belief can be quite valuable.
For example, suppose there is a 1000-person company. I tend to think that credit allocation for the success of the company is heavy tailed, and that there's typically 1-3 people who the company just would zombify and die without, and ~20 people who have the key context and understanding that the 1-3 people can work with to do new and live things. (I'm surely oversimplifying because I've not ever been on the inside with a 1000-person company.) In this situation it's very valuable to know who the people are who deserve the credit allocation. Getting the wrong 1-3 people is a bit of a disaster. This means that discussing it, raising hypotheses, bringing up bad arguments, bringing up arguments due to motivated cognition, and so on, can be unusually costly, and conversations about it can feel quite fraught.
Other fraught topics include breaking up romantically, quitting your job, leaving a community club or movement. I think taboo tradeoffs have a related feeling, like bringing up whether to lie in a situation, whether to cheat in a situation, or when to exchange money for values ... (read more)
I'm thinking about the rigor of alternating strategies. Here are three examples.
... (read more)
- Forward-Chaining vs Backward-Chaining
- To be rich, don't marry for money. Surround yourself by rich people and marry for love. But be very strict about not letting poor people into your environment.
- Scott Garrabrant's once described his Embedded Agency research to me as the most back-chaining in terms of the area of work, and the most forward-chaining within that area. Often quite unable to justify what he's working on in the short-term (e.g. 1, 2, 3) yet can turn out to be very useful later on (e.g. 1).
- Optimism vs Pessimism
- Successful startup founders build a vision they feel incredible optimism and excitement about and are committed to making happen, yet falsify it as quickly as possible by building a sh*tty MVP and putting it in front of users, because you're probably wrong and the customer will show you what they want. Another name is "Vision vs Falsification".
- Finding vs Avoiding (Needles in Haystacks)
- Some work is about finding the needle, and some is about mining hay whilst ensuring that you avoid 100% of needles.
- For example when trying to build a successful Fusion Power Generator, most things yo
Trying to think about building some content organisations and filtering systems on LessWrong. I'm new to a bunch of the things I discuss below, so I'm interested in other people's models of these subjects, or links to sites that solve the problems in different ways.
So, one problem you might try to solve is that people want to see all of a thing on a site. You might want to see all the posts on reductionism on LessWrong, or all the practical how-to guides (e.g. how to beat procrastination, Alignment Research Field Guide, etc), or all the literature reviews on LessWrong. And so you want people to help build those pages. You might also want to see all the posts corresponding to a certain concept, so that you can find out what that concept refers to (e.g. what is the term "goodhart's law" or "slack" or "mesa-optimisers" etc).
Another problem you might try to solve, is that while many users are interested in lots of the content on the site, they have varying levels of interest in the different topics. Some people are mostly interested in the posts on big picture historical narratives, and less so on models of one's own mind that help with dealing with emotions and trauma. Som... (read more)
I block all the big social networks from my phone and laptop, except for 2 hours on Saturday, and I noticed that when I check Facebook on Saturday, the notifications are always boring and not something I care about. Then I scroll through the newsfeed for a bit and it quickly becomes all boring too.
And I was surprised. Could it be that, all the hype and narrative aside, I actually just wasn’t interested in what was happening on Facebook? That I could remove it from my life and just not really be missing anything?
On my walk home from work today I realised that this wasn’t the case. Facebook has interesting posts I want to follow, but they’re not in my notifications. They’re sparsely distributed in my newsfeed, such that they appear a few times per week, randomly. I can get a lot of value from Facebook, but not by checking once per week - only by checking it all the time. That’s how the game is played.
Anyway, I am not trading all of my attention away for such small amounts of value. So it remains blocked.
I've found Facebook absolutely terrible as a way to both distribute and consume good content. Everything you want to share or see is just floating in the opaque vortex of the f%$&ing newsfeed algorithm. I keep Facebook around for party invites and to see who my friends are in each city I travel too, I disabled notifications and check the timeline for less than 20 minutes each week.
OTOH, I'm a big fan of Twitter. (@yashkaf) I've curated my feed to a perfect mix of insightful commentary, funny jokes, and weird animal photos. I get to have conversations with people I admire, like writers and scientists. Going forward I'll probably keep tweeting, and anything that's a fit for LW I'll also cross-post here.
Reading this post, where the author introspects and finds a strong desire to be able to tell a good story about their career, suggests that a way of understanding how people will make decisions will be heavily constrained by the sorts of stories about your career that are definitely common knowledge.
I remember at the end of my degree, there was a ceremony where all the students dressed in silly gowns and the parents came and sat in a circular hall while we got given our degrees and several older people told stories about how your children have become men and women, after studying and learning so much at the university.
This was a dumb/false story, because I'm quite confident the university did not teach these people most important skills for being an adult, and certainly my own development was largely directed by the projects I did on my own dime, not through much of anything the university taught.
But everyone was sat in a circle, where they could see each other listen to the speech in silence, as though it were (a) important and (b) true. And it served as a coordination mechanism, saying "If you go into the world and tell people that your child came to university and gre... (read more)
At the SSC Meetup tonight in my house, I was in a group conversation. I asked a stranger if they'd read anything interesting on the new LessWrong in the last 6 months or so (I had not yet mentioned my involvement in the project). He told me about an interesting post about the variance in human intelligence compared to the variance in mice intelligence. I said it was nice to know people read the posts I write. The group then had a longer conversation about the question. It was enjoyable to hear strangers tell me about reading my posts.
I've finally moved into a period of my life where I can set guardrails around my slack without sacrificing the things I care about most. I currently am pushing it to the limit, doing work during work hours, and not doing work outside work hours. I'm eating very regularly, 9am, 2pm, 7pm. I'm going to sleep around 9-10, and getting up early. I have time to pick up my hobby of classical music.
At the same time, I'm also restricting the ability of my phone to steal my attention. All social media is blocked except for 2 hours on Saturday, whi... (read more)
Why has nobody noticed that the OpenAI logo is three intertwined paperclips? This is an alarming update about who's truly in charge...
I think of myself as pretty skilled and nuanced at introspection, and being able to make my implicit cognition explicit.
However, there is one fact about me that makes me doubt this severely, which is that I have never ever ever noticed any effect from taking caffeine.
I've never drunk coffee, though in the past two years my housemates have kept a lot of caffeine around in the form of energy drinks, and I drink them for the taste. I'll drink them any time of the day (9pm is fine). At some point someone seemed shocked that I was about to drink one a... (read more)
I think I've been implicitly coming to believe that (a) all people are feeling emotions all the time, but (b) people vary in how self-aware they are of these emotions.
Does anyone want to give me a counter-argument or counter-evidence to this claim?
Hot take: The actual resolution to the simulation argument is that most advanced civilizations don't make loads of simulations.
Two things make this make sense:
... (read more)
- Firstly, it only matters if they make unlawful simulations. If they make lawful simulations, then it doesn't matter whether you're in a simulation or a base reality, all of your decision theory and incentives are essentially the same, you want to take the same decisions in all of the universes. So you can make lots of lawful simulations, that's fine.
- Secondly, they will strategically choose to not mak
I think in many environments I'm in, especially with young people, the fact that Paul Graham is retired with kids sounds nice, but there's an implicit acknowledgement that "He could've chosen to not have kids and instead do more good in the world, and it's sad that he didn't do that". And it reassures me to know that Paul Graham wouldn't reluctantly agree. He'd just think it was wrong.
Sometimes I get confused between r/ssc and r/css.
When I’m trying to become skillful in something, I often face a choice about whether to produce better output, or whether to bring my actions more in-line with my soul.
For instance, sometimes when I’m practicing a song on the guitar, I will sing it in a way where the words feel true to me.
And sometimes, I will think about the audience, and play in a way that is reliably a good experience for them (clear melody, reliable beat, not too irregular changes in my register, not moving in a way that is distracting, etc).
Something I just noticed is that it is somet... (read more)
I am still confused about moral mazes.
I understand that power-seekers can beat out people earnestly trying to do their jobs. In terms of the Gervais Principle, the sociopaths beat out the clueless.
What I don't understand is how the culture comes to reward corrupt and power-seeking behavior.
One reason someone said to me is that it's in the power-seekers interest to reward other power-seekers.
Is that true?
I think it's easier for them to beat out the earnest and gullible clueless people.
However, there's probably lots of ways that their sociopathic underlings ... (read more)
Striking paragraph by a recent ACX commenter (link):
Something I've thought about the existence of for years, but imagined was impossible: this 70s song by Italian Adriano Celentano. It fully registers to my mind as English. But it isn't. It's like skimming the output of GPT-2.
I've been thinking lately that picturing an AI catastrophe is helped a great deal by visualising a world where critical systems in society are performed by software. I was spending a while trying to summarise and analyse Paul's "What Failure Looks Like", which lead me this way. I think that properly imagining such a world is immediately scary, because software can deal with edge cases badly, like automated market traders causing major crashes, so that's already a big deal. Then you add ML in, and can talk about how crazy it is to hand critical systems over... (read more)