All of Raemon's Comments + Replies

(Flagging it’s still technically required to get 2 positive votes to proceed to review phase)

I certainly buy that as an argument, but don’t know that it’s obviously worth prioritizing before checking that anyone actively cared about it positively. Lots of posts are bad, you can’t cover all of them.

2Ben Pace10h
I have the experience when voting in the review that I don't vote on most posts, and my negative votes only go on importantly bad posts. Empirically I don't expect people will downvote anywhere near all of ~4500 posts written in 2022, and I think the 10-20 that people will downvote have a ~100x chance relative to baseline of being worth reviewing. (Perhaps 100x is a bit strong but 30x seems reasonable to me.)

Posts also need at least 2 positive votes to get into the Review Phase, so you can wait to see if it seems overrated before putting the effort into a negative review. (although if you think it was already overhyped and want to correct the record anyway, that sound fine too)

4Ben Pace10h
I discussed this with Oli and he argued that negative votes were also typically strong evidence that the post was worth reviewing. I was persuaded by the argument that it means someone out of their way to say "there is something actively bad about this post and I really think it should not get a high score", and that probably means that a review of what's bad about the post would be worthwhile to read (that I would learn something interesting or valuable from it).

Oh, this actually feels related to my relational-stance take on this. When I decide to trust a friend, colleague or romantic partner, I'm giving them the power to hurt me in some way (possibly psychologically). There's practical versions of this, but part of it is something like "we are choosing to be the sort of people who going to share secrets and vulnerabilities to each other."

2Sune1d
This is also just another way of saying “willing to be vulnerable” (from my answer below) or maybe “decision to be vulnerable”. Many of these answers are just saying the same thing in different words.

Fwiw I just think it's fine to get karma for sharing linkposts – you're doing a valuable service if you're sharing useful information. I don't know of other forums that draw a distinction between linkposts and regular posts in terms of where they show up. 

It makes sense that it feels a bit weird, but given our limited dev time I think I'd mostly recommend feeling more free to do linkposts as they currently are (marking down who the author is in the title, so people can see what's going on)

2Yoav Ravid1d
My main aversion is that I don't want them to drown out my own posts on my user page. 

This feels kinda straw-vulcany, sort of missing the point about what people are often using trust for.

I'm not actually sure what trust is, but when I imagine people saying the sentences at the beginning, at least 35% and maybe 75% of what's going on is more about managing a relational stance, i.e something like "do you respect me?". 

I do expect you'll followup with "yeah, I am also just not down to respect people the particular way they want to be respected." 

So a major part of how I handle this sort of thing is usually conveying somehow "I don't... (read more)

5Gordon Seidoh Worley7h
I've been really frustrated in the past with folks who equate trust with respect. My ex-wife frequently complained that I didn't trust her. Why? Because she'd ask me to do something and, rather than simply do it, I'd ask why. Most of the time I was just curious (can you imagine? someone who posts on LessWrong was curious?) and wanted to know more, but she read it as me distrusting and thus not respecting or being committed to her. Mostly I just select myself out of relationships with people who are like this now. The flip side of this, though, is that there's a part of my life where I do things without asking why, which is as part of my Zen practice. Our rituals exist because someone created them, but the intent is intentionally not communicated. This is to create the experience of not knowing why you do something and having to live with not knowing. You might eventually come up with your own reason for why you do a particular ritual, but then that's something you added that you can explore rather than something you've taken as given from a teacher, senior student, etc. For example, new people often ask why we bow. The answer: because that's the ritual. If bowing is the mean something, it's up to you to figure out what it means.

We did some experiments with Community Notes esques things recently, although I'm not sure how it worked out. @kave ?

5kave2d
I ran some experiments using only the core of the Community Notes algorithm, taking votes to be helpfulness ratings. I didn't get anything super interesting out of it, though I might have had implementation bugs. The top posts according to the model seemed fine, and then I didn't allocate much time to poking around at it any more.

(It’s hard to price 4-book sets at this scale of printing at a price that makes sense)

The sales are at cost and don’t make money on net.

5Raemon2d
(It’s hard to price 4-book sets at this scale of printing at a price that makes sense)

The hardest part here is ensuring that whoever we hire can actually work self-directedly, without constant management. We've spent 3 years trying to make books efficiently and not succeeded yet, which I think is making us more risk-averse to trying again (although I do have some ideas on how to do it)

I think if someone who had previously made a particularly great HPMOR, SlatestarCodex or Sequences custom book, has good project-management skills, overall good aesthetic taste, and is proficient with both AI art and reworking essay diagrams that were low res to be printable resolution...

...I'd at least personally be pretty interested in hiring that person if they seemed to clearly demonstrate all the skills.

Oh yeah I'm pretty easily sold on "Actually it's just more like $200k" for reasons you cite, although it gets into more intangibles that are harder to quantify. ($200k seems more likely to be "our Cheerful Price", but I suspect if we got a a $40k donation we'd consider it more strongly anyway, in part because it was an indication someone thought it was that valuable)

2Adam Jermyn2d
I'm guessing that the sales numbers aren't high enough to make $200k if sold at plausible markups?

Note you can use the tag-filters to filter out AI or otherwise adjust the topics in your Latest feed.

It’s historically been a couple months of salary time + a bunch of intermittent work over the course of the year. I think it’s at least $20k and plausibly like $40k. Plus the actual team time not being able to be spent on other things. (The books get sold at cost so this money is a cost to the org)

We tried hiring a bookmaker last year which didn’t work out. The hiring process was also pretty costly.

I think the actual cost is more like ‘do the headhunting to find someone who’d do a great job’.

8Raemon3d
The hardest part here is ensuring that whoever we hire can actually work self-directedly, without constant management. We've spent 3 years trying to make books efficiently and not succeeded yet, which I think is making us more risk-averse to trying again (although I do have some ideas on how to do it) I think if someone who had previously made a particularly great HPMOR, SlatestarCodex or Sequences custom book, has good project-management skills, overall good aesthetic taste, and is proficient with both AI art and reworking essay diagrams that were low res to be printable resolution... ...I'd at least personally be pretty interested in hiring that person if they seemed to clearly demonstrate all the skills.
8Ben Pace3d
Briefly registering disagreement: my first thought was an order of magnitude higher than yours.  Brief sketch of my reasoning: Losing a staff member for 1-2 months really cuts out our ability to maintain the infrastructure we have responsibility for (like Lighthaven and Lightspeed grants and LW) while running at the organizational top priority — right now that's dialogues — and we're already stretched thin with only 2 people working on the top-priority full-time who don't have any side commitments (plus 2 other people working on it as their main focus but with side commitments). I've not got a definite sense of how we'd rearrange, but I can see worlds where it would cut our focus on the top priority by as much as 30% during that period, and that's not just the cost measured in the staff member's time, but reduces the value of everyone's time in a big way.

This seems useful to be flagged as a review, so it shows up in some review UI later. Mind if I convert it?

(You can create reviews by clicking the Review button at the top of the post)

Nope. Dunno what happened.

I think the practice that'd probably make most to me is just reporting the average for each thing, without making much of a claim about what it meant. 

That does update me a bit. 

(fyi, it looks like the overall outcome here is pretty good, i.e. 46% of scholars getting a 9 or 10 seems significant. But, the framing of the overview-section at the beginning feels like it's trying to oversell me on something)

1Ryan Kidd6d
Do you think "46% of scholar projects were rated 9/10 or higher" is better? What about "scholar projects were rated 8.1/10 on average" ?

I hadn't thought about the specific use-case of scholar support allowing people to get help with weaknesses without having to trust that evaluators would consider those weaknesses fairly. I found that an interesting new gear.

(I think I had had some version of the air gapping evaluation from information gathering concept, but I hadn't read your previous post on it, nor thought about applying it in this particular context)

I think an ideal world somehow makes it true, and credibly communicates that it's true, that evaluators are can be trusted to have this so... (read more)

"Reverse MATS"?

(I think I agree that "co-MATS" is in some sense a more accurate description of what's going on, but Reverse MATS feels like it gets the idea across better at first glance)

1mattmacdermott6d
Oops, thanks, I’ve changed it to Reverse MATS to avoid confusion.

Mentors rated 18% of scholar research projects as 10/10 and 28% as 9/10.

That does sound like a pretty good actual numbers for 9 and 10, although I'm confused about how it maps onto the graph:

1Ryan Kidd6d
Yeah, I just realized the graph is wrong; it seems like the 10/10 scores were truncated. We'll upload a new graph shortly.
  • 10/10 = Very disappointed if [the research] didn't continue;
  • 5/10 = On the fence, unsure what the right call is;
  • 1/10 = Fine if research doesn't continue.

fwiw that's actually not that cruxy for me – questions like this are typically framed as if a 5 is "average", but my understanding/experience is that people still tend to give somewhat inflated scores. 

(i.e. the NPS score, "on a scale of 1-10 how likely are you to recommend this to a friend?" ranking system counts 9 and 10 as positive, 7 and 8 as neutral, and 6-and-below as negative. This is a differen... (read more)

5Neel Nanda6d
For what it's worth, as a MATS mentor, I gave a bunch of 7s and 8s for people I'm excited about, and felt bad giving people 9s or 10s unless it was super obviously justified
3Ryan Kidd6d
FYI, the Net Promoter score is 38%.
2Raemon6d
(fyi, it looks like the overall outcome here is pretty good, i.e. 46% of scholars getting a 9 or 10 seems significant. But, the framing of the overview-section at the beginning feels like it's trying to oversell me on something)
2Raemon6d
That does sound like a pretty good actual numbers for 9 and 10, although I'm confused about how it maps onto the graph:

Yes, but I’m drawing a line from ‘MIRI dialogues’ through Death With Dignity and modeling Eliezer generally and I think the line just points roughly at the Time piece without FTX.

Eliezer started talking about high P(doom) around the Palmcone which I think was more like peak FTX hype. And it seemed like his subsequent comms were part of a trend that began with the MIRI dialogues the year before. I’d bet against FTX collapse being that causal at least for him.

2habryka7d
I don't think Death with Dignity or the List O' Doom posts are at all FTX-collapse related. I am talking about things like the Time piece.

It won’t affect your own karma. I’m not sure offhand about coauthor.

Neato, this was a clever use of LLMs.

I don't think so

My current best guess of what's going on is a mix of:

  • It's actually fairly cognitively demanding for me to play at my peak. When I do beat level 3 with full health, I typically feel like my brain just overclocked itself. So I think during normal play, I start out playing "medium hard", and if I notice that I'm losing I start burning more cylinders or something. And if I start off playing quite hard, I get kinda tired by level 3.
  • But also, there's a survivorship bias of "sometimes if I've taken damage I just give up and start over", which may mean I'm forming an incorrect impression of how well I'd have done.

Maybe to elaborate: I had a lot of neurotypical friends, and a lot of autistic friends, and barely any of them have ever called me up years later to talk if we didn’t have some kind of social context. It seems like this is not a thing people do very often.

2lc24d
I've had two people do this to me. It didn't register to me what they were doing at the time. But I also have a kind of different friend group than most rationalists.

I think it’s not just an autism thing but something of an atomic modernity thing.

Maybe to elaborate: I had a lot of neurotypical friends, and a lot of autistic friends, and barely any of them have ever called me up years later to talk if we didn’t have some kind of social context. It seems like this is not a thing people do very often.

3Shadowslacker24d
Thank you so much, that was driving me up the wall. Have a great day!

You're the one who asked "why did Screwtape invent his own terminology", but I don't know what words you think there was an existing terminology for. From my perspective you're the one who didn't include terms.

-7M. Y. Zuo1mo

I don’t know which terms you didn’t understand and which terms you’re advocating replacing them with.

-24M. Y. Zuo1mo

I think this part of HPMOR predates CFAR?

A claim I've heard habryka make before (I don't know myself) is that there are actual rules to the kind of vague-deception that goes on in DC. And something like, while it's a known thing that a politician will say "we're doing policy X" when they don't end up doing policy X, if you misrepresent who you're affiliated with, this is an actual norm violation. (i.e. it's lying about the Simulacrum 3 level, which is the primary level in DC)

I think I liked the first half of this article a lot, and thought the second half didn't quite flesh it out with clear enough examples IMO. I like that it spells out the problem well though.

One note:

  • I don't trust an arbitrary uploaded person (even an arbitrary LessWrong reader) to be "wise enough" to actually handle the situation correctly. I do think there are particular people who might do a good enough job.
1Johannes C. Mayer1mo
Thank you for the feedback. That's useful. I agree that you need to be very careful about who you upload. There are less than 10 people I would be really confident in uploading. That point must have been so obvious in my own mind that I forgot to mention it. Depending on the setup I think an additional important property is how resistant the uploaded person is, to going insane. Not because the scan wasn't perfect, or the emulation engine is buggy, but because you would be very lonely (assuming you only upload one person and don't immediately clone yourself) if you run that much faster. And you need to handle some weird stuff about personal identity that comes up naturally, through cloning, simple self-modifications, your program being preempted by another process, changing your running speed, etc.

Melting all the GPUs and then shutting down doesn't actually count, I think (and I don't think was intended to be the original example). Then people would just build more GPUs. It's an important part of the problem that the system continues to melt all GPUs (at least until some better situation is achieved), and that the part where the world is like "hey, holy hell, I was using those GPUs" and tries to stop the system, is somehow resolved (either by having world governments bought into the solution, or having the system be very resistant to being stopped).

(Notably, you do eventually need to be able to stop the system somehow when you do know how to build aligned AIs so you don't lose all most of the value of the future)

5Algon1mo
Yeah, good point.

Oh lol I also just now got the pun.

fwiw, while the end of Ants and Grasshopper was really impactful to me, I did feel like the the first half was "worth the price of admission". (Though yeah, this selkie story didn't accomplish that for me). I can imagine an alt ending to the grasshopper one that focused on "okay, but, like, literally today right now, what I do with all these people who want resources from me that I can't afford to give?".

lol at the spellchecker choking on "Rumpelstiltskin" and not offering any alternate suggestions.

2Richard_Ngo1mo
(I think you're thinking of Spinning Silver not Uprooted btw.)

Yeah as I was writing it I realized "eh, okay it's not exactly AI, it's... transhumanism broadly?" but then I wasn't actually sure what cluster I was referring to and figured AI was still a reasonable pointer.

I also did concretely wonder "man, how is he going to pack an emotional punch sticking to this agency/decision-theory theme?". So, lol at that.

An idea fragment that just came to me is to showcase how the decision-theory applies to a lot of different situations, some of which are transhuman, but not in an escalating way, such that it feels like the who

... (read more)
2Raemon1mo
lol at the spellchecker choking on "Rumpelstiltskin" and not offering any alternate suggestions.

Spoiler response:

Man I started reading as was like "Wait, is this one still a metaphor for AI, or is it just actually about Selkies?". Halfway in, I was like "oh cool, this is kind about different ways of conceptualizing agency/decision-theory-ish-stuff, which is AI-adjacent while also kind of it's own topic, I like this variety while still sticking to some kind of overarching theme of 'parables about AI-adjacent philosophy."

Then I got 2/3rds in and was like "oh lol, it just totally is about AI again." I do think the topic and story here were good/important things that I could use help thinking through, although part of me is sad it didn't somehow go in a different direction.

5Richard_Ngo1mo
Oh, interesting. Glad to hear your take on it. Although personally, I don't actually think of it as being about

I mean there’s also like ‘regular ol’ (possibly subtle) dystopia?’ Like, it might also be a weirdtopia but it doesn’t seem necessary in the above description. (I interpret weirdtopia to mean ‘actually good, overall, but in a way that feels horrifying or strange’. If the replacements for friendship etc aren’t actually good, it might just be bad)

2Mitchell_Porter1mo
This could be a reason for me not to call it a "w-risk". But this also highlights the slippery nature of some of the boundaries here.  My central idea of a w-risk and a weirdtopia, is that it's a world where the beings in it are happy, because it's being optimized/governed according to their values - but those values are not ours, and yet those beings are us, and/or our descendants, after being changed by some process to which we would not have consented beforehand, if we understood its nature.  On the other hand, your definition of weirdtopia could also include futures in which our present values are being satisfied, "but in a way that feels horrifying or strange" if it's described to us in the present. So it might belong to my fourth category - all risks successfully avoided - and yet we-in-the-present would reject it, at least at first. 

Curated. 

It's still unclear to me how well interpretability can scale and solve the core problems in superintelligence alignment, but this felt like a good/healthy incremental advance. I appreciated the exploration of feature splitting, beginnings of testing for universality, and discussion of the team's update against architectural approaches. I found this remark at the end interesting:

Finally, we note that in some of these expanded theories of superposition, finding the "correct number of features" may not be well-posed. In others, there is a true n

... (read more)

I think the difference between surprise and confusion is that surprise is when something-with-low-probability happens, and confusion is when something happens that my model can't explain.  They sometimes (often) overlap (i.e. if lightning strikes, I'm surprised because that doesn't usually happen, but I'm not confused)

look around the room and attempt to produce three instances of something resembling tiny quiet confusion (or louder than that if it's available)

I hadn't done this particular exercise. I just tried it now and had some little microconfusio... (read more)

it's really interesting and valuable to see my thoughts contextualized from the outside and narrativized. It's usually hard for me to see forests when I'm surrounded by trees.

Curious if there are more bits of which tree/forest shifts stood out, or what felt valuable. No pressure if that feels weird.

I'm really curious how you relate to this claim six years later.
 

Well, right before you asking this question I think I'd have said "still seems fairly true to me." (I meant the claim to mean "in the wild, where you're in the middle of a bunch of other stuff." Having cultivated it a moderate amount, I think I notice it maybe a couple times a week? I think while I'm, say, doing a Thinking Physics problem, or actively trying to think through a real problem, there's more opportunities, but it's a different style of thing than I thought I meant at the time.)

But, now that you've asked the question I'm all second-guessing myself. :P

2LoganStrohl1mo
if you wanna second-guess yourself even harder,  1) look around the room and attempt to produce three instances of something resembling tiny quiet confusion (or louder than that if it's available) 2) try to precisely describe the difference between surprise and confusion 3) sketch a taxonomy of confusing experiences and then ask yourself what you might be missing

(I’ll try to followup this weekend, if I fail feel free to ping me again)

Load More