I certainly buy that as an argument, but don’t know that it’s obviously worth prioritizing before checking that anyone actively cared about it positively. Lots of posts are bad, you can’t cover all of them.
Posts also need at least 2 positive votes to get into the Review Phase, so you can wait to see if it seems overrated before putting the effort into a negative review. (although if you think it was already overhyped and want to correct the record anyway, that sound fine too)
Oh, this actually feels related to my relational-stance take on this. When I decide to trust a friend, colleague or romantic partner, I'm giving them the power to hurt me in some way (possibly psychologically). There's practical versions of this, but part of it is something like "we are choosing to be the sort of people who going to share secrets and vulnerabilities to each other."
Fwiw I just think it's fine to get karma for sharing linkposts – you're doing a valuable service if you're sharing useful information. I don't know of other forums that draw a distinction between linkposts and regular posts in terms of where they show up.
It makes sense that it feels a bit weird, but given our limited dev time I think I'd mostly recommend feeling more free to do linkposts as they currently are (marking down who the author is in the title, so people can see what's going on)
This feels kinda straw-vulcany, sort of missing the point about what people are often using trust for.
I'm not actually sure what trust is, but when I imagine people saying the sentences at the beginning, at least 35% and maybe 75% of what's going on is more about managing a relational stance, i.e something like "do you respect me?".
I do expect you'll followup with "yeah, I am also just not down to respect people the particular way they want to be respected."
So a major part of how I handle this sort of thing is usually conveying somehow "I don't...
The hardest part here is ensuring that whoever we hire can actually work self-directedly, without constant management. We've spent 3 years trying to make books efficiently and not succeeded yet, which I think is making us more risk-averse to trying again (although I do have some ideas on how to do it)
I think if someone who had previously made a particularly great HPMOR, SlatestarCodex or Sequences custom book, has good project-management skills, overall good aesthetic taste, and is proficient with both AI art and reworking essay diagrams that were low res to be printable resolution...
...I'd at least personally be pretty interested in hiring that person if they seemed to clearly demonstrate all the skills.
Oh yeah I'm pretty easily sold on "Actually it's just more like $200k" for reasons you cite, although it gets into more intangibles that are harder to quantify. ($200k seems more likely to be "our Cheerful Price", but I suspect if we got a a $40k donation we'd consider it more strongly anyway, in part because it was an indication someone thought it was that valuable)
Note you can use the tag-filters to filter out AI or otherwise adjust the topics in your Latest feed.
It’s historically been a couple months of salary time + a bunch of intermittent work over the course of the year. I think it’s at least $20k and plausibly like $40k. Plus the actual team time not being able to be spent on other things. (The books get sold at cost so this money is a cost to the org)
We tried hiring a bookmaker last year which didn’t work out. The hiring process was also pretty costly.
I think the actual cost is more like ‘do the headhunting to find someone who’d do a great job’.
This seems useful to be flagged as a review, so it shows up in some review UI later. Mind if I convert it?
(You can create reviews by clicking the Review button at the top of the post)
I think the practice that'd probably make most to me is just reporting the average for each thing, without making much of a claim about what it meant.
(fyi, it looks like the overall outcome here is pretty good, i.e. 46% of scholars getting a 9 or 10 seems significant. But, the framing of the overview-section at the beginning feels like it's trying to oversell me on something)
I hadn't thought about the specific use-case of scholar support allowing people to get help with weaknesses without having to trust that evaluators would consider those weaknesses fairly. I found that an interesting new gear.
(I think I had had some version of the air gapping evaluation from information gathering concept, but I hadn't read your previous post on it, nor thought about applying it in this particular context)
I think an ideal world somehow makes it true, and credibly communicates that it's true, that evaluators are can be trusted to have this so...
"Reverse MATS"?
(I think I agree that "co-MATS" is in some sense a more accurate description of what's going on, but Reverse MATS feels like it gets the idea across better at first glance)
Mentors rated 18% of scholar research projects as 10/10 and 28% as 9/10.
That does sound like a pretty good actual numbers for 9 and 10, although I'm confused about how it maps onto the graph:
- 10/10 = Very disappointed if [the research] didn't continue;
- 5/10 = On the fence, unsure what the right call is;
- 1/10 = Fine if research doesn't continue.
fwiw that's actually not that cruxy for me – questions like this are typically framed as if a 5 is "average", but my understanding/experience is that people still tend to give somewhat inflated scores.
(i.e. the NPS score, "on a scale of 1-10 how likely are you to recommend this to a friend?" ranking system counts 9 and 10 as positive, 7 and 8 as neutral, and 6-and-below as negative. This is a differen...
Yes, but I’m drawing a line from ‘MIRI dialogues’ through Death With Dignity and modeling Eliezer generally and I think the line just points roughly at the Time piece without FTX.
Eliezer started talking about high P(doom) around the Palmcone which I think was more like peak FTX hype. And it seemed like his subsequent comms were part of a trend that began with the MIRI dialogues the year before. I’d bet against FTX collapse being that causal at least for him.
I don't think so.
My current best guess of what's going on is a mix of:
Maybe to elaborate: I had a lot of neurotypical friends, and a lot of autistic friends, and barely any of them have ever called me up years later to talk if we didn’t have some kind of social context. It seems like this is not a thing people do very often.
Maybe to elaborate: I had a lot of neurotypical friends, and a lot of autistic friends, and barely any of them have ever called me up years later to talk if we didn’t have some kind of social context. It seems like this is not a thing people do very often.
You're the one who asked "why did Screwtape invent his own terminology", but I don't know what words you think there was an existing terminology for. From my perspective you're the one who didn't include terms.
I don’t know which terms you didn’t understand and which terms you’re advocating replacing them with.
A claim I've heard habryka make before (I don't know myself) is that there are actual rules to the kind of vague-deception that goes on in DC. And something like, while it's a known thing that a politician will say "we're doing policy X" when they don't end up doing policy X, if you misrepresent who you're affiliated with, this is an actual norm violation. (i.e. it's lying about the Simulacrum 3 level, which is the primary level in DC)
I think I liked the first half of this article a lot, and thought the second half didn't quite flesh it out with clear enough examples IMO. I like that it spells out the problem well though.
One note:
Melting all the GPUs and then shutting down doesn't actually count, I think (and I don't think was intended to be the original example). Then people would just build more GPUs. It's an important part of the problem that the system continues to melt all GPUs (at least until some better situation is achieved), and that the part where the world is like "hey, holy hell, I was using those GPUs" and tries to stop the system, is somehow resolved (either by having world governments bought into the solution, or having the system be very resistant to being stopped).
(Notably, you do eventually need to be able to stop the system somehow when you do know how to build aligned AIs so you don't lose all most of the value of the future)
fwiw, while the end of Ants and Grasshopper was really impactful to me, I did feel like the the first half was "worth the price of admission". (Though yeah, this selkie story didn't accomplish that for me). I can imagine an alt ending to the grasshopper one that focused on "okay, but, like, literally today right now, what I do with all these people who want resources from me that I can't afford to give?".
lol at the spellchecker choking on "Rumpelstiltskin" and not offering any alternate suggestions.
Yeah as I was writing it I realized "eh, okay it's not exactly AI, it's... transhumanism broadly?" but then I wasn't actually sure what cluster I was referring to and figured AI was still a reasonable pointer.
I also did concretely wonder "man, how is he going to pack an emotional punch sticking to this agency/decision-theory theme?". So, lol at that.
An idea fragment that just came to me is to showcase how the decision-theory applies to a lot of different situations, some of which are transhuman, but not in an escalating way, such that it feels like the who
Spoiler response:
Man I started reading as was like "Wait, is this one still a metaphor for AI, or is it just actually about Selkies?". Halfway in, I was like "oh cool, this is kind about different ways of conceptualizing agency/decision-theory-ish-stuff, which is AI-adjacent while also kind of it's own topic, I like this variety while still sticking to some kind of overarching theme of 'parables about AI-adjacent philosophy."
Then I got 2/3rds in and was like "oh lol, it just totally is about AI again." I do think the topic and story here were good/important things that I could use help thinking through, although part of me is sad it didn't somehow go in a different direction.
I mean there’s also like ‘regular ol’ (possibly subtle) dystopia?’ Like, it might also be a weirdtopia but it doesn’t seem necessary in the above description. (I interpret weirdtopia to mean ‘actually good, overall, but in a way that feels horrifying or strange’. If the replacements for friendship etc aren’t actually good, it might just be bad)
Curated.
It's still unclear to me how well interpretability can scale and solve the core problems in superintelligence alignment, but this felt like a good/healthy incremental advance. I appreciated the exploration of feature splitting, beginnings of testing for universality, and discussion of the team's update against architectural approaches. I found this remark at the end interesting:
...Finally, we note that in some of these expanded theories of superposition, finding the "correct number of features" may not be well-posed. In others, there is a true n
This reminds me of my pet crusade that Abstracts should be either Actually Short™, or broken into paragraphs
I think the difference between surprise and confusion is that surprise is when something-with-low-probability happens, and confusion is when something happens that my model can't explain. They sometimes (often) overlap (i.e. if lightning strikes, I'm surprised because that doesn't usually happen, but I'm not confused)
look around the room and attempt to produce three instances of something resembling tiny quiet confusion (or louder than that if it's available)
I hadn't done this particular exercise. I just tried it now and had some little microconfusio...
it's really interesting and valuable to see my thoughts contextualized from the outside and narrativized. It's usually hard for me to see forests when I'm surrounded by trees.
Curious if there are more bits of which tree/forest shifts stood out, or what felt valuable. No pressure if that feels weird.
I'm really curious how you relate to this claim six years later.
Well, right before you asking this question I think I'd have said "still seems fairly true to me." (I meant the claim to mean "in the wild, where you're in the middle of a bunch of other stuff." Having cultivated it a moderate amount, I think I notice it maybe a couple times a week? I think while I'm, say, doing a Thinking Physics problem, or actively trying to think through a real problem, there's more opportunities, but it's a different style of thing than I thought I meant at the time.)
But, now that you've asked the question I'm all second-guessing myself. :P
(Flagging it’s still technically required to get 2 positive votes to proceed to review phase)