Most of my posts and comments are about AI and alignment. Posts I'm most proud of, which also provide a good introduction to my worldview:
I also created Forum Karma, and wrote a longer self-introduction here.
PMs and private feedback are always welcome.
NOTE: I am not Max Harms, author of Crystal Society. I'd prefer for now that my LW postings not be attached to my full name when people Google me for other reasons, but you can PM me here or on Discord (m4xed) if you want to know who I am.
if you asked me to pick between the CEV of Claude 3 Opus and that of a median human, I think it'd be a pretty close call (I'd probably pick Claude, but it depends on the details of the setup).
This example seems like it is kind of missing the point of CEV in the first place? If you're at the point where you can actually pick the CEV of some person or AI, you've already solved most or all of your hard problems.
Setting aside that picking a particular entity is already getting away from the original formulation of CEV somewhat, the main reason I see to pick a human over Opus is that a median human very likely has morally-relevant-to-other-humans qualia, in ways that current AIs may not.
I realize this is maybe somewhat tangential to the rest of the post, but I think this sort of disagreement is central to a lot of (IMO misplaced) optimism based on observations of current AIs, and implies an unjustifiably high level of confidence in a theory of mind of AIs, by putting that theory on par with a level of confidence that you can justifiably have in a theory of mind for humans. Elaborating / speculating a bit:
My guess is that you lean towards Opus based on a combination of (a) chatting with it for a while and seeing that it says nice things about humans, animals, AIs, etc. in a way that respects those things' preferences and shows a generalized caring about sentience and (b) running some experiments on its internals to see that these preferences are deep or robust in some way, under various kinds of perturbations.
But I think what models (or a median / randomly chosen human) say about these things is actually one of the less important considerations. I am not as pessimistic as, say, Wei Dai about how bad humans currently are at philosophy, but neither the median human nor any AI model that I have seen so far can talk sensibly about the philosophy of consciousness, morality, alignment, etc. nor even really come close. So on my view, outputs (both words and actions) of both current AIs and average humans on these topics are less relevant (for CEV purposes) than the underlying generators of those thoughts and actions.
In humans, we have a combination of (a) knowing a lot about evolution and neuroscience and (b) being humans ourselves. Taken together, these two things bridge the gap of a lot of missing or contentious philosophical knowledge - we don't have to know exactly what qualia are to be pretty confident that other humans have them via introspection + knowing that the generators are (mechanically) very similar. Also, we know that the generators of goodness and sentience in humans generalize well enough, at least from median to >.1%ile humans - for the same reasons (a) and (b) above, we can be pretty confident that the smartest and most good among us feel love, pain, sorrow, etc. in roughly similar ways to everyone else, and being multiple standard deviations (upwards) among humans for smartness and / or goodness (usually) doesn't cause a person to do crazy / harmful things. I don't think we have similarly strong evidence about how AIs generalize even up to that point (let alone beyond).
Not sure where / if you disagree with any of this, but either way, the point is that I think that "I would pick Opus over a human" for anything CEV-adjacent implies a lot more confidence in a philosophy of both human and AI minds than is warranted.
In the spirit of making empirical / falsifiable predictions, a thing that would change my view on this is if AI researchers (or AIs themselves) started producing better philosophical insights about consciousness, metaethics, etc. than the best humans did in 2008, where these insights are grounded by their applicability to and experimental predictions about humans and human consciousness (rather than being self-referential / potentially circular insights about AIs themselves). I don't think Eliezer got everything right about philosophy, morality, consciousness, etc. 15y ago, but I haven't seen much in the way of public writing or discourse that has improved on things since then, and in many ways the quality of discourse has gotten worse. I think it would be a positive sign (but don't expect to see it) if AIs were to change that.
- MIRI was rolling their own metaethics (deploying novel or controversial philosophy) which is not a good idea even if alignment turned out to be not that hard in a technical sense.
What specifically is this referring to? The Mere Goodness sequences?
I read your recent post about not rolling your own metaethics as addressed mostly at current AGI or safety researchers who are trying to build or align AIs today. I had thought what you were saying was that those researchers would be better served by stopping what they are doing with AI research, and instead spend their time carefully studying / thinking about / debating / writing about philosophy and metaethics. If someone asked me, I would point to Eliezer's metaethics sequences (and some of your posts and comments, among others) as a good place to start with that.
I don't think Eliezer got everything right about philosophy, morality, decision theory, etc. in 2008, but I don't know of a better / more accessible foundation, and he (and you) definitely got some important and basic ideas right, which are worth accepting and building on (as opposed to endlessly rehashing or recursively going meta on).
Is your view that it was a mistake to even try writing about metaethics while also doing technical alignment research in 2008? Or that the specific way Eliezer wrote those particular sequences is so bad / mistaken / overconfident, that it's a central example of what you want to caution against with "rolling your own metaethics"? Or merely that Eliezer did not "solve" metaethics sufficiently well, and therefore he (and others) were mistaken to move ahead and / or turn their attention elsewhere? (Either way / regardless, I still don't really know what you are concretely recommending people do instead, even after reading this thread.)
OK yeah, retatrutide is good. (previous / related: The Biochemical Beauty of Retatrutide: How GLP-1s Actually Work, 30 Days of Retatrutide, How To Get Cheap Ozempic. Usual disclaimers, YMMV and this is not medical advice or a recommendation.)
I am not quite overweight enough to be officially eligible for a prescription for tirzepatide or semaglutide, and I wasn't all that interested in them anyway given their (side) effects and mechanism of reducing metabolism.
I started experimenting with a low dose (1-2 mg / week) of grey-market retatrutide about a month ago, after seeing the clinical trial results and all the anecdata about how good it is. For me the metabolic effects were immediate: I get less hungry, feel fuller for longer after eating, and generally have more energy. I am also losing weight effortlessly (a bit less than 1 lb / week, after initially losing some water weight faster at the beginning), which was my original main motivation for trying it. I am hoping to lose another 10-15 lbs or so and then reduce or maintain whatever dose I need to stay at that weight.
The only negative side effects I have experienced so far are a slight increase in RHR (mid-high 60s -> low 70s), and a small / temporary patch of red, slightly itchy skin around the injection site. I work out with weights semi-regularly and haven't noticed much impact on strength one way or the other, nor have I noticed an impact on my sleep quality, which was / is generally good.
I also feel a little bad about benefiting from Eli Lilly's intellectual property without paying them for it, but there's no way for them to legally sell it or me to legally buy it from them right now. Probably when it is approved by the FDA I'll try to talk my way into an actual prescription for it, which I would be happy to pay $1000 / mo or whatever, for both peace of mind and ethical reasons.
(Grey market suppliers seem mostly fine risk-wise; it's not a particularly complicated molecule to manufacture if you're an industrial pharmaceutical manufacturer, and not that hard for independent labs to do QA testing on samples. The main risk of depending on these suppliers is that customs will crack down on importers / distributors and make it hard to get.)
The other risk is that long term use will have some kind of more serious negative side effect or permanently screw up my previously mostly-normal / healthy metabolism in some way, which won't be definitively knowable until longer-term clinical trials have completed. But the benefits I am getting right now are real and large, and carrying a bit less weight is likely to be good for my all-cause mortality even if there are some unknown long term risks. So all things considered it seems worth the risk for me, and not worth waiting multiple years for more clinical trial data.
Looking into all of this has definitely (further) radicalized me against the FDA + AMA and made me more pro-big pharma. The earliest that retatrutide is likely to be approved for prescription use is late 2026 or 2027, and initially it will likely only be approved / prescribed for use by people who are severely overweight, have other health problems, and / or have already tried other GLP-1s.
This seems like a massive waste of QALYs in expectation; there are likely millions of people with more severe weight and metabolism problems than me for whom the immediate benefits of taking reta would outweigh most possible long term risks or side effects. And the extremely long time that it takes to bring these drugs to market + general insanity of the prescription drug market and intellectual property rights for them in various jurisdictions pushes up the price that Lilly has to charge to recoup the development costs, which will hurt accessibility even once it is actually approved.
There was some discussion about heroic responsibility here not too long ago. One aspect / behavior that some people (incorrectly IMO) attribute to heroic responsibility is that it is a justification for deontology violations in order to accomplish whatever goal you have.
My take is that it is more like the opposite: the thing you're describing here is mostly just ordinary high-agency behavior / executive responsibility. Where heroic responsibility comes in is that it says that you're supposed to wield that agency and level of execution continuously and at all levels of meta (using comprehensive / non-naive consequentialism) until the job is actually done. It also includes tracking and recognizing when to give up, and making that call - in your example, maybe this means stepping back and realizing that actually running the kind of ads you were trying to run are not effective or not worth the cost in the first place, or that your car dealership is headed for bankruptcy regardless of what happens with the ads. Furthermore, taking heroic responsibility means that you're obligated to do all this without stepping outside the bounds of deontology or slipping into invalid / motivated reasoning.
I don't have a good sense of what the CCP's ASI policy looks like whether they get B30As or not. But, just looking at how they handled COVID and similar things, one thing that does seem likely either way is that their response will be much more top-down / consistent / coordinated, relative to the US. "Consistent" doesn't necessarily mean sane or good, of course.
So, I am sympathetic to the argument that the CCP could not possibly be worse than the US, but (a) things can always get worse and (b) selling the chips or otherwise relaxing export controls is an action that is hard to undo, and plausibly the effect is that it gives a superpower that is actually capable of strong coordination around building ASI the resources to do so.
Like, one way of looking at things is that the US is mostly not coordinating on AI right now in a real way, and any sane / actually-useful policy responses are far outside the Overton window. Whereas, for better or worse, CCP leadership wouldn't blink or think twice about taking big actions like "strict tracking / monitoring / limits of all SoTA chips used for training runs" or "massive national mobilization to race towards ASI" if they became convinced that either of those things were in the national interest. And once they do pick a direction like that, they seem much more likely to commit and go hard in that direction than the U.S.
I agree / believe you that it's common for Republican staffers to have refrained from ever donating to a Democratic cause, and that this is often more of a strategic decision than a completely uniform / unwavering opposition to every Democrat everywhere.
I still think that the precise kind of optics considerations described and recommended in this post (and other EA-ish circles) are subtly but importantly different from what those staffers are doing. And that this difference is viscerally perceptible to some "red tribe"-coded people, but something of a blind spot for traditionally blue-tribe coded people, including many EAs.
I'm not really making any strong claims about what the distribution / level of caring about all this is likely to be among people with hiring authority in a red tribe administration. Hanania was probably a bad example for me to pick for that kind of question, but I do think he is an exemplar of some aspects of "red tribe" culture that are at a zenith right now, and understanding that is important if you actually want to have a realistic chance at a succeeding in a high-profile / appointee position in a red tribe administration. But none of this is really in tension with also just not donating to democrats if that's you're aspiration, so I'm not really strongly dis-recommending the advice in this post or anything.
Another way of putting things: I suspect that "refrained from donating to a democrat I would have otherwise supported because I read a LW / EAF about optics" is anti-correlated with a person's chances of actually working in a Republican administration in a high-profile capacity. But I'm not particularly confident that that's actually true in real life [edit: and not confident that the effect is causal rather than evidential], and especially not confident that the effect is large vs. the first order effect of just quietly taking the advice in the post. I am more confident that being blind to the red-tribe cultural things I gestured at is going to be pretty strongly anti-correlated, though.
Is the idea that Hanania is evidence that being very public about your contrarian opinions is helpful for policy influence?
No. I'm more saying that the act of carefully weighing up career capital / PR considerations, and then not donating to a democrat based on a cost-benefit analysis of those considerations, feels to me like very stereotypical democrat / blue-tribe behavior.
And further, that some people could have a visceral negative reaction to that kind of PR sensitivity more so than the donations themselves. The Hanania post is an example of the flavor of that kind of negative reaction (though it's not exactly the same thing, I admit).
Separately, I'm not advising people to follow in Hanania's footsteps in terms of deliberately being contrarian and courting controversy, but he is a good example of "not caring about PR / self-censoring at all" and still doing well.
I would rather guess that this pivot has been really costly to his influence on the right, and if he had self-censored, he'd be more influential.
Sure, but if he were the kind of person who would do that, he probably would not have gotten as popular as he is in the first place.
I appreciate this analysis, especially as someone considering donating and who falls in the target audience in some ways, and at an opportune / time-sensitive moment.
That said, my gut reaction is that reading this analysis and then holding off on donating to a candidate you like because of these considerations feels... kinda democrat-coded, in a negative way.
It reminded me of this post by Richard Hanania. Of course, Hanania himself is a pretty controversial figure, and could probably not get an appointment in an administration of any political stripe at this point. But he has an influence and reach on the right that is the envy of many, and which has translated to direct impact on policy. Many of his takes are also well-regarded by more left-leaning / centrist public intellectuals and writers (though probably not so much among mainstream elected democrats), especially lately since he has become more anti-Trump.
Anyway, donating to a political candidate is much more tame / low-stakes than anything Hanania posts on Twitter or Substack. So, if you're interested in politics or policy work (even in a narrow / relatively non-partisan way) and are impressed by what Hanania has accomplished, consider reversing the advice in this post - make whatever donations you want, lean into any controversy / trouble it brings, and don't be afraid to wear and defend your honestly-held views because of PR / career considerations.
Or, turning it around: if you find that one day you're an elected official (or staffer / advisor in the PPO) tasked with screening and vetting potential political appointees or otherwise making these kinds of hiring decisions, consider whether taking someone's past political donations into account is giving in to a culture of lameness and cowardice and femininity, at least in the eyes of Richard Hanania and his fans.
[edit: Not sure if it's the source of the downvotes / solider mindset react, but to clarify, the last paragraph is the advice I would give to a Trump staffer or hypothetical Vance staffer in the PPO who is considering whether to filter out someone for a political appointment because of past political donations, couched in terms and language (from the Hanania post) that might appeal to them.]
Putting the lessons of the Sequences into practice, reflecting on and mentally rehearsing the core ideas, making them your own and weaving them into your everyday habits of thought and action until they become a part of you - at no point should any of this cause an increase in mental anguish, emotional vulnerability, depression, psychosis, mania etc., even temporarily. The worst-case consequences of absorbing these lessons should be that you regret some of your past life choices or perhaps come to realize that you're stuck in a bad situation that you can't easily change. But rationality should also leave you strictly better-equipped to deal with that situation, if you find yourself in it.
Also, the feeling of successfully becoming more rational should not feel like a sudden, tectonic shift in your mental processes or beliefs (in contrast to actually changing your mind about something concrete, which can sometimes feel like that). Rationality should feel natural and gradual and obvious in retrospect, like it was always a part of you, waiting to be discovered and adopted.
I am using "should" in the paragraphs above both descriptively and normatively. It is partly a factual claim: if you're not better off, you're probably missing something or "doing it wrong", in some concrete, identifiable way. But I am also making a normative / imperative statement that can serve as advice or a self-fulfilling prophecy of sorts - if your experience is different or you disagree, consider whether there's a mental motion you can take to make it true.
I am also not claiming that the Valley of Bad Rationality is entirely fake. But I am saying it's not that big of a deal, and in any case the best way out is through. And also that "through" should feel natural / good / easy.
I am not very interested in meditation or jhanas or taking psychoactive drugs or various other forms of "woo". I believe that the beneficial effects that many people derive from these things are real and good, but I suspect they wouldn't work on me. Not because I don't believe in them, but because I already get basically all the plausible benefits from such things by virtue of being a relatively happy, high-energy, mentally stable person with a healthy, well-organized mind.
Some of these qualities are a lucky consequence of genetics, having a nice childhood, a nice life, being generally smart, etc. But there's definitely a chunk of it that I attribute directly to having read and internalized the Sequences in my early teens, and then applied them to thousands of tiny and sometimes not-so-tiny tribulations of everyday life over the years.
The thoughts above are partially / vaguely in response to this post and its comment section about CFAR workshops, but also to some other hazy ideas that I've seen floating around lately.
I have never been to a CFAR workshop and don't actually have a strong opinion on whether attending one is a good idea or not - if you're considering going, I'd advise you to read the warnings / caveats in the post and comments, and if you feel like (a) they don't apply to you and (b) a CFAR workshop sounds like your thing, it's worth going? You'll probably meet some interesting people, have fun, and learn some useful skills. But I suspect that attending such a workshop is not a necessary or even all that helpful ingredient for actually becoming more rational.
A while ago, Eliezer wrote in the preface for the published version of the Sequences:
It ties in to the first-largest mistake in my writing, which was that I didn’t realize that the big problem in learning this valuable way of thinking was figuring out how to practice it, not knowing the theory. I didn’t realize that part was the priority; and regarding this I can only say “Oops” and “Duh.”
Yes, sometimes those big issues really are big and really are important; but that doesn’t change the basic truth that to master skills you need to practice them and it’s harder to practice on things that are further away. (Today the Center for Applied Rationality is working on repairing this huge mistake of mine in a more systematic fashion.)
And has also written:
Jeffreyssai inwardly winced at the thought of trying to pick up rationality by watching other people talk about it—
Maybe I just am typical-minding / generalizing from one example here, but in my case, simply reading a bunch of blog posts and quietly reflecting on them on my own did work, and in retrospect it feels like the only thing that could have worked, or at least that attending a workshop, practicing a bunch of rationality exercises from a handbook, discussing in a group setting, etc. would not have been particularly effective on its own, and potentially even detracting or at least distracting.
And, regardless of whether the caveats / warnings / dis-recommendations in the CFAR post and comments are worth heeding, I suspect they're pointing at issues that are just not that closely related to (what I think of as) the actual core of learning rationality.
There are plenty of expansions you could make to the "evolutionary subset" (some of them trivial, some of them probably interesting) for which no theorem from complexity theory guarantees that the problem of predicting how any particular instance in the superset folds is intractable.
In general, hardness results from complexity theory say very little about the practical limits on problem-solving ability for AI (or humans, or evolution) in the real world, precisely because the "standard abstraction schemes" do not fully capture interesting aspects of the real-world problem domain, and because the results are mainly about classes and limiting behavior rather than any particular instance we care about.
In many hardness and impossibility results, "adversarial / worst-case" are doing nearly all of the work in the proof, but if you're just trying to build some nanobots you don't care about that. Or more prosaically, if you want to steal some cryptocurrency, in real life you use a side-channel or 0-day in the implementation (or a wrench attack); you don't bother trying to factor large numbers.
IMO it is correct to mostly ignore these kinds of things when building your intuition about what a superintelligence is likely or not likely to be able to do, once you understand what the theorems actually say. NP-hardness says, precisely, that "if a problem is NP-hard (and P≠NP), that implies that there is no deterministic algorithm anyone (even a superintelligence) can run, which accepts arbitrary instances of the problem and finds a solution in time steps polynomial in the size of the problem instance.". This statement is precise and formal, but unfortunately it doesn't mention protein folding, and even the implications that it has for an idealized formal model of protein folding are of limited use when trying to predict what specific proteins AlphaFold-N will / won't be able to predict correctly.