To branch off the line of thought in this comment, it seems that for most of my adult life I've been living in the bubble-within-a-bubble that is LessWrong, where the aspect of human value or motivation that is the focus of our signaling game is careful/skeptical inquiry, and we gain status by pointing out where others haven't been careful or skeptical enough in their thinking. (To wit, my repeated accusations that Eliezer and the entire academic philosophy community tend to be overconfident in their philosophical reasoning, don't properly appreciate the difficulty of philosophy as an enterprise, etc.)
I'm still extremely grateful to Eliezer for creating this community/bubble, and think that I/we have lucked into the One True Form of Moral Progress, but must acknowledge that from the outside, our game must look as absurd as any other niche status game that has spiraled out of control.
My early posts on LW often consisted of pointing out places in the Sequences where Eliezer wasn't careful enough. Shut Up and Divide? and Boredom vs. Scope Insensitivity come to mind. And of course that's not the only way to gain status here - the big status awards are given for coming up with novel ideas and backing them up with carefully constructed arguments.
Reassessing , in light of subsequent events.
I think @cousin_it made a good point "if many people adopt heroic responsibility to their own values, then a handful of people with destructive values might screw up everyone else, because destroying is easier than helping people" and I would generalize it to people with biased beliefs (which is often downstream of a kind of value difference, i.e., selfish genes).
It seems to me that "heroic responsibility" (or something equivalent but not causally downstream of Eliezer's writings) is contributing to the current situation, of multiple labs racing for ASI and essentially forcing the AI transition on humanity without consent or political legitimacy, each thinking or saying that they're justified because they're trying to save the world. It also seemingly justifies or obligates Sam Altman to fight back when the OpenAI board tried to fire him, if he believed the board was interfering with his mission.
Perhaps "heroic responsibility" makes more sense if overcoming bias was easy, but in a world where it's actually hard and/or few people are actually motivated to do it, which we seem to live in, spreading the idea of "heroic responsibility" seems, well, irresponsible.
My sense is that most of the people with lots of power are not taking heroic responsibility for the world. I think that Amodei and Altman intend to achieve global power and influence but this is not the same as taking global responsibility. I think, especially for Altman, the desire for power comes first relative to responsibility. My (weak) impression is that Hassabis has less will-to-power than the others, and that Musk has historically been much closer to having responsibility be primary.
I don’t really understand this post as doing something other than asking “on the margin are we happy or sad about present large-scale action” and then saying that the background culture should correspondingly praise or punish large-scale action. Which is maybe reasonable, but alternatively too high level of a gloss. As per the usual idea of rationality, I think whether you are capable of taking large-scale action in a healthy way is true in some worlds and not in others, and you should try to figure out which world you’re in.
The financial incentives around AI development are blatantly insanity-inducing on the topic and anyone should’ve been able to guess that going in, I don’t think this w...
I'm also uncertain about the value of "heroic responsibility", but this downside consideration can be mostly addressed by "don't do things which are highly negative sum from the perspective of some notable group" (or other anti-unilateralist curse type intuitions). Perhaps this is too subtle in practice.
AI labs are starting to build AIs with capabilities that are hard for humans to oversee, such as answering questions based on large contexts (1M+ tokens), but they are still not deploying "scalable oversight" techniques such as IDA and Debate. (Gemini 1.5 report says RLHF was used.) Is this more good news or bad news?
Good: Perhaps RLHF is still working well enough, meaning that the resulting AI is following human preferences even out of training distribution. In other words, they probably did RLHF on large contexts in narrow distributions, with human rater who have prior knowledge/familiarity of the whole context, since it would be too expensive to do RLHF with humans reading diverse 1M+ contexts from scratch, but the resulting chatbot is working well even outside the training distribution. (Is it actually working well? Can someone with access to Gemini 1.5 Pro please test this?)
Bad: AI developers haven't taken alignment seriously enough to have invested enough in scalable oversight, and/or those techniques are unworkable or too costly, causing them to be unavailable.
From a previous comment:
...From my experience doing early RLHF work for Gemini, larger models exploit the reward mode
I'm thinking that the most ethical (morally least risky) way to "insure" against a scenario in which AI takes off and property/wealth still matters is to buy long-dated far out of the money S&P 500 calls. (The longest dated and farthest out of the money seems to be Dec 2029 10000-strike SPX calls. Spending $78 today on one of these gives a return of $10000 if SPX goes to 20000 by Dec 2029, for example.)
My reasoning here is that I don't want to provide capital to AI industries or suppliers because that seems wrong given what I judge to be high x-risk their activities are causing (otherwise I'd directly invest in them), but I also want to have resources in a post-AGI future in case that turns out to be important for realizing my/moral values. Suggestions welcome for better/alternative ways to do this.
I find it curious that none of my ideas have a following in academia or have been reinvented/rediscovered by academia (including the most influential ones so far UDT, UDASSA, b-money). Not really complaining, as they're already more popular than I had expected (Holden Karnofsky talked extensively about UDASSA on an 80,000 Hour podcast, which surprised me), it just seems strange that the popularity stops right at academia's door. (I think almost no philosophy professor, including ones connected with rationalists/EA, has talked positively about any of my philosophical ideas? And b-money languished for a decade gathering just a single citation in academic literature, until Satoshi reinvented the idea, but outside academia!)
Clearly academia has some blind spots, but how big? Do I just have a knack for finding ideas that academia hates, or are the blind spots actually enormous?
I think the main reason why UDT is not discussed in academia is that it is not a sufficiently rigorous proposal, as well as there not being a published paper on it. Hilary Greaves says the following in this 80k episode:
Then as many of your listeners will know, in the space of AI research, people have been throwing around terms like ‘functional decision theory’ and ‘timeless decision theory’ and ‘updateless decision theory’. I think it’s a lot less clear exactly what these putative alternatives are supposed to be. The literature on those kinds of decision theories hasn’t been written up with the level of precision and rigor that characterizes the discussion of causal and evidential decision theory. So it’s a little bit unclear, at least to my likes, whether there’s genuinely a competitor to decision theory on the table there, or just some intriguing ideas that might one day in the future lead to a rigorous alternative.
I also think it is unclear to what extent UDT and updateless are different from existing ideas in academia that are prima facie similar, like McClennen's (1990) resolute choice and Meacham's (2010, §4.2) cohesive decision theory.[1] Resolute choice in particular h...
It may be worth thinking about why proponents of a very popular idea in this community don't know of its academic analogues, despite them having existed since the early 90s[1] and appearing on the introductory SEP page for dynamic choice.
Academics may in turn ask: clearly LessWrong has some blind spots, but how big?
I was thinking of writing a short post kinda on this topic (EDIT TO ADD: it’s up! See Some (problematic) aesthetics of what constitutes good work in academia), weaving together:
I wrote an academic-style paper once, as part of my job as an intern in a corporate research department. It soured me on the whole endeavor, as I really didn't enjoy the process (writing in the academic style, the submission process, someone insisting that I retract the submission to give them more credit despite my promise to insert the credit before publication), and then it was rejected with two anonymous comments indicating that both reviewers seemed to have totally failed to understand the paper and giving me no chance to try to communicate with them to understand what caused the difficulty. The cherry on top was my mentor/boss indicating that this is totally normal, and I was supposed to just ignore the comments and keep resubmitting the paper to other venues until I run out of venues.
My internship ended around that point and I decided to just post my ideas to mailing lists / discussion forums / my home page in the future.
Also, I think MIRI got FDT published in some academic philosophy journal, and AFAIK nothing came of it?
What is going on with Constitution AI? Does anyone know why no LLM aside from Claude (at least none that I can find) has used it? One would think that if it works about as well as RLHF (which it seems to), AI companies would be flocking to it to save on the cost of human labor?
Also, apparently ChatGPT doesn't know that Constitutional AI is RLAIF (until I reminded it) and Gemini thinks RLAIF and RLHF are the same thing. (Apparently not a fluke as both models made the same error 2 out of 3 times.)
Isn't the basic idea of Constitutional AI just having the AI provide its own training feedback using written instruction? My guess is there was a substantial amount of self-evaluation in the o1 training with complicated written instructions, probably kind of similar to a constituion (though this is just a guess).
I'm increasingly worried that philosophers tend to underestimate the difficulty of philosophy. I've previously criticized Eliezer for this, but it seems to be a more general phenomenon.
Observations:
Possible explanations:
Philosophy is frequently (probably most of the time) done in order to signal group membership rather than as an attempt to accurately model the world. Just look at political philosophy or philosophy of religion. Most of the observations you note can be explained by philosophers operating at simulacrum level 3 instead of level 1.
If you say to someone
Ok, so, there's this thing about AGI killing everyone. And there's this idea of avoiding that by making AGI that's useful like an AGI but doesn't kill everyone and does stuff we like. And you say you're working on that, or want to work on that. And what you're doing day to day is {some math thing, some programming thing, something about decision theory, ...}. What is the connection between these things?
and then you listen to what they say, and reask the question and interrogate their answers, IME what it very often grounds out into is something like:
Well, I don't know what to do to make aligned AI. But it seems like X ϵ {ontology, decision, preference function, NN latent space, logical uncertainty, reasoning under uncertainty, training procedures, negotiation, coordination, interoperability, planning, ...} is somehow relevant.
...And, I have a formalized version of some small aspect of X in which is mathematically interesting / philosophically intriguing / amenable to testing with a program, and which seems like it's kinda related to X writ large. So what I'm going to do, is I'm going to tinker with this formalized version for a week/month/year, and then I
About a week ago FAR.AI posted a bunch of talks at the 2024 Vienna Alignment Workshop to its YouTube channel, including Supervising AI on hard tasks by Jan Leike.
What do people think about having more AI features on LW? (Any existing plans for this?) For example:
I've been thinking about doing some of this myself (e.g., update my old script for loading all of someone's post/comment history into one page), but of course would like to see official implementations, if that seems like a good idea.
Math and science as origin sins.
From Some Thoughts on Metaphilosophy:
Philosophy as meta problem solving Given that philosophy is extremely slow, it makes sense to use it to solve meta problems (i.e., finding faster ways to handle some class of problems) instead of object level problems. This is exactly what happened historically. Instead of using philosophy to solve individual scientific problems (natural philosophy) we use it to solve science as a methodological problem (philosophy of science). Instead of using philosophy to solve individual math problems, we use it to solve logic and philosophy of math. [...] Instead of using philosophy to solve individual philosophical problems, we can try to use it to solve metaphilosophy.
It occurred to me that from the perspective of longtermist differential intellectual progress, it was a bad idea to invent things like logic, mathematical proofs, and scientific methodologies, because it permanently accelerated the wrong things (scientific and technological progress) while giving philosophy only a temporary boost (by empowering the groups that invented those things, which had better than average philosophical competence, to spread their cu...
High population may actually be a problem, because it allows the AI transition to occur at low average human intelligence, hampering its governance. Low fertility/population would force humans to increase average intelligence before creating our successor, perhaps a good thing!
This assumes that it's possible to create better or worse successors, and that higher average human intelligence would lead to smarter/better politicians and policies, increasing our likelihood of building better successors.
Some worry about low fertility leading to a collapse of civilization, but embryo selection for IQ could prevent that, and even if collapse happens, natural selection would start increasing fertility and intelligence of humans again, so future smarter humans should be able to rebuild civilization and restart technological progress.
Added: Here's an example to illustrate my model. Assume a normally distributed population with average IQ of 100 and we need a certain number of people with IQ>130 to achieve AGI. If the total population was to half, then to get the same absolute number of IQ>130 people as today, average IQ would have to increase by 4.5, and if the population was to become 1/10 of the original, average IQ would have to increase by 18.75.
Are humans fundamentally good or evil? (By "evil" I mean something like "willing to inflict large amounts of harm/suffering on others in pursuit of one's own interests/goals (in a way that can't be plausibly justified as justice or the like)" and by "good" I mean "most people won't do that because they terminally care about others".) People say "power corrupts", but why isn't "power reveals" equally or more true? Looking at some relevant history (people thinking Mao Zedong was sincerely idealistic in his youth, early Chinese Communist Party looked genuine about wanting to learn democracy and freedom from the West, subsequent massive abuses of power by Mao/CCP lasting to today), it's hard to escape the conclusion that altruism is merely a mask that evolution made humans wear in a context-dependent way, to be discarded when opportune (e.g., when one has secured enough power that altruism is no longer very useful).
After writing the above, I was reminded of @Matthew Barnett's AI alignment shouldn’t be conflated with AI moral achievement, which is perhaps the closest previous discussion around here. (Also related are my previous writings about "human safety" although they still used the...
Some potential risks stemming from trying to increase philosophical competence of humans and AIs, or doing metaphilosophy research. (1 and 2 seem almost too obvious to write down, but I think I should probably write them down anyway.)
I wrote Smart Losers a long time ago, trying to understand/explain certain human phenomena. But the model could potentially be useful for understanding (certain aspects of) human-AI interactions as well.
Possibly relevant anecdote: Once I was with a group of people who tried various psychological experiments. That day, the organizers proposed that we play iterated Prisonner's Dilemma. I was like "yay, I know the winning strategy, this will be so easy!"
I lost. Almost everyone always defected against me; there wasn't much I could do to get points comparable to other people who mostly cooperated with each other.
After the game, I asked why. (During the game, we were not allowed to communicate, just to write our moves.) The typical answer was something like: "well, you are obviously very smart, so no matter what I do, you will certainly find a way to win against me, so my best option is to play it safe and always defect, to avoid the worst outcome".
I am not even sure if I should be angry at them. I suppose that in real life, when you have about average intelligence, "don't trust people visibly smarter than you" is probably a good strategy, on average, because there are just too many clever scammers walking around. At the same time I feel hurt, because I am a natural altruist and cooperator, so this feels extremely unfair, and a loss for both sides.
(There were other situations in my life where the same pattern probably also applied, but most of the time, you just don't know why other people do whatever they do. This time I was told their reasoning explicitly.)
What I've been using AI (mainly Gemini 2.5 Pro, free through AI Studio with much higher limits than the free consumer product) for:
I'm sharing a story about the crew of Enterprise from Star Trek TNG[1].
This was written with AI assistance, and my workflow was to give the general theme to AI, have it write an outline, then each chapter, then manually reorganize the text where needed, request major changes, point out subpar sentences/paragraphs for it to rewrite, and do small manual changes. The AI used was mostly Claude 3.5 Sonnet, which seems significantly better than ChatGPT-4o and Gemini 1.5 Pro at this kind of thing.
getting an intelligence/rationality upgrade, which causes them to