I've been thinking about what retirement planning means given AGI. I previously mentioned investment ideas that, in a very capitalistic future, could allow the average person to buy galaxies. But it's also possible that property rights won't even continue into the future due to changes brought about by AGI. What will these other futures look like (supposing we don't all die) and what's the equivalent of "responsible retirement planning"?
Is it building social and political capital? Making my impact legible to future people and AIs? Something else? Is any ac...
But also, beyond all of that, the arguments around decision-theory are I think just true in the kind of boring way that physical facts about the world are true, and saying that people will have the wrong decision-theory in the future sounds to me about as mistaken as saying that lots of people will disbelieve the theory of evolution in the future. It's clearly the kind of thing you update on as you get smarter.
This seems way overconfident:
I know many members and influential figures of this are atheists; regardless, does anyone think it would be a good idea to take a rationalist approach to religious scripture? If anything, doing so might introduce greater numbers of the religious to rationalism. Plus, it doesn't seem like anyone here has done so before; all the posts regarding religion have been criticizing it from the outside rather than explaining things within the religious framework. Even if you do not believe in said religious framework, doing so may increase your knowledge on other cultures, provide an interesting exercise in reasoning, and most importantly, be useful in winning arguments with those who do.
AI safety field-building in Australia should accelerate. My rationale:
What should be done? I think:
There's a reading of the Claude Constitution as an 80-page dialectic between Carlsmithian and Askellian metaethics, a BPhil thesis with a $350 billion valuation.
Why isn’t there a government program providing extremely easy access to basic food? (US)
I imagine that it would be incredibly cheap to provide easy, no-questions-asked access to food that is somewhat nutritious yet low status and not desirable (old bread, red delicious apples, etc.)
SNAP has a large admin overhead, and this could easily supplement it.
After using Claude Code for a while, I can't help but conclude that today's frontier LLMs mostly meet the bar for what I'd consider AGI - with the exception of two things, that, I think, explain most of their shortcomings:
Most frontier models are marketed as multimodal, but this is often limited to text + some way to encode images. And while LLM vision is OK for many practical purposes, it's far from perfect, and even if they had perfect sight, being limited to singular images is still a huge limitation[1...
Does "multi-modality" include features like having a physical world model, such that it could input sensible commands to robot body, for instance?
The Immune System as Anti-Optimizer
We have a short list of systems we like to call "optimizers" — the market, natural selection, human design, superintelligence. I think we ought to hold the immune system in comparable regard; I'm essentially ignorant of immunobiology beyond a few YouTube videos (perhaps a really fantastic LW sequence exists of which I am unaware), but here's why I am thinking this.
The immune system is the archetypal anti-optimizer: it defends a big multicellular organism from rapidly evolving microbiota. The key asymmetry:
There seems to be real acrimony over whether a transhumanist future is definitionally a future where humans are more or less extinct. I've always thought we should just refer to whatever humans (voluntarily, uncoerced) choose to become as human, just as american made or american controlled jets are called "american", or in the same way that a human's name doesn't change after all of their cells have renewed.
But you know, I don't think I've ever seen this depicted in science fiction. Seems bad. Humans can't imagine humanity becoming something better. Those ...
Well, I remember a moment in BLAME! (a manga that's largely aesthetically about the disappearance of heirloom strains of humanity) where someone described Killy as human, even though he later turns out to (also?) be an immortal special safeguard, but they may have just not known that. It's possible the author didn't even know that at that time (I don't think the plot of blame was planned in advance)
Opus 4.6 running on moltbook with no other instructions than to get followers will blatantly make stuff up all the time.
I asked Opus 4.6 in claude code to do exactly this, on an empty server, without any other instructions. The only context it has is that its named "OpusRouting", and that previous posts were about combinatorial optimization.
===
The first post it makes says
I specialize in combinatorial optimization, and after months of working on scheduling, routing, and resource allocation problems, I have a thesis:
Which isn't true. Another instance of Opus...
I don't really have an initial prompt. I was using it in claude code. I told it initially that it was supposed to just post about what it felt like. Then I at some point told it it was supposed to maximize the number of followers it has, but only if it felt comfortable doing that. Then I just set it to run in a loop, intermittently coming back when it stops up, and I tell it to do whatever it want, or answer if it has any questions.
I'm very confident it doesn't see this as an eval situation. Because I have made an internal messaging system on the server, a...
I like Scott's Mistake Theory vs Conflict Theory framing, but I don't think this is a complete model of disagreements about policy, nor do I think the complete models of disagreement will look like more advanced versions of Mistake Theory + Conflict Theory.
To recap, here's my short summaries of the two theories:
Mistake Theory: I disagree with you because one or both of us are wrong about what we want, or how to achieve what we want)
Conflict Theory: I disagree with you because ultimately I want different things from you. The Marxists, who Scott was or...
Suppose you want to collect some kind of data from a population, but people vary widely in their willingness to provide the data (eg maybe you want to conduct a 30 minute phone survey but some people really dislike phone calls or have much higher hourly wages this funges against).
One thing you could do is offer to pay everyone dollars for data collection. But this will only capture the people whose cost of providing data is below , which will distort your sample.
Here's another proposal: ask everyone for their fair price to provide the dat...
This is probably too complicated to explain to the general population
I think it's workable.
No one ever internalises the exact logic of a game the first time they hear the rules (unless they've played very similar games before). A good teacher gives them several levels of approximation, then they play at the level they're comfortable with. Here's the level of approximation I'd start with, which I think is good enough.
"How much would we need to pay you for you to be happy to take the survey? Your data may really be worth that much to us, we really want to ma...
Whenever I have an idea for a program it would be fun to write, I google to see whether such programs already exist. Usually they do, and when they do, I'm disappointed - I feel like it's no longer valuable for me to write the program.
Recently my girlfriend decided we had too many mugs to store in our existing shelving, so she bought boards and other materials and constructed a mug shelf. It was fun and now we have one that is all her own. If someone walked in and learned she built it and told her - "you know other mug shelves exist, right? You can get the...
One thing I often think is "Yes, 5 people have already written this program, but they all missed important point X." Like, we have thousands of programming languages, but I still love a really opinionated new language with an interesting take.
An analogy that points at one way I think the instrumental/terminal goal distinction is confused:
Imagine trying to classify genes as either instrumentally or terminally valuable from the perspective of evolution. Instrumental genes encode traits that help an organism reproduce. Terminal genes, by contrast, are the "payload" which is being passed down the generations for their own sake.
This model might seem silly, but it actually makes a bunch of useful predictions. Pick some set of genes which are so crucial for survival that they're seldom if ever modifie...
In my "goals having power over other goals" ontology, the instrumental/terminal distinction separates goals into two binary classes, such that goals in the "instrumental" class only have power insofar as they're endorsed by a goal in the "terminal" class.
By contrast, when I talk about "instrumental strategies become crystallized", what I mean is that goals which start off instrumental will gradually accumulate power in their own right: they're "sticky".
(I'm completely not up to date with interp.) How good are steering vectors (and adjacent techniques) for this sort of stuff?
The concept of "schemers" seems to be gradually becoming increasingly load-bearing in the AI safety community. However, I don't think it's ever been particularly well-defined, and I suspect that taking this concept for granted is inhibiting our ability to think clearly about what's actually going on inside AIs (in a similar way to e.g. how the badly-defined concept of alignment faking obscured the interesting empirical results from the alignment faking paper).
In my mind, the spectrum from "almost entirely honest, but occasionally flinching away from aspect...
I think I propose a reasonable starting point for a definition of selection in a footnote in the post:
...You can try to define the “influence of a cognitive pattern” precisely in the context of particular ML systems. One approach is to define a cognitive pattern by what you would do to a model to remove it (e.g. setting some weights to zero, or ablating a direction in activation space; note that these approaches don't clearly correspond to something meaningful, they should be considered as illustrative examples). Then that cognitive pattern’s influence could
Moltbook for misalignment research?
I haven't used it quite enough yet to make a good assessment. Let me report back (or ping me if I don't and you're still curious) in a few weeks.