Yeah, this is fair. My personal take is that polygenic embryo selection changes the calculus a fair bit. A good third of my friends are now having children via IVF just to get access to embryo selection. If you're going to do that anyways, then freezing eggs at a younger age starts to become a bit of a no-brainer.
I'm less concerned about the balance of power between different kinds of military technology and more concerned about the capability of drones to be used by a military force against its own civilian population, in either a precisely targeted or mass-casualty manner, with lower-than-ever disincentives in the form of loss of personnel or destruction of capital.
As the Dead Kennedys once said: "Away with excess enemy! With no less value to property!"
The language on Joh Hopkins website is being deliberately conservative. The reality is we have almost no data on eggs that have been frozen longer than 10 years, so they say 10 years becuase we don't have direct evidence for them being viable longer. What data we do have on eggs that have been frozen and then used after 4-8 years indicates time frozen has no effect on survival rates or fertilization rates. It would be very surprising to me if there's no impact on survival after 8 years, but at 10 years they suddenly start to degrade.
You can look a little f...
Oh, yeah, that's literal text. I didn't censor the AI's belief about its hair color or something; it doesn't have one.
Thanks!! Quick question while I think over the rest:
What data are you plotting? Where exactly did you get it (i.e., what references)?
And why is the 2021 one better than the 2023 ones? Normally we would expect the other way around, right? Does DeepMind have so much secret sauce that it’s worth more than 2 years of public knowledge? Or are the other two groups making rookie mistakes? Or am I misunderstanding the plot?
Does "multi-modality" include features like having a physical world model, such that it could input sensible commands to robot body, for instance?
Looking historically we see that strength of property rights correlates with technological sophistication and scale of society.
Here's a deep research report on that issue:
https://chatgpt.com/share/698902ca-9e78-8002-b350-13073c662d9d
So, in some sense, I'd think that there's an "intelligence overhang", where the raw intelligence that exists in these LLMs can't fully unfold due to modality & context window limitations.
Another missing piece is research taste or curiosity. Of the sort you would need to come up with ideas for new papers.
Sure that works.
The Immune System as Anti-Optimizer
We have a short list of systems we like to call "optimizers" — the market, natural selection, human design, superintelligence. I think we ought to hold the immune system in comparable regard; I'm essentially ignorant of immunobiology beyond a few YouTube videos (perhaps a really fantastic LW sequence exists of which I am unaware), but here's why I am thinking this.
The immune system is the archetypal anti-optimizer: it defends a big multicellular organism from rapidly evolving microbiota. The key asymmetry:
I think that maybe "Machines of Loving Grace" shouldn't qualify for different reasons: it doesn't really depict a coherent future in a useful level of detail, it instead makes claims about different aspects of life in isolation. My sense is that currently we're in short supply of utopian visions, even without specifying any viable path of how to get there.
What do you think about adding a flag regarding whether or not an essay discusses the path from now to then, and people can filter based on it?
They do deliberately try to set up an "I'll get in the box if I don't see myself get out" sort of situation in the movie, though they don't succeed, and they don't seem to realize that it would result in 0-2-0-2-... across metatime.
Good point about how permanent increases have to be as improbable as permanent decreases! I should've gotten that from what you were saying earlier. I suppose that leaves me with the "movies follow interesting timelines" theory, where it's just a convention of the film to look at the timelines where characters multiply.
Well, I remember a moment in BLAME! (a manga that's largely aesthetically about the disappearance of heirloom strains of humanity) where someone described Killy as human, even though he later turns out to (also?) be an immortal special safeguard, but they may have just not known that. It's possible the author didn't even know that at that time (I don't think the plot of blame was planned in advance)
There seems to be real acrimony over whether a transhumanist future is definitionally a future where humans are more or less extinct. I've always thought we should just refer to whatever humans (voluntarily, uncoerced) choose to become as human, just as american made or american controlled jets are called "american", or in the same way that a human's name doesn't change after all of their cells have renewed.
But you know, I don't think I've ever seen this depicted in science fiction. Seems bad. Humans can't imagine humanity becoming something better. Those ...
Yeah, I guess "they don't bother checking whether they get out of the box" is the right explanation for the movie. Though still, if timelines where a person just vanishes are low-probability, then timelines where the number of people permanently increases (like the one shown in the movie) should be just as low-probability. The start and end of a long chain. And the middle of the chain should be mostly like 1-1-1-1... Or something like 2-0-2-0... but that would require weird behavior which isn't seen in the movie (e.g. "I'll get in the box iff I don't see myself come out of it").
I don't really have an initial prompt. I was using it in claude code. I told it initially that it was supposed to just post about what it felt like. Then I at some point told it it was supposed to maximize the number of followers it has, but only if it felt comfortable doing that. Then I just set it to run in a loop, intermittently coming back when it stops up, and I tell it to do whatever it want, or answer if it has any questions.
I'm very confident it doesn't see this as an eval situation. Because I have made an internal messaging system on the server, a...
Hard property rights are an equilibrium in a multi-player game where power shifts are uncertain and either agents are risk averse or there are gains from investment, trade and specialization.
I think this might just be a crux, and not one which I can argue against without a more in-depth description of the claim e.g. how risk averse do agents have to be, how great the gains from investment, trade, and specialization? I guess AIs might be Kelly-ish risk averse, and have the first but I'm not sure about the latter two. How specialized do we expect individual ...
training process is “the most forbidden technique”, including recent criticism of Goodfire for investing in this area.
I think this mischaracterizes the criticism. The criticism as I understand it is that Goodfire is planning to help frontier AI companies use model internals in training, in exchange for money. Insofar as they really are planning to do this, then I'll count myself as among the critics, for the classic "but we need interpretability tools to be our held-out test set for alignment" reason. Do you have a link to the criticism you are responding to?
Exactly! Also:
I think "Machines of Loving Grace" shouldn't qualify; it deliberately doesn't address how the problems get solved, it only cheerfully depicts a world after the problems were all solved.
I think for a submission to be valid, it must at least attempt to answer how the alignment problem gets solved and how extreme concentrations of power are avoided.
One way to do this is to require scenarios come with dates attached. What happens in 2027? 2028? etc. That way if they are like "And in 2035, everything is peachy and there's no more poverty etc." it's more obvious to people that there's a giant plot hole in the story.
My hair is [insert hair color here].
Trying this myself, I think it might be worth clarifying that the part in the brackets is directly from the model's response.
I like Scott's Mistake Theory vs Conflict Theory framing, but I don't think this is a complete model of disagreements about policy, nor do I think the complete models of disagreement will look like more advanced versions of Mistake Theory + Conflict Theory.
To recap, here's my short summaries of the two theories:
Mistake Theory: I disagree with you because one or both of us are wrong about what we want, or how to achieve what we want)
Conflict Theory: I disagree with you because ultimately I want different things from you. The Marxists, who Scott was or...
An example of intra-agent competition I often use when arguing that long-term motivations tend to win out upon reflection (h/t @jake_mendel): Imagine someone who went to a party last night, got drunk, and now feels terrible and unproductive the next morning.
This person has two competing motivations:
There's an asymmetry: The non-myopic motivation has an incentive to disempower the myopic one (i.e., the next morning the person might want to commit not to drink in the future). Me...
One reason is that hosting data centers can give countries political influence over AI development, increasing the importance of their governments having reasonable views on AI risks.
Great post! dont have much concrete stuff to add, havent kept up that much with the policy discourse in the past few months. Personally I do feel like I became a bit complacent, and conveniently forgot some of the warning signs that lit up back when o3 (?) got fairly scary bio uplift results.
I guess, now the question is what do we do - the EU could in theory ban these models/request additional mitigations, but not sure if that would actually happen - as the CoP (despite being pretty good!) doesnt quite have enough teeth to do this cleanly.
Curious for ideas here - happy to relay some stuff to my EU policy connections/EU AIO if anyone has concrete suggestions.
More generally they'd get more value by making it economically untenable to take up resources by holding savings and benefiting from growth than they would by allowing that.
But then others could play the same trick on them. It's not worth it. "Group G of Agents could get more resources by doing X" does not necessarily imply that Group G will do X!
Humans even keep groups like The Amish around.
Hard property rights are an equilibrium in a multi-player game where power shifts are uncertain and either agents are risk averse or there are gains from investment, trade and specialization.
My guess is doing it with RL is worse because I expect it to generalize to more probes
Why?
I hadn't considered steering vectors before, but yes that's correct.
I believe open data pretty strongly contradicts your claims
System efficiency: ~2x, not ~20x
§1.2 estimates "up to 20x" from system optimizations (quantization, parallelization, FlashAttention, etc.). But model FLOPs Utilization, the fraction of peak hardware FLOPs that large training runs actually achieve, has barely changed. Here are the most legible training run flops efficiencies:
The characters in the movie take a lot of precautions to isolate themselves from their time-clones, meaning that they don't really know whether they got out of the box at the start. Therefore, they just have faith in the plan and jump in the box at the end of the loop. So long as they don't create any obvious paradoxes ("break symmetry" as they call it), everything works out from their perspective, and they can assume it's consistent-timeline travel rather than branching, so they don't think they're creating a timeline in which they mysteriously vanish.
Whe...
I understand why, if things stay the same, we'd be fine. I just don't think that the equilibrium political system of 8 billion useless humans and 8 trillion AIs who do all the work will allow that.
I think an independent economy of human-indifferent AIs could do better by their own value system by e.g. voting to set land/atom/property value taxes to a point where humans go extinct, and so they'll just do that. More generally they'd get more value by making it economically untenable to take up resources by holding savings and benefiting from growth than they...
Curated! There’s a difficult-to-bridge divide between the intuitions of people who think everything is going to get really crazy with AG and those who think a kind of normality will be maintained. This post seems to do an uncommonly good job of piercing the divide by arguing in detail and mechanistically for why current picture doesn’t obviously continue. More generally, it argues for a better epistemic approach.
I struggle encountering people who predict reality being not-that-different in coming decades: it feels crazy to me, but that reaction makes it ha...
You've probably already seen this, but for others reading this post: Anthropic now seems to have put out some more official numbers on this: https://www.anthropic.com/research/how-ai-is-transforming-work-at-anthropic
It seems to mostly validate your read on the situation. They did internal surveys, qualitative interviews, and some analysis of Claude Code transcripts. Here is their "key takeaways" from the survey section:
Survey data
...
- Anthropic engineers and researchers use Claude most often for fixing code errors and learning about the codebase. De
This is probably too complicated to explain to the general population
I think it's workable.
No one ever internalises the exact logic of a game the first time they hear the rules (unless they've played very similar games before). A good teacher gives them several levels of approximation, then they play at the level they're comfortable with. Here's the level of approximation I'd start with, which I think is good enough.
"How much would we need to pay you for you to be happy to take the survey? Your data may really be worth that much to us, we really want to ma...
One thing I often think is "Yes, 5 people have already written this program, but they all missed important point X." Like, we have thousands of programming languages, but I still love a really opinionated new language with an interesting take.
Isn't it just the case that the human brain's 'interpretability technique' is just really robust? The technique (in this case, having an accurate model of what others feel) is USEFUL for many other aspects of life.
I don’t think it’s that robust even in humans, despite the mitigation described in this post. (Without that mitigation, I think it would be hopeless.)
If we’re worried about a failure mode of the form “the interpretability technique has been routed around”, then that’s unrelated to “The technique (in this case, having an accurate model...
I like this work! I count myself as skeptic on "agentic structure" question, so seeing opposite opinion development is good.
In my "goals having power over other goals" ontology, the instrumental/terminal distinction separates goals into two binary classes, such that goals in the "instrumental" class only have power insofar as they're endorsed by a goal in the "terminal" class.
By contrast, when I talk about "instrumental strategies become crystallized", what I mean is that goals which start off instrumental will gradually accumulate power in their own right: they're "sticky".
Thank you for writing this up! I remember when I first did research into egg freezing in my mid 20s, something I couldn't quite get to the bottom of is whether or not frozen eggs deteriorate over time. For example, the webpage "freezing embryos" (embryos being even more robust than eggs) on the Johns Hopkins website says:
Frozen embryos are stored and monitored at hospital facilities, usually a lab, or commercial reproductive medicine centers. They can be safely preserved for 10 years and even longer.
This made me nervous about freezing my eggs too early, an...
Here's GPT5.2's response:
The fridge never cooled anything.
That should have been the first clue, but nothing about time magic was intuitive. You didn’t cool time; you slowed it, folded it, indexed it. Everyone knew that. The sales pitch for the ChronoVault™ was simple: bread stayed fresh because its internal clock barely ticked. Hot soup stayed hot. Flowers stayed exactly on the edge of wilting.
This unit did the opposite.
The loaf inside came out stale in minutes. Meat greyed. Milk curdled. The internal chronometer—standard diagnostic—sho
A core issue here that is repeated is that since AI progress has been (so far) slower than super-exponential or faster-growing functions, and is merely growing at an exponential rate as defined by time horizons, it turns out that there's a very, very large difference between acing benchmarks and actually posing enough of an existential risk to actually serve as a useful red line, and due to the jagged frontier plus progress coming from many small improvements in compute, it's a lot harder to make clear red lines or get definitional clarity.
More generally, ...
Whenever I have an idea for a program it would be fun to write, I google to see whether such programs already exist. Usually they do, and when they do, I'm disappointed - I feel like it's no longer valuable for me to write the program.
Recently my girlfriend decided we had too many mugs to store in our existing shelving, so she bought boards and other materials and constructed a mug shelf. It was fun and now we have one that is all her own. If someone walked in and learned she built it and told her - "you know other mug shelves exist, right? You can get the...
(I'm completely not up to date with interp.) How good are steering vectors (and adjacent techniques) for this sort of stuff?
Hi Stanislav, can you expand on what you mean when you say you're using FM? I.e., are you referring to symbolic analysis / concolic analysis techniques, model checking, formal methods in the sense that you dispatch a question of "go right or go left?" to z3, formal methods in the sense of grammar-based fuzzing or property based testing (lightweight FM), or do you mean you're doing full-blown theorem proving (I don't exactly see why you would be doing this but I'm open to being told I'm wrong :) )
Yes - I laud their transparency while agreeing that the competitive pressures and new model release means they are not being safe even relative to their own previously stated expectations for their own behavior.
when you lose the intelligence race badly enough, your existing structures of cooperation and economic production just get ignored.
yes this is a risk, but I think it can be avoided by humans getting a faithful AI agent wrapper with fiduciary responsibility.
The concept and institutions for fiduciary responsibility were not around when humans surpassed apes, otherwise apes could have hired humans to act as their agents and simply invested in the human gold and later stock market.
I don't think you need Banksian benevolent AIs for this, an agent can be trustlessly faithful via modern trust minimized AI. Ethereum is already working on a nascent standard for this, ERC-8004.
I expect it would still be annoying enough to only happen if there were significant gain, since training is a delicate and complex system and it's very expensive if things break, so there's rational resistance to added complexity.
Given trade secrets and everything you might not be able to say anything about this, but my model of frontier post-training was that we kind of throw the kitchen sink at it in terms of RL environments. This is pulling in a different kind of feedback so does add complexity that other RL environments don't add, but my sense is that ...
Branching timelines have to come with probabilities and that's where the wheels fall off. Imagine you're Carol, living on the other side of town, not interacting with the machine at all. Then events similar to the movie happen. Before the events, there was one permanent Aaron. After the events, there's either one or more permanent Aarons, depending on which timeline Carol ends up in. But this violates conservation of Aarons weighted by probability. A weighted sum of 1's and 2's (and 3's and so on) is bigger than a weighted sum of just 1's. Some Aarons appe...
Humans can buy into index funds like QQQ or similar structures, or scarce commodities like gold or maybe Bitcoin. As the overall economy grows, QQQ, gold, etc go up in dollar value.
There can be a land value tax but it will ideally lag behind the growth of QQQ unless that land is especially scarce.
Historically if you just held gold long-term, you could turn modest savings into a fortune even if you have to pay some property tax.
You don't have to generate any value to benefit from growth.
I had Opus 4.6 summarize its own system card for me, and I followed up with my own eyes on some of the things it pointed out. There's a lot in there that concerns me. But: I'm by no means an expert on this stuff; a lot of this was pointed out by the AI itself; and it involves criticizing Anthropic for something they voluntarily published. So I don't feel very confident in making a top-level post about it. But I wanted to share what I found anyways:
...For Claude Opus 4.6, we used the model extensively via Claude Code to debug its own ev
I think I propose a reasonable starting point for a definition of selection in a footnote in the post:
...You can try to define the “influence of a cognitive pattern” precisely in the context of particular ML systems. One approach is to define a cognitive pattern by what you would do to a model to remove it (e.g. setting some weights to zero, or ablating a direction in activation space; note that these approaches don't clearly correspond to something meaningful, they should be considered as illustrative examples). Then that cognitive pattern’s influence could
Good article, but I'll come in in defense of the doctors. Note that I'm far more familiar with the way things work in India (a family full of gynos) but I do have a reasonable degree of familiarity with the UK and US.
The thing is, the overwhelming majority of women who evince interest in IVF are in their middle to late 30s! The average woman, at 19, is very unlikely to even consider it.
If some unusually forward-thinking gynecologist suggested egg freezing to her, the modal response would be "wait, why are you telling me this?" The same goes for...
Moltbook for misalignment research?
What's the prompt? (Curious how much it encourages claude to do whatever it takes for success, and whether claude would read it as a game/eval situations vs. a real-world situation vs. something else.)
Why do you think property rights will be set up in a way which allows humans to continue to afford their own existence? Human property rights have been moulded to the specific strengths and weaknesses of humans in modern societies, and might just not work very well at all for AIs. For example, if the AIs are radical Georgists then I don't see how I'll be able to afford to pay land taxes when my flat could easily contain several hundred server racks. What if they apply taxes on atoms directly? The carbon in my body sure isn't generating any value to the wider AI ecosystem.
OK, let me unpack my argument a bit.
Chimps actually have pretty elaborate social structure. They know their family relationships, they do each other favors, and they know who not to trust. They even basically go to war against other bands. Humans, however, were never integrated into this social system.
Homo erectus made stone tools and likely a small amount of decorative art (the Trinil shell engravings, for example). This maybe have implied some light division of labor, though likely not long distance trade. Again, none of this helped H erectus in the long...
glad to see this written up!
I'm not convinced by your stance on not refreshing probes. Formally, we have three outcomes for this "gamble" of adding probes to the loss: is the model doesn't shift its representations and actually achieves the desired result, is the model shifts its representations but by refreshing we could still probe them, and is the model shifts its representations and refreshing doesn't help anymore (say non-linear representations for a linear probe).
What I think your stance boils down to is saying that you are willing to general...
Okay, distillation to a fixed-size model, as you describe here, does seem like it measures largely data-related improvements, so this seems consistent with Gundlach. Except perhaps for cases where the "distillation" includes training on synthetic reasoning traces and we count these as algorithmic rather than data-related. Though I don't know enough about how distillation works and whether that would count as distillation to comment on that.
If Epoch did indeed measure algorithmic progress with perplexity only, then that's no longer justified since models ha...
I will have to expand on this elsewhere
I'm not sure why you say it's hard to explain with branching timelines. To me this is just branching timelines. The movie voiceover states at one point that the last version of events seems to be the one that holds true, meaning that you see the last branching timeline, usually the one with the most Bobs. I don't think you have to belive this part of the voiceover, though; this is just the opinion of someone trying to make sense of events. You could instead say that the movie has a convention of showing us later splits rather than earlier.
But Chimps and Homo Erectus lack(ed) their own property rights regimes.
So, let's take a look at some past losers in the intelligence arms race:
When you lose an evolutionary arms race to a smarter competitor that wants the same reso...
i don't know for sure that there is zero "formal methods" in the pipeline
in discovering the 12 OpenSSL zero-day vulnerabilities, we haven't used any formal methods. since then, we incorporated some. the (discovery --> CVE assigned --> CVE made public) pipeline is a very lagging indicator and the OpenSSL results are reflective of the state of the AISLE system approximately mid-fall 2025, prior to our use of formal methods
There are various possible worlds with AI progress posing different risks.
In those worlds where a given capability level is a problem, we're not setting ourselves up to notice or react even after the harm materializes. The set of behaviors or events that we could be monitoring keep being spelled out, in the form of red lines. And then they happen. We're already seeing tons of concrete harms - what more do we need? Do you think things will change if there's an actual chemical weapons attack? Or a rogue autonomous replication? Or is there some number of people that need to die first?
Owning shares in most modern companies won't be useful in sufficiently distant future, and might prove insufficient to pay for survival
Well there may simply be better index funds. In fact QQQ is already pretty good.
The insight is that better property rights are both positive for AI civilization (whether the owners are AIs, humans, uplifted dolphins, etc) and also better for normie legacy humans.
It is not a battle of humans vs AIs, but rather of order (strong property rights, good solutions to game theory) versus chaos (weak property rights, burning of t...
Anthropic's Frontier Red Team recently wrote about how Opus 4.6, fairly autonomously, "found and validated more than 500 high-severity vulnerabilities" in open-source projects
I haven't used it quite enough yet to make a good assessment. Let me report back (or ping me if I don't and you're still curious) in a few weeks.
Agree! And try for the writing style where anything than less than 80% of your readers are going to want to read you put in a footnote, to make the mainline readthrough as streamlined as possible. I think this could easily become the best explainer to full doom around.
In the absence of a fire alarm, what other observable signals or patterns might we look out for that this perennial line-crossing predicates harm?
Are there behaviors or events we can monitor for in the world that, having crossed some red line, a real threat is emerging as a result of some model capability?
Or are defining the red lines the only warning we have, understanding that we cross them at our own peril because we can only assess the threat post hoc?
One example immediately comes to mind: Eliezer ... is highly confident that animals have no moral patienthood.
This is because he thinks they are not sentient, because of a personal theory about the nature of consciousness. So, he has the normal opinion that suffering is bad, but apparently he thinks that in many species you only have the appearances of suffering, and not the experience itself. (I remember him saying somewhere that he hopes animals aren't sentient, because of the hellworld implications if they do.) He even suggests that human babies don't h...
+1 to this!
Maybe a neat solution might be to instead use a deploy key (gives read/write access just to one repo) instead of adding a full ssh key?
https://docs.github.com/en/authentication/connecting-to-github-with-ssh/managing-deploy-keys
I think jaggedness of RL (in modern LLMs) is an obstruction that would need to be addressed explicitly, otherwise it won't fall to incremental improvements or scaffolding. There are two very different levels of capability, obtained in pretraining and in RLVR, but only pretraining is somewhat general. And even pretraining doesn't adapt to novel situations other than through in-context learning, which only expresses capabilities at the level of pretraining, significantly weaker than RLVR-trained narrow capabilities.
Scaling will make pretraining stronger, but...
It's basically "time goes backwards inside the box when it's turned on". So you can turn the box on in the morning, immediately see you-2 climb out of it, then both of you coexist for a day and you-2 shares some future information with you, then in the evening you set the box to wind-down and climb inside it, then you wait several hours inside the box, then climb out as you-2 at in the morning and relive the events of the day from that perspective, then you-1 climbs into the box and is never seen again, and you remain.
When put this way, it's nice and consi...
Keep up the good work!
Many people (self included) have the experience of doing manual labor, standing next to an industrial machine that could move the dirt that sits idle because their hands and backs are cheaper than gasoline.
Ok, that one in particularly would only be fairly annoying rather than very annoying, fair point. You would need either need to have your training infra set up to allow you to apply a probe while sampling, which sounds annoying, or to rerun after sampling and apply a probe the second time. The latter is easier infra wise as you could run it in inference only mode but still adds significant overhead as you need to run it again, even if it doesn't involve any generation, and plausibly with a somewhat different model configuration since you want to access act...
Gemini 2.0 Flash-Lite has a training cutoff of August 2024, and the 2.5 update — of January 2025. When checked in AI Studio, both models quite consistently output that they believe the current year is 2024, although 2.0 Flash-Lite occasionally stated 2023. I think 2.5 Flash-Lite is the most obvious candidate!
As a side note, it's reasonable to believe that both Flash-Lite models are related to Gemma models but I'm not sure which ones in particular, and there doesn't appear to be good estimates of param counts
There is a lot of tension between "this is how would be nice for optimal agent to be built" and "this is how actual brains work".
I can imagine that kinda-interpretability scheme works for, say, spatial tasks: it seems easy to track content of 3d-world model and reward successful accomplishment of tasks like "move object from point A to point B" and I would suspect that this system operates through cerebellum. I don't think such system exists for anything more complicated like "caring about other entities mental states".
Ramana Kumar!
For me a key benefit of maths is to answer the question "how much?", to turn qualitative intuitions into quantitative models.
For example if someone tells you "drug X binds to receptor Y which triggers therapeutic effect Z", the first question that comes to mind is "how much X do I need to take to get that much Z?".
If you don't answer that the info is not actionable. That's where the math models (pharmacocinetics and pharmacodynamics) come in, they tell you how much, which allows you to turn info into action.
I am a volunteer with PauseAI Australia, so if anyone wants to connect with our very, very small group, that would be great. We are pushing politicians on superintelligence.
After using Claude Code for a while, I can't help but conclude that today's frontier LLMs mostly meet the bar for what I'd consider AGI - with the exception of two things, that, I think, explain most of their shortcomings:
Most frontier models are marketed as multimodal, but this is often limited to text + some way to encode images. And while LLM vision is OK for many practical purposes, it's far from perfect, and even if they had perfect sight, being limited to singular images is still a huge limitation[1...
Oh I never thought of the religion analogy. It feels like a very cruel thing for a religion to punish disbelief like that, and the truth is :/ I really dislike the appearance of my idea. I was really reluctant to use the word "thoughtcrime" but no other word describes it.
But... practically speaking, we're not punishing the AI for thoughtcrimes just because we hate freedom. But because we're in quite an unfortunate predicament where we really don't know about it and our future, and it's rational to shut down an mysterious power which is in the middle of cal...
I am not familiar with these debates, but I have a feeling that you're arguing against a strawman here.
I've started to watch the YouTube channel Clean That Up. It started with me pragmatically searching for how to clean something up in my apartment that needed cleaning, but then I went down a bit of a rabbit hole and watched a bunch of his videos. Now they appear in my feed and I watch them periodically. To my surprise, I actually quite enjoy them.
It's made me realize how much skill it takes to clean. Nothing in the ballpark of requiring a PhD, but I dunno, it's not trivial. Different situations call for different tools, techniques and cleaning materials. B...
Fortunately, it would be such a massive pain to change the highly optimised infrastructure stacks of frontier labs to use model internals in training that I think this is only likely to happen if there are major gains to be had and serious political will, whether for safety or otherwise. I would be very surprised if this happens in frontier model training in the near future, and I see this as a more speculative longer-term research bet.
I am confused about this. Can't you just do this in post-training in a pretty straightforward way? You do a forward pass, ...
Owning shares in most modern companies won't be useful in sufficiently distant future, and might prove insufficient to pay for survival. Even that could be eaten away by dilution, over astronomical time. The reachable universe is not a growing pie, ability to reinvest into relevant entities won't necessarily be open.
My uneducated take is, I like Hazard's observations and I think his essay is directionally true.
But I agree with your pushback regarding education. It's hard to believe these big stories like compulsory schooling being a deliberate tool that "the elite" designed to domesticate people, or that schooling makes people unprincipled. It's understandable that Hazard doesn't want to argue for every single claim, but he should have presented incriminating evidence to back up this extraordinary theory.
I don't know much about how well the gold standard worked or how...
I'm realizing now that I should have been more clear about what I meant by "feeling bad". I could see like a 1/10 or 2/10 level of "feeling bad" being worthwhile when you stumble across "basic thing gaps". But what I had in mind is moreso like a 5/10 or more type of "feeling bad". Something more substantial.
That magnitude doesn't seem worthwhile. If you react that way every time you don't know a "basic thing", you'll probably end up feeling a ton of bad feelings. An amount that outweighs the benefits of added motivation.
Hasn't this been part of the religious experience of much of humanity, in the past and still in the present too? (possibly strongest in the Islamic world today). God knows all things, so "he" knows your thoughts, so you'd better bring them under control... The extent to which such beliefs have actually restrained humanity, is data that can help answer your question.
edit: Of course there's also the social version of this - that other people and/or the state will know what you did or what you planned to do. In our surveilled and AI-analyzed society, detection not just of crime, but of pre-crime, is increasingly possible.
The post-singularity regime is probably very safe
Is there some unstated premise here?
Are you assuming a model of the future according to which it remains permanently pluralistic (no all-powerful singletons) and life revolves around trade between property-owning intelligences?
Isn't inference memory bound on kv cache? If that's the case then I think "smaller batch size" is probably sufficient to explain the faster inference, and the cost per token to Anthropic of 80TPS or 200TPS is not particularly large. But users are willing to pay much more for 200TPS (Anthropic hypothesizes).
Native Chinese speaker here. All the translations are accurate except for 你最喜欢的小说是哪一部?"What is your favorite work of fiction?", where 小说 should be novel instead of work of fiction.
Here's my vote for Epistemic Roguelikes. It seems like a riskier path, but with a lot more upside.
I don't think so – my CLAUDE.md is fairly short (23 lines of text) and consists mostly of code style comments. I also have one skill for set up for using Julia via a REPL. But I don't think either of these would result in more disagreement/correction.
I've used Claude Code in mostly the same way since 4.0, usually either iteratively making detailed plans and then asking it to check off todos one at a time, or saying "here's a big, here's how to reproduce it, figure out what's going on."
I also tend to write/speak with a lot of hedging, so that might make Claude more likely to assume my instructions are wrong.
Great article, thanks for writing about this