This is fantastic. It has me wondering what other cheap, highly effective things we can set modern AI to for AI safety.
Thoughts I had about this specifically:
Re: Sonnet for search and 4o for analysis, could Opus or GPT 5.2 have been cheaper or better? My impression is that Opus 4.5, despite higher token costs, uses them more efficiently.
Would it be cheaper to use Gemini/Perplexity (which, as I understand it, tend to be more efficient and powerful when searching than Claude)?
Would using Grok to pull Twitter data have been helpful?
Would a verification pass using different instances improve hallucinations, find missed data points, and improve the code?
I’m not very experienced in doing this - these are just the first thoughts that came to mind. I’m half tempted to try to replicate your results!
"Secondarily, current models don’t operate for long enough (or on hard enough problems) for these convergent instrumental incentives to be very strong."
I'm worried that when it comes to Claude Code, this is not a base capabilities problem, but an elicitation one. It feels very plausible to me that with the correct harness, you could actually get an assemblage that is capable of arbitrary long horizon work.
Like - do humans actually have a long time horizon? The Basic Rest-Activity Cycle suggests we work in ~90 minute bursts at most. If true, then the base models are already there. All we would need is a way to mimic or substitute the cognitive scaffolding that we use to pull off our own arbitrary time horizons.
I'm envisioning an "assembly line" of cognitive labor - if Claude Code has 30-90 minute time horizons on every task natively, then you figure out a way to have it orchestrate the chopping up of arbitrary tasks into bits that dedicated subagents can do. Then you add a suite of epistemic tools, permanent memory, version control, reflexes; self-improvement via self-reflection, objective tests, and external research; the ability to build out its own harness; and so on: all the individual ingredients needed to get a bootstrapping functional cognitive laborer. [Edit: I'm very concerned this is an infohazard!!!]
Do we know that this can't work right now? Has anyone really tried? I've been looking and I don't see anyone doing this to the maximal extent I'm envisioning, but it seems inevitable that either someone will, either in the hacker community or at the frontier labs. I'm working on putting something like this together but, thinking about it, I'm really worried I shouldn't.
I sometimes worry that my ability to perceive social status isn't calibrated well. I wonder if you might be experiencing that? They may have been patting you on the back for your cool questions rather than your jokes, but you completely missed it.
Also, there might be some selection effects on who shows up to philosophy meetups, such that their net total epistemics are worse than a randomly selected sample of people from the general population. To spitball a low confidence explanation - maybe they're high in openmindedness, but haven't developed an epistemic toolkit suited for dealing with that? So they do worse than more average closed-minded people in forming good beliefs? But honestly, I don't like thinking this way very much. It's not very charitable, and I wouldn't want to say that to the faces of people I'm judging this way.
I guess if it were me, I would worry that maybe I was Just Wrong and I failed to engage with the social reality correctly? Like there was a layer or signal that I completely missed? A while back I read an essay about how neurotypical people differ from ASD people about their relationship to the Truth, and it's stuck with me. It could be just that: they relate to Truth differently.
Hi, I'm a local who's interested in AI safety, though I don't think I'd have anything to contribute since I'm still a student. Would this be something where it would be cool if I applied or showed up? Or are you aiming for a more professional atmosphere?
Isn’t this a really small amount, like a single raisin has this much free glucose in it right? Why not just microdose raisins? How does this create a signal against normal blood glucose variation? Your gut should be releasing a lot more glucose from just the oats. Do you have a metabolic disorder? I guess I’m trying to reason out about why your body isn’t already supplying the glucose to your brain in the first place from current digestion, and why would glucose microdosing get around this.
“What terms have you heard of (or can you invent right now in the comments?) that we have definitely NOT passed yet?”
Artificial Job-taking Intelligence (AJI): AI that takes YOUR job specifically
Money Printing Intelligence (MPI): AI that any random person can tell in plain language to go make money for them (more than its inference and tool use cost) and it will succeed in doing so with no further explication from them
Tool-Using Intelligence (TUI): AI that can make arbitrary interface protocols to let it control any digital or physical device responsive to an Internet connection (radio car, robot body, industrial machinery)
Automated Self-Embodier (ASE): An AI that can design a physical tool for itself to use in the world, have the parts ordered, assembled, and shipped to the location it wants the tool at, and is able to remote in and use it for its intended purpose at that location with few hiccups and little assistance from humans outside of plugging it in
“Everything feels both hopeless - my impact on risk almost certainly will round down to zero - and extremely urgent - if I don’t try now, then I won’t have a chance to.“
I have thought about individual impact a lot myself, and I don’t think this is the right way to see it. It sounds like you might not be hung up on this, but I want to attack it anyways since it’s been on my mind, and maybe you will find it useful.
So. Two alternatives:
Focus on your marginal impact, instead of your absolute impact. No one person’s marginal contributions, in expectation, are going to be able to swing p(doom) by 1%. A much more reasonable target to aim for on a personal level is .01% or .0001% or so.
Or: the paths to successful worlds are highly irregular. There might be several different lines that will get us there, many requiring a high number of steps in sequence, an unknown number of which are interchangeable. The problem is too difficult and unknowable to model with a single final probability, or is simply not even in that kind of a reference class. You just have to look for the most effective levers from your position and pull them.
One might counter that actually, we live in a world where you only need a few key ideas or visions, and a few extraordinary, keystone people to implement them. Maybe that’s true. But I think we should think about the difference between two very similar instances of that world, one where we win, one where we lose.
The first thought about that difference that comes to my mind (confidence 80%) is: The ecosystem of work on this was just slightly not robust enough, and those few keystones didn’t encounter the right precursor idea, or meet the right people, or have the right resources to implement them. Or, they didn’t even have the motivation to do it in the first place, due to despairing in their belief in their insignificance.
So given this, I think a key component of that ecosystem is morale. Morale is a correlated decision; if you don’t have it, the keystone people won’t have it either. And you won’t know in advance if you’re one of the keystones, either. Therefore, believe in yourself.
As for whether you’re even likely to be a keystone? Well, looking at your webpage, I’d say it’s much more likely than the odds of a random person on Earth. So you should count yourself in that reference class. This probably extends to anyone who has read LessWrong, even if you’re not aiming for technical work. Some of the key actions might not even be technical, such as if an international pause is required.
And of course, if we live in the fuzzier many-paths world I described earlier, then it’s much harder to say that your actions don’t matter; so the only reason left to take actions as if they don’t is poor self esteem. That should collapse once you take the time to properly integrate that there is no other reason, and as long as you are doing the other things humans need to function (socializing, taking care of your biology, etc.).
Or, I suppose, if you disagree with the whole AI safety project in general, or if you think the chances of anyone helping are truly infinitesimal and you’d rather just focus on living your best life in the shadow of the singularity. But I assume you’re not here for that. So within that frame - do your best; that’s all you have to do.
I am curious, are the kids actually doing this? Is everything you described here literally happening? If so, is there a search term for what they’re doing? And what scene is this happening in?
I encounter the same frustration when trying to talk about this. However, I think of it like this:
From the outside, arguing for AI risk looks highly analogous to evangelism.
You are concerned for the immortal soul (post singularity consciousness) of every human on earth, and that there is an imminent danger to them, posed by wildly powerful unfalsifiable unseen forces (yet-to-be-existent AGI), which cannot be addressed unless you get everyone you talk to to convert to your way of thinking (AI risk).
Or think about it like this: if an evangelist whose religion you disagree with takes every opportunity they get to try to convert you, how would you respond? If you wish to be kind to them, you sympathize with them while not taking their worldview seriously. And if they keep at it, your kindness might wear thin.
AI risk - or any potentially likely x-risk in general- is a very grabby memeplex. Taken seriously, if you think it’s at all a tractable problem, it warps your entire value system around it. Its resemblance to religion in this way is probably a major factor in giving many irreligious folk the ick when considering it. Most people who aren’t already bought into this are not down to have their entire value system upended like this, in a conversation where they likely already have epistemic guards up.
In my view, you often have to take a softer approach. Try to retreat to arguing for the sorts of background assumptions that naturally lead to x-risk concern, rather than arguing for it directly. Share the emotions that drive you, rather than the logic. Try not to push it- people can tell when you’re doing it. Appeal to the beliefs of people your interlocutor genuinely respects. And try to be vulnerable: if you aren’t willing to change your beliefs, why should they be? In conversation, minds are usually changed only when both parties have their guard down.
Doubtless there are darker arts of persuasion you could turn to. They could even be important or necessary. But I personally don’t want to think of myself as a soldier fighting for a cause, willing to use words as weapons. So this is what I’ve come to aspire to instead. It helps a lot in my own belief in the idea, also, to open it up to vulnerable critique from others; if I didn’t, I would just be letting the first grabby memeplex that saw me take me over, without fighting back. And, who knows - you might occasionally meet someone who shares enough assumptions to buy your worldview, and then at that point you can make the easier, logical case.
Edit: upon reflection, a lot of this applies to activism generally, too, not just religion. The connection to religion seems to me to be more prominent, however, due to its potential totality over all affairs of life, the way x-risk concern, and transhumanism more broadly, addresses the same kind of cosmic meaning religion often does, apocalyptic thinking, a central canon, set of institutions, and shared culture, organized gatherings explicitly framed as a secular surrogate for traditional holidays, and so on. Others have likely debated the particular distinctions elsewhere. The irreligionist in me wishes it weren’t so, but I feel I have to own the comparison if I want to see the world correctly. It’s been on my mind a lot.
A guy at my local ACX group has very strong opinions on this. He says that there's mainly two groups of people doing studies on the effect of CO2 on human health and cognition: the people who studied its effects in submarines and spaceships, and the people who study mundane indoor air quality. The submarine people are the ones finding no effects at 4000ppm, while the IAQ people are the ones finding effects at very low concentrations. He thinks the IAQ people have terrible study design. He also thinks (iirc) that body effluents (e.g. VOCs) are the real issue, and CO2 is just a proxy. I'll have to forward your post to him next time I see him (a couple weeks?).
As for practically lowering CO2 measurements in the winter, you can just build a heat/energy recovery ventilator, to get fresh air while recovering some of the heat from indoors. My build is 8ft of 6" semirigid aluminum dryer duct, and 2x8ft of 4" semirigid. Join the 4" together with a collar insert thing and tape, put the 4" inside the 6", set up the assembly indoors so that the 6" terminates at your window and the 4" sticks out a foot or two out of the window to prevent short circuiting air, tilt the assembly so that the 6" is tilted down towards the window (.5" per foot?) so that condensate drains out, put an inline fan pumping inwards on the 4" on the inside terminus, and there you go. You now have a ~50% efficient counterflow HRV for $100; the air comes in the 4", exchanges heat with the air going out the 6". I think it should get to 70% efficiency if I bothered to get another 6ft of 6", to increase the exchange surface area. The slower the fan goes, the more efficient it is.
You can also just buy premade HRVs. Get a pair of ceramic Lunos knockoffs from Alibaba for ~$150-200 total, put them in your window, one pointing inward, the other pointing outwards. Or you can get an ERV if you're worried about moisture loss too, that's more expensive though, like $500?
Mildly tempted to make a post about HRVs. I don't think there's a single post on LessWrong about them. But I should finish my post about how I fully treated my non-24 first.
If you don't wanna do that you could do the usual thing of making a Corsi cube/fan (strap MERV 13 filters to box fans) to capture particulates, and then get a huge charcoal bed canister for VOCs (the kind they use for filtering exhaust from marijuana grow tents, like $50-100) and wrap some filter medium around the outside, and put an inline fan on the other end to draw air through it (and preferably put a filter on the other end so you don't breathe charcoal dust). Those huge charcoal filters don't last as long as furnace filters though.