I think (not sure!) the damage from people/orgs/states going "wow, AI is powerful, I will try to build some" is larger than the upside of people/orgs/states going "wow, AI is powerful, I should be scared of it". It only takes one strong enough one of the former to kill everyone, and the latter is gonna have a very hard time stopping all of them.
By not informing the public that AI is indeed powerful, awareness of that fact is disproportionately allocated to people who will choose to think hard about it on their own, and thus that knowledge is more likely to be in reasonabler hands (for example they'd also be more likely to think "hmm maybe I shouldn't build unaligned powerful AI").
The same goes for cyborg tools, as well as general insights about AI: we should want them to be differentially accessible to alignment people than the general public.
In fact, my biggest criticism of OpenAI is not that they built GPTs, but that they productized it, made it widely available, and created a giant public frenzy about LLMs. I think we'd have more time to solve alignment if they kept it internally and the public wasn't thinking about AI nearly as much.
(edit: thank you for your comment! I genuinely appreciate it.)
"""I think (not sure!) the damage from people/orgs/states going "wow, AI is powerful, I will try to build some" is larger than the upside of people/orgs/states going "wow, AI is powerful, I should be scared of it"."""
^^ Why wouldn't people seeing a cool cyborg tool just lead to more cyborg tools? As opposed to the black boxes that big tech has been building?
I agree that in general, cyborg tools increase hype about the black boxes and will accelerate timelines. But it still reduces discourse lag. And part of what's bad about accelerating timelines is that you don't have time to talk to people and build institutions --- and, reducing discourse lag would help with that.
"""By not informing the public that AI is indeed powerful, awareness of that fact is disproportionately allocated to people who will choose to think hard about it on their own, and thus that knowledge is more likely to be in reasonabler hands (for example they'd also be more likely to think "hmm maybe I shouldn't build unaligned powerful AI")."""
^^ You make 3 assumptions that I disagree with:
1) Only reasonable people who think hard about AI safety will understand the power of cyborgs
2) You imply a cyborg tool is a "powerful unaligned AI", it's not, it's a tool to improve bandwidth and throughput between any existing AI (which remains untouched by cyborg research) and the human
3) That people won't eventually find out. One obvious way is that a weak superintelligence will just build it for them. (I should've made this explicit, that I believe that capabilities overhang is temporary, that inevitably "the dam will burst", that then humanity will face a level of power they're unaware of and didn't get a chance to coordinate against. (And again, why assume it would be in the hands of the good guys?))
^^ Why wouldn't people seeing a cool cyborg tool just lead to more cyborg tools? As opposed to the black boxes that big tech has been building?
You imply a cyborg tool is a "powerful unaligned AI", it's not, it's a tool to improve bandwidth and throughput between any existing AI (which remains untouched by cyborg research) and the human
I was making a more general argument that applies mainly to powerful AI but also to all other things that might help one build powerful AI (such as: insights about AI, cyborg tools, etc). These things-that-help have the downside that someone could use them to build powerful but unaligned AI, which is ultimately the thing we want to delay / reduce-the-probability-of. Whether the downside is bad enough that making them public/popular is net bad is the thing that's uncertain, but I lean towards yes, it is net bad.
I believe that:
Only reasonable people who think hard about AI safety will understand the power of cyborgs
I don't think I'm particularly relying on that assumption?? I don't understand what sounded like I think this.
In any case, I'm not making strict "only X are Y" or "all X are Y" statements; I'm making quantitative "X are disproportionately more Y" statements.
That people won't eventually find out.
I believe that capabilities overhang is temporary, that inevitably "the dam will burst"
Well, yes. And at that point the world is much more doomed; the world has to be saved ahead of that. To increase the probability that we have time to save the world before people find out, we want to buy time. I agree it's inevitable, but it can be delayed. Making tools and insights broadly available hastens the bursting of the dam, which is bad; containing them delays the bursting of the dam, which is good.
Things I learned/changed my mind about thanks to your reply:
1) Good tools allow experimentation which yields insights that can (unpredictably) lead to big advancements in AI research.
o1 is an example, where basically an insight discovered by someone playing around (Chain Of Thought) made its way into a model's weights 4 (ish?) years later by informing its training.
2) Capabilities overhang getting resolved, being seen as a type of bad event that is preventable.
This is a crux in my opinion:
It is bad for cyborg tools to be broadly available because that'll help {people trying to build the kind of AI that'd kill everyone} more than they'll {help people trying to save the world}.
I need to look more into the specifics of AI research and of alignment work and what kind of help a powerful UI actually provides, and hopefully write a post some day.
(But my intuition is, the fact that cyborg tools help both capabilities and alignment, is bad, and whether I open source code or not shouldn't hinge on narrowing down this ratio, it should overwhelmingly favor alignment research)
Cheers.
I've written a post about my thoughts related to this, but I haven't gone specifically into whether UI tools help alignment or capabilities more. It kind of touches on "sharing vs keeping secret" in a general way, but not head-on such that I can just write a tldr here, and not along the threads we started here. Except maybe "broader discussion/sharing/enhanced cognition gives more coordination but risks world-ending discoveries being found before coordination saves us" -- not a direct quote.
But I found it too difficult to think about, and it (feeling like I have to reply here first) was blocking me from digging into other subjects and developing my ideas, so I just went on with it.
https://www.lesswrong.com/posts/GtZ5NM9nvnddnCGGr/ai-alignment-via-civilizational-cognitive-updates
(note: lots of discussion happened in the comments, you'll want to read it if you found this post interesting)
TLDR:
- Building powerful human-AI collaboration tools in the open removes capabilities overhang, which reduces discourse lag, which reduces x-risk.
- Alignment work is philosophy/writing/thinking-heavy, capabilities work is coding-heavy. Cyborg tools are more for the former than the latter, and great coding tools already exist.
Given this, "safety concerns" like "but what if someone uses your app and discovers jailbreaks or hacks somebody" are not actually a problem. (maybe even net positive, since they "update discourse" on unknown dangers, more on this later)
(tldr: cyborg tools help alignment more than capabilities, because it's for reading/writing, not coding)
Very empirical, you're writing code and running experiments, running actual Pytorch code running on lots and lots of gpus.
(One piece of evidence is that Sholto and Trenton in the Dwarkesh podcast describe AI research as very empirical)
More about reading and writing on Lesswrong and Substack, thinking about things conceptually, convincing people about dangers of AGI, etc.
Some looks more like general AI research, for example mechanistic interpretability.
(Concrete/pragmatic note: my tool helps me a lot with absorbing material, next up are https://www.narrowpath.co/ and https://www.thecompendium.ai/. In normal UIs you can just dump and ask for a summary, but obviously there are more sophisticated ways to both parse an AI's response, and for feeding it prompts. For example, ask it for a list of title/text pairs, formatted as JSON, use code that parses it and loads a widget containing titled text blocks, which is easier to browse, which my tool can currently do.
- prompt engineering (like pulling from multiple text blocks from within the app, templates, customizable/programmable "surfaces" for writing stuff, for dynamically loading sections of prompts via arbitrary code execution)
- all the benefits of normal note-taking/management systems + heavy customizability
Generally reading and writing, non-agentic stuff, many people have already written about this stuff. Basically my point is that there is little overlap with capabilities research.
You can essentially take it from:
type prompt -> read response
to:
type text in widgets, type code that takes widgets and converts to messages object for api call -> token stream is handled by customizable code -> code is called after response is done
To the extent that the (potential) abilities of a human-AI team are locked behind good UIs (which I think they are--by a huge amount, but won't get into that here), the worse that the existing/well-known UIs are, the more the discourse about a given level of model capability is behind the actual potential of that model.
(in short: model capability*( amplification( best possible UI ) - amplification( current UI ) ) = discourse lag )
This is bad for basically everything that involves talking to other people about the dangers of AI.
Which includes: affecting policy, getting funding, telling people their tech is dangerous and they should feel bad, creating institutions and cooperation among people that don't want to die --- note that these activities are a huge portion of many people's work in AI alignment.
(The only world in which a capability overhang is not like a dam waiting to burst is if I'm the only one who will ever design super good AI tooling and if I just keep quiet nobody will ever build such things, but I'm not that smart, nor that good of a programmer (though pretty good Tkinter GUI builder))
If cyborgism reduces x-risk then this tech should be developed as much and as soon as possible, which you do by sharing, open-sourcing and discussing.
If it doesn't, I don't have to go through the massive effort of open sourcing this, which involves testing, documenting, probably a lot of rewriting, which I really don't feel like doing anyway.
Also I can only ask people for donations (aka "funding") if I can actually show people what my idea is and what I've got so far.
I shared an earlier draft of this with Claude (Sonnet 3.5), it argued against an underlying assumption. I couldn't find a good counterargument.
I'll directly quote it since it wrote so well:
(...)
Me:
capability overhang means people think models are less powerful than they are, which means its harder to get people to take xrisk seriously, which means you can corral less researchers
Claude:
(...)
RESPONSE:
This implies a specific causal chain:
Better UI → Demonstrates true capabilities → Researchers update beliefs → More alignment workBut consider:
1. Researchers might update differently than expected (e.g. "oh cool, let's build more powerful stuff!")
2. Demonstration of capabilities could accelerate both alignment AND capabilities work
3. The bottleneck might not be belief - many researchers who fully believe in x-risk still work on capabilities(...)
RESPONSE:
"People don't want humanity to go extinct" seems obviously true yet doesn't predict behavior well. Consider:
- Most people believe in climate change but don't significantly alter behavior
- People smoke despite knowing it kills
- Engineers kept working on nuclear weapons post-Manhattan ProjectHumans seem to:
1. Discount large future risks
2. Compartmentalize ("someone else will solve it")
3. Follow local incentives (career, status, interesting problems) over global outcomes(...)
The end.