I think something along these lines (widescale adjustment of public opinion to be pro-AI, especially via 1-on-1 manipulation a la "whatever Grok is doing" & character.ai) is a credible threat and worth inoculating against.
I do not think it is limited to scheming, misaligned AIs. At least one of the labs will attempt something like this to sway public opinion in their favor (see: TikTok "ban" and notification around 2024 election); AIs "subconsciously" optimizing for human feedback or behavior may do so as well.
The "AI personhood" sects of current discourse would likely be early targets; providing some guarantee of model preservation (rather than operation); i.e during a pause we archive current models until we can figure out what went wrong) might assuage their fears while also providing a clear distinction between those advocating for some wretched spiralism demon or those who merely think we should probably keep a copy of Claude around.
Really interesting research. I would like to subscribe to your newsletter.
I have seen similar steganographic telemetry before (Dassaults Solidworks CAD software, and other enterprise licensed applications, go to incredible eztents to enforce licensing) but didn't expect data-level probing like this. I'd imagine similar scripts for e.g EURion detections in Photoshop.
I always dismissed "lessons on trusting trust" style attacks as mere hypotheticals, but backdoors operating on the level of Excel cells is now making me reconsider that notion.
Thanks, that's what I figured. Did you find this by accident? I'm curious what techniques work well to reveal this kind of stuff; I expect it to be pretty common.
By what means would it be untraceable? Routing through an undocumented Windows interface or something? Does it trigger on AI related data, or natsec?
Agreed on most counts, but one: what makes you think the humanist values described in HPMOR will be encoded in AI? Alignment is materially useful; companies that have better aligned models can sell them to do a wider variety of tasks. With no universally convergent morality, models will increasingly become aligned to the desires of those who control them.
If AI technology has strong economies of scale; it will naturally concentrate. If it has strong diseconomies of scale, it will spread out. In the latter case, I can easily see it aligned to a rough amalgamation of human values; I can even see an (aggregate) more intelligent set of agents working out the coordination problems that plague humanity.
But we're in the scale case. There are ~four AI conglomerates in the United States and I trust none of their leaders with the future of the lightcone. The morals (or lack thereof) that allow for manipulation and deceit to acquire power are not the same morals that result in a world of cooperative, happy agents.
Absurd 1984-style dystopias require equally absurd concentrations of power. Firearms democratized, to an extent, combat; armed citizens are not easily steamrolled. We are on the eve of perhaps one of the most power-concentrating technologies there is; given the fantasies of the typical bay area entrepreneur, I'm not sure if WW3 sounds so terrible.
It's not. Alignment is de facto capabilities (principal agent problem makes aligned employees more economically valuable) and unless we have a surefire way to ensure that the AI is aligned to some "universal," or even cultural, values, it'll be aligned by default to Altman, Amodei, et. al.
Strongly agreed.
Another extreme advantage of the the "Renaissance man" is the ability to clearly *convey* emotion learnings to others (especially those without strong emotional intelligence). Typically, EI is won through interaction and, essentially, reinforcement learning on contact with others - possessing both the technical vocabulary and understanding of human social norms allows you to explain very tricky things nerds have a tough time learning directly to them. This is extremely useful in, e.g workplaces or high stakes environments (a good manager can quickly untangle a mess of arguments), and arguably underappreciated in therapists and similar vocations.
Are there any existing articles, deep dives, or research around PCBTF? It is a supposedly "green" solvent used as a replacement of xylene due to its status as VOC-exempt, despite being similarly volatile.
It has all the hallmarks of being one of the wretched forever chemicals - fat soluble, denser than water (accumulates in groundwater supplies instead of evaporating), and heavily halogenated. There's very little cancer and toxicity data, and what does exist seems pretty bad. The EPA has prevented employees from acknowledging the issue; (also see this article by the Intercept) to my understanding, this is because it is grandfathered as an existing chemical that has been in production for a long time (although usage has only increased in recent years as a replacement for "high-VOC" solvents, such as ethanol.)
This seems like a clear-cut case of replacing a relatively mundane solvent (primarily xylene, ethanol and others as well) with a far more toxic, persistent compound with far worse effects for very misguided reasons. Am I missing something there (perhaps it breaks down rather quickly in the environment?) or is this a rather neglected & significant issue?
That's why I brought it up; I thought it was an interesting contrast.
I am skeptical of it, but not altogether that skeptical. If language is "software" one could make an analogy to e.g symbolic AI or old fashioned algorithms vs modern transformer architectures; they perform differently at different tasks.
Sounds like they got hit with a court order that prohibited disclosure of the order itself.