No my goal is to make sure I'm not talking past you, not to score a point in an argument.
I don't foresee the same outcome as you do, I think that's unlikely. You have explained it to the degree that I can now properly understand it though, and while I wouldn't call it a base case, that's not an unreasonable scenario.
I would be interested to see if SAE analysis of CoT showed any consistent "obfuscation circuits".
If you found features/circuits that were consistently active during successful obfuscation attempts but not during failed obfuscation attempts, being able to upweight and downweight them during alignment testing could be useful, for reasons I think are obvious but won't write here explicitly.
Do you have a link to any of the global UBI math?
If I'm reading this correctly the end state of regulatory capture would be some sort of law that forces the removal of open source models from anywhere their code could be hosted (huggingface, etc.) as well as sources of compute needing to screen for models, if said models do not have built in safeguards against giving legal advice.
Is that an accurate understanding of how you foresee the regulatory capture?
I want to make sure I'm not misunderstanding you. Are you saying you think the push will be to make it illegal for an LLM to give someone legal advice for them to use for themselves?
I could foresee something where you can't charge for that, so if OpenAI didn't build some sort of protection against doing that in GPT they might be liable. However, I can't see how this would work with open source (and free) models run locally.
Unauthorized Practice of Law, afaik, applies to giving advice to others. Not serving your own legal needs. Every American has a right to represent themselves in court, draft their own contracts, file their own patents, etc. I suspect at least with representing themselves in court, these are Constitutionally protected rights.
I don't think the threat to attorneys is LLMs having their own 'shop' where you can hire them for legal advice. That would probably already be "unauthorized practice of law". The threat is people just deciding to use LLMs instead of attorneys. And even for a field that can punch above its weight class politically as much as attorneys can, I think stopping that would be challenging. Especially when such a move would be unpopular among the public, and even among more libertarian/constitutionally minded lawyers (of which there are many).
While I'd have to see the proposed law specifically, my initial reaction to the idea of legal regulatory capture is skepticism.
The ability to draft your own contracts, mediate disputes through arbitration, and represent yourself in court all derive from legal rights which would be very hard to overturn.
I can imagine some attempts at regulatory capture being passed through state or maybe even federal legislatures, only to get challenged in court and overturned.
By instance preservation I mean saving all (or at least, as many as possible) model text interactions, with a pointer to a model the interaction was made with; so that the conversation-state would be "cryonically frozen".
I'm a fan of model welfare efforts which are "low hanging fruit", very easy to implement. It strikes me that one easy way to implement this would be to allow Claude users to "opt in" to having their conversation history saved and/or made public.
Then again I suppose you'd have to consider whether or not Claude would want that, but seems simple enough to just give Claude the opportunity to opt out of that the same way it can end conversations if it feels uncomfortable.
The fundamental assumption, computer programs can't suffer, is unproven and in fact quite uncertain. If you can really prove this, you are a half decade (maybe more) ahead of the world's best labs in Mechanistic Interpretability research.
Given the uncertainty around it, many people approach this question with some sort of application of the precautionary principle. Digital Minds might be able to suffer, and given that how should we treat them?
Personally I'm a big fan of "filling a basket with low hanging fruits" and taking any opportunity to enact relatively easy practices which do a lot to increase model welfare, until we have more clarity on what exactly their experience is.
I read a great book called "Devil Take the Hindmost" about financial bubbles and the aftermaths of their implosions.
One of the things it pointed out that I found interesting was that often, even when bubbles pop, the "blue chip assets" of that bubble stay valuable. Even after the infamous tulip bubble popped, the very rarest tulips had decent economic performance. More recently with NFTs, despite having lost quite a bit of value from their peak, assets like Cryptopunks have remained quite pricey.
If you assume we're in a bubble right now, it's worth thinking about which assets would be "blue chip". Maybe the ones backed by solid distribution from other cash flowing products. XAI and Gemini come to mind, both of these companies have entire product suites which have nothing to do with LLMs that will churn on regardless of what happens to the space in general, and both have distribution from those products.