Saying ‘fuck them’ when people are shifting to taking actions that threaten society is expressing something that should be expressed, in my view.
I see Oliver replied that in response to two Epoch researchers leaving to found an AI start-up focussed on improving capabilities. I interpret it as ‘this is bad, dismiss those people’. It’s not polite though maybe for others who don’t usually swear, it comes across much stronger?
To me, if someone posts an intense-feeling negatively worded text in response to what other people are doing, it usually signals that there is something they care about that they perceive to be threatened. I’ve found it productive to try relate with that first, before responding. Jumping to enforcing general rules stipulated somewhere in the community, and then implying that the person not following those rules is not harmonious with or does not belong to the community, can get counterproductive.
(Note I’m not tracking much of what Oliver and Geoffrey have said here and on twitter. Just wanted to respond to this part.)
instead being "demonizing anyone associated with building AI, including much of the AI safety community itself".
I'm confused how you can simultaneously suggest that this talk is about finding allies and building a coalition together with the conservatives, while also explicitly naming "rationalists" in your list of groups that are trying to destroy religion
I get the concern about "rationalists" being mentioned. It is true that many (but not all) rationalists tend to downplay the value of traditional religion, and that a minority of rationalists unfortunately have worked on AI development (including at DeepMind, OpenAI and Anthropic).
However, I don't get the impression that this piece is demonising the AI Safety community. It is very much arguing for concepts like AI extinction risk that came out of the AI Safety community. This is setting a base for AI Safety researchers (like Nate Soares) to talk with conservatives.
The piece is mostly focussed on demonising current attempts to develop 'ASI'. I think accelerating AI development is evil in the sense of 'discontinuing life'. A culture that commits to not do 'evil' also seems more robust at preventing some bad thing from happening than a culture focused on trying to prevent an estimated risk but weighing this up with estimated benefits. Though I can see how a call to prevent 'evil' can result in a movement causing other harms. This would need to be channeled with care.
Personally, I think it's also important to build bridges across to multiple communities, to show where all of us actually care about restricting the same reckless activities (toward the development and release of models). A lot of that does not require bringing up abstract notions like 'ASI', which are hard to act on and easy to conflate. Rather, it requires relating with communities' perspectives on what company activities they are concerned about (e.g. mass surveillance and the construction of hyperscale data centers in rural towns), in a way that enables robust action to curb those activities. The 'building multiple bridges' aspect is missing in Geoffrey's talk, but also it seems focused on first making the case why traditional conservatives should even care about this issue.
If we care to actually reduce the risk, let's focus the discussion on what this talk is advocating for, and whether or not that helps people in communities orient to reduce the risk.
These are insightful points. I'm going to think about this.
In general, I think we can have more genuine public communication about where Anthropic and other companies have fallen short (from their commitments, in terms of their legal requirements, and/or how we as communities expect them to not do harm).
Good question. I don't know to be honest.
Having said that, Stop AI is already organising monthly open protests in front of OpenAI's office.
Above and beyond the argument over whether practical or theoretical alignment can work I think there should be some norm where both sides give the other some credit …
E.g. for myself I think theoretical approaches that are unrelated to the current AI paradigm are totally doomed, but I support theoretical approaches getting funding because who knows, maybe they're right and I'm wrong.
I understand this is a common area of debate.
Both approaches do not work based on the reasoning I’ve gone through.
the LTBT is consulted on RSP policy changes (ultimately approved by the LTBT-controlled board), and they receive Capability Reports and Safeguards Reports before the company moves forward with a model release.
These details are clarifying, thanks! Respect for how LTBT trustees are consistently kept in the loop with reports.
The class T shares held by the LTBT are entitled to appoint a majority of the board
...
Again, I trust current leadership, but think it is extremely important that there is a legally and practically binding mechanism to avoid that balance being set increasingly towards shareholders rather than the long-term benefit of humanity
...
the LTBT is a backstop to ensure that the company continues to prioritize the mission rather than a day-to-day management group, and I haven't seen any problems with that.
My main concern is that based on the public information I've read, the board is not set up to fire people in case there is some clear lapse of responsibility on "safety".
Trustees' main power is to appoint (and remove?) board members. So I suppose that's how they act as a backstop. They need to appoint board members who provide independent oversight and would fire Dario if that turns out to be necessary. Even if people in the company trust him now.
Not that I'm saying that trustees appointing researchers from the safety community (who are probably in Dario's network anyway) robustly provides for that. For one, following Anthropic's RSP is not actually responsible in my view. And I suppose only safety folks who are already mostly for the RSP framework would be appointed as board members.
But it seems better to have such oversight than not.
OpenAI's board had Helen Toner, someone who acted with integrity in terms of safeguarding OpenAI's mission when deciding to fire Sam Altman.
Anthropic's board now has the Amodei siblings and three tech leaders – one brought in after leading an investment round, and the other two brought in particularly for their experience in scaling tech companies. I don't really know these tech leaders. I only looked into Reed Hastings before, and in his case there is some coverage of his past dealings with others that make me question his integrity.
~ ~ ~
Am I missing anything here? Recognising that you have a much more comprehensive/accurate view of how Anthropic's governance mechanisms are set up.
This is clarifying. Appreciating your openness here.
I can see how Anthropic could have started out with you and Dustin as ‘aligned’ investors, but that around that time (the year before ChatGPT) there was already enough VC interest that they could probably have raised a few hundred millions anyway
Thinking about your invitation here to explore ways to improve:
i'm open to improving my policy (which is - empirically - also correllated with the respective policies of dustin as well as FLI) of - roughly - "invest in AI and spend the proceeds on AI safety"
Two thoughts:
When you invest in an AI company, this could reasonably be taken as a sign that you are endorsing their existence. Doing so can also make it socially harder later to speak out (e.g. on Anthropic) in public.
Has it been common for you to have specific concerns that a start-up could or would likely do more harm than good – but you decide to invest because you expect VCs would cover the needed funds anyway (but not grant investment returns to ‘safety’ work, nor advise execs to act more prudently)?
In that case, could you put out those concerns in public before you make the investment? Having that open list seems helpful for stakeholders (e.g. talented engineers who consider applying) to make up their own mind and know what to watch out for. It might also help hold the execs accountable.
The grant priorities for restrictive efforts seem too soft.
Pursuing these priorities imposes little to no actual pressure on AI corporations to refrain from reckless model development and releases. They’re too complicated and prone to actors finding loopholes, and most of them lack broad-based legitimacy and established enforcement mechanisms.
Sharing my honest impressions here, but recognising that there is a lot of thought put behind these proposals and I may well be misinterpreting them (do correct me):
The liability laws proposal I liked at the time. Unfortunately, it’s become harder since then to get laws passed given successful lobbying of US and Californian lawmakers who are open to keeping AI deregulated. Though maybe there are other state assemblies that are less tied up by tech money and tougher on tech that harms consumers (New York?).
The labelling requirements seem like low-hanging fruit. It’s useful for informing the public, but applies little pressure on AI corporations to not go further ‘off the rails’.
The veto committee proposal provides a false sense of security with little teeth behind it. In practice, we’ve seen supposedly independent boards, trusts, committees and working groups repeatedly fail to carry out their mandates (at DM, OAI, Anthropic, UK+US safety institute, the EU AI office, etc) because nonaligned actors could influence them to, or restructure them, or simply ignore or overrule their decisions. The veto committee idea is unworkable, in my view, because we first need to deal with a lack of real accountability and capacity for outside concerned coalitions to impose pressure on AI corporations.
Unless the committee format is meant as a basis for wider inquiry and stakeholder empowerment? A citizen assembly for carefully deliberating a crucial policy question (not just on e.g. upcoming training runs) would be useful because it encourages wider public discussion and builds legitimacy. If the citizen’s assembly mandate gets restricted into irrelevance or its decision gets ignored, a basis has still been laid for engaged stakeholders to coordinate around pushing that decision through.
The other proposals – data centre certification, speed limits, and particularly the global off-switch – appear to be circuitous, overly complicated and mostly unestablished attempts at monitoring and enforcement for mostly unknown future risks. They look technically neat, but create little ingress capacity for different opinionated stakeholders to coordinate around restricting unsafe AI development. I actually suspect that they’d be a hidden gift for AGI labs who can go along with the complicated proceedings and undermine them once no longer useful for corporate HQ’s strategy.
Direct and robust interventions could e.g. build off existing legal traditions and widely shared norms, and be supportive of concerned citizens and orgs that are already coalescing to govern clearly harmful AI development projects.
An example that comes to mind: You could fund coalition-building around blocking the local construction of and tax exemptions for hyperscale data centers by relatively reckless AI companies (e.g. Meta). Some seasoned organisers just started working there, and they are supported by local residents, environmentalist orgs, creative advocates, citizen education media, and the broader concerned public. See also Data Center Watch.
Thanks, you're right that I left that undefined. I edited the introduction. How does this read to you?
"From the get-go, these researchers acted in effect as moderate accelerationists. They picked courses of action that significantly sped up and/or locked in AI developments, while offering flawed rationales of improving safety."
Just a note here that I'm appreciating our conversation :) We clearly have very different views right now on what is strategically needed but digging your considered and considerate responses.
but also once LLMs do get scaled up, everything will happen much faster because Moore's law will be further along.
How do you account for the problem here that Nvidia's and downstream suppliers' investment in GPU hardware innovation and production capacity also went up as a result of the post-ChatGPT race (to the bottom) between tech companies on developing and releasing their LLM versions?
I frankly don't know how to model this somewhat soundly. It's damn complex.
Gebru thinks there is no existential risk from AI so I don't really think she counts here.
I was imagining something like this response yesterday ('Gebru does not care about extinction risks').
My sense is that the reckless abandon of established safe engineering practices is part of what got us into this problem in the first place. I.e. if the safety community had insisted that models should be scoped and tested like other commercial software with critical systemic risks, we would be in a better place now.
It's a more robust place to come from than the stance that developments will happen anyway – but that we somehow have to catch up by inventing safety solutions generally applicable to models auto-encoded on our general online data to have general (unknown) functionality, used by people generally to automate work in society.
If we'd manage to actually coordinate around not engineering stuff that Timnit Gebru and colleagues would count as 'unsafe to society' according to say the risks laid out in the Stochastic Parrots paper, we would also robustly reduce the risk of taking a mass extinction all the way. I'm not saying that is easy at all, just that it is possible for people to coordinate on not continuing to develop risky resource-intensive tech.
but the common thread is strong pessimism about the pragmatic alignment work frontier labs are best positioned to do.
This is agree with. So that's our crux.
This not a very particular view – in terms of the possible lines of reasoning and/or people with epistemically diverse worldviews that end up arriving at this conclusion. I'd be happy to discuss the reasoning I'm working from, in the time that you have.
I agree you won't get such a guarantee
Good to know.
I was not clear enough with my one-sentence description. I actually mean two things:
The reason I think it's possible is that a corrigible and non-murderous AGI is a coherent target that we can aim at and that AIs already understand. That doesn't mean we're guaranteed success mind you but it seems pretty clearly possible to me.
I agree that this is a specific target to aim at.
I also agree that you could program for an LLM system to be corrigible (for it to correct output patterns in response to human instruction). The main issue is that we cannot build in an algorithm into fully autonomous AI that can maintain coherent operation towards that target.
Agreed on tracking that hypothesis. It makes sense that people are more open to consider what’s said by an insider they look up to or know. In a few discussions I saw, this seemed a likely explanation.
Also, insiders tend to say more stuff that is already agreed on and understandable by others in the community.
Here there seems to be another factor:
Whether the person is expressing negative views that appear to support v.s. to be dissonant with core premises. With ‘core premises’, I mean beliefs about the world that much thinking shared in the community is based on, or tacitly relies on to be true.
In my experience (yours might be different), when making an argument that reaches a conclusion that contradicts a core premise in the community, I had to be painstakingly careful to be polite, route around understandable misinterpretations, and already address common objections. To be able to get to a conversation where the argument was explored somewhat openly.
It’s hard to have productive conversations that way. The person arguing against the ‘core premise’ bears by far the most cost trying to write out responses in a way that might be insightful for others (instead of dismissed too quickly). The time and strain this takes is mostly hidden to others.