Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

Subscribe here to receive future versions.

---

How AI enables disinformation

Yesterday, a fake photo generated by an AI tool showed an explosion at the Pentagon. The photo was falsely attributed to Bloomberg News and circulated quickly online. Within minutes, the stock market declined sharply, only to recover after it was discovered that the picture was a hoax.

This story is part of a broader trend. AIs can now generate text, audio, and images that are unnervingly similar to their naturally occurring counterparts. How will this affect our world, and what kinds of solutions are available?

The fake image generated by an AI showed an explosion at the Pentagon.

AIs can generate personalized scams. When John Podesta was the chair of Hillary Clinton’s 2016 presidential campaign, he fell for a scam that might soon become more common. He received what looked like an email from Google asking him to confirm his password. But after Podesta provided his login credentials, he learned that the email was from Russian hackers, who then leaked thousands of his emails and private documents to the public, which hindered Clinton’s campaign. 

This is an example of a spear phishing attack, and language models could make it easier to tailor these attacks towards towards specific individuals. A new paper used ChatGPT and GPT-4 to write personalized emails to more than 600 British Members of Parliament, referencing the hometowns and proposed policies of each MP. These emails could be sent with phony links to malware or password collectors, allowing hackers to steal private information as in the Podesta case. 

There is no silver bullet solution, but the paper proposes two countermeasures. First, AI can be used to identify and filter out spam. He notes that this becomes less effective as language models improve their impersonation of normal human text, and as humans use language models in their own writing. Second, he argues that companies should monitor for misuse of their AI tools and cut off access to anyone using AIs for illegal activities. Yet if companies open source their models to the public, this solution would be impossible. 

Drowning out democratic voices. Democratic processes often encourage voters to provide input to lawmakers. But if AIs can impersonate voters, how will democracy function? 

We are already facing this problem. Three companies stole the identities of millions of voters in order to make false comments in support of a federal repeal of net neutrality laws. New York State sued the companies and, last week, they agreed to pay $615K in legal penalties. 

Others have argued that language models could be used to lobby lawmakers at scale in the names of real voters. The White House is seeking guidance on how to handle this onslaught of AI-powered disinformation, but few solutions have been identified so far. 

A fake video generated by AI showed Ukrainian President Volodymyr Zelensky supposedly surrendering to Russian forces in March of last year. 

AIs can personalize persuasion. Facebook was fined $5 billion by the FTC for selling the personal data of millions of users to Cambridge Analytica, a political consultancy that created personalized political ads for users based on their demographics. This personalization of political messaging can be corrosive to democracy. It allows politicians to give a different message to each voter depending on what they want to hear, rather than choosing their platform and representing it honestly to all voters.

Personalized persuasion will become easier as people begin having extended interactions with AIs. Just as individualized social media recommendations shape our beliefs today, a chatbot that supports certain political beliefs or worldviews could have a corrosive impact on people’s abilities to think for themselves. Current language models have been shown to reflect the beliefs of a small subset of users, and a variety of groups have indicated interest in building chatbots that reflect their own ideologies. Over time, persuasive AIs could undermine collective decision-making and prevent consensus on divisive topics. 

Dishonesty originating from AI systems. The two previous examples concern humans deliberately using AIs for harmful purposes. But it has also been shown that AIs will often generate false information despite the best intentions of its creators and users.

The simplest reason is because sometimes AIs don’t know any better. Companies have strong incentives to fix these hallucinations, which limits their potential for long-term damage. 

But in some cases, there are incentives for AIs to continue generating false information, even as they learn the difference between true and false. One reason is that language models are trained to mimic text from the internet, and therefore often repeat common misconceptions such as “stepping on a crack will break your mother’s back.” Another motivation is that humans might penalize a model for telling unpleasant truths. For example, if an AI honestly reports that it has broken a rule, developers might train the model to behave differently next time. Instead of learning not to break rules, an AI might learn to hide its transgressions. 

Last week, we covered research that could prevent or identify these cases of AI dishonesty. But few of these techniques have been implemented to oversee cutting edge models, and more research is needed to ensure models have honest motivations as they gain capabilities.

Governance recommendations on AI safety

How can AI labs improve their safety practices? A recent survey polled 92 experts from AI labs, academia, and civil society, asking how much they support a variety of safety proposals. Responses were overwhelmingly positive, with 49 out of 50 proposals being supported by the majority of respondents, and more than 20 proposals supported by 90% of respondents.

Here’s a breakdown of some key proposals. 

Evaluate risks from AI models. Rather than adopt the “move fast, break things” attitude of much of the tech world, most AI experts agreed that companies should assess risks from AI models before, during, and after training them. Evaluations should consider both the scale of harm that could be caused by AIs and the likelihood of such a harm. Internal efforts can be complemented by external investigators, as OpenAI did before releasing GPT-4

Be careful who you give access to. Monitoring how people use AI models can put a stop to crimes and misuse before they cause widespread damage. Some companies deploy AI systems incrementally, giving access to a small group of users first in order to test safety and security features. Only when safety measures have been proven capable should the model be made available to a wider group of users. 

Other companies have made their AI available to anonymous internet users, leaving no opportunity to limit damage from AI misuse. Respondents to this poll strongly agreed with recommendations that AI labs should make models available to other researchers first, then to a wider group of users, and supported Know Your Customer (KYC) screening under which AI companies would need to verify user identities before providing access to large models. 

Cybersecurity and information security. Hackers have accessed the private financial information of millions of consumers by hacking large corporations including Target, Home Depot, and Equifax. Because of the economic and political value of advanced AI models, respondents to this survey agreed that AI labs should implement strong protections against cyberattacks and espionage. They said several layers of defenses should be used, including both initial defenses against attacks and protocols for responding to ongoing security threats. 

Technical research on AI safety. Risks from AI can be mitigated by a variety of technical solutions. But according to a recent estimate, only 2% of AI research is safety relevant. Respondents agreed that AI labs should conduct research on AI alignment and maintain a balance between capabilities progress and safety progress.

Senate hearings on AI regulation

The Senate held a hearing on AI regulation on Tuesday, featuring testimony from OpenAI CEO Sam Altman, NYU Professor Gary Marcus, and IBM executive Christina Montgomery.

A federal agency for AI. When a pharmaceutical company develops a new drug, they must prove its safety and effectiveness to the FDA before they can sell it. While there is a growing consensus about AIs posing risks, there is no federal agency that reviews AI models before they are released to the public. Allowing companies to decide their own AI safety standards has already led to AI enabling real world harms such as helping hackers create malware, giving scammers the ability to mimic people’s voices, and threatening the safety of chatbot users.

During the Senate testimony, Altman and Marcus agreed that a federal agency to regulate AI is necessary to ensure safety and minimize risks from the technology. Montgomery disagreed, arguing that “existing regulatory authorities… have the ability to regulate in their respective domains.” This domain-specific approach to governing general purpose AI systems has been rejected by AI experts as well as European Union lawmakers, who recently proposed including general purpose AI systems in the jurisdiction of the EU AI Act. Similarly, a federal agency for AI could help address the rapidly changing challenges of general purpose AI systems.

Evaluating and licensing models above a capability threshold. Only a small handful of companies build the most powerful AI models, as they typically cost millions of dollars to train. Altman argued that those companies should be held to a higher standard because their models have new capabilities that could pose unexpected threats to society. Under this framework, startups and open source projects would have more freedom to innovate, while more powerful models would need to be proven safe at a higher level of scrutiny.

International cooperation is necessary and viable. Leading AI models are trained on hardware that is produced by only a handful of firms in the world. The United States has seen significant success in controlling the computer chip supply chain. For example, last September the US banned sales of cutting edge hardware to China, and key firms in Taiwan and the Netherlands have supported the American ban. By leveraging the international supply chain of computer hardware, the US can encourage international cooperation on AI governance. 

Yesterday, OpenAI recommended that companies should begin implementing safety protocols that could later be made legally binding by an international regulatory agency. They argue that such an agency should “focus on reducing existential risk and not issues that should be left to individual countries, such as defining what an AI should be allowed to say.”

  • Former Prime Minister of Israel Naftali Bennett says that “just like nuclear tech, [AI] is an amazing invention for humanity but can also risk the destruction of humanity.”
  • Former CEO of Google argues that AI companies should regulate themselves, saying “There’s no way a non-industry person can understand what’s possible.” This objection seems less sharp after Senate hearings requesting the input of AI leaders on potential regulations.
  • Leaders of G7 nations call for global standards to ensure that AI is “accurate, reliable, safe and non-discriminatory.”
  • A New Yorker article considers the risks of accelerating AI capabilities.
  • A leak from Google’s AI team reveals their new language model was trained with more than five times more data than its predecessors.
  • Scale AI introduces an AI platform to help the US government “rapidly integrate AI into warfighting.” The press release argues that “the only solution to the AI War is to accelerate our own military implementations and rapidly integrate AI into warfighting.”

See also: CAIS website, CAIS twitter, A technical safety research newsletter

Subscribe here to receive future editions of the AI Safety Newsletter.

25

New Comment

New to LessWrong?