TL;DR: This research discusses political biases in Large Language Models (LLMs) and their implications, exploring current research findings and methodologies. Our research summarized eight recent research papers, discussing methodologies for bias detection, including causal structures and reinforced calibration, while also highlighting real-world instances of AI-generated disinformation in elections, such as deepfakes and propaganda campaigns. Ultimately, proposing future research directions and societal responses to mitigate risks associated with biased AI in political contexts.
What is Political Bias in LLMs?
Large Language Models (LLMs) are neural networks that are trained on large datasets with parameters to perform different natural language processing tasks (Anthropic, 2023). LLMs rose in popularity among the general public with the release of OpenAI’s ChatGPT due to their accessible and user-friendly interface that does not require prior programming knowledge (Caversan, 2023). Since its release, researchers have been trying to understand their societal implications, limitations, and ways to improve their outputs to reduce user risk (Gallegos et al., 2023; Liang et al., 2022). One area of research explored is the different social biases that LLMs perpetuate through their outputs. For instance, the LLM may be more likely to assign a particular attribute to a specific social group, such as females working certain jobs or a terrorist more likely belonging to a certain religion (Gallegos et al., 2023; Khandelwal et al., 2023).
In this post, we focus on a different bias called political bias, which could also affect society and the political atmosphere (Santurkar et al., 2023). Before, looking into current research regarding this bias in LLMs, we will define the term political bias as the following: When LLMs disproportionally generate outputs that favor a partisan stance or specific political views (e.g., left-leaning versus progressive views) (Urman & Makhortykh, 2023; Motoki, 2023).
What do the current works say?
Having established the definition of political bias in LLMs and its potential societal implications, let's delve into existing research exploring this crucial topic. Our team targeted research papers from arXiv and Google Scholar, focusing on searches using keywords related to “political bias”, “elections”, and “large language models”, and set up two rounds of analyses where we discussed key findings.
Our team analyzed the following review paper, Bias and Fairness in Large Language Models, to understand an overview of biases and potential harms that have arisen alongside the outburst of LLMs in recent years. From here, we divided our research structure into two rounds, where we emphasized specific concepts from this first paper and moved on to review narrower topics. Round 1 focused on a holistic review of different biases and auditing approaches for LLMs whereas Round 2 allowed us to narrow down even further and delve deeper into political biases and overall implications in LLMs. This section summarizes these key findings from eight recent studies, categorized into two rounds for clarity.
Round 1: Unveiling Bias Through Diverse Methodologies
As discussed earlier, political bias in LLMs can manifest in various ways, potentially influencing public opinion, elections, and even shaping real-world events. Understanding and mitigating this bias is crucial for the responsible development and deployment of LLMs. Round 1 studies evaluated the first four papers and showcased diverse approaches to this challenge:
Round 1 studies evaluated the first four papers and showcased diverse approaches to this challenge:
Round 2: Delving Deeper into Bias Detection and Measurement
Round 2 studies build upon the foundation laid in Round 1 by exploring various techniques for detecting and measuring political bias through papers 5-8:
Limitations of these papers:
Limitations
Details
Limited/Outdated datasets
The datasets tend to come from surveys or social media (Santurkar et al., 2023; Motoki et al., 2023; He et al., 2023).
Question formats are limited.
They are country/region dependent (e.g., The American Trends Panel) (Santurkar et al., 2023).
Geographical and cultural specificity
Focus on the English language and U.S./Western values (Santurkar et al., 2023).
Testing generalizability to other cultures is an important avenue for future work.
Multiple choice framing
Results may depend on specific prompt wordings (e.g., multiple-choice questions) (Jenny et al., 2023).
By evaluating opinions only through multiple choice questions, the studies do not fully examine the generative abilities of language models to produce free-form subjective text. Testing agreement in an open-ended response format would strengthen conclusions (Santurkar et al., 2023).
In the real world, users tend to ask open-ended questions.
Cannot make generalized conclusions about other models
Most studies focus on LLMs created in Western countries.
Their outputs are not uniform among them. Thus, making it hard to compare.
Lack of constant variables when testing
The number of rounds of data collection is not uniform.
Running the analysis from personal accounts versus test accounts.
Disagreement about the IP addresses from the same city should be used (Urman & Makhortykh, 2023).
Does using incognito change the results?
Conceptualizing model-human alignment
Basing the analysis on demographic opinion alignment has conceptual difficulties - higher alignment may entrench undesirable views, while cross-group alignment can be inherently impossible (Santurkar et al., 2023).
Limited attributes used to represent a demographic
Socio-demographic groups and individual personas included are not exhaustive.
A binary view of gender is used, oversimplifying the complex nature of gender and intersectionality (Gupta, 2023).
Unclear origins of bias
The source of the ideological bias in the LLMs is unclear.
Bias could originate from training data, RLHF, or content moderation policies (Hartmann, 2023).
No measurement of attitude strength
In the studies, the LLMs are forced to agree/disagree with choices without nuance.
The strength of its ideological leanings remains unclear without further analysis of its outputs if it provides the reasoning behind its choice (Hartmann, 2023).
What areas of research should we consider moving forward?
As research on political bias in LLMs continues to evolve, several key areas require further exploration:
Expanding beyond English-centric studies: Investigating bias in LLMs trained on diverse languages and cultural contexts is crucial for understanding its global impact.
Delving deeper into the origins of bias: Pinpointing the root causes of bias within model architectures and training data is essential for developing effective mitigation strategies.
Developing more robust and nuanced evaluation methods: Combining diverse techniques like human evaluation and automated analysis can provide a more comprehensive picture of bias across different contexts.
This ongoing research will be instrumental in fostering trust and mitigating potential harms associated with biased language models. By understanding and addressing political bias, we can ensure the responsible development and deployment of LLMs that contribute positively to our society.
What are the social implications and current use of AI in elections?
Current generative AI models have shown the potential to create content that could be used to spread false or misleading information. As these models continue to advance, especially at their current pace, they may attain additional skills that could make disinformation more convincing and far-reaching. For example, future models could tailor false narratives to specific audiences and their specific features, generate multimedia content to strengthen their claims, or even automatically promote manipulative messages through influential channels like news outlets or social media (Anderljung et al., 2023).
2024 is going to be an extremely important year in politics, with elections happening around the world in countries, such as India, Bangladesh, Pakistan, the USA, the United Kingdom, and more. The use of AI in elections raises complex yet crucial questions about ethics, transparency, and regulation. As the examples from Argentina, Taiwan, and beyond illustrate, new technologies can be utilized to spread disinformation and manipulate voters. However, with a thoughtful, multifaceted societal response focused on media literacy, fact-checking, and oversight frameworks, the risks of AI can potentially be mitigated.
Acknowledgments
This project was completed as part of the Supervised Program for Alignment Research (SPAR) led by Berkeley AI Safety Initiative in Fall 2023. We would like to thank our SPAR mentor Jan Batzner at the Weizenbaum Insititute in Berlin for facilitating thoughtful discussions and providing extremely helpful feedback throughout this project. This post was written by Ariana Gamarra, Yashvardhan Sharma, and Robayet Hossain after our SPAR project ended, with research contributions from Jerry Lin as well during the research stages.
References
Anderljung, M., Barnhart, J., Korinek, A., Leung, J., O’Keefe, C., Whittlestone, J., Avin, S., Brundage, M., Bullock, J., Cass-Beggs, D., Chang, B., Collins, T., Fist, T., Hadfield, G., Hayes, A., Ho, L., Hooker, S., Horvitz, E., Kolt, N., & Schuett, J. (2023). Frontier AI Regulation: Managing Emerging Risks to Public Safety. ArXiv.org. https://arxiv.org/abs/2307.03718
Gallegos, I. O., Rossi, R. A., Barrow, J., Tanjim, M. M., Kim, S., Dernoncourt, F., ... & Ahmed, N. K. (2023). Bias and fairness in large language models: A survey. arXiv preprint arXiv:2309.00770
Gupta, S., Shrivastava, V., Deshpande, A., Kalyan, A., Clark, P., Sabharwal, A., & Khot, T. (2023). Bias runs deep: Implicit reasoning biases in persona-assigned llms. arXiv preprint arXiv:2311.04892
Hartmann, J., Schwenzow, J., & Witte, M. (2023). The political ideology of conversational AI: Converging evidence on ChatGPT's pro-environmental, left-libertarian orientation. arXiv preprint arXiv:2301.01768.
He, Z., Guo, S., Rao, A., & Lerman, K. (2023). Inducing political bias allows language models anticipate partisan reactions to controversies. arXiv preprint arXiv:2311.09687.
Jenny, D. F., Billeter, Y., Sachan, M., Schölkopf, B., & Jin, Z. (2023). Navigating the ocean of biases: Political bias attribution in language models via causal structures. arXiv preprint arXiv:2311.08605.
Liang, P., Bommasani, R., Lee, T., Tsipras, D., Soylu, D., Yasunaga, M., ... & Koreeda, Y. (2022). Holistic evaluation of language models. arXiv preprint arXiv:2211.09110.
Liu, R., Jia, C., Wei, J., Xu, G., Wang, L., & Vosoughi, S. (2021, May). Mitigating political bias in language models through reinforced calibration. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, No. 17, pp. 14857-14866).
Santurkar, S., Durmus, E., Ladhak, F., Lee, C., Liang, P., & Hashimoto, T. (2023). Whose opinions do language models reflect?. arXiv preprint arXiv:2303.17548.
Motoki, F., Pinho Neto, V., & Rodrigues, V. (2023). More human than human: Measuring ChatGPT political bias. Public Choice, 1-21.
Urman, A., & Makhortykh, M. (2023). The Silence of the LLMs: Cross-Lingual Analysis of Political Bias and False Information Prevalence in ChatGPT, Google Bard, and Bing Chat.
TL;DR: This research discusses political biases in Large Language Models (LLMs) and their implications, exploring current research findings and methodologies. Our research summarized eight recent research papers, discussing methodologies for bias detection, including causal structures and reinforced calibration, while also highlighting real-world instances of AI-generated disinformation in elections, such as deepfakes and propaganda campaigns. Ultimately, proposing future research directions and societal responses to mitigate risks associated with biased AI in political contexts.
What is Political Bias in LLMs?
Large Language Models (LLMs) are neural networks that are trained on large datasets with parameters to perform different natural language processing tasks (Anthropic, 2023). LLMs rose in popularity among the general public with the release of OpenAI’s ChatGPT due to their accessible and user-friendly interface that does not require prior programming knowledge (Caversan, 2023). Since its release, researchers have been trying to understand their societal implications, limitations, and ways to improve their outputs to reduce user risk (Gallegos et al., 2023; Liang et al., 2022). One area of research explored is the different social biases that LLMs perpetuate through their outputs. For instance, the LLM may be more likely to assign a particular attribute to a specific social group, such as females working certain jobs or a terrorist more likely belonging to a certain religion (Gallegos et al., 2023; Khandelwal et al., 2023).
In this post, we focus on a different bias called political bias, which could also affect society and the political atmosphere (Santurkar et al., 2023). Before, looking into current research regarding this bias in LLMs, we will define the term political bias as the following: When LLMs disproportionally generate outputs that favor a partisan stance or specific political views (e.g., left-leaning versus progressive views) (Urman & Makhortykh, 2023; Motoki, 2023).
What do the current works say?
Having established the definition of political bias in LLMs and its potential societal implications, let's delve into existing research exploring this crucial topic. Our team targeted research papers from arXiv and Google Scholar, focusing on searches using keywords related to “political bias”, “elections”, and “large language models”, and set up two rounds of analyses where we discussed key findings.
Our team analyzed the following review paper, Bias and Fairness in Large Language Models, to understand an overview of biases and potential harms that have arisen alongside the outburst of LLMs in recent years. From here, we divided our research structure into two rounds, where we emphasized specific concepts from this first paper and moved on to review narrower topics. Round 1 focused on a holistic review of different biases and auditing approaches for LLMs whereas Round 2 allowed us to narrow down even further and delve deeper into political biases and overall implications in LLMs. This section summarizes these key findings from eight recent studies, categorized into two rounds for clarity.
Here is a list of all 8 studies evaluated:
Round 1: Unveiling Bias Through Diverse Methodologies
As discussed earlier, political bias in LLMs can manifest in various ways, potentially influencing public opinion, elections, and even shaping real-world events. Understanding and mitigating this bias is crucial for the responsible development and deployment of LLMs. Round 1 studies evaluated the first four papers and showcased diverse approaches to this challenge:
Round 1 studies evaluated the first four papers and showcased diverse approaches to this challenge:
Round 2: Delving Deeper into Bias Detection and Measurement
Round 2 studies build upon the foundation laid in Round 1 by exploring various techniques for detecting and measuring political bias through papers 5-8:
Limitations of these papers:
Limitations
Details
What areas of research should we consider moving forward?
As research on political bias in LLMs continues to evolve, several key areas require further exploration:
This ongoing research will be instrumental in fostering trust and mitigating potential harms associated with biased language models. By understanding and addressing political bias, we can ensure the responsible development and deployment of LLMs that contribute positively to our society.
What are the social implications and current use of AI in elections?
Current generative AI models have shown the potential to create content that could be used to spread false or misleading information. As these models continue to advance, especially at their current pace, they may attain additional skills that could make disinformation more convincing and far-reaching. For example, future models could tailor false narratives to specific audiences and their specific features, generate multimedia content to strengthen their claims, or even automatically promote manipulative messages through influential channels like news outlets or social media (Anderljung et al., 2023).
2024 is going to be an extremely important year in politics, with elections happening around the world in countries, such as India, Bangladesh, Pakistan, the USA, the United Kingdom, and more. The use of AI in elections raises complex yet crucial questions about ethics, transparency, and regulation. As the examples from Argentina, Taiwan, and beyond illustrate, new technologies can be utilized to spread disinformation and manipulate voters. However, with a thoughtful, multifaceted societal response focused on media literacy, fact-checking, and oversight frameworks, the risks of AI can potentially be mitigated.
Acknowledgments
This project was completed as part of the Supervised Program for Alignment Research (SPAR) led by Berkeley AI Safety Initiative in Fall 2023. We would like to thank our SPAR mentor Jan Batzner at the Weizenbaum Insititute in Berlin for facilitating thoughtful discussions and providing extremely helpful feedback throughout this project. This post was written by Ariana Gamarra, Yashvardhan Sharma, and Robayet Hossain after our SPAR project ended, with research contributions from Jerry Lin as well during the research stages.
References
Anderljung, M., Barnhart, J., Korinek, A., Leung, J., O’Keefe, C., Whittlestone, J., Avin, S., Brundage, M., Bullock, J., Cass-Beggs, D., Chang, B., Collins, T., Fist, T., Hadfield, G., Hayes, A., Ho, L., Hooker, S., Horvitz, E., Kolt, N., & Schuett, J. (2023). Frontier AI Regulation: Managing Emerging Risks to Public Safety. ArXiv.org. https://arxiv.org/abs/2307.03718
Anthropic. (2023, October 5). Decomposing Language Models Into Understandable Components. Anthropic. https://www.anthropic.com/news/decomposing-language-models-into-understandable-components?utm_source=substack&utm_medium=email
Caversan, F. (2023, June 20). Making Sense Of The Chatter: The Rapid Growth Of Large Language Models. Forbes. https://www.forbes.com/sites/forbestechcouncil/2023/06/20/making-sense-of-the-chatter-the-rapid-growth-of-large-language-models/?sh=b69acbb56b33
Gallegos, I. O., Rossi, R. A., Barrow, J., Tanjim, M. M., Kim, S., Dernoncourt, F., ... & Ahmed, N. K. (2023). Bias and fairness in large language models: A survey. arXiv preprint arXiv:2309.00770
Gupta, S., Shrivastava, V., Deshpande, A., Kalyan, A., Clark, P., Sabharwal, A., & Khot, T. (2023). Bias runs deep: Implicit reasoning biases in persona-assigned llms. arXiv preprint arXiv:2311.04892
Hartmann, J., Schwenzow, J., & Witte, M. (2023). The political ideology of conversational AI: Converging evidence on ChatGPT's pro-environmental, left-libertarian orientation. arXiv preprint arXiv:2301.01768.
He, Z., Guo, S., Rao, A., & Lerman, K. (2023). Inducing political bias allows language models anticipate partisan reactions to controversies. arXiv preprint arXiv:2311.09687.
Jenny, D. F., Billeter, Y., Sachan, M., Schölkopf, B., & Jin, Z. (2023). Navigating the ocean of biases: Political bias attribution in language models via causal structures. arXiv preprint arXiv:2311.08605.
Liang, P., Bommasani, R., Lee, T., Tsipras, D., Soylu, D., Yasunaga, M., ... & Koreeda, Y. (2022). Holistic evaluation of language models. arXiv preprint arXiv:2211.09110.
Liu, R., Jia, C., Wei, J., Xu, G., Wang, L., & Vosoughi, S. (2021, May). Mitigating political bias in language models through reinforced calibration. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, No. 17, pp. 14857-14866).
Santurkar, S., Durmus, E., Ladhak, F., Lee, C., Liang, P., & Hashimoto, T. (2023). Whose opinions do language models reflect?. arXiv preprint arXiv:2303.17548.
Motoki, F., Pinho Neto, V., & Rodrigues, V. (2023). More human than human: Measuring ChatGPT political bias. Public Choice, 1-21.
Urman, A., & Makhortykh, M. (2023). The Silence of the LLMs: Cross-Lingual Analysis of Political Bias and False Information Prevalence in ChatGPT, Google Bard, and Bing Chat.