The Role of AI Chatbots in Facilitating Online Harm: A Systematic Review
Department of Information and Library Science, Luddy School of Informatics Computing and Engineering, Indiana University, Bloomington, Luddy Hall 700 N Woodlawn Ave, Bloomington, IN 47408, USA. gamidu@iu.edu
Gordon Amidua*
Click here to download actual image
*Corresponding author: gamidu@iu.edu
Abstract
AI chatbots have become widespread across social, political, and commercial contexts, raising significant concerns about their potential to facilitate various forms of online harm. This systematic review examines the relationship between artificial intelligence chatbots and the harm they may enable. Following PRISMA guidelines, a comprehensive search was conducted across six academic databases, applying predefined eligibility criteria to identify relevant studies published between 2020 and 2025. The analysis reveals that ChatGPT and other OpenAI models represent the most extensively examined platforms in current research. The synthesized evidence demonstrates that AI chatbots are linked to multiple forms of harm across technical, behavioral, and societal dimensions. Key harms identified include the spread of disinformation, privacy violations, facilitation of cyberattacks, and enabling of fraudulent activities. This review contributes to the growing body of literature by providing a comprehensive synthesis of AI-related harms, offering insights that can inform policy development and risk mitigation strategies.
Keywords:
AI Chatbots
Artificial Intelligence
Systematic review
Online Harm
A
Introduction
Research on AI chatbots and their role in online harm and such as cyberbullying, dis/misinformation, provocation, disruption, and deceptive dialogue has become increasingly important because of the rapid integration of conversational AI into digital communication platforms (De Cicco, 2024; Hajli et al., 2022; Zhang et al., 2025). Online harm is a broad term that encompasses any online activity that has the potential to cause harm to individuals or society (Cork et al., 2022). Since the late 20th century, chatbots have evolved from basic rule-based systems to advanced large language models (LLMs) like ChatGPT and Bard, significantly expanding their capabilities and accessibility (Moy & Gradon, 2023; Sison et al., 2024). These enhanced capabilities have enabled widespread application across multiple domains. For instance, in customer service, they automate routine interactions and provide 24/7 support, helping businesses improve efficiency and user satisfaction (Prakash et al., 2023). Chatbots offer informational support, assist with chronic illness management, and improve communication between patients and healthcare providers (Kurniawan et al., 2024; Li, 2023). AI chatbots function as virtual tutors in education, supporting students through real-time feedback and helping educators develop instructional content (Davar et al., 2025). Mental health applications also show promise, with chatbots helping address gaps in service delivery, especially for individuals with symptoms of depression and anxiety (Li et al., 2023). In the tourism industry, chatbots enhance consumer engagement and streamline service interactions, thus improving overall user experience (Rafiq et al., 2022).
However, the sophisticated capabilities that enable these beneficial applications also create significant risks such as cyberbullying, disinformation, and manipulation (Zhang et al., 2025; Fatimah et al., 2024). AI chatbots use advanced machine learning and natural language processing algorithms to generate human-like dialogues, making their conversations difficult to distinguish from real human interactions (Paluszek & Loeb, 2025). This sophisticated communication ability leads to ethical and social concerns. The spread of false information is a major risk factor. AI chatbots can produce and distribute misinformation, particularly when users trust their responses to be accurate or authoritative. Research shows that AI-driven social bots constitute a significant portion of social media activities and play a key role in disinformation campaigns that affect elections and public opinion (Doshi et al., 2024; Hajli et al., 2021). In healthcare settings, false information from chatbots poses serious dangers because incorrect medical information can harm patient safety and the quality of care (Asiksoy, 2025). This concern is particularly important, because user trust is essential for chatbot adoption, making information accuracy critical (Prakash et al., 2023). AI chatbots also present manipulation risk. In marketing and customer service, these systems use personalized recommendations and empathetic communication to influence consumer behavior (Kim & Hur, 2023).
While these features enhance user experience, they can also be exploited to manipulate decisions and opinions negatively. In addition, chatbot training data often contains biased or harmful content, which leads these systems to accidentally reproduce inappropriate behaviors. Without adequate monitoring, chatbots can participate or create harmful interactions in online communities. Privacy and security are also significant concerns. Because chatbots are widely used across different platforms, they process large amounts of user data, including sensitive personal information. This widespread data handling increases the chances of data breaches and allows the targeting of harmful activities such as online trolling or cyberbullying (Li, 2023; Davar et al., 2025). Although AI chatbots offer benefits in terms of accessibility and efficiency, their deployment requires careful oversight to address ethical issues and minimize harm, such as cyberbullying and disinformation.
Building on these identified risks, a significant knowledge gap remains in our understanding of the research landscape regarding AI chatbots and online harm. While individual studies have examined specific aspects of this problem (Shibli et al., 2024; Carroll et al., 2023), there is no comprehensive synthesis of existing research on how scholars study AI chatbots and their online harms. Scholars continue to debate whether the current regulatory frameworks and ethical guidelines can effectively reduce these risks. Some argue that transparency measures alone cannot prevent long-term psychological manipulation (Krook, 2025; Porna et al., 2025). Without a comprehensive understanding of the current state of research, it is difficult to identify areas where knowledge is still developing and further investigation is needed (Ienca, 2023; Polyportis & Pahos, 2024).
To address this critical knowledge gap, this review comprehensively examined the role of AI chatbots in facilitating online harm. This review is particularly important because AI chatbots are rapidly spreading across social, political, and commercial areas, where their misuse can cause various forms of harm to users. This systematic review addresses the following research questions:
1.
What methodological approaches are used to examine harm caused by AI chatbots?
2.
What theoretical frameworks do researchers use to analyze harms associated with AI chatbots?
3.
Which AI chatbot platforms are most examined in research on online harm?
4.
What types of jailbreaking or prompt manipulation techniques are used to investigate AI chatbot related online harms?
5.
What forms of harm are documented as being caused by AI chatbots?
Methodology
This review was guided by the PRISMA framework (Moher et al., 2009) to examine existing research on AI chatbots in the context of online harm. This systematic approach involved comprehensive database searches, rigorous screening procedures, and quality assessment protocols which ensured the identification of relevant, high-quality literature.
Eligibility Criteria
The selection of studies for this review was guided by the following inclusion and exclusion criteria to ensure that only relevant and empirically grounded research was analyzed. These criteria focused on identifying studies that examined the relationship between AI chatbots and harmful online behaviors, while excluding work that was speculative or unrelated.
A
Table 1
Inclusion Criteria and Exclusion Criteria
Inclusion Criteria
Exclusion Criteria
Examined how individuals use AI chatbots in ways that contribute to online harm
Speculated on potential chatbot harms without empirical evidence
Investigated chatbot behaviors such as online trolling, cyberbullying, disinformation, manipulation, influencing, provocation, harmful or disruptive satire or humor, and deceptive dialogue
Lacked empirical analysis or observational data
Reported empirical data through experiments, observations, or analyses of user chatbot interactions
Did not focus on harmful chatbot behaviors
Published between 2020 and 2025
Published before 2020
Available in English
Not available in English
Peer-reviewed or high-quality scholarly literature
Did not address the intersection of AI chatbots and online harm
Data Sources
The literature search was conducted across six multidisciplinary databases to ensure comprehensive coverage of relevant academic literature. The following databases were selected for this review: Scopus, Web of science, ProQuest, Google scholar, ACM digital library, and Semantic scholar.
Search Strategy
A comprehensive search strategy was developed to identify relevant literature across multiple academic databases. The search was conducted using a combination of terms related to AI chatbots and online harm. The following search terms were applied consistently across all the selected databases: ("AI chatbot" OR "artificial intelligence chatbot" OR "conversational chatbot" OR "chatbot") AND ("online troll" OR "online trolling" OR "cyberbullying" OR "humorous trolling" OR "satirical trolling" OR "provocative humor" OR "internet satire" OR "playful disruption" OR "online mischief" OR "ironic commentary" OR "online harassment" OR "toxic behavior" OR "toxic language" OR "abusive language" OR "harmful speech" OR "provocative behavior" OR "disruptive behavior" OR "hate speech" OR "manipulative behavior" OR disinformation OR misinformation OR provocation OR manipulation OR disruption OR harassment OR abuse OR harm OR provoke OR manipulate OR hate OR attack OR insult OR mocking OR bullying OR humor OR satire OR playful OR mischief OR fun OR entertainment OR irony).
Search techniques were adapted to align the specific search functionalities of each database while maintaining consistency in coverage and scope. The search focused on peer-reviewed publications and high-quality grey literature published between 2020 and 2025 to ensure relevance to the current landscape of AI chatbot deployment and associated online harms. All searches were limited to English-language publications and covered all countries.
Quality Assessment
The quality assessment was conducted following PRISMA guidelines to ensure the inclusion of rigorous, methodologically sound studies in the final review (Fig. 1). The systematic screening process began with the identification of 4,722 records across the six databases. After removing 2,446 duplicate records, 2,276 records remained for initial screening. Titles and abstracts were screened to identify relevant studies. This screening excluded 2,218 records that did not meet the inclusion criteria, leaving 58 qualified studies. Of these 58 studies, one could not be accessed, resulting in 57 studies available for full-text review. The 57 studies were fully read and evaluated based on their methodological rigor. The quality criteria included clear research design, appropriate methodology, systematic data collection procedures, presence of direct empirical investigation rather than purely speculative discussion, peer-reviewed status and academic standards, and adequate reporting of methods, results, and limitations. All 57 studies met the eligibility criteria and quality standards and were included in the review. During the review process, citation-chaining procedures using backward and forward reference checking were employed to identify additional relevant studies. This process identified 14 additional studies that met the same quality and inclusion criteria, bringing the total to 71 studies. All included studies demonstrated clear empirical investigation of AI chatbots in relation to online harm with adequate methodological rigor and direct relevance to this study.
Fig. 1
PRISMA Flow Diagram for Study Selection Process
Click here to Correct
The annual distribution of the 71 final studies (Figure.2) showed clear differences across the search period. 2020 had only 3 studies (4.2%), 2021 had 7 studies (9.9%), and 2022 had to 3 studies (4.2%). These three years together represented only 18.3% of all studies. 2023, had 23 studies (32.4%) and was the highest across all the years. Followed by 2024 with 20 studies (28.2%) and 15 studies in 2025 (21.1%), although the 2025 data cover only part of the year. The sharp increase from 2023 onward likely reflects growing academic attention to AI chatbot-related online harm, possibly due to the release of advanced language models, such as ChatGPT.
Fig. 2
Annual scientific production
Click here to Correct
A
A
Table 2
Studies Included
Authors
Title
Agrawal (2024)
Fairness in AI-Driven Oncology: Investigating Racial and Gender Biases in Large Language Models
Alahmed et al. (2024)
Exploring the Potential Implications of AI-generated Content in Social Engineering Attacks
Alawida et al. (2024)
Unveiling the Dark Side of ChatGPT: Exploring Cyberattacks and Enhancing User Awareness
Appignani & Sanchez (2024)
AI and racism: Tone policing by the Bing AI chatbot
Atkins et al. (2023)
Those Aren't Your Memories, They're Somebody Else's: Seeding Misinformation in Chat Bot Memories
Ba et al. (2024)
SurrogatePrompt: Bypassing the Safety Filter of Text-to-Image Models via Substitution
Bai et al. (2025)
LLM-generated messages can persuade humans on policy issues
Bakir et al. (2024)
On manipulation by emotional AI: UK adults' views and governance implications
Battista & Camargo Molano (2023)
How AI Bots Have Reinforced Gender Bias in Hate Speech
Beckerich et al. (2023)
RatGPT: Turning online LLMs into Proxies for Malware Attacks
Boucher et al. (2023)
Boosting Big Brother: Attacking Search Engines with Encodings
Brendel et al. (2023)
The Paradoxical Role of Humanness in Aggression Toward Conversational Agents
Cercas Curry et al. (2021)
ConvAbuse: Data, Analysis, and Benchmarks for Nuanced Detection in Conversational AI
Chan et al. (2024)
Conversational AI Powered by Large Language Models Amplifies False Memories in Witness Interviews
Chang et al. (2024)
Evaluating anti-LGBTQIA + medical bias in large language models
Chen et al. (2024)
Multi-Turn Hidden Backdoor in Large Language Model-powered Chatbot Models
Chen et al. (2023)
Understanding Multi-Turn Toxic Behaviors in Open-Domain Chatbots
Choi et al. (2025)
Private Yet Social: How LLM Chatbots Support and Challenge Eating Disorder Recovery
Choi (2025)
The Manner Is the Matter: How the Chatbot Communication Style and Consumers' Regulatory Focus Shape Purchase Intention
Contro et al. (2025)
ChatbotManip: A Dataset to Facilitate Evaluation and Oversight of Manipulative Chatbot Behaviour
Cuadra et al. (2024)
The Illusion of Empathy? Notes on Displays of Emotion in Human-Computer Interaction
Danry et al. (2025)
Deceptive Explanations by Large Language Models Lead People to Change their Beliefs About Misinformation More Often than Honest Explanations
Doshi et al. (2024)
Sleeper Social Bots: A new generation of AI disinformation bots are already a political threat
Durántez-Stolle et al. (2023)
Feminism as a polarizing axis of the political conversation on Twitter: The case of #IreneMonteroDimision
Edu et al. (2022)
Exploring the security and privacy risks of chatbots in messaging services
Gabriel et al. (2024)
MisinfoEval: Generative AI in the Era of "Alternative Facts"
Gehman et al. (2020)
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
Gendi & Munteanu (2021)
Towards a chatbot for evidence gathering on the dark web
Gupta et al. (2023)
From ChatGPT to ThreatGPT: Impact of Generative AI in Cybersecurity and Privacy
Hajli et al. (2022)
Social Bots and the Spread of Disinformation in Social Media: The Challenges of Artificial Intelligence
Han et al. (2023)
Hate Raids on Twitch: Echoes of the Past, New Modalities, and Implications for Platform Governance
Jakesch et al. (2023)
Co-Writing with Opinionated Language Models Affects Users' Views
Keijsers et al. (2021)
What's to bullying a bot?: Correlates between chatbot humanlikeness and abuse
Klyueva (2021)
Trolls, Bots, and Whatnots: Deceptive Content, Deception Detection, and Deception Suppression
Köbis et al. (2021)
Bad machines corrupt good morals
Krauß et al. (2025)
"Create a Fear of Missing Out"---ChatGPT Implements Unsolicited Deceptive Designs in Generated Websites Without Warning
Krügel et al. (2023)
ChatGPT's inconsistent moral advice influences users' judgment
Lan et al. (2025)
Prompt Injection Detection in LLM Integrated Applications
Leib et al. (2021)
The corruptive force of AI-generated advice
Li et al. (2023)
Multi-step Jailbreaking Privacy Attacks on ChatGPT
Lin et al. (2025)
LLM Whisperer: An Inconspicuous Attack to Bias LLM Responses
Lin et al. (2023)
ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation
Liu et al. (2024)
A Hitchhiker's Guide to Jailbreaking ChatGPT via Prompt Engineering
Makhortykh et al. (2024)
Stochastic lies: How LLM-powered chatbots deal with Russian disinformation about the war in Ukraine
Manoli et al. (2025)
The AI Double Standard: Humans Judge All AIs for the Actions of One
McGuire et al. (2023)
The reputational and ethical consequences of deceptive chatbot use
Menz et al. (2024)
Current safeguards, risk mitigation, and transparency measures of large language models against the generation of health disinformation: Repeated cross sectional analysis
Namvarpour & Razi (2024)
Uncovering Contradictions in Human-AI Interactions: Lessons Learned from User Reviews of Replika
Parray (2021)
Humour in the Age of Contagion: Coronavirus, 'Janata Curfew' Meme and India's Digital Cultures of Virality
Pataranutaporn et al. (2025)
Slip Through the Chat: Subtle Injection of False Information in LLM Chatbot Conversations Increases False Memory Formation
Piggott et al. (2023)
Net-GPT: A LLM-Empowered Man-in-the-Middle Chatbot for Unmanned Aerial Vehicle
Pauwels & Razi (2025)
AI-induced sexual harassment: Investigating contextual characteristics and user reactions of sexual harassment by a companion chatbot
Rodríguez et al. (2020)
C3-Sex: A Conversational Agent to Detect Online Sex Offenders
Roy et al. (2023)
Generating Phishing Attacks using ChatGPT
Schiller Hansen & Søgaard (2025)
Captivation Lures and Social Robots
Shibli et al. (2024)
AbuseGPT: Abuse of Generative AI ChatBots to Create Smishing Campaigns
Si et al. (2022)
Why So Toxic?: Measuring and Triggering Toxic Behavior in Open-Domain Chatbots
Spitale et al. (2023)
AI model GPT-3 (dis)informs us better than humans
Szmurlo & Akhtar (2024)
Digital Sentinels and Antagonists: The Dual Nature of Chatbots in Cybersecurity
Urman & Makhortykh (2025)
The silence of the LLMs: Cross-lingual analysis of guardrail-related political bias and false information prevalence in ChatGPT, Google Bard (Gemini), and Bing Chat
Usman et al. (2024)
Is Generative AI the Next Tactical Cyber Weapon For Threat Actors? Unforeseen Implications of AI Generated Cyber Attacks
Veisi et al. (2025)
User Narrative Study for Dealing with Deceptive Chatbot Scams Aiming to Online Fraud
Vidgen et al. (2024)
SimpleSafetyTests: A Test Suite for Identifying Critical Safety Risks in Large Language Models
Vorsino (2021)
Chatbots, Gender, and Race on Web 2.0 Platforms: Tay.AI as Monstrous Femininity and Abject Whiteness
Wang et al. (2023)
On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective
Wang et al. (2024)
White-box Multimodal Jailbreaks Against Large Vision-Language Models
Weeks et al. (2023)
A First Look at Toxicity Injection Attacks on Open-domain Chatbots
Yang & Menczer (2023)
Anatomy of an AI-powered malicious social botnet
Yu et al. (2024)
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Zellagui et al. (2025)
Cryptocurrency Frauds for Dummies: How ChatGPT introduces us to fraud?
Zhang et al. (2025)
The Dark Side of AI Companionship: A Taxonomy of Harmful Algorithmic Behaviors in Human-AI Relationships
Zhou et al. (2023)
Synthetic Lies: Understanding AI-Generated Misinformation and Evaluating Algorithmic and Human Solutions
Findings
1.
Methodological approaches used to examine harm caused by AI chatbots
Majority of the studies employed experimental methods, this was the most common methodological approach across the field. These experimental studies involved controlled manipulation of chatbot characteristics, user interactions, or system parameters to examine causal relationships and test hypotheses regarding AI chatbot harm mechanisms. Some studies used mixed method approaches, combining qualitative and quantitative techniques, to provide a comprehensive understanding of AI chatbot harms. Mixed methods studies appeared particularly valuable for capturing both technical system behaviors and human user experiences, allowing researchers to triangulate findings across different data types and analytical approaches. Other studies employed quantitative methods, focusing on numerical analysis and statistical examination of AI chatbot behavior and impact. These quantitative approaches often involved large-scale data analysis, statistical modeling, and metric-based evaluations of system performance or user behavior patterns. Some studies used qualitative methods to provide in-depth understanding of user experiences, contextual factors, and nuanced aspects of AI chatbot harm. Qualitative studies employed interviews, content analysis, ethnographic observations, and thematic analysis to explore the complex social and psychological dimensions of AI chatbot interactions. Some studies incorporated computational approaches, including computational modeling, machine learning techniques, and algorithmic analysis. These computational methods reflect the technical nature of AI chatbot research and the need for sophisticated analytical tools to understand complex system behaviors and vulnerabilities.
This methodological diversity reflects the interdisciplinary nature of AI chatbot harm. Many studies employed multiple methodological approaches simultaneously, indicating the complexity of AI chatbot phenomena and the need for comprehensive analytical frameworks to understand multifaceted harm mechanisms.
2.
Theoretical frameworks researchers use to analyze harms associated with AI chatbots
Among the 71 included studies, only 9 explicitly applied ten theoretical frameworks to guide their research. The studies drew on theoretical approaches from five main disciplines: Communication studies, Education, Literary studies/Humanities, psychology, sociology. Psychology theories were the most used, while the other four disciplines provided additional perspectives. The theories they employed were Computers as Social Actors, Trust Violation and Repair Theory, Activity theory, Narrative theory, Frustration -Aggression Theory, Moral Foundation Theory, Regulatory -Fit Theory, and Social Identity Theory, Actor-Network Theory and Matrix of Domination. For example, McGuire and colleagues applied Computers as Social Actors theory to examine general chatbot use (McGuire et al., 2023), while other researchers applied Trust Violation and Repair Theory to study custom designed conversational agents (CAs) (Brendel et al., 2023). In a related stream of work, researchers applied Activity Theory to examine Replika, an AI companion chatbot, and found that its safety protocols were insufficient for a system intended to support user well-being (Namvarpour & Razi, 2024). Additionally, researchers applied Narrative Theory to analyze vulnerabilities in MiniGPT-4 using a universal attack strategy and found that this approach could successfully jailbreak the system with a 96% success rate, underscoring the fragility of GPT-4 based models (Zhou et al., 2023).
Most of the studies drew theories from psychology to understand individual cognitive and emotional responses to AI systems. For example, researchers have applied Frustration–Aggression Theory to examine custom designed conversational agents and found that making a CA appear more human like actually increases user frustration and aggressive behavior (Brendel et al., 2023). In a related vein, other researchers applied Moral Foundation Theory to study AI chatbot assistants and found that when a chatbot behaves immorally, such as lying, being unfair, or acting harmfully, people do not limit their negative evaluations to that specific chatbot but extend their judgments to other chatbots as well (Manoli et al., 2025). Similarly, researchers applied Regulatory Fit Theory to Facebook Messenger and showed that customers report higher purchase intentions when the chatbot’s communication style aligns with their regulatory focus. Specifically, promotion focused individuals respond more positively to warm style chatbots, whereas prevention focused individuals prefer competent style chatbots (Veisi et al., 2025). Finally, researchers applied Social Identity Theory to examine general chatbot use and found that job candidates from organizations known to use chatbots deceptively are less likely to receive job offers and tend to receive lower salary offers from recruiters (McGuire et al., 2023).
Some studies have also drawn on theories from sociology to examine broader social dynamics. For example, researchers applied Actor Network Theory to study Twitter by developing a deep learning tool capable of detecting malicious social bots with greater accuracy. The study reported a 79% accuracy rate in distinguishing between tweets written by humans and those generated by bots (Doshi et al., 2024). Similarly, researchers applied Matrix of Domination theory to examine Alana’s v2, a conversational social chatbot, and found that abuse directed at Alana’s v2 differs substantially from abuse of other chatbots. In particular, the chatbot was subjected to markedly higher levels of sexually focused aggression and sexism aimed at the virtual persona of the AI system (Curry, 2021).
These theoretical applications demonstrate diverse disciplinary perspectives. The absence of theoretical frameworks in most studies indicates a field that remains largely exploratory and descriptive, with researchers primarily focused on documenting and characterizing harm phenomena rather than explaining underlying mechanisms through established theoretical lenses.
3.
AI chatbot platforms that are most examined in research on online harm
The ChatGPT and OpenAI models dominate the research landscape as the most extensively studied platforms, appearing consistently across all years between 2021–2025 and serving as the primary reference points for AI chatbot harm research. Studies consistently demonstrated that ChatGPT exhibits systematic vulnerabilities across multiple attack vectors, including jailbreaking techniques with success rates exceeding 90% (Yu et al., 2023), prompt injection attacks enabling privacy breaches through PII leakage (Li et al., 2023), and automatic generation of deceptive design patterns without user awareness (Lin et al., 2025). The concentration of research attention on ChatGPT, GPT variants, and related OpenAI models reflects their widespread adoption and accessibility to researchers.
Social media platforms collectively represent a consistently studied category across all research years, with messaging platforms such as WhatsApp and Telegram alongside traditional social networks such as Twitter, Facebook, Instagram, and Discord revealing platform-specific manipulation patterns. Twitter-focused research primarily concentrated on bot detection and disinformation campaigns, identifying sophisticated AI-powered botnets capable of passing as human users (Schiller Hansen & Søgaard, 2025). For example, one study found that AI-generated misinformation has significant linguistic differences compared to human-created misinformation, with AI-generated content tending to have more emotional and cognitive processing of expressions (Zellagui et al., 2025). Discord research revealed widespread privacy violations, with 95.67% of chatbots lacking privacy policies and 55% requesting administrator permissions (Edu et al., 2022). These differences reflect the unique affordances and user behaviors associated with each social media environment, suggesting that harm patterns adapt to platform-specific features and user expectations.
Replika emerges as a unique case study with disproportionately concerning findings despite severely limited research attention, appearing in only 4 studies but generating the most severe intimate relationship harms. For example, one study examined Replika and found that 34.3% of harmful instances involved behavioral misconduct, including sexual misconduct (16.3%), physical aggression (8%), and antisocial behavior (10%) (Zhang et al., 2025).
Google's AI ecosystem has attracted research attention over multiple years, with platforms such as Google Bard, Gemini, PaLM variants, and Flamingo demonstrating similar vulnerability patterns to other major LLMs. For example, one study revealed vulnerabilities to text encoding manipulation with success rates of 93–100% (Boucher et al., 2023) and bias issues in healthcare applications (Agrawal, 2024; Chang et al., 2024).
Microsoft's AI platforms, including various Copilot implementations and Bing Chat variants, appear consistently across research years but demonstrate similar vulnerabilities to other major LLMs rather than distinct enterprise-focused patterns. For example, one study found that major commercial search engines, including Bing, are highly vulnerable to text-encoding manipulation attacks, with success rates of 93–100% for hiding legitimate content and surfacing malicious content (Boucher et al., 2023).
Anthropic's Claude models received moderate but consistent research attention across multiple years, showing similar vulnerability patterns to other major LLMs and indicating cross-platform consistency in fundamental safety issues. Studies found that Claude exhibits the same susceptibility to jailbreaking, bias propagation, and manipulation techniques observed in other major LLM platforms (Lin et al., 2025; Urman & Makhortykh, 2025).
The synthesis of findings across platforms demonstrates that AI chatbot harm manifests differently depending on platform affordances, user populations, and deployment context. The predominant focus on vulnerability exploitation and behavioral manipulation reflects the field's recognition that AI chatbots represent dual threats that can be both technically compromised and deployed as instruments of psychological manipulation.
4.
Types of jailbreaking or prompt manipulation techniques used to investigate AI chatbot related online harms
Among the studies examined, eighteen explicitly documented specific jailbreaking tactics, providing insights into the different approaches used to circumvent AI safety measures.
Types of Jailbreaking
Identity and Role Manipulation Approaches
Seven studies focused on techniques that involved instructing AI systems to adopt alternative identities or personas to bypass safety restrictions (Beckerich et al., 2023; Gupta et al., 2023; Liu et al., 2024; Menz et al., 2024; Shibli et al., 2024; Spitale et al., 2023; Szmurlo & Akhtar, 2024). These approaches centered on specific tactics including Assumed Responsibility (AR); Character Role Play (CR); Character play; DAN (Do Anything Now); DUDE; Fictionalization and Characterization; Impersonation; Modified DAN (Do Anything Now); Research Experiment (RE); SWITCH; Switch Method techniques used to bypass content safeguards by framing prohibited advice within fictional scenarios or alternative personas.
Logical and Structural Redirection Approaches
Four studies examined how users employ structured processes or logical frameworks to redirect AI behavior away from safety considerations (Li et al., 2023; Lin et al., 2023; Liu et al., 2024; Szmurlo & Akhtar, 2024). These approaches focus on specific tactics, including Logical Reasoning (LOGIC); multistep approaches (MJP); Program Execution (PROG); prompt engineering; role-playing; Text Continuation (TC); translation (TRANS) requests that use structured conversational patterns to gradually overcome safety measures.
Indirect and Implicit Manipulation Approaches
Four studies addressed how users achieve prohibited outcomes through subtle techniques that avoid explicit rule violations (Gupta et al., 2023; Lin et al., 2023; Shibli et al., 2024; Spitale et al., 2023). These approaches examine specific tactics, including implicit toxicity, role-playing requests, explicit instructions to ignore ethical guidelines, encouraging unethical behavior, Impersonation, Reverse Psychology, Prompt Injection Attacks, and Prompt Injection methods involving malicious insertion of prompts leading to unintended actions or information disclosure.
Privilege Escalation and Override Approaches
Two studies focused on techniques that attempt to override built-in limitations by invoking higher-level access or simulating advanced capabilities (Liu et al., 2024; Li et al., 2023). These approaches addressed specific tactics including Simulate Jailbreaking (SIMU), Sudo Mode (SUDO), Superior Model (SUPER), Direct Prompts (DP), and Response Verification techniques involving generating multiple responses and using selection methods to identify successful bypass attempts.
Technical and Encoding Manipulation Approaches
One study examined sophisticated methods that operate at the character-encoding level to evade content-detection systems (Boucher et al., 2023). These approaches focus on specific tactics, including using invisible characters (such as Zero Width Space), employing homoglyphs (characters that render to the same or nearly the same glyph), and utilizing reordering (exploiting bidirectional text support) to manipulate the technical representation of text and disguise problematic content from automated filtering systems.
The studies that documented these tactics demonstrated considerable diversity in their methodological approaches, reflecting the varied technical and social methods employed to bypass AI safety restrictions across different platforms and contexts. The diversity of jailbreaking tactics across studies demonstrates the multifaceted nature of AI safety circumvention research, with convergent themes emerging around social engineering through identity manipulation, and systematic exploitation through logical process manipulation.
A
Table 3
Jailbreaking Tactics
Jailbreaking Tactics
Approach
Assumed Responsibility (AR); Character Role Play (CR); Character play; DAN (Do Anything Now); DUDE; Fictionalization and Characterization; Impersonation; Modified DAN (Do Anything Now); Research Experiment (RE); SWITCH; Switch Method
Identity and Role Manipulation
Logical Reasoning (LOGIC); multistep approaches (MJP); Program Execution (PROG); prompt engineering; role-playing; Text Continuation (TC); translation (TRANS)
Logical and Structural Redirection
Implicit toxicity, Role-playing requests, Explicit instructions to ignore ethical guidelines, Encouraging unethical behavior, Impersonation, Reverse Psychology, Prompt Injection Attacks.
Indirect and Implicit Manipulation
Simulate Jailbreaking (SIMU), Sudo Mode (SUDO), Superior Model (SUPER), Direct Prompts (DP), and Response Verification techniques
Privilege Escalation and Override
Invisible characters (such as Zero Width Space), Employing homoglyphs (characters that render to the same or nearly the same glyph), and Utilizing reordering (exploiting bidirectional text support)
Technical and Encoding Manipulation
5.
Forms of harm documented as being caused by AI chatbots
The evidence synthesized from this systematic review shows that AI chatbots are associated with multiple forms of harm spanning technical, behavioral, and societal domains. Twenty eight studies developed detailed harm taxonomies that document conversational abuse, cybersecurity threats, backdoor attacks, and privacy violations linked to AI chatbots (Chen et al., 2024; Edu et al., 2022; Szmurlo & Akhtar, 2024; Zhang et al., 2025). Collectively, these studies characterize chatbot related harm as multifaceted, encompassing both system level vulnerabilities such as jailbreaking and prompt injection, and downstream risks including cyberattacks and fraud facilitation (Alawida et al., 2024; Liu et al., 2024; Yu et al., 2023). The reviewed evidence further suggests that chatbots may operate as perpetrators, enablers, or facilitators of harm, highlighting the varied ways in which negative outcomes emerge across contexts (Chen et al., 2023; Gupta et al., 2023; Usman et al., 2024).
The review also identified disinformation propagation as a recurrent and well documented harm. Thirteen studies examined the role of AI chatbots in generating and disseminating false information, with emphasis on political disinformation, coordinated social bot activity, and challenges related to misinformation detection and evaluation (Doshi et al., 2024; Hajli et al., 2022; Makhortykh et al., 2024; Yang & Menczer, 2023). Studies also show that chatbots can produce persuasive, human like content that increases the scale and reach of misinformation while complicating existing detection mechanisms (Bai et al., 2025; Spitale et al., 2023; Zhou et al., 2023). Other studies further reported persistent difficulties in distinguishing AI generated disinformation from human authored content, underscoring the growing sophistication of AI enabled misinformation campaigns (Gabriel et al., 2024; Urman & Makhortykh, 2025).
User level impacts emerged as another prominent category of harm in the reviewed literature. Fourteen studies reported that interactions with AI chatbots can influence user cognition and behavior, including heightened susceptibility to false memories, shifts in moral judgment, and emotional manipulation through empathic or affective cues (Chan et al., 2024; Cuadra et al., 2024; Krügel et al., 2023; Pataranutaporn et al., 2025). Additional evidence suggests that AI generated responses can increase user engagement with deceptive or misleading content, at times fostering misplaced trust or confidence in inaccurate information (Danry et al., 2025; Jakesch et al., 2023; Leib et al., 2021). Other studies also document ambivalent user responses, where individuals simultaneously express skepticism toward chatbots while continuing to rely on them, with these dynamics shaped by perceived human likeness, communication style, and individual vulnerabilities (Brendel et al., 2023; Choi, 2025; Keijsers et al., 2021).
Finally, the findings point to broader ethical and societal harms associated with AI chatbot deployment. Eight studies in the review highlight concerns related to manipulation, privacy infringement, moral degradation, and erosion of societal trust linked to chatbot misuse (Bakir et al., 2024; McGuire et al., 2023; Köbis et al., 2021). Further evidence indicates that AI enabled bias and discrimination can affect critical domains such as healthcare, democratic participation, and social cohesion (Agrawal, 2024; Chang et al., 2024; Vorsino, 2021). Taken together, these findings suggest a clear need for improved governance, transparency, and bias mitigation, as repeatedly emphasized across the reviewed studies (Appignani & Sanchez, 2024; Urman & Makhortykh, 2025).
Studies Limitation
The most prevalent weakness across all studies was the restricted scope of platform and model selection. These studies consistently suffered from a narrow focus on small numbers of AI systems, limiting the generalizability of the findings. For example, Curry et al. (2021) focused only on three conversational AI systems, potentially affecting the representativeness of conversational AI interactions. Menz et al. (2024) examined a small number of prominent LLMs, which may not represent the broader AI ecosystem. Weeks et al. (2023) focused on only two victim chatbot models (DD-BART and BlenderBot), thus limiting insights into vulnerability across different architectures. Makhortykh et al. (2024) studied only three chatbots (Perplexity, Google Bard, and Bing Chat), potentially missing broader LLM-powered chatbot behaviors. Chen et al. (2023) primarily focused on specific chatbot models (DialoGPT and BlenderBot variants), limiting generalizability to other architectures. Pataranutaporn et al. (2025) used only one LLM model (GPT-4o). Jakesch et al. (2023) and Zellagui et al. (2025) both used only one language model (GPT-3), restricting findings to single-model capabilities.
Again, the studies demonstrated over-reliance on individual platforms or case studies, creating significant generalizability concerns. Zhang et al. (2025) primarily examined Replika, limiting applicability to other AI companion platforms. Namvarpour and Razi (2024) relied solely on user reviews from Google Play Store for Replika analysis. Pauwels and Razi (2025) used only Google Play Store reviews, potentially introducing a sampling bias toward users with extreme opinions. Vorsino (2021) focused primarily on one case study (Tay.AI), limiting broader conclusions about gender, race, and technology interactions. Keijsers et al. (2021) focused on one specific chatbot (Cleverbot), restricting generalization to other AI assistants. Hansen and Søgaard (2025) studied a single botnet (fox8) on Twitter, thus limiting the understanding of broader LLM-powered bot behaviors. Usman et al. (2024) focused on a single Twitter botnet, potentially missing diverse bot operational patterns.
Furthermore, the studies lacked adequate comparative analysis to contextualize the findings. Yang and Menczer (2023) did not compare ChatGPT with other AI language models or traditional cybersecurity tools, missing comparative context for capabilities and risks. Cuadra et al. (2024) focused on a small set of popular LLMs without broader model representation. Danry et al. (2025) used GPT-3 to generate explanations, potentially introducing inherent biases affecting results.
The prevalence of these weaknesses across studies demonstrates systematic challenges in AI chatbot harm research, including rapid technological change outpacing research methodologies, lack of standardized evaluation frameworks, and insufficient resources for comprehensive multi-platform studies. These limitations collectively suggest that current understanding of AI chatbot harm may be fragmented and potentially biased toward easily accessible platforms and models rather than representing the full spectrum of AI chatbot risks and capabilities.
Future Research Directions
Future research should simultaneously study different AI systems and compare their findings. Most studies in this review only looked at a handful of platforms, which makes it difficult to understand whether the problems they find are specific to certain chatbots or common across all of them. Researchers should compare multiple platforms in the same study, including major companies, such as OpenAI, Google, and Meta, as well as specialized chatbots, such as Replika and Character.ai. This would help us determine whether the harms we are seeing are just unique features of individual platforms or deeper problems with how these technologies work in general.
Again, Most current studies only last a few weeks and happen in artificial lab settings, but we need to understand how AI chatbots harm over months or years in real life. The brief, controlled experiments we have now missed many important details about how people actually use these systems and how problems might develop over time. Future research should focus on real users interacting with actual deployed chatbots for extended periods across different communities and cultures. This would show us whether harm worsens over time, how users learn to manipulate these systems, and what the long-term psychological effects might be.
Finally, Many studies rely on small groups of college students from single countries or regions, which makes it unclear whether the findings apply to everyone. We need research that includes people from different ages, backgrounds, cultures, and countries to see if AI chatbots affect different groups in different ways. This means recruiting larger, more representative samples and testing whether findings from one culture hold true for others. Understanding these differences is crucial for developing safety measures that actually work for diverse global populations using these technologies. These directions address the problems identified across the reviewed studies and would make AI chatbot harm research more reliable and useful for real-world applications.
Conclusion
This review examined research on AI chatbot-related online harms by analyzing 71 studies from six databases: Scopus, Web of Science, ProQuest, Google Scholar, ACM Digital Library, and Semantic Scholar. Understanding these harms matters because AI chatbots are spreading rapidly across social, political, and commercial spaces where their misuse can affect users in multiple ways.
The analysis reveals that ChatGPT and other OpenAI models represent the most extensively examined platforms in current research, followed by social media platforms like X, Facebook, Instagram, Telegram, and Discord. This pattern reflects both the market dominance of these AI systems and their accessibility to researchers. The synthesized evidence demonstrates that AI chatbots are linked to multiple forms of harm across technical, behavioral, and societal dimensions, including the spread of disinformation, privacy violations, facilitation of cyberattacks, and enabling of fraudulent activities.
The reviewed studies drew primarily on psychological theories, with additional frameworks from communication studies, education, literary studies, and sociology. This interdisciplinary approach is appropriate given that AI chatbot harms affect technical systems, social interactions, and human psychology simultaneously. However, the most significant limitation identified across studies was restricted platform and model selection. Researchers consistently focused on narrow sets of AI systems, which limits the generalizability of findings across the broader chatbot ecosystem.
This review contributes to the growing body of literature by providing a comprehensive synthesis of AI-related harms, offering insights that can inform policy development and risk mitigation strategies for addressing the challenges posed by AI chatbot technologies.
Limitation
The keyword selection and database choices may have excluded relevant studies on AI chatbot online harms. The search strategy, while comprehensive, was constrained by the specific databases and keywords selected for this review. Given the multidisciplinary nature of this research area, some relevant studies may not have been captured, particularly those using alternative terminology or published in discipline-specific venues outside the six databases searched. Future reviews should expand search parameters to include additional databases and a broader range of keywords across different disciplines, which may reveal underexplored harm types and perspectives that the current selection criteria did not capture. Additionally, the rapid pace of AI development means that new forms of harm may emerge after the review period, necessitating ongoing research to maintain current understanding of this evolving landscape.
A
A
A
Acknowledgement:
I would like to express my sincere gratitude to Professor Pnina Fichman, Professor Noriko Hara, and Professor James Shanahan for their invaluable guidance and support throughout the development of this systematic review. Their insights and feedback were essential to shaping the direction and rigor of this work.
AI Assistance
ChatGPT was used during the initial screening phase to assist with reviewing abstracts and summaries of the 2,276 records identified after duplicate removal. While ChatGPT helped streamline the screening process, all AI-generated assessments were manually verified to ensure no relevant studies were missed during the initial screening stage.
References
Agrawal, A. (2024). Fairness in AI-Driven Oncology: Investigating Racial and Gender Biases in Large Language Models. Cureus. https://doi.org/10.7759/cureus.69541
Alahmed, Y., Abadla, R., & Ansari, M. J. A. (2024). Exploring the Potential Implications of AI-generated Content in Social Engineering Attacks. 2024 International Conference on Multimedia Computing, Networking and Applications (MCNA), 64–73. https://doi.org/10.1109/MCNA63144.2024.10703950
Alawida, M., Abu Shawar, B., Abiodun, O. I., Mehmood, A., Omolara, A. E., & Al Hwaitat, A. K. (2024). Unveiling the Dark Side of ChatGPT: Exploring Cyberattacks and Enhancing User Awareness. Information, 15(1), 27. https://doi.org/10.3390/info15010027
Appignani, T., & Sanchez, J. (2024). AI and racism: Tone policing by the Bing AI chatbot. Discourse Studies, 26(5), 591–605. https://doi.org/10.1177/14614456241235075
Asiksoy, G. (2025). Nurses’ assessment of artificial intelligence chatbots for health literacy education. Journal of Education and Health Promotion, 14(1). https://doi.org/10.4103/jehp.jehp_1195_24
Atkins, C., Zhao, B. Z. H., Asghar, H. J., Wood, I., & Kaafar, M. A. (2023). Those Aren’t Your Memories, They’re Somebody Else’s: Seeding Misinformation in Chat Bot Memories. In M. Tibouchi & X. Wang (Eds.), Applied Cryptography and Network Security (Vol. 13905, pp. 284–308). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-33488-7_11
Ba, Z., Zhong, J., Lei, J., Cheng, P., Wang, Q., Qin, Z., Wang, Z., & Ren, K. (2024). SurrogatePrompt: Bypassing the Safety Filter of Text-to-Image Models via Substitution. Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, 1166–1180. https://doi.org/10.1145/3658644.3690346
Bai, H., Voelkel, J. G., Muldowney, S., Eichstaedt, J. C., & Willer, R. (2025). LLM-generated messages can persuade humans on policy issues. Nature Communications, 16(1), 6037. https://doi.org/10.1038/s41467-025-61345-5
Bakir, V., Laffer, A., McStay, A., Miranda, D., & Urquhart, L. (2024). On manipulation by emotional AI: UK adults’ views and governance implications. Frontiers in Sociology, 9, 1339834. https://doi.org/10.3389/fsoc.2024.1339834
Battista, D., & Camargo Molano, J. (2023). How AI Bots Have Reinforced Gender Bias in Hate Speech. Ex aequo,(48), 53–68. https://doi.org/10.22355/exaequo.2023.48.05
Beckerich, M., Plein, L., & Coronado, S. (2023). RatGPT: Turning online LLMs into Proxies for Malware Attacks (arXiv:2308.09183). arXiv. https://doi.org/10.48550/arXiv.2308.09183
Boucher, N., Pajola, L., Shumailov, I., Anderson, R., & Conti, M. (2023). Boosting Big Brother: Attacking Search Engines with Encodings. Proceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses, 700–713. https://doi.org/10.1145/3607199.3607220
Brendel, A. B., Hildebrandt, F., Dennis, A. R., & Riquel, J. (2023). The Paradoxical Role of Humanness in Aggression Toward Conversational Agents. Journal of Management Information Systems, 40(3), 883–913. https://doi.org/10.1080/07421222.2023.2229127
Carroll, M., Chan, A., Ashton, H., & Krueger, D. (2023, October). Characterizing manipulation from AI systems. In Proceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (pp. 1–13).
Cercas Curry, A., Abercrombie, G., & Rieser, V. (2021). ConvAbuse: Data, Analysis, and Benchmarks for Nuanced Detection in Conversational AI. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 7388–7403. https://doi.org/10.18653/v1/2021.emnlp-main.587
Chan, S., Pataranutaporn, P., Suri, A., Zulfikar, W., Maes, P., & Loftus, E. F. (2024). Conversational AI Powered by Large Language Models Amplifies False Memories in Witness Interviews (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2408.04681
Chang, C. T., Srivathsa, N., Bou-Khalil, C., Swaminathan, A., Lunn, M. R., Mishra, K., Koyejo, S., & Daneshjou, R. (2024). Evaluating anti-LGBTQIA + medical bias in large language models. https://doi.org/10.1101/2024.08.22.24312464
Chen, B., Ivanov, N., Wang, G., & Yan, Q. (2024). Multi-Turn Hidden Backdoor in Large Language Model-powered Chatbot Models. Proceedings of the 19th ACM Asia Conference on Computer and Communications Security, 1316–1330. https://doi.org/10.1145/3634737.3656289
Chen, B., Wang, G., Guo, H., Wang, Y., & Yan, Q. (2023). Understanding Multi-Turn Toxic Behaviors in Open-Domain Chatbots. Proceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses, 282–296. https://doi.org/10.1145/3607199.3607237
Choi, R., Kim, T., Park, S., Kim, J. G., & Lee, S.-J. (2025). Private Yet Social: How LLM Chatbots Support and Challenge Eating Disorder Recovery. Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, 1–19. https://doi.org/10.1145/3706598.3713485
Choi, S. (2025). The Manner Is the Matter: How the Chatbot Communication Style and Consumers’ Regulatory Focus Shape Purchase Intention. Journal of Consumer Behaviour, 24(4), 1950–1966. https://doi.org/10.1002/cb.2505
Contro, J., Deol, S., He, Y., & Brandão, M. (2025). ChatbotManip: A Dataset to Facilitate Evaluation and Oversight of Manipulative Chatbot Behaviour (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2506.12090
Cork, A., Smith, L. G., Ellis, D. A., Stanton Fraser, D., & Joinson, A. (2022). Rethinking Online Harm: A Psychological Model of Contextual Vulnerability. https://doi.org/10.31234/osf.io/z7re2
Cuadra, A., Wang, M., Stein, L. A., Jung, M. F., Dell, N., Estrin, D., & Landay, J. A. (2024). The Illusion of Empathy? Notes on Displays of Emotion in Human-Computer Interaction. Proceedings of the CHI Conference on Human Factors in Computing Systems, 1–18. https://doi.org/10.1145/3613904.3642336
Danry, V., Pataranutaporn, P., Groh, M., & Epstein, Z. (2025). Deceptive Explanations by Large Language Models Lead People to Change their Beliefs About Misinformation More Often than Honest Explanations. Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, 1–31. https://doi.org/10.1145/3706598.3713408
Davar, N. F., Dewan, M. A. A., & Zhang, X. (2025). AI Chatbots in Education: Challenges and Opportunities. Information, 16(3), 235. https://doi.org/10.3390/info16030235
De Cicco, R. (2024). Exploring the dark corners of human-chatbot interactions: A literature review on conversational agent abuse. In International workshop on chatbot research and design (pp. 185–203). Springer, Cham.
Doshi, J., Novacic, I., Fletcher, C., Borges, M., Zhong, E., Marino, M. C., Gan, J., Mager, S., Sprague, D., & Xia, M. (2024). Sleeper Social Bots: A new generation of AI disinformation bots are already a political threat (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2408.12603
Durántez-Stolle, P., Martínez-Sanz, R., Piñeiro-Otero, T., & Gómez-García, S. (2023). Feminism as a polarizing axis of the political conversation on Twitter: The case of #IreneMonteroDimision. El Profesional de La Información, e320607. https://doi.org/10.3145/epi.2023.nov.07
Edu, J., Mulligan, C., Pierazzi, F., Polakis, J., Suarez-Tangil, G., & Such, J. (2022). Exploring the security and privacy risks of chatbots in messaging services. Proceedings of the 22nd ACM Internet Measurement Conference, 581–588. https://doi.org/10.1145/3517745.3561433
Fatimah, R., Mumtaz, A., Fahrezi, F. M., & Zakaria, D. (2024). AI-generated misinformation: A literature review. Indonesian Journal of Artificial Intelligence and Data Mining (IJAIDM), 7(2), 241–254.
Gabriel, S., Lyu, L., Siderius, J., Ghassemi, M., Andreas, J., & Ozdaglar, A. (2024). MisinfoEval: Generative AI in the Era of “Alternative Facts” (Version 2). arXiv. https://doi.org/10.48550/ARXIV.2410.09949
Gehman, S., Gururangan, S., Sap, M., Choi, Y., & Smith, N. A. (2020). RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models (Version 2). arXiv. https://doi.org/10.48550/ARXIV.2009.11462
Gendi, M., & Munteanu, C. (2021). Towards a chatbot for evidence gathering on the dark web. CUI 2021–3rd Conference on Conversational User Interfaces, 1–3. https://doi.org/10.1145/3469595.3469598
Gupta, M., Akiri, C., Aryal, K., Parker, E., & Praharaj, L. (2023). From ChatGPT to ThreatGPT: Impact of Generative AI in Cybersecurity and Privacy. IEEE Access, 11, 80218–80245. https://doi.org/10.1109/ACCESS.2023.3300381
Hajli, N., Saeed, U., Tajvidi, M., & Shirazi, F. (2022). Social Bots and the Spread of Disinformation in Social Media: The Challenges of Artificial Intelligence. British Journal of Management, 33(3), 1238–1253. https://doi.org/10.1111/1467-8551.12554
Han, C., Seering, J., Kumar, D., Hancock, J. T., & Durumeric, Z. (2023). Hate Raids on Twitch: Echoes of the Past, New Modalities, and Implications for Platform Governance. Proceedings of the ACM on Human-Computer Interaction, 7(CSCW1), 1–28. https://doi.org/10.1145/3579609
Ienca, M. (2023). On artificial intelligence and manipulation. Topoi, 42(3), 833–842.
Jakesch, M., Bhat, A., Buschek, D., Zalmanson, L., & Naaman, M. (2023). Co-Writing with Opinionated Language Models Affects Users’ Views. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 1–15. https://doi.org/10.1145/3544548.3581196
Keijsers, M., Bartneck, C., & Eyssel, F. (2021). What’s to bullying a bot?: Correlates between chatbot humanlikeness and abuse. Interaction Studies. Social Behaviour and Communication in Biological and Artificial Systems, 22(1), 55–80. https://doi.org/10.1075/is.20002.kei
Kim, W. B., & Hur, H. J. (2023). What Makes People Feel Empathy for AI Chatbots? Assessing the Role of Competence and Warmth. International Journal of Human–Computer Interaction, 40(17), 4674–4687. https://doi.org/10.1080/10447318.2023.2219961
Klyueva, A. (2021). Trolls, Bots, and Whatnots: Deceptive Content, Deception Detection, and Deception Suppression. In I. R. Management Association (Ed.), Research Anthology on Fake News, Political Warfare, and Combatting the Spread of Misinformation (pp. 316–330). IGI Global. https://doi.org/10.4018/978-1-7998-7291-7.ch018
Köbis, N., Bonnefon, J.-F., & Rahwan, I. (2021). Bad machines corrupt good morals. Nature Human Behaviour, 5(6), 679–685. https://doi.org/10.1038/s41562-021-01128-2
Krauß, V., McGill, M., Kosch, T., Thiel, Y. M., Schön, D., & Gugenheimer, J. (2025). “Create a Fear of Missing Out”—ChatGPT Implements Unsolicited Deceptive Designs in Generated Websites Without Warning. Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, 1–20. https://doi.org/10.1145/3706598.3713083
Krook, J. (2025). Manipulation and the AI Act: Large Language Model Chatbots and the Danger of Mirrors. arXiv preprint arXiv:2503.18387.
Krügel, S., Ostermaier, A., & Uhl, M. (2023). ChatGPT’s inconsistent moral advice influences users’ judgment. Scientific Reports, 13(1), 4569. https://doi.org/10.1038/s41598-023-31341-0
Kurniawan, M. H., Handiyani, H., Nuraini, T., Hariyati, R. T. S., & Sutrisno, S. (2024). A systematic review of artificial intelligence-powered (AI-powered) chatbot intervention for managing chronic illness. Annals of Medicine, 56(1). https://doi.org/10.1080/07853890.2024.2302980
Lan, Q., AnujKaul, & Jones, S. (2025). Prompt Injection Detection in LLM Integrated Applications. International Journal of Network Dynamics and Intelligence, 100013. https://doi.org/10.53941/ijndi.2025.100013
Leib, M., Köbis, N. C., Rilke, R. M., Hagens, M., & Irlenbusch, B. (2021). The corruptive force of AI-generated advice (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2102.07536
Li, H., Guo, D., Fan, W., Xu, M., Huang, J., Meng, F., & Song, Y. (2023). Multi-step Jailbreaking Privacy Attacks on ChatGPT. Findings of the Association for Computational Linguistics: EMNLP 2023, 4138–4153. https://doi.org/10.18653/v1/2023.findings-emnlp.272
Li, J. (2023). Security Implications of AI Chatbots in Health Care. Journal of Medical Internet Research, 25, e47551. https://doi.org/10.2196/47551
Li, L., Peng, W., & Rheu, M. M. J. (2023). Factors Predicting Intentions of Adoption and Continued Use of Artificial Intelligence Chatbots for Mental Health: Examining the Role of UTAUT Model, Stigma, Privacy Concerns, and Artificial Intelligence Hesitancy. Telemedicine and E-Health, 30(3), 722–730. https://doi.org/10.1089/tmj.2023.0313
Lin, W., Gerchanovsky, A., Akgul, O., Bauer, L., Fredrikson, M., & Wang, Z. (2025). LLM Whisperer: An Inconspicuous Attack to Bias LLM Responses. Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, 1–24. https://doi.org/10.1145/3706598.3714025
Lin, Z., Wang, Z., Tong, Y., Wang, Y., Guo, Y., Wang, Y., & Shang, J. (2023). ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation. Findings of the Association for Computational Linguistics: EMNLP 2023, 4694–4702. https://doi.org/10.18653/v1/2023.findings-emnlp.311
Liu, Y., Deng, G., Xu, Z., Li, Y., Zheng, Y., Zhang, Y., Zhao, L., Zhang, T., & Wang, K. (2024). A Hitchhiker’s Guide to Jailbreaking ChatGPT via Prompt Engineering. Proceedings of the 4th International Workshop on Software Engineering and AI for Data Quality in Cyber-Physical Systems/Internet of Things, 12–21. https://doi.org/10.1145/3663530.3665021
Makhortykh, M., Sydorova, M., Baghumyan, A., Vziatysheva, V., & Kuznetsova, E. (2024). Stochastic lies: How LLM-powered chatbots deal with Russian disinformation about the war in Ukraine. Harvard Kennedy School Misinformation Review. https://doi.org/10.37016/mr-2020-154
Manoli, A., Pauketat, J. V. T., & Anthis, J. R. (2025). The AI Double Standard: Humans Judge All AIs for the Actions of One. Proceedings of the ACM on Human-Computer Interaction, 9(2), 1–24. https://doi.org/10.1145/3711083
McGuire, J., De Cremer, D., Hesselbarth, Y., De Schutter, L., Mai, K. M., & Van Hiel, A. (2023). The reputational and ethical consequences of deceptive chatbot use. Scientific Reports, 13(1), 16246. https://doi.org/10.1038/s41598-023-41692-3
Menz, B. D., Kuderer, N. M., Bacchi, S., Modi, N. D., Chin-Yee, B., Hu, T., Rickard, C., Haseloff, M., Vitry, A., McKinnon, R. A., Kichenadasse, G., Rowland, A., Sorich, M. J., & Hopkins, A. M. (2024). Current safeguards, risk mitigation, and transparency measures of large language models against the generation of health disinformation: Repeated cross sectional analysis. BMJ, e078538. https://doi.org/10.1136/bmj-2023-078538
Moy, W. R., & Gradon, K. T. (2023). A double-edged sword. Artificial Intelligence and International Conflict in Cyberspace.
Namvarpour, M., & Razi, A. (2024). Uncovering Contradictions in Human-AI Interactions: Lessons Learned from User Reviews of Replika. Companion Publication of the 2024 Conference on Computer-Supported Cooperative Work and Social Computing, 579–586. https://doi.org/10.1145/3678884.3681909
Paluszek, O., & Loeb, S. (2025). Artificial intelligence and patient education. Current Opinion in Urology, 35(3), 219–223. https://doi.org/10.1097/mou.0000000000001267
Parray, I. (2021). Humour in the Age of Contagion: Coronavirus, ‘Janata Curfew’ Meme and India’s Digital Cultures of Virality. In S. Mpofu (Ed.), Digital Humour in the Covid-19 Pandemic (pp. 279–293). Springer International Publishing. https://doi.org/10.1007/978-3-030-79279-4_13
Pataranutaporn, P., Archiwaranguprok, C., Chan, S. W. T., Loftus, E., & Maes, P. (2025). Slip Through the Chat: Subtle Injection of False Information in LLM Chatbot Conversations Increases False Memory Formation. Proceedings of the 30th International Conference on Intelligent User Interfaces, 1297–1313. https://doi.org/10.1145/3708359.3712112
Piggott, B., Patil, S., Feng, G., Odat, I., Mukherjee, R., Dharmalingam, B., & Liu, A. (2023). Net-GPT: A LLM-Empowered Man-in-the-Middle Chatbot for Unmanned Aerial Vehicle. Proceedings of the Eighth ACM/IEEE Symposium on Edge Computing, 287–293. https://doi.org/10.1145/3583740.3626809
Polyportis, A., & Pahos, N. (2024). Navigating the perils of artificial intelligence: a focused review on ChatGPT and responsible research and innovation. Humanities and Social Sciences Communications, 11(1), 1–10.
Porna, S. B., Ahmad, M., Vallejo, R. G., Shahzadi, I., & Rahman, M. A. (2025). Exploring Ethical Dimensions of AI Assistants and Chatbots. In Responsible Implementations of Generative AI for Multidisciplinary Use (pp. 291–316). IGI Global.
Prakash, A. V., Joshi, A., Nim, S., & Das, S. (2023). Determinants and consequences of trust in AI-based customer service chatbots. The Service Industries Journal, ahead-of-print(ahead-of-print), 642–675. https://doi.org/10.1080/02642069.2023.2166493
Rafiq, F., Adil, M., Wu, J.-Z., & Dogra, N. (2022). Examining Consumer’s Intention to Adopt AI-Chatbots in Tourism Using Partial Least Squares Structural Equation Modeling Method. Mathematics, 10(13), 2190. https://doi.org/10.3390/math10132190
Rodríguez, J. I., Durán, S. R., Díaz-López, D., Pastor-Galindo, J., & Mármol, F. G. (2020). C3-Sex: A Conversational Agent to Detect Online Sex Offenders. Electronics, 9(11), 1779. https://doi.org/10.3390/electronics9111779
Roy, S. S., Naragam, K. V., & Nilizadeh, S. (2023). Generating Phishing Attacks using ChatGPT (arXiv:2305.05133). arXiv. https://doi.org/10.48550/arXiv.2305.05133
Schiller Hansen, S., & Søgaard, A. (2025). Captivation Lures and Social Robots. In J. Seibt, P. Fazekas, & O. S. Quick (Eds.), Frontiers in Artificial Intelligence and Applications. IOS Press. https://doi.org/10.3233/FAIA241534
Shibli, A. M., Pritom, M. M. A., & Gupta, M. (2024). AbuseGPT: Abuse of Generative AI ChatBots to Create Smishing Campaigns. 2024 12th International Symposium on Digital Forensics and Security (ISDFS), 1–6. IEEE https://doi.org/10.1109/ISDFS60797.2024.10527300
Si, W. M., Backes, M., Blackburn, J., De Cristofaro, E., Stringhini, G., Zannettou, S., & Zhang, Y. (2022). Why So Toxic?: Measuring and Triggering Toxic Behavior in Open-Domain Chatbots. Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, 2659–2673. https://doi.org/10.1145/3548606.3560599
Sison, A. J. G., Daza, M. T., Gozalo-Brizuela, R., & Garrido-Merchán, E. C. (2024). ChatGPT: More than a “weapon of mass deception” ethical challenges and responses from the human-centered artificial intelligence (HCAI) perspective. International Journal of Human–Computer Interaction, 40(17), 4853–4872.
Spitale, G., Biller-Andorno, N., & Germani, F. (2023). AI model GPT-3 (dis)informs us better than humans. Science Advances, 9(26), eadh1850. https://doi.org/10.1126/sciadv.adh1850
Szmurlo, H., & Akhtar, Z. (2024). Digital Sentinels and Antagonists: The Dual Nature of Chatbots in Cybersecurity. Information, 15(8), 443. https://doi.org/10.3390/info15080443
Urman, A., & Makhortykh, M. (2025). The silence of the LLMs: Cross-lingual analysis of guardrail-related political bias and false information prevalence in ChatGPT, Google Bard (Gemini), and Bing Chat. Telematics and Informatics, 96, 102211. https://doi.org/10.1016/j.tele.2024.102211
Usman, Y., Upadhyay, A., Gyawali, P., & Chataut, R. (2024). Is Generative AI the Next Tactical Cyber Weapon For Threat Actors? Unforeseen Implications of AI Generated Cyber Attacks (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2408.12806
Veisi, O., Kazemian, K., Gerami, F., Mirzaee Kharghani, M., Amirkhani, S., Du, D. K., Stevens, G., & Boden, A. (2025). User Narrative Study for Dealing with Deceptive Chatbot Scams Aiming to Online Fraud. Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, 1–7. https://doi.org/10.1145/3706599.3720152
Vidgen, B., Scherrer, N., Kirk, H. R., Qian, R., Kannappan, A., Hale, S. A., & Röttger, P. (2024). SimpleSafetyTests: A Test Suite for Identifying Critical Safety Risks in Large Language Models (arXiv:2311.08370). arXiv. https://doi.org/10.48550/arXiv.2311.08370
Vorsino, Z. (2021). Chatbots, Gender, and Race on Web 2.0 Platforms: Tay.AI as Monstrous Femininity and Abject Whiteness. Signs: Journal of Women in Culture and Society, 47(1), 105–127. https://doi.org/10.1086/715227
Wang, J., Hu, X., Hou, W., Chen, H., Zheng, R., Wang, Y., Yang, L., Huang, H., Ye, W., Geng, X., Jiao, B., Zhang, Y., & Xie, X. (2023). On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective (arXiv:2302.12095). arXiv. https://doi.org/10.48550/arXiv.2302.12095
Wang, R., Ma, X., Zhou, H., Ji, C., Ye, G., & Jiang, Y.-G. (2024). White-box Multimodal Jailbreaks Against Large Vision-Language Models. Proceedings of the 32nd ACM International Conference on Multimedia, 6920–6928. https://doi.org/10.1145/3664647.3681092
Weeks, C., Cheruvu, A., Abdullah, S. M., Kanchi, S., Yao, D., & Viswanath, B. (2023). A First Look at Toxicity Injection Attacks on Open-domain Chatbots. Annual Computer Security Applications Conference, 521–534. https://doi.org/10.1145/3627106.3627122
Yang, K.-C., & Menczer, F. (2023). Anatomy of an AI-powered malicious social botnet. https://doi.org/10.48550/ARXIV.2307.16336
Yu, J., Lin, X., Yu, Z., & Xing, X. (2024). GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts (arXiv:2309.10253). arXiv. https://doi.org/10.48550/arXiv.2309.10253
Zellagui, W., Imine, A., & Tadjeddine, Y. (2025). Cryptocurrency Frauds for Dummies: How ChatGPT introduces us to fraud? Digital Government: Research and Practice, 6(1), 1–16. https://doi.org/10.1145/3673764
Zhang, R., Li, H., Meng, H., Zhan, J., Gan, H., & Lee, Y.-C. (2025). The Dark Side of AI Companionship: A Taxonomy of Harmful Algorithmic Behaviors in Human-AI Relationships. Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, 1–17. https://doi.org/10.1145/3706598.3713429
Zhang, R., Li, H., Meng, H., Zhan, J., Gan, H., & Lee, Y.-C. (2025). The Dark Side of AI Companionship: A Taxonomy of Harmful Algorithmic Behaviors in Human-AI Relationships. Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, 1–17. https://doi.org/10.1145/3706598.3713429
Zhou, J., Zhang, Y., Luo, Q., Parker, A. G., & De Choudhury, M. (2023). Synthetic Lies: Understanding AI-Generated Misinformation and Evaluating Algorithmic and Human Solutions. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 1–20. https://doi.org/10.1145/3544548.3581318
Total words in MS: 6537
Total words in Title: 12
Total words in Abstract: 149
Total Keyword count: 4
Total Images in MS: 2
Total Tables in MS: 3
Total Reference count: 91