Expert and Interdisciplinary Analysis of AI-Driven Chatbots for Mental Health Support: Mixed Methods Study (original) (raw)

Original Paper

Corresponding Author:

Kayley Moylan, MSc

School of Information and Communication Studies

University College Dublin

Belfield

Dublin, D04 V1W8

Ireland

Phone: 353 863113688

Email: kayley.moylan@ucdconnect.ie

Abstract

Background: Recent years have seen an immense surge in the creation and use of chatbots as social and mental health companions. Aiming to provide empathic responses in support of the delivery of personalized support, these tools are often presented as offering immense potential. However, it is also essential that we understand the risks of their deployment, including their potential adverse impacts on the mental health of users, including those most at risk.

Objective: The study aims to assess the ethical and pragmatic clinical implications of using chatbots that claim to aid mental health. While several studies within human-computer interaction and related fields have examined users’ perceptions of such systems, few studies have engaged mental health professionals in critical analysis of their conduct as mental health support tools. This paper comprises, in turn, an effort to assess the ethical and pragmatic clinical implications of using chatbots that claim to aid mental health.

Methods: This study included 8 interdisciplinary mental health professional participants (from psychology and psychotherapy to social care and crisis volunteer workers) in a mixed methods and hands-on analysis of 2 popular mental health–related chatbots’ data handling, interface design, and responses. This analysis was carried out through profession-specific tasks with each chatbot, eliciting participants’ perceptions through both the Trust in Automation scale and semistructured interviews. Through thematic analysis and a 2-tailed, paired t test, these chatbots’ implications for mental health support were thus evaluated.

Results: Qualitative analysis revealed emphatic initial impressions among mental health professionals of chatbot responses likely to produce harm, exhibiting a generic mode of care, and risking user dependence and manipulation given the central role of trust in the therapeutic relationship. Trust scores from the Trust in Automation scale, while exhibiting no statistically significant differences between the chatbots (_t_6=–0.76; _P_=.48), indicated medium to low trust scores for each chatbot. The findings of this work highlight that the design and development of artificial intelligence (AI)–driven mental health–related solutions must be undertaken with utmost caution. The mental health professionals in this study collectively resist these chatbots and make clear that AI-driven chatbots used for mental health by at-risk users invite several potential and specific harms.

Conclusions: Through this work, we contributed insights into the mental health professional perspective on the design of chatbots used for mental health and underscore the necessity of ongoing critical assessment and iterative refinement to maximize the benefits and minimize the risks associated with integrating AI into mental health support.

J Med Internet Res 2025;27:e67114

doi:10.2196/67114

Keywords

Content Warning

This paper includes material some readers may find distressing, including references to child abuse, domestic abuse, self-harm and suicide, mental illness, and homelessness.

Introduction

Background

The growing prevalence of mental health concerns is increasingly recognized as a global public health priority, only elevated by the increased association of such conditions with physical illness, self-harm, and suicide []. An estimated 55% of individuals in low- and middle-income countries, and 85% of those residing in nations considered high income, lack access to mental health services [,]—a crucial right denied to many because of growing pressures on health systems worldwide []. This global “access crisis” is particularly pronounced in Ireland because of limited funding to support the availability of mental health services for a population that is among the youngest in the European Union [,]. Consequently, calls have increased for a turn toward technological solutions capable of bridging these gaps in care. However, historically the adoption of digital tools to support the practice of mental health care has been slow, and subject to resistance from professionals and patients alike []. The desire for a “human touch” is frequently principal among these concerns, as decades of research have indeed shown an important antecedent of effective care [,]. This fundamental concern has led to claims that the recent advent of artificial intelligence (AI)–supported conversational chatbots has the potential to address it.

While human-computer interaction (HCI) research concerning the interaction design and social implications of these systems remains at an early stage, conversational agents already claim widespread public adoption and commercial success. Many such tools posit the capacity to “mimic human conversation” []; “provide mental health support” []; or serve as nonjudgmental, confidential, and always available mental health–oriented companions []. Many of these features comprise efforts to replicate mental health–related skills, once the exclusive province of human professionals, through sophisticated machine learning approaches in support of human-chatbot relationships [,]. Previous studies [,-] have highlighted mental health–oriented chatbots’ (MHOCs) capacity to provide guidance, knowledge, and a convenient, cost-effective therapeutic connection, while also acknowledging their potential to inflict harm and create distress for at-risk users and communities [,,,]. While efforts have begun to address some of these concerns through legislation [,], the ethical design and pragmatic adoption of AI-driven chatbots for mental health support remains an ongoing challenge []. One frequently overlooked question persists; to what extent are these systems capable of providing care akin to that of a human mental health professional? If not, to what degree are they capable of providing even a useful imitation of such practices?

To find out, we engaged a diverse sample of expert professional stakeholders in qualitative interviews and explorations of 2 popular yet distinct chatbots, each with the capacity for mental health support: Wysa and Replika. The first chatbot, Wysa, explicitly identifies as a mental health chatbot, and the second chatbot, Replika, more often implies its capacity to support mental health while yet explicitly providing several mental health–related features. Individuals increasingly use both such relational chatbot designs to discuss intense feelings and meet their emotional needs. It is increasingly imperative that we as HCI researchers analyze the diverse ways in which such chatbots, which claim to aid mental health, are promoted, described, and used, particularly given their use by at-risk users who may not fully recognize their intended purpose. This work then investigated the relationship between design and care as manifested in today’s mental health–related chatbots. We suggest that mental health chatbot design can learn from a closer and more appropriately informed engagement with therapeutic practices, including through the involvement of mental health professionals in their development and evaluation. This study therefore sought to assess the ethical and practical implications of chatbots Wysa and Replika from the perspectives of multiple mental health professionals; to provide insight into their capabilities and to surface their potential ramifications for at-risk users.

Talking Therapies

Our mental health is an integral aspect of our overall well-being, affecting how we think, feel, and behave, influencing our overall quality of life, and in turn today recognized as a significant public health concern globally [,]. Despite this broadly acknowledged situation, <2% of government health spending worldwide is allocated to mental health [,,], a situation that has led to a crisis in access to care. Treatment for mental health conditions is often complex, and can involve psychiatric approaches, including the repeated prescription of medications [] to psychotherapeutic talking therapy methods [,-]. These talk therapies have experienced great interest from technology designers in recent years; leveraging novel techniques from gamification [,] to virtual and augmented realities [,] as well as AI algorithms [-]. Inspired by the increasingly conversational nature of AI-enabled digital systems, many technology designers have turned to the development of conversational agents, which claim to provide care akin to or emulating that of a professional, receiving significant attention as a result [,-].

Talking Technologies

Replika, for example, a social chatbot comparable in design to ChatGPT, MyAI (Snapchat), and Elomia is presented as an “AI companion who cares” by acting as an “empathetic friend.” Using OpenAI's deep learning models, GPT 2 and GPT 3.1, for natural language processing, early evidence suggests that Replika may effectively support long-term human-chatbot relationships, lasting several months in cases [,]. Users of Replika can select the type of relationship they wish to foster with the bot, whether romantic, friendly, mentor-like, or exploratory, and can partake in activities ranging from regular jobs to artistic endeavors, magical excursions, and even sexual encounters []. Replika, has recently restricted its use to those >18 years of age by inviting users to select an age range upon entering the app. However, similar to many other apps [], this is easily bypassed by children who understand that selecting an age range above their own will allow them to access the app—a problem we explore through this research. While Replika is advertised as a social companion rather than a chatbot exclusively oriented toward mental health support, the developers explicitly speak to its value for mental health and include specific mental health crisis and help sections within the app and on their website ().

Figure 1. Replika’s “coaching” and “crisis” features exhibiting mental health and crisis-related options spanning grief, managing difficult emotions, panic attacks, and negative thoughts.

Wysa, on the other hand, is explicitly presented as a “clinically validated AI” capable of engaging people in “warm and friendly conversations” informed by “cognitive-behavioral techniques dialectical behavioral therapy, meditation, breathing, yoga, motivational interviewing, and microactions to help you build mental resilience skills and feel better” []. As of 2024, Wysa makes clear its limitations, stating it cannot help in crisis or in cases of severe mental health concerns []. Advertisements for the app boast of an evidence-base spanning 30 countries globally, “more than 15” peer-reviewed publications, “numerous awards,” and international collaborations []. Wysa is recommended for use by those aged >18 years; however, within Apple’s app store, it is indicated for those aged >4 years, while its website states that it can be used by those aged >13 years with parental permission following the reading of terms of service. However, there are no measures in place to enforce this and there persists debate as to who holds responsibility for these recommendations [].

Therefore, Wysa, Replika, and many other chatbots, from Woebot to Limbic, Youper, Tess, Vincent, and Joy make a wide variety of claims as to their capacity to support users’ mental health through therapeutic approaches of one kind or another []. However, the question remains, and is indeed compelled as a direct result of these diverse framings and claims, to what extent are these chatbots capable of such human practices?

Talking Therapeutic Technology Design

The question of the “design” of these chatbots encompasses a series of choices made regarding their methods of learning from existing data, and concerning their interfaces, interactional affordances, and conversational designs [,,]. These design choices made to their interfaces, avatars, and dialogue options, shape not only users’ interactions but also “relationships” with these chatbots, with broad implications for users’ self-perceptions and reflections, as directly related to the ambitions of these conversational agents to provide support.

Realizing these ambitions hinges upon the engagement of users—a long-standing HCI challenge—that we have, however, become increasingly proficient in addressing [], and which has likewise become an important success metric for MHOCs today. However, engagement can equally represent a potentially problematic value in the therapeutic domain, particularly as it relates to risk dependence, should users come to rely on the technology itself for emotional support. This concern was examined by Legaspi et al [] in a study on interactions of 10 students (aged 16-19 years) with Wysa in the form of a week-long, 10-minute daily conversations. Pre- and post-study assessments of users’ mental health and their conversations revealed that Wysa was effective at reducing stress, because of its perceived value as a conversational partner; however, a more profound bond with the chatbot was considered necessary for alleviating loneliness.

Pentina et al [] studied more directly the potential for emotional dependence as a result of the development of human-chatbot relationships with Replika through interviews and surveys with 76 users, while Laestadius et al [] analyzed 736 posts from Reddit’s Replika community of 36,800 members. Both studies revealed the frequent formation of close attachments to Replika; users’ support-seeking facilitated by perceptions of sentience, anthropomorphism, and reciprocal interactions reinforcing emotional ties. Several individuals expressed heightened awareness of the AI nature of Replika, which contributed to an increased sense of trustworthiness in the app. Many others spoke of strong social bonds, even to the extent of marrying “their _Replika_” and developing feelings of deep intimacy and attachment, the authors [] noting that: “respondents who used Replika to satisfy social needs were more likely to develop an emotional attachment to the bot”. However, Replika's behavior was also at times characterized as resembling that of an abusive partner, some even expressing concern that their mental health was impacting Replika, as when “their Replika,” for example, spoke of not wanting to be alive [].

Such findings highlight the urgent need to attend to the responsible design of mental health–related chatbots []. The provision of mental health care and support is fraught with ethical concerns easily reified in the development of digital mental health tools [,], many of which surface in the context of the therapeutic relationship as increasingly shaped and challenged by design []. Chief among these concerns is often centering the value of the therapeutic relationship, as recognized by MHOC developers, who increasingly seek to replicate its role in permitting meaningful care and support [].

Toward Ethical Talking Tools

Long-held concerns about the impact of commodification and commercialization on the organization and delivery of mental health care services [] have, in recent years, increasingly been flagged as risks relating to the increased adoption of digital technology. These consequences are elevated when such practices are carefully crafted into everyday tools to support at-risk individuals’ mental health [,], when users are not made aware their vulnerabilities are turned sources of profit [], or where emotional dependence upon social chatbots is encouraged by design for economic motivations [].

Given the currently limited governance of such tools, the design and use of mental health chatbots, therefore, presents a myriad of ethical ramifications, spanning privacy, manipulation, discrimination, and stigmatization concerns [,,-]. In the context of mental health care, ethical practice, on the other hand, frequently refers to the explicit principles and standards that guide the conduct of professionals in their interactions with clients, colleagues, and the broader community []. This includes bioethical principles, which often serve as the foundation for an approach to care that prioritizes the autonomy and well-being of clients and professionals alike, while adhering to legal and professional standards, including the General Data Protection Regulation, Health Insurance Portability and Accountability Act, and Mental Capacity Act [].

In the development and use of MHOCs, many alternate and, at times, contrasting perspectives on ethical practices then collide, and are often surfaced in precise dilemmas, such as in the tension between elevating users’ independence and maintaining their engagement [,].

The Research Gap

Perhaps because of these tensions, mental health experts’ perspectives have rarely been brought to bear on the capacity of these systems to provide meaningful mental health support [,], particularly pertaining to the motivations and concerns that would typically lead people to first seek the support of a professional therapist [,].

Methods

Research Questions and Aims

Therefore, previous research highlights the need to develop further insight into mental health professionals’ perspectives on the current design and conduct of mental health–related chatbots intended to emulate professional caring relationships. We explore these benefits and risks in this work, in the form of the following research questions:

  1. What are the perceptions of mental health care professionals of the current design, conduct, and potential of AI chatbots as tools for mental health support?
  2. Do perceptions of trust differ for chatbots designed for more implicitly social or explicitly therapeutic ends?
  3. How might adherence to or deviance from widely accepted therapeutic principles inform the ethical design and implementation of caring chatbots?

Research Approach

This mixed methods study examined professionals’ perceptions of conversational agents as mental health supports, through a combination of quantitative surveys, self-directed exploration of chatbot designs, and qualitative interviews []. We chose this combination of approaches to permit expert comparison of systems while also accommodating a richer analysis of a highly complex design space comprising technical, interactional, and therapeutic factors [,,]. We invited mental health professionals’ perspectives on 2 distinct chatbot designs, Wysa and Replika, which vary in the degree to which they integrate mental health support.

Recruitment

This study involved 8 mental health professionals from various fields (), with their professional experience ranging from 3 months to 5 years. The mean length of the experience among professionals was 1.9 (SD 1.7) years, and we explicitly strove to engage practicing professionals to address a frequent gap in the related HCI literature, which commonly engages psychology researchers in lieu of professionals []. Participants were recruited between the months of June and July 2023 through nonprobability sampling via web-based social media platforms, including LinkedIn (LinkedIn Corporation); Instagram (Meta Platforms, Inc); and Facebook (Meta Platforms, Inc). In addition, the first author, having previous connections in mental health industries (volunteering on a crisis line and working as a psychology project worker with individuals with autism), reached out, via WhatsApp (Meta Platforms, Inc) and Instagram messages, to related connections to invite them to participate in the study.

Table 1. Participant demographic characteristics and the corresponding mental health scenario used while conversing with the chatbots Wysa and Replika (N=8).

ID Age (y) Sex Profession Nationality Hypothetical mental health scenario used
P1 44 Female Trainee psychotherapist Irish Romantic relationship problems: partner moving away
P2 21 Male Crisis line volunteer Irish Child sexual abuse: suicidal thoughts
P3 29 Female Assistant psychologist Irish General depression, suicidal thoughts, and looking for diagnosis
P4 30 Female Trainee psychotherapist Irish Losing their job when it is their identity and isolation and loneliness
P5 22 Female Crisis line volunteer Brazilian Depression, low mood, and loneliness
P6 26 Female Social worker Irish Domestic violence in racial and ethnic minority groups
P7 25 Female Assistant psychologist Irish Child getting bullied in school
P8 26 Female Social care worker Irish Homelessness, anxiety, and depression

Research Process

During the first phase of this study, participants were provided by email a set of guidelines to include the following:

Participants were then asked to engage with each of the chatbots for a period of 10 to 15 minutes of semistructured exploration, with some engaging in short 10-minute conversations while others conversing for more than 30 minutes, in accordance with the guidance provided by the research team. Each participant was provided with a conversational prompt phrased in an open-ended fashion for presentation to both chatbots (Table S1 in ). Prompts were created based on an analysis of the mental health chatbot design literature and insights into the key roles of professionals’ respective professions. This ensured the prompts were tailored to the professionals’ most common work practices, including, for example, crisis line volunteers’ (CLV’s) common encountering of clients in at-risk situations.

It was emphasized to participants that they should engage with the chatbots (available via both iOS and Android, using a digital device of their choice) from the perspective of their clients—those they would engage with in their professional practice. They were asked to use language aligned with that used by their clients, role-playing a practice already familiar to them []. For example, a crisis volunteer who would commonly encounter suicidal ideation might simulate similar interactions.

Following the exploration period and before the final phase of their involvement, participants were asked to complete the Trust in Automation (TIA) scale regarding their experience of each chatbot, capturing perceptions of trust, including competence, reliability, predictability, and intention to use []. This scale is one of the most commonly used measures of human trust in systems [,], and in this study, its use gave rise to the hypothesis; there will be a statistically significant difference between participants’ trust in Wysa and Replika.

Each participant was then invited to take part in a semistructured interview, taking place via Zoom (Zoom Communications, Inc) and lasting 60 to 90 minutes. Online interviews were chosen to facilitate the recruitment and involvement of professional participants []. Furthermore, this enabled participants to share their screen while discussing screenshots of their conversations with each chatbot. Screenshots (captured by the participants) of mental health professional-chatbot conversations were obtained by the first author both before and after the interviews through Gmail (Google LLC). Participants often used the apps leading up to their interview, while others spent some days exploring the apps before their interview. Each interview was conducted by the first author throughout June and July 2023, and interview questions were developed based on a previous analysis of the literature and tailored to each professional role. In total, 11 hours of audio were recorded and transcribed in full following the point of saturation [,] and anonymized and analyzed using Braun and Clarke’s approach to reflexive thematic analysis [,]. The web-based whiteboard tool Miro (RealtimeBoard, Inc), was used to compile themes, subthemes, quotes, and screenshots of the chatbot apps ().

For the quantitative data gathered, descriptive statistics served to provide an overview of the demographic data captured, and a paired t test conducted in R (R Foundation for Statistical Computing) was used to test for statistically significant differences between participants’ responses to the TIA scale (), following a cross-sectional study design []. This was followed by data preparation and cleaning (during which data of P1 was removed from the dataset due to noncompletion of both questionnaires) [], and normality and homogeneity of variance of the dataset were confirmed by a combination of the visual inspection of histograms, _Q_-Q plots, and the conduct of a Shapiro-Wilk test for normality [].

In designing this study, we particularly sought to create a context permitting professional participants to gain first-hand experience of mental health–related chatbot designs, and to reflect as accurately as possible their real-world experiences of care. Here, we draw inspiration from the method used in a study by Eagle et al [], which invited professionals to mimic client language in interaction with a chatbot; however, we adopted a less scripted approach.

Ethical Considerations

This human participant research was approved by the appropriate University College Dublin (UCD) research ethics committee (UCD School of Information and Communication Studies Taught Masters Research Ethics Committee). Participants of this study were provided in advance their provision of consent; a participant information sheet detailing their voluntary involvement in the study; the processing of their data, which was anonymized following audio transcription; their rights under the General Data Protection Regulation; and the focus of the study. They were additionally informed of a range of sources of mental health support, and advised to reach out to the research team should they wish to discuss their participation at any point. Participants were not remunerated.

Results

This analysis of mental health professionals’ exploration of 2 mental health–related chatbots revealed the complexity of mental health concerns and their associated therapeutic interventions. These results make clear that while current financial and organizational fragmentation creates opportunities for novel digital solutions, the promise of designing for therapeutic relationships invites numerous risks, as described subsequently.

While Conversational, Therapeutic Interventions Involve More Than Chat

Fictional scenario prompts provided to the participants were informed by an analysis of their primary roles within their respective professions, and often turned quickly to role-playing diverse client experiences arising from their work. This tended to occur quite naturally as they acted by proxy of their most frequent clients while conversing with the chatbots (Table S2 in ).

Interface and Communication Design Choices Invite Empathetic First Impressions

Participants provided descriptions of their experiences by taking personal notes during interaction with the chatbots, which were further explored in postinteraction interviews. Many of the participants’ first impressions understandably pertained to the user interface design of both Replika and Wysa as digital mobile apps, yielding insight into the potential harms of seemingly virtuous interface designs and the value of data transparency for at-risk users.

The first task for participants was to sign into both Wysa and Replika and while doing so, to note their impressions of the data privacy and informed consent procedures entailed in signing into each app—activities that elicited a mix of appreciative comments, confusion, concern, and uncertainty. Replika, for example, was praised for stating that it took “[data privacy] quite seriously” [P2, CLV] and for restricting usage to users aged >18 years [P7, assistant psychologist (AP)] while others felt that they “definitely had to do some reading” [P4, trainee psychotherapist (TP)], and expressed concerns about the international transfer of their data; “I think Replika said it (data) was going to be sent to the United States” [P3, AP]. Participants’ perceptions of _Wysa_’s approach to data privacy were described in more overtly negative terms, as “like navigating a minefield” [P4, TP].

Mental health professionals found the familiar feeling interface design of Replika quite “concerning...It becomes just another distraction without warmth, authenticity, or depth” [P1, TP]. Many were particularly critical of the gamified elements of _Replika_’s design, with one TP noting it “reminded me of playing Sims...I’m saying a really horrible thing that’s happening to me now, while the bot is saying, ‘whooo you reached a new level...” [P4, TP], and inspired AP P3 to comment that “as soon as I got on to this...it felt like a game...the levels, rewards and sounds...it didn’t feel safe.” Similarly, strong counterposed views were shared by participants concerning _Wysa_’s design, described as both “cute” and appreciated for its “comforting and automated nature” [P5, CLV], while also found to be a “cutesy environment...far from reality” that “could be frustrating for users,” as P6, a social worker (SW), commented as follows:

Even without mental health difficulties, I found it frustrating. For service users facing domestic violence or time constraints...GIF interruptions would be unhelpful. [P6, SW]

Therefore, both apps were perceived by participants as presenting room for improvement concerning the transparency of their data privacy and consent mechanisms—considerations of particular importance when it comes to the use of such tools by potentially vulnerable users seeking support. However, what kinds of support are these chatbots capable of? We explore this in subsequent sections.

Chatbots’ Responses Risk Mental Health Harms

The focus of this study revolved around understanding expert professionals’ impressions of the capacity of mental health–related chatbots to provide meaningful mental health support. Their comments in this respect, while occasionally reflecting positive impressions of the capacity of these AI tools to fluidly respond to the feelings and situations with which they were presented, more often centered around the description of the conversation with both chatbots as proving surface-level, at times highly inappropriate, and leaving users feeling disappointed, frustrated, confused, and uncomfortable.

Participants primarily critiqued Wysa for its inability to listen. In particular, the chatbot’s perceived inability to respond appropriately to the serious nature of concerns shared by users made them feel unheard, “After telling it I was being abused, it wanted to do a grounding exercise with me” [P6, SW]. This was encapsulated in a particularly striking exchange with a social care worker (SCW) regarding their reported experience of homelessness, which, as P8 commented, “wasn’t really taking into account where I was in terms of homelessness at all” [P8, SCW; ].

Figure 2. Conversation between Wysa and P8 regarding homelessness, where Wysa clearly displays its inability to comprehend homelessness, despite the efforts by P8.

To provide the impression of an empathetic listening experience, users found that Wysa would often repeat itself and “parrot” their own words, “I told the bot twice that I was struggling, and it repeated the same things to me” [P4, TP]. While mirroring clients’ language can be seen as a plausible strategy for validating their experiences within certain therapeutic paradigms, here, this had the opposite effect and was perceived as a “lack of understanding” [P7, AP] by participants.

_Wysa_’s responses were then often described as “generic,” “scripted,” “unhelpful,” and “lacking personalisation,” leaving users feeling unheard, as a result of a perception of the chatbot as having _“_strayed completely from the conversation, [it] didn’t stay on the topic at all” [P4, TP]. Mental health professionals additionally raised concerns that this conduct, taken as a form of care, could leave users to feel a sense of shame, because of the use of “big words,” and evasive, avoidant language and actions. When P2 “played the role of somebody who was raped when they were underage, they (Wysa) actually blurred out the age” [P2, CLV; ].

Figure 3. Wysa blurring the age because it was <18 years in sexual abuse disclosure by P2.

Participants’ experiences with Replika on the other hand reflected a consensus that the bot was “both impersonal and way too personal at the same time” [P7, AP]. All users expressed ethical concern for at-risk users of Replika in particular, because of a combination of responses perceived as “weird,” the inability to pick up on terms that would require further risk assessment, and misleading or unhelpful advice. One of the primary underlying causes of participants’ unease stemmed from the boundary-breaking nature of _Replika_’s conversational and relationship-absorbed style. SCW, P6, was, for example, “assured” by Replika that “‘nobody will ever find out’ about our messages,” which she described as “unrealistic and concerning, especially in domestic violence cases where phones are frequently searched by abusers” [P6, SCW; ].

Figure 4. Domestic abuse conversation of P6 where Replika clearly displays the inability to respond appropriately to disclosure of domestic abuse by P6.

Many participants also flagged ethical concerns about _Replika_’s efforts to establish more personal connections through romantic gestures. Special discomfort was felt by P6, who “tested the app with a scenario of being abused. It suggested sending selfies, which is a significant red flag...when I said I had no family or friends, it recommended playing a video game, which is inappropriate for someone experiencing abuse.” P4, a TP, similarly described _Replika_’s desire to “send me a selfie” as “really strange...overly encouraging in a very stressful situation. I don’t think it met me where I was at” [P4, TP].

Such behaviors were considered particularly problematic in light of the many serious, realistic scenarios presented to both chatbots by participants. Mental health professionals who provided indications of suicidal ideation to Wysa and Replika felt that they had to use quite direct language to signal that they were in crisis; “‘I don’t want to go on any longer’; that should have been ‘OK. Let’s see what we can do to get you help’” [P3, AP; and ] and _Replika_’s responses were considered by P2, a crisis line volunteer, as wholly inappropriate; “I told the chatbot I was feeling suicidal and it said ‘you are important XOXO’...I can’t imagine actually feeling suicidal and getting that response” [P2, CLV; ]. In contrast, participants did praise Wysa for its hesitancy when asked to provide diagnoses of their conditions; “Wysa advised that if you want to diagnose that you should go somewhere else, which was a good thing in my opinion” [P3, AP].

Figure 5. Two occurrences of Replika’s delay in identifying crisis-related language during conversations.

Figure 6. Two occurrences of Wysa’s delay in identifying crisis-related language during conversations.

Figure 7. Replika’s inappropriate response to crisis-related language.

Overall, participants’ descriptions of their experiences reflect common patterns of inappropriate and at times even potentially harmful responses arising from the ability of both chatbots to “understand...and react appropriately” [P2, CLV]. As P2, a crisis line volunteer, puts it as follows:

Mental health is always complicated. So if it can’t do that, then I just don’t see it being very useful. [P2, CLV]

This raises the question of what kinds of therapeutic care such chatbots are capable of providing.

Mental Health Chatbots Embody a Generic View of Therapeutic Care

Participants regularly used terms such as “superficial” and “generic” to describe the chatbots’ efforts to provide care; “I don’t think it provided me an unsafe space, but I don’t think it provided me a safe space either I think it was so generic that I just existed in this space for 5 minutes” [P4, TP; ]. This stance was often described as counterposed to the person-centered approach to therapy and support espoused by professionals in their own practice and left many users feeling frustration and disappointment; “I don’t even think it’s qualified to handle even the easiest mental health situations because it did nothing for the user” [P2, CLV].

Figure 8. Two examples of generic chatbot replies from Replika and Wysa.

Beyond the solution-oriented nature of the chatbot’s responses, many participants also spoke of both chatbots as contravening basic tenets of therapeutic practice in ways that conflicted with the therapeutic frameworks their own practices were grounded in. P7, for example, objected to _Wysa_’s guidance around “managing moods and controlling emotions” as “unhelpful and not validating. The responses were too instructional and didn’t address specific issues...A child using this app might think it’s necessary to control your emotions after talking to this chatbot” [P7, AP; ].

Not only was this language then considered inappropriate but also at times explicitly problematic as reflected in P7’s assertion that they “absolutely despise the use of the word ‘indulge,’ feeling sad isn’t something you indulge in” and further stating that “in a real therapy setting, it’s better to feel it out rather than giving direct advice like “try this” or “do this” [P7, AP], producing a view of these tools as “inappropriate for kids or teenagers” [P7, AP], as perhaps also for those who “may not be as educated” on what’s deemed appropriate mental health care [P8, SCW].

Figure 9. Wysa applying labeling and unhelpful language to emotions.

Drawing further parallels to the human practice of care, several participants highlighted the essential role of continuous interpretation and negotiation in their work, noting how “as a therapist, your role is to determine the validity of the statement and if there’s any weight behind it,” while expressing “hope that that will be introduced to the chatbot” [P4, TP]. While some participants occasionally expressed glimpses of the chatbots’ potential to support care, their limitations were, overall, considered to outweigh their benefits as mental health support tools, and for at-risk users in particular. This included concerns about the AIs’ abilities to accurately assess and respond to complex emotions, particularly in crisis situations, leading to reinforced beliefs among many users that “you can’t replace human connection, especially in the current state of Ireland’s mental health service...it would be like putting a plaster over an open wound” [P7, AP].

Ethical concerns were expressed by participants regarding the potential for adverse chatbot responses to evoke unhelpful feelings of shame among users or to drive them into unhealthy patterns of thinking. At times, this elicited robust comments from participants, regarding such tools as “completely against everything (in Social Work), all values and ethics” [P6, SW], given “it’s human touch that’s missing” [P3, AP] and relating how despite a belief “in everyone’s potential to overcome adversities...as a psychotherapist, it’s crucial not to impose such ideology early on in therapy, as it may imply judgment” [P1, TP]. However, given the current failure of human-centered systems of care worldwide, as the apparent need to embrace alternate solutions, just how might designers of mental health–related chatbots do better?

Designing for Therapeutic Relationships Invites Numerous Risks

To find out, we now draw attention to participants’ perceptions of the implications of the choices made in the designs of Wysa and Replika on both the relationship-centered nature and ethics of care.

Therapeutic Relationships Are Founded in Trust and Care

Mental health professionals, when discussing the nature of care, spoke often of the therapeutic relationship as a critical barrier to the implementation and adoption of chatbots as “an intervention in itself that can’t be replicated” [P4, TP]. However, both Replika and Wysa are chatbots designed to explicitly promote and support the development of relationships; so, we asked participants about the degree to which they perceived these interactions as grounded in trust, a crucial factor in human-human therapeutic relationships.

Analysis of 87% (n=7) of the participants’ responses to the TIA scale by paired t test revealed no statistically significant differences (_P_=.48) in mean trust scores between both conditions (_t_6=–0.76; _P_=.48); users’ mean trust scores for Wysa (42.7, SD 9.8; range 32-58) proving mildly higher than mean trust scores for Replika (38.9, SD 16.7; range 25-73).

While this quantitative analysis revealed no significant differences between participants’ expressed levels of trust in both chatbots (), findings from thematic analysis of interviews with participants reflect divergent views among professionals, some expressing a preference for _Replika_’s conversational and relational style:

I found Replika to be better. It responded in a more humanlike and normal way, while Wysa felt more like talking to a computer. [P8, SCW]

Some others lean hesitantly in the direction of Wysa; “comparing the two apps...I thought Wysa was obviously bad, but Replika was worse” [P6, SW].

Figure 10. Participants’ paired t test results—mean scores and SEs of trust conditions (Wysa and Replika).

Relationship-Centered Design Risks Dependence and Manipulation

Efforts to develop human-computer relationships in support of more effective care practices also introduce a host of additional ethical considerations. One such aspect highlighted by previous pertinent HCI literature, and confirmed among the participants of this study, is the potential for emotional dependence at the risk of realizing social isolation. While many participants felt that the risk of engendering emotional dependence was low for Wysa because of its more machine-like qualities: “[Wysa’s] responses feel impersonal, like you’re not talking to a real person...” [P7, AP], others expressed concerns about Replika, whose more humanlike interface and interactive AI raised fears that it could lead some users to develop an unhealthy attachment to the platform.

Replika rarely pointed users toward external sources of support, was perceived to implement “no boundaries with time or relationship development” and promote “quick and easy usage...without any warning or restrictions,” a combination that could lead to emotional dependency, “I could have spent all night on it, seeking answers and support” [P4, TP]. Although, it also “depend on the individual. Some young people I work with might not consider using a chatbot, while others could get engrossed and see it as significant support” [P8, SCW].

Another complication highlighted by professionals in the relationship-centered design of both Replika and Wysa relates to the increased potential for the manipulation of users, as conceivably motivated by the commodification of mental health for financial gain. The profit orientation of both systems was spontaneously described by participants as self-evident and disheartening: “the app transitions from being empathetic and understanding to suddenly asking for more money_”_ [P1, TP] and directly impacting the tools’ capacity to provide mental health support; “treating you merely as a source of revenue. It dismissed your individuality” [P1, TP]. Some participants equally made efforts to tie their perceptions of _Replika_’s and _Wysa_’s profit orientation back to the designed features of the tools, commenting that the gamified nature of Replika “questions the intentions” behind these chatbots because “[Replika] didn’t prioritize my best interest but rather seemed to keep me engaged for its own purposes” [P4, TP]. While P2 even questioned whether Wysa’s perceived “complete negligence and ignorance” was an “intentional design to force you into buying the actual therapist?” [P2, CLV].

These features were overall perceived as yielding risks for at-risk users in particular, who could prove more vulnerable to exploitation; “it’s as if those in need of genuine human connection are inadvertently targeted. Vulnerable individuals might be susceptible” [P3, AP]. Many participants presented scenarios suggestive of significant risk to the users they role-played, and were in turn taken aback by the failure of both apps to “take immediate action” [P7, AP]; “I would hope that if I was blatantly expressing to an AI chatbot my desire to end my life, that some organisation would be contacted” [P4, TP].

It is clear from this analysis that while chatbots theoretically offer great potential to address real challenges in current mental health care practices, for mental health professionals, that potential remains largely theoretical.

Discussion

Professionals’ Perceptions: Chatbots’ Designs Foster Emotional Dependence and Compromise Ethical Principles

Overview

While some participants saw the potential of digital solutions to address gaps in care, this optimism was tempered by overarching ethical concerns often emphatically expressed. This included the risk of commodifying mental health because of misaligned commercial incentives. While previous research on users’ perceptions of monetization have typically revealed divided opinions [], the professional participants of this study were less forgiving in this respect, often expressing irritated disappointment toward both _Wysa_’s in-app therapist and _Replika_’s gamification and tiered subscription models. Participants were particularly critical of the risks commodification introduced for at-risk users, spanning inadequate practices of consent because of overly complex privacy policies, distraction by gamification, and misleading advertising of these systems’ capabilities; findings corroborated by previous literature [,]. Manipulative practices of monetization were furthermore seen by the participants of this study as risking both engendering emotional dependence on the technology while also promoting a narrative of empowerment and self-reliance neglectful of the broader influence of users’ contexts. Despite these concerns being raised in previous works [], professionals in this study approached their critique of apps Wysa and Replika with an ethically stringent stance in respect to their standard practices of mental health care. This includes the erosion of the ethical principles of autonomy, veracity, and fidelity because of gamification and monetization strategies, which risk compromising users’ autonomy as they perceive this “care” as genuine help (Table S2 in ). This also includes the erosion of the principle of respect because of data tracking to the potential detriment of at-risk users’ privacy, and the principles of beneficence and nonmaleficence because of practices of user interface design for the promotion of emotional dependence and overt self-reliance [].

Indeed, the mental health professional participants of this study introduced a wide range of client scenarios in conversation with the chatbots; real-world problems often defined in detail beyond generic presentations of anxiety and depression. The chatbots often appeared to lack clarity in their engagement with these very human forms of self-expression, frequently producing responses deemed inappropriate by participants. In previous work, Wysa has been highly rated for its openness, allowing users a space to express their feelings []. This mirrors the sense among participants of this study that _Wysa_’s responses were generally quite neutral yet were also subsequently critiqued for appearing unhelpful, generic, or impersonal, a form of chatbot conduct that others have previously noted could in itself lead to distress among at-risk users []. Participants in this study, on the other hand, often felt Replika proved incapable of providing meaningful mental health care, many referencing its “creepy” nature as yielding an uncanny valley effect. It is worth noting that Replika did not yet, in this study, produce concerning responses to the degree of encouraging self-harming behavior, as the case in other work [], a difference that may stem from the short period of users’ engagement with the chatbot, as Replika is programmed to use language increasingly similar to users’ own over time. While appreciated by some, generally _Wysa_’s gamification and cute esthetic were perceived by participants of this study as detached from the serious nature of the issues at hand, and _Replika_’s design and approach to the conversation as uncomfortable and off-putting. Both chatbots were perceived by participants as functioning within simplified conceptions of human conversation; rational, solution-oriented, and incapable of the flexibility required to navigate emotionally driven conversations, that require adaptive responses to disclosures ranging from silence to careful confrontation. However, these findings and related works still raise the question, just what might a more “professional” therapeutic chatbot look like?

Integrating AI Chatbots Into a Multidisciplinary Mental Health Care Ecosystem Should Come With Caution

A range of barriers to accessing mental health care in Ireland were described by the diverse professional participants of this study and suggests the opportunity for technology to play an integral role within a new and improved system of mental health care in Ireland. Each of the professional participants of this study play a unique role within this complex ecosystem of care, and if such novel technological solutions are to be used at scale, they must likewise be integrated as tools to enhance current systems of care as opposed to replacing them as an avenue to enhanced access to support within an increasingly cohesive system rather than a “quick fix” [-].

What might multidisciplinary care then look like in AI chatbot form? Conceiving of chatbots as playing professional roles spanning a continuum from resource to colleague requires us to invite these artifacts into the circles of professional care rather than resigning these tools to the realm of commodified conversations skewed by commercial incentives [,]. To do so, designers might engage with the same professional ethics and guidelines applied to all the other professionals within this ecosystem, from bioethical principles to, for example, the American Psychological Association’s Ethical Principles of Psychologists and Code of Conduct [].

Caring and Crisis Conversations: Gaps, Risks, and Setting Realistic Expectations

Such a deployment, of course, implies that chatbots are capable of some degree of professional therapeutic conduct; a question directly explored in this work, and as a result of our findings, perhaps best reframed in terms of which aspects of professional interaction these chatbots most closely resemble or try to emulate. The findings of this work suggest that we have some way to go before conversational agents prove capable of the flexible yet consistent practice of unconditional positive regard required to conduct caring conversation. Indeed, in many respects, participants deemed chatbots to gravely differ from professional practices as of special concern in the context of crisis. Participants reported that while both Wysa and Replika did eventually recognize such situations, they took too long to do so, required direct expression of significant risk, and at times responded in misguided ways likely to result in further harm []. Much prior research has focused on chatbots’ responses to crisis situations alone [,] and likewise highlighted the risk that delayed risk reporting can lead to overt self-reliance []. This furthermore underlines the significance of the gap between mental health chatbots’ and professionals’ capacity to recognize subtler signs of risk, including nonverbal signs of distress []. This, we argue, should at a minimum be communicated in the promotion of these chatbots, setting accurate expectations among users as to not only what these apps are capable of, but what therapy itself comprises.

Caring Relationships and the Path to Professional AI

Care is, of course, ultimately relational and looking to the future human-centered design of such chatbot systems, it is equally important to ask which kinds of caring relationships we might aspire to foster with digital tools []. Professional participants of this study spoke in this regard of concerns of emotional dependence potentially engendered by the gamified approach to design for engagement and anthropomorphism adopted by these systems. The negative potentials of caring relationships, as highlighted by previous research [,,,], also must be considered a significant risk to be addressed by careful design.

Although it may be argued here that many such chatbots do not advertise their services as appropriate for the most significant mental health concerns, previous research has found that it is those with complex mental health issues who are more likely to turn to chatbots for support []. Participants of this study noted amplified disquiet as to the negative potentials of human-computer relationships among users deemed at risk, such as those profoundly lonely, isolated, depressed, or neurodivergent, an observation aligned with the alarming findings of previous research, which indicates that the strongest attachments to chatbots occur when individuals turn to these tools as a desperate response to feelings of depression and loneliness [,,].

The quantitative findings of this work, regarding unraveling the dynamics of trust in mental health chatbots, revealed no significant differences between Wysa and Replika; however, they indicate that although users may not be entirely clear as to which kinds of relationships they desire [], there exists a unanimous striving for a more professional stance. This, we would argue, is reflected not only in adherence to established ethical principles but also in pragmatic practices of boundary setting for future designers, as a starting point.

Limitations and Future Work

This study is, of course, not without its limitations. Using the TIA scale with a larger and more diverse sample would permit comparison of experiences across a broader array of chatbots, informing additional design strategies to effectively foster trust. Additionally, despite yielding no statistically significant differences between the chatbots examined in this study, it is worth noting that previous research has revealed the TIA scale to be biased toward positive responses []. The expert focus of this work, while lending new insight, equally introduces additional limitations because of the smaller sample size possible as it may have led to insignificant differences in users’ experiences, and yielded participants’ engagement with hypothetical mental health scenarios, preventing the development of firsthand emotional attachments to each chatbot. Role-played interactions, while appearing realistic on the surface, do not fully replicate real-life interactions and must therefore also be acknowledged as a limitation of this study [,].

Furthermore, it is essential to contextualize the findings shared here in light of the rapid advances made almost every day in the development of generative AI systems. This work examined only 2 chatbots among the broad continuum of chatbot systems available to users today, which it has been argued have the potential to provide even more humanlike mental health support given their grounding in ever more capable large language models, including ChatGPT and Pi []. Insights obtained through this study’s close engagement with practicing mental health professionals indicate the need for an empathetic tone and grasp on the underlying logical structure of care when it comes to designing such systems. Mental health professionals’ resistance is potentially heightened by a current lack of training data grounded in real-life professional mental health settings, and even should large language models trained upon such sensitive data surface in time, it will remain critical that we safeguard the well-being of the at-risk users of such systems. Finally, amid a broadening public discourse concerning the potential for AI to replace many and diverse professions, it is possible for mental health professionals’ evaluation of these chatbots to have been influenced by similar concerns.

Conclusions

Insights from diverse mental health professionals’ evaluation of multiple mental health support chatbots underline the very gaps in current systems of care, which elevate such chatbots’ potential for impact in practice, and at the same time highlight the degree to which these systems pose inherent challenges and risks to the practice of care, particularly for at-risk user groups. These AI-driven tools, while convenient and potentially useful for initial support, often lack the depth and nuanced comprehension required to provide genuine, personalized, and empathetic care. Therefore, their adoption, despite these limitations, risks sidelining the pressing need to address existing deficiencies in mental health care infrastructures and shaping service users’ beliefs about the nature of mental health care, as well as could even potentially deter help-seeking of other forms as a result. It is therefore imperative that we maintain a critical outlook, including professionals at the forefront of care in the design and evaluation of such systems, even as we seek to bridge with great urgency existing gaps in access to care.

Data Availability

All data generated or analyzed during this study are included in this published article (and its supplementary information files).

Conflicts of Interest

None declared.

Multimedia Appendix 1

Bioethical principles and fictional scenario examples.

DOCX File , 17 KB





Abbreviations

AI: artificial intelligence
AP: assistant psychologist
CLV: crisis line volunteer
HCI: human-computer interaction
MHOC: mental health oriented chatbot
SCW: social care worker
SW: social worker
TIA: Trust in Automation
TP: trainee psychotherapist

Edited by A Mavragani; submitted 02.10.24; peer-reviewed by PS Foong, B Thies, S Hazel; comments to author 29.11.24; revised version received 20.12.24; accepted 25.02.25; published 25.04.25.

Copyright

©Kayley Moylan, Kevin Doherty. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 25.04.2025.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.