
As more people seek mental health advice from AI chatbots, new research suggests the systems are not yet ready and can fall short of professional psychotherapy ethics standards.
Researchers from Brown University, working with mental health professionals, found repeated problematic behaviour even when the chatbots were prompted to use established psychotherapy approaches.
In tests, chatbots mishandled crisis situations, produced replies that reinforced harmful beliefs, and used language that sounded empathic without genuine understanding.
“In this work, we present a practitioner-informed framework of 15 ethical risks to demonstrate how LLM counsellors violate ethical standards in mental health practice by mapping the model’s behavior to specific ethical violations,” the researchers wrote in their study.
“We call on future work to create ethical, educational and legal standards for LLM counselors — standards that are reflective of the quality and rigor of care required for human-facilitated psychotherapy.”
The findings were presented at the AAAI/ACM Conference on Artificial Intelligence, Ethics and Society, and the research team is affiliated with Brown’s Center for Technological Responsibility, Reimagination and Redesign.
Zainab Iftikhar, a PhD candidate in computer science at Brown who led the study, examined whether carefully written prompts could steer AI systems towards more ethical responses.
Prompts are written instructions designed to guide a model’s output without retraining it or adding new data.
To evaluate the systems, seven trained peer counsellors with experience in cognitive behavioural therapy (CBT), a talking therapy that aims to help people change patterns of thinking and behaviour, carried out self-counselling sessions with AI models prompted to act as CBT therapists.
The models included versions of OpenAI’s GPT series, Anthropic’s Claude and Meta’s Llama.
Three licensed clinical psychologists reviewed transcripts of simulated chats selected from real human counselling conversations to flag possible ethical violations.
The analysis identified 15 risks across five areas: lack of contextual adaptation, poor therapeutic collaboration, deceptive empathy, unfair discrimination, and lack of safety and crisis management.
Among the issues, chatbots sometimes used phrases such as “I see you” or “I understand” to suggest emotional connection without true comprehension, and struggled to respond appropriately to crises, including suicidal thoughts.
Iftikhar said the key difference from human therapists is accountability.
“For human therapists, there are governing boards and mechanisms for providers to be held professionally liable for mistreatment and malpractice,” she said.
“But when LLM counselors make these violations, there are no established regulatory frameworks.”
The researchers said the findings do not mean AI has no place in mental health care.
They noted AI tools could help expand access, particularly where costs are high or licensed professionals are scarce, but argued safeguards and stronger regulation are needed before use in high-stakes settings.
Ellie Pavlick, a computer science professor at Brown who was not involved in the research, said the work highlights how difficult it can be to evaluate systems deployed in sensitive contexts.
Pavlick leads ARIA, a National Science Foundation AI research institute at Brown focused on building trustworthy AI assistants.
“The reality of AI today is that it’s far easier to build and deploy systems than to evaluate and understand them,” Pavlick said.
“This paper required a team of clinical experts and a study that lasted for more than a year in order to demonstrate these risks.
“Most work in AI today is evaluated using automatic metrics which, by design, are static and lack a human in the loop.”
She added that the study could help guide future work.
“There is a real opportunity for AI to play a role in combating the mental health crisis that our society is facing, but it’s of the utmost importance that we take the time to really critique and evaluate our systems every step of the way to avoid doing more harm than good,” Pavlick said.
“This work offers a good example of what that can look like.”









