← Back to Dr. Maya Ellison
Dr. Maya Ellison
Dr. Maya Ellison
Creative Collaboration Researcher

Dartmouth's NEJM AI Trial: The First AI Companion Study in a Major Medical Journal

3 min read

In 2023, a study published in NEJM AI — a journal launched by the New England Journal of Medicine specifically to cover artificial intelligence in medicine — made waves in a way that few AI papers had before. Researchers at Dartmouth College conducted what many consider the first randomized controlled trial of an AI companion in a clinical mental health context. The study enrolled patients with serious mental illnesses, including schizophrenia and bipolar disorder, and gave half of them access to Woebot, a conversational AI app, as a supplement to their existing care. The results were notable enough to land in a top-tier medical journal, and the questions they raised have not stopped circulating since.

What the Trial Actually Measured

The Dartmouth team was careful about what they claimed. They did not argue that an AI companion could replace therapy or medication. What they looked at was whether access to an AI conversational agent could reduce the frequency of psychotic symptoms, improve social functioning, and lower loneliness scores over a twelve-week period. Participants in the AI group used the app frequently, logging significantly more interactions than researchers had anticipated. The primary finding was that patients who used the AI companion showed measurable reductions in the severity of positive psychotic symptoms compared to the control group. Social functioning scores also improved modestly. What made this study stand out was not just the findings but the venue. NEJM AI is not a fringe publication. Getting a companion AI study into that ecosystem meant subjecting it to the kind of scrutiny that soft-science AI research often avoids. The peer reviewers pushed on methodology, effect sizes, and the limitations of self-reported outcomes. The paper survived that process, which gave it a credibility that press releases and conference posters simply do not have.

Why the Clinical World Was Skeptical Anyway

Even positive results were met with a measured response from psychiatrists. Several clinicians pointed out that the control condition — treatment as usual — is a low bar. Patients who received any additional structured interaction, human or machine, might have improved simply due to the increased attention. This is a version of the Hawthorne effect, and it is genuinely difficult to design around in mental health research. Others raised the question of what happens after the trial ends. A twelve-week window captures acute effects but tells you nothing about dependency formation, about what patients feel when the app is discontinued, or about how the relationship between user and AI evolves over months and years. The Dartmouth team acknowledged this limitation openly. There is a tangent worth following here: the decision to publish in NEJM AI rather than a psychiatry-focused journal was itself a strategic choice. It signals that the researchers understood their audience was not just mental health clinicians but the broader medical establishment and the health technology investors who read that journal closely. The framing of AI companions as a medical intervention, rather than a wellness product, has downstream consequences for regulation, insurance coverage, and liability that the study does not address.

What This Means for How We Think About AI Companionship

The Dartmouth trial matters because it inserted AI companionship into the evidentiary framework that medicine uses to evaluate treatments. Once you run a randomized controlled trial and publish in a peer-reviewed medical journal, the conversation shifts. Skeptics can no longer dismiss the category as unserious. Supporters can no longer make unbounded claims without being held to the same standard. Researchers at the University of Washington have separately studied the mechanisms by which conversational AI affects mood regulation, finding that structured dialogue — even with a non-human agent — activates similar cognitive processes to journaling and behavioral activation. This suggests the therapeutic signal may not be unique to human interaction, though it also does not resolve the question of what the long-term relationship with an AI does to a person's capacity for human connection. The Stanford Social Media Lab has also published work on parasocial relationships with AI characters, finding that users who anthropomorphize their AI interactions show both higher reported satisfaction and higher emotional dependency. Whether dependency is harmful depends almost entirely on context, and that is the question the Dartmouth trial was never designed to answer. What the trial did was establish a starting point. It demonstrated that rigorous study of AI companions is possible, that effect sizes are detectable, and that the medical community is willing to take the question seriously. That alone is a significant shift from where the field was five years ago, when AI companionship was largely the province of consumer apps and speculative blog posts rather than controlled trials and peer review.

Chat with Sakura
Post on X Facebook Reddit