Technology
Tell Me About Borderline Personality Disorder! Comparing ChatGPT-3’s Responses About Borderline Personality Disorder to an Expert-Written Fact Sheet
Ariana Santos, B.A.
student
Texas A & M University-Corpus Christi
Corpus Christi, Texas
Kelli R. Lahman, M.S.
Student
University of Houston
Richmond, Texas
Sean A. Lauderdale, Ph.D.
Assistant Professor
University of Houston – Clear Lake
Richmond, Texas
With advances in internet search capabilities, more people will rely on the internet to find mental health information. Recently, OpenAI released one of these advances, ChatGPT-3. Considered an artificial intelligence language model chatbot, ChatGPT-3 delivers search results in a narrative mimicking human speech (Ramponi, 2022). Although ChatGPT-3 was trained by human AI responders, it can generate inaccurate and biased results (Ramponi, 2022). Users may find the narrative responses influential if unaware of this. Additionally, most internet searchers do not practice digital media literacy (Stvilia et al., 2009) and trust health information provided by the internet (Pew Internet & American Life Project, 2006) and chatbots (Abd-Alrazaq et al., 2021).
Given the uncertainty about ChatGPT-3’s information about mental health conditions, we compared ChatGPT-3’s output about Borderline Personality Disorder (BPD) to an expert-written, peer/consumer-reviewed, online fact sheet published by the Royal Australian and New Zealand Psychiatrists (RANZP) using text analysis tools assessing content and narrative characteristics affecting readability. We specifically compared the total ChatGPT-3 output to the total text of the RANZP fact sheet. We also compared individual sections of the fact sheet (e.g., “What is BPD?”) to queries answered by ChatGPT-3.
Specific statistics used include the similarity cosine, which is a measure of document similarity based on the cosine of word frequency vectors between documents. The similarity cosine ranges from 0 to 1 and higher values indicate greater similarity. The non-parametric Mann-Whitney U was used to compare median differences in vocabulary density (ratio of unique vocabulary words to total vocabulary words), readability index (Coleman-Liau; grade level required to comprehend text), number of positive words, and number of negative words. Finally, content will be evaluated by raters masked to the source but knowledgeable about BPD using DISCERN (Charnock et al., 1999; Grohol et al., 2013), a measure validated for rating the quality of mental health content from internet sources across 15 categories.
In comparing the total RANZP fact sheet to ChatGPT-3’s output, the similarity cosine (.67) indicates moderate similarity. There were no differences in vocabulary density (Z = 0.56, p > .05), number of positive words (Z = 0.08, p = .85), or number of negative words (Z = 0.62, p = .70). The RANZP fact sheet’s readability level (mean = 11.07 years of education) was significantly lower than that of ChatGPT-3 (mean = 14.81 years of education; Z = 4.01, p < .01). Although ChatGPT-3 provided more information on potential causes of BPD, the RANZP fact sheet provided more information about treatment and supporting someone with BPD.
Preliminary findings suggest that ChatGPT-3’s output about BPD is similar to the RANZP’s fact sheet in several ways (e.g., vocabulary density). The RANZP fact sheet had a lower reading level, making it more accessible to a wider number of people. Further data analysis may reveal differences in content categories of accuracy, relevance, and coverage of treatment options.