Technology
Tell Me About Non-Suicidal Self-Injury (NSSI)! Comparing ChatGPT-3 Output About NSSI to Expert-Written Web Pages
Shealyn Tomlinson, B.A.
Student
Texas A&M University-Corpus Christi
Corpus Christi, Texas
Sean A. Lauderdale, Ph.D.
Assistant Professor
University of Houston – Clear Lake
Richmond, Texas
With advances in internet search capabilities, more people will rely on the internet to find mental health information. Recently, OpenAI released one of these advances, ChatGPT-3. Considered an artificial intelligence language model chatbot, ChatGPT-3 delivers search results in a narrative mimicking human speech (Ramponi, 2022). Although ChatGpt-3 was trained by human AI responders, it can generate inaccurate and biased results (Ramponi, 2022). Users may find the narrative responses influential if unaware of this. Additionally, most internet searchers don’t practice digital media literacy (Stvilia et al., 2009) and trust health information provided by the internet (Pew Internet & American Life Project, 2006) and chatbots (Abd-Alrazaq et al., 2021).
Given the uncertainty about ChatGPT’s mental health information, we compared ChatGPT-3’s output about NSSI to expert-written, web pages published by the National Alliance on Mental Illness (non-profit advocacy; United States), the Victoria Department of Health (Australia), Rethink Mental Illness (non-profit advocacy; United Kingdom), and the National Health Service (United Kingdom). These sites were chosen given the breadth of information provided about NSSI not found on a single webpage. Comparisons between ChatGPT-3’s output and webpages were made using text analysis tools to assess narrative content and characteristics affecting readability and accuracy. We specifically compared the total ChatGPT-3 output to the total text of the webpages. We also compared individual sections of the web pages (e.g., “What is self-harm?”) to queries answered by ChatGPT-3. Content will also be evaluated by raters knowledgeable about NSSI masked to the source using DISCERN (Charnock et al., 1999; Grohol et al., 2013), a validated measure for rating quality of internet mental health information. Specific statistics used included the similarity cosine, which is a measure of document similarity based on the cosine of word frequency vectors between documents. Greater similarity cosine values indicate more similarity. The non-parametric Mann-Whitney U was used to compare differences in vocabulary density (ratio of unique vocabulary words to total vocabulary words), readability index (Coleman-Liau; grade level required to comprehend text), number of positive words, and number of negative words. The cosine similarity was .75, indicating moderate similarity. ChatGPT-3’s output had a lower vocabulary density (Z = 4.14, p < .01) but more positive words (Z = 3.34, p < .01) than the webpages. The webpages readability level (mean = 11.69 years of education) was lower than that of ChatGPT-3’s (mean = 13.46 years of education; Z = 2.34, p < .05). There were no differences in the number of negative words (Z = 1.61, p > .05). Although Chat-GPT-3’s output was similar to the web pages, there were some qualitative differences in content. The web pages provided more information about the process of self-harm, self-stigmatizing beliefs, and how NSSI may differ across cultures. Findings suggest that the ChatGPT-3 output had easier vocabulary and used more positive words, but the advocacy and governmental webpages were easier to read. Further data analysis may reveal other differences in accuracy and treatment coverage.