Technology
Tell Me About Schizophrenia! Comparing ChatGPT-3 Output About Schizophrenia to the NIMH and ABCT Fact Sheets
Sean A. Lauderdale, Ph.D.
Assistant Professor
University of Houston – Clear Lake
Richmond, Texas
With advances in internet search capabilities, more people will rely on the internet to find mental health information. Recently, OpenAI released one of these advances, ChatGPT-3. Considered an artificial intelligence language model chatbot, ChatGPT-3 delivers search results in a narrative mimicking human speech (Ramponi, 2022). Although ChatGpt-3 was trained by human AI responders, it can generate inaccurate and biased results (Ramponi, 2022). Users may find the narrative responses influential if unaware of this. Additionally, most internet searchers don’t practice digital media literacy (Stvilia et al., 2009) and trust health information provided by the internet (Pew Internet & American Life Project, 2006) and chatbots (Abd-Alrazaq et al., 2021).
For this investigation, we compare ChatGpt-3’s output about schizophrenia to expert-written, peer-reviewed, online fact sheets published by the National Institute of Mental Health (NIMH) and the Association of Behavioral and Cognitive Therapies (ABCT) using text analysis tools to assess narrative content and characteristics affecting readability and accuracy. We specifically compared the total ChatGPT-3 output to the total text of the NIMH and ABCT fact sheets. We also compared individual sections of the fact sheet (e.g., “What is schizophrenia?”) to queries answered by ChatGPT-3. Content will also be evaluated by schizophrenia experts masked to the source using DISCERN (Charnock et al., 1999; Grohol et al., 2013), a validated measure for rating quality of internet mental health information.
Specific statistics used included the similarity cosine, which is a measure of document similarity based on the cosine of word frequency vectors between documents. The similarity cosine ranges from 0 to 1 and higher values indicate greater similarity. The non-parametric Mann-Whitney U was used to compare median differences in vocabulary density (ratio of unique vocabulary words to total vocabulary words), readability index (Coleman-Liau; grade level required to comprehend text), number of positive words, and number of negative words.
The ABCT fact sheet and ChatGpt-3 output similarity cosine was high (0.71). There were no differences in vocabulary density (Z = 0.89, p = .37), readability index (Z = -0.38, p = .70), number of positive words (Z = 0.19, p = .85), or number of negative words (Z = 0.38, p = .70). Similar results were obtained when comparing the NIMH fact sheet to the ChatGpt-3 output (average of all Z’s = .40, all p’s > .05). The average reading level of the fact sheets and ChatGpt-3’s output was 15.75 years of education, exceeding the estimated average reading level (7th-8th grade) for the US population. Preliminary findings suggest that ChatGpt-3’s output about schizophrenia is similar to the ABCT and NIMH fact sheets in content and readability. There were some differences in content, with the ABCT fact sheet providing more information about the phases of schizophrenia, while ChatGPT-3 provided more information about diagnostic criteria and psychotherapy. Further data analysis may reveal differences in content categories of accuracy, relevance, and coverage of treatment options.