ADHD - Child
Tell Me About ADHD! Comparing ChatGPT-3 Output About ADHD to Expert-Written, NIMH and ABCT Fact Sheets
Kaley Galipp, None
Student
Texas A&M University-Corpus Christi
Corpus Christi, Texas
Sean A. Lauderdale, Ph.D.
Assistant Professor
University of Houston – Clear Lake
Richmond, Texas
Introduction: With advances in internet search capabilities, more people will rely on the internet to find mental health information. Recently, OpenAI released one of these advances, ChatGPT-3. Considered an artificial intelligence language model chatbot, ChatGPT-3 delivers search results in a narrative mimicking human speech (Ramponi, 2022). Although ChatGpt-3 was trained by human AI responders, it can generate inaccurate and biased results (Ramponi, 2022). Users may find the narrative responses influential if unaware of this. Additionally, most internet searchers don’t practice digital media literacy (Stvilia et al., 2009) and trust health information provided by the internet (Pew Internet & American Life Project, 2006) and chatbots (Abd-Alrazaq et al., 2021).
Method: Given the uncertainty about ChatGPT’s information about mental health conditions, we compared ChatGPT output about ADHD to expert-written, online fact sheets published by the National Institute of Mental Health (NIMH) and the Association for Behavioral and Cognitive Therapies (ABCT) using text analysis tools to assess narrative content and characteristics affecting readability and accuracy. We were wanting to learn how similar ChatGPT’s output was to credible sources and compare the quality of the information from these sources across several domains. We specifically compared the total ChatGPT-3 output to the total text of the NIMH and ABCT fact sheets. We also compared individual sections of the fact sheet (e.g., “What is ADHD?”) to queries answered by ChatGPT-3. Content will also be evaluated by raters knowledgeable about ADHD masked to the source using DISCERN (Charnock et al., 1999; Grohol et al., 2013), a validated measure for rating quality of internet mental health information.
Results: Specific statistics used included the similarity cosine, which is a measure of document similarity based on the cosine of word frequency vectors between documents. Greater similarity cosine values indicate more similarity. The non-parametric Mann-Whitney U was used to compare differences in vocabulary density (ratio of unique vocabulary words to total vocabulary words), readability index (Coleman-Liau; grade level required to comprehend text), number of positive words, and number of negative words. The NIMH fact sheet had fewer words compared to the ChatGPT-3 output (Z = 2.96, p < .01). The similarity cosine was moderate (.77). The NIMH fact sheet had greater vocabulary density (mean = .70) than the ChatGPT-3 output (mean = .49; Z = 4.01, p < .01). The ChatGPT-3 output had more positive (Z = 2.94, p < .01) and negative words (Z = 2.36, p < .05) than the NIMH fact sheet. There were no differences in readability level (Z = 0.24, p > .05). The ABCT fact sheet and ChatGPT-3 output were moderately similar (.70). There were no other differences between the ABCT fact sheet and ChatGPT-3 output (p’s > .05). The ChatGPT-3 output provided more detailed support information; however, the medication information was inaccurate, and complementary treatments were not evidence-based.
Conclusion: Findings suggest that the ChatGPT-3 output was easier to read and provided more information than the fact sheets; however, there were some substantial concerns. Further data analysis may reveal other differences in accuracy and treatment coverage.