Bipolar Disorders
Isaac Thao, None
Health Science Specialist - Research Assistant
Minneapolis VAMC
Saint Paul, Minnesota
Kasey Stack, B.S.
Research Specialist
Georgetown University Medical Center
Washington, District of Columbia
Helen Frieman, B.A.
Project Coordinator
Minneapolis VAMC
Minneapolis, Minnesota
John J. Curtin, Ph.D.
Professor
Department of Psychology, University of Wisconsin - Madison
Madison, Wisconsin
John Ferguson, Ph.D.
Associate Professor
University of Minnesota
Minneapolis, Minnesota
Tasha Nienow, Ph.D.
Staff Psychologist, Clinician-Investigator
Minneapolis VAMC
Minneapolis, Minnesota
David J. Bond, M.D., Ph.D.
Associate Professor
Johns Hopkins University School of Medicine
Baltimore, Maryland
Eric Kuhn, Ph.D. (he/him/his)
Clinical Psychologist | Associate Professor
National Center for PTSD
Menlo Park, California
Snezana Urosevic, Ph.D. (she/her/hers)
Clinician Investigator Team Program Manager
Minneapolis VA Health Care System
Minneapolis, Minnesota
Previous research indicates speech features like increased fundamental frequency (f0) and first and second formants (F1, F2) are reliable diagnostic markers of BD. Greater number and length of speech pauses differentiate depression from euthymia and mania. Still, most speech analysis research in BD is cross-sectional and focuses on identifying diagnostic markers of BD instead of within-person indicators of clinical state. The present study investigated the feasibility of using automated procedures for speech analyses of mHealth voice data for within-person speech feature changes, which could be indices of acute clinical states in BD.
Background: Bipolar disorders (BD), which can lead to impaired functioning and high suicide risk, are characterized by swings in clinical states of (hypo)mania, depression, and euthymia. A clinical tool that detects real-time acute clinical states of BD could improve patient outcomes by allowing efficient intervention implementation. A promising tool is mHealth, or smartphone apps and wireless devices, for patient tracking in real-time. Smartphone apps have audio recording capabilities, which can provide speech features for analysis.
Methods: For 13 weeks, 30 Veterans with BD were tasked to record a daily voice diary using a phone app. Biweekly interviews assessed current patient BD symptoms via Young Mania Rating Scale and a modified Hamilton Depression Rating Scale, psychosocial functioning via Patient-Reported Outcome Measurement Information Systems items, and suicidality via Depressive Symptom Inventory Suicidality Subscale. Exit interview assessed participant feedback on voice app and study procedures. Files were orthographically transcribed manually and phonetically through Montreal Forced Aligner (MFA). Three acoustic features (f0, F1, F2) and two rhythmic features (pause, speech rate) were computed using Praat software. 10% of files were randomly selected and manually checked for reliability of Praat-derived speech features. Files were rejected if transcribers could not comprehend speech or if MFA/Praat could not distinguish speech from noise.
Results: On average, participants provided 47 voice diaries. 43% of participants provided recordings for at least two clinical states, which were of sufficient length based on prior studies of acute mood. We judged 47 % of files as good quality for analyses. They had a mean duration of 91 seconds. Sampled manual quality checks identified common factors for unreliable Praat-derived speech features: verbal background noise, white noise, low volume, and low-spectral resolution. In Exit Interviews, participants’ promoters of app use included self-reflection, and inhibitors included depression and technical issues. Analysis examining speech features across different clinical states within-participants is ongoing.
Conclusions: Automated mHealth speech analysis for acute clinical state detection in BD has several obstacles. Reliability and validity of automated speech-feature measurements were affected by variable audio quality across participants and even across the same participant’s recordings. Future research recommendations include enhanced instructions for recording and individualized speech-feature thresholds.