Data Science Challenges in Behavioral Health Care Analysis

Bonnie Ray. VP Data Science, Talkspace.

While applications of machine learning and data science are becoming commonplace in health research using information derived from Electronic Health Records (EHRs), large biological sample collections (i.e., –omics data), medical imaging data, and sensor data collected using medical devices, applications of large-scale machine learning in the behavioral health care space are less prevalent. Much of the research has dealt with predicting mental health status from medical records, or identification of suicidal intent from social media posts, rather than with more detailed exploration of the underlying psychotherapeutic process as captured through patient-therapist dialog. This is in part due to the lack of large bodies of anonymized recorded transcripts available to the research community. The paper of Stewart and Davis (2016) provides a summary of ‘big data’ research in the area of mental health. The introduction of companies providing therapy services through an online mechanism over the last few years, such as live video or audio or synchronous or asynchronous text-based messaging, has provided an opportunity to collect large bodies of client-therapist interactions for study, as well as other data typically gathered from internet-based companies, such as in-app usage behavior. From such archives, it is now possible to explore use of image analysis, speech recognition, and advanced natural language processing techniques to determine the impact of facial expressions, physical gestures, tone of voice, and language-specific actions and engagement patterns on therapy outcomes. The paper of Hoermann et al. (2017) discusses applications of synchronous text-based dialogue systems in mental health interventions. See Calvo et al. (2017) for additional examples of natural language processing techniques in mental health applications.

I currently lead data science efforts at Talkspace, a NYC-based startup that enables behavioral health care for all through providing a secure, affordable platform for messaging-based psychotherapy. The company shares attributes of consumer-facing subscription businesses, such as Audible, Harry’s Razors, and Freshly, while also falling in the digital health care space with its focus on data-driven analysis for improving health outcomes. In this post, I’ll describe some of the data science challenges the team addresses in our daily work.

Subscription businesses are typically maniacally focused on subscription renewals, meaning that understanding customer engagement over time and the factors that impact it is critical.

For businesses such as Talkspace and Freshly, which provide only a single service, there is little to no user browsing or purchase behavior within a subscription period. Other metrics, such as how often the client uses the service, on which platform, and at what periods of the day, are typically more important, as is information on customer service contact or in-app reviews. Talkspace integrates this information to build customer engagement models with features that vary over time. However, the priority at Talkspace is not subscription renewal for renewal’s sake. Customer engagement plays a key role in treatment success, as measured by reduction of clinical symptoms. Completion of treatment, which often requires interacting with a therapist for longer than the length of the typical subscription period, and future return to treatment, as needed, are driven by successful clinical outcomes. Hence continuously updated engagement estimates are used to guide targeted intervention actions and engagement recommendations that have been found to increase the likelihood of clinical improvement.

Talkspace differs from many internet-based subscription services in that a Consultation Therapist messages with a potential subscriber prior to purchase of the service. The information captured through text messages exchanged in this initial consultation stage allows for customized matching of clients and therapists, i.e. a client can be matched to a therapist that has relevant experience working with other clients like ‘him’, for example clients that are similar in terms of demographics, communication preferences, and presenting conditions.  After a match is made, a client communicates directly with his selected psychotherapist. For Talkspace, understanding the trajectory of the actual discussion between a customer and his therapist, analyzed in the aggregate after anonymization, is key to providing high quality clinical service. Identification of psycholinguistic markers within the messaging content using Natural Language Processing techniques provides a way to characterize conversations in terms of expected efficacy and can lead to suggestions for more efficient and effective techniques personalized to the individual customer. Knowledge of state-of-the-art machine learning methods for text analysis and their implementation is an important skill set for the aspiring Talkspace data scientist. From a health analytics perspective, Talkspace data scientists must also be familiar with health outcomes research methods, as Talkspace uses standardized clinical surveys appropriate to a client’s diagnosis to measures severity of clinical symptoms regularly and monitor changes in symptoms over time.

In summary, Talkspace data science activities cover multiple aspects of the modern data science playbook, allowing for continuing growth of team members’ technical and business skills. From a technical perspective, SQL and Python skills (particularly Pandas, Numpy, Scikit Learn, NLTK, and Statsmodels) are expected, as is some exposure to writing repeatable, efficient code. The team is also currently exploring the use of Deep Neural Networks for natural language processing using Python packages such as Keras and PyTorch. A typical day might involve writing Python code for an analysis task, meeting with engineers to discuss how the results of a data science model can be incorporated into the product, reviewing customer engagement metrics with the marketing team, and creating new reports for the Talkspace clinical team to monitor therapist quality. Most critical to success is the ability to think about data science tasks from a business perspective, e.g., how can my model be used to drive a business action or support a new product feature?

Anyone interested in contributing to the Talkspace mission through data science, please feel free to reach out to me with questions.

January 2018

  1. Stewart, R., & Davis, K. (2016). “Big data” in mental health research: current status and emerging possibilities. Social Psychiatry and Psychiatric Epidemiology51, 1055–1072.
  2. Hoermann, S., McCabe, K. L., Milne, D. N., & Calvo, R. A. (2017). Application of Synchronous Text-Based Dialogue Systems in Mental Health Interventions: Systematic Review. Journal of Medical Internet Research19(8), e267.
  3. CALVO, R., MILNE, D., HUSSAIN, M., & CHRISTENSEN, H. (2017). Natural language processing in mental health applications using non-clinical texts. Natural Language Engineering, 23(5), 649-685. doi:10.1017/S1351324916000383


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s