Development of a 3-step theory of suicide ontology to facilitate 3ST factor extraction from clinical progress notes

Abstract: OBJECTIVE: Suicide risk prediction algorithms at the Veterans Health Administration (VHA) do not include predictors based on the 3-Step Theory of suicide (3ST), which builds on hopelessness, psychological pain, connectedness, and capacity for suicide. These four factors are not available from structured fields in VHA electronic health records, but they are found in unstructured clinical text. An ontology and controlled vocabulary that maps psychosocial and behavioral terms to these factors does not exist. The objectives of this study were 1) to develop an ontology with a controlled vocabulary of terms that map onto classes that represent the 3ST factors as identified within electronic clinical progress notes, and 2) to determine the accuracy of automated extractions based on terms in the controlled vocabulary. METHODS: A team of four annotators did linguistic annotation of 30,000 clinical progress notes from 231 Veterans in VHA electronic health records who attempted suicide or who died by suicide for terms relating to the 3ST factors. Annotation involved manually assigning a label to words or phrases that indicated presence or absence of the factor (polarity). These words and phrases were entered into a controlled vocabulary that was then used by our computational system to tag 14 million clinical progress notes from Veterans who attempted or died by suicide after 2013. Tagged text was extracted and machine-labelled for presence or absence of the 3ST factors. Accuracy of these machine-labels was determined for 1000 randomly selected extractions for each factor against a ground truth created by our annotators. RESULTS: Linguistic annotation identified 8486 terms that related to 33 subclasses across the four factors and polarities. Precision of machine-labeled extractions ranged from 0.73 to 1.00 for most factor-polarity combinations, whereas recall was somewhat lower 0.65-0.91. CONCLUSION: The ontology that was developed consists of classes that represent each of the four 3ST factors, subclasses, relationships, and terms that map onto those classes which are stored in a controlled vocabulary (https://bioportal.bioontology.org/ontologies/THREE-ST). The use case that we present shows how scores based on clinical notes tagged for terms in the controlled vocabulary capture meaningful change in the 3ST factors during weeks preceding a suicidal event.

Read the full article
Report a problem with this article

Related articles

  • More for Researchers

    Identifying opioid relapse during COVID-19 using natural language processing of nationwide Veterans Health Administration electronic medical record data

    Abstract: Novel and automated means of opioid use and relapse risk detection are needed. Unstructured electronic medical record data, including written progress notes, can be mined for clinically relevant information, including the presence of substance use and relapse-critical markers of risk and recovery from opioid use disorder (OUD). In this study, we used natural language processing (NLP) to automate the extraction of opioid relapses, and the timing of these occurrences, from veteran patients' electronic medical record. We then demonstrated the utility of our NLP tool via analysis of pre-/post-COVID-19 opioid relapse trends among veterans with OUD. For this demonstration, we analyzed data from 107,606 veterans OUD enrolled in Veterans Health Administration, comparing a pandemic-exposed cohort (n = 53,803; January 2019-March 2021) to a matched prepandemic cohort (n = 53,803; October 2017-December 2019). The recall of our NLP tool was 75% and our precision was 94%, demonstrating moderate sensitivity and excellent specificity. Using the NLP tool, we found that the odds of opioid relapse postpandemic onset were proportionally higher compared to prepandemic trends, despite patients having fewer mental health encounters from which to derive instances of relapse postpandemic onset. In this research application of the tool, and as hypothesized, we found that opioid relapse risk was elevated postpandemic. The application of NLP Methods: to identify and monitor relapse risk holds promise for future surveillance, risk prevention, and clinical outcome research.