Investigating the differential impact of psychosocial factors by patient characteristics and demographics on Veteran suicide risk through machine learning extraction of cross-modal interactions
Accurate prediction of suicide risk is crucial for identifying patients with elevated risk burden, helping ensure these patients receive targeted care. The US Department of Veteran Affairs' suicide prediction model primarily leverages structured electronic health records (EHR) data. This approach largely overlooks unstructured EHR, a data format that could be utilized to enhance predictive accuracy. This study aims to enhance suicide risk models' predictive accuracy by developing a model that incorporates both structured EHR predictors and semantic NLP-derived variables from unstructured EHR. XGBoost models were fit to predict suicide risk- the interactions identified by the model were extracted using SHAP, validated using logistic regression models, added to a ridge regression model, which was subsequently compared to a ridge regression approach without the use of interactions. By introducing a selection parameter, α, to balance the influence of structured (α=1) and unstructured (α=0) data, we found that intermediate α values achieved optimal performance across various risk strata, improved model performance of the ridge regression approach and uncovered significant cross-modal interactions between psychosocial constructs and patient characteristics. These interactions highlight how psychosocial risk factors are influenced by individual patient contexts, potentially informing improved risk prediction methods and personalized interventions. Our findings underscore the importance of incorporating nuanced narrative data into predictive models and set the stage for future research that will expand the use of advanced machine learning techniques, including deep learning, to further refine suicide risk prediction methods.
Abstract: Novel and automated means of opioid use and relapse risk detection are needed. Unstructured electronic medical record data, including written progress notes, can be mined for clinically relevant information, including the presence of substance use and relapse-critical markers of risk and recovery from opioid use disorder (OUD). In this study, we used natural language processing (NLP) to automate the extraction of opioid relapses, and the timing of these occurrences, from veteran patients' electronic medical record. We then demonstrated the utility of our NLP tool via analysis of pre-/post-COVID-19 opioid relapse trends among veterans with OUD. For this demonstration, we analyzed data from 107,606 veterans OUD enrolled in Veterans Health Administration, comparing a pandemic-exposed cohort (n = 53,803; January 2019-March 2021) to a matched prepandemic cohort (n = 53,803; October 2017-December 2019). The recall of our NLP tool was 75% and our precision was 94%, demonstrating moderate sensitivity and excellent specificity. Using the NLP tool, we found that the odds of opioid relapse postpandemic onset were proportionally higher compared to prepandemic trends, despite patients having fewer mental health encounters from which to derive instances of relapse postpandemic onset. In this research application of the tool, and as hypothesized, we found that opioid relapse risk was elevated postpandemic. The application of NLP Methods: to identify and monitor relapse risk holds promise for future surveillance, risk prevention, and clinical outcome research.