Stroke recurrence prediction using machine learning and segmented neural network risk factor aggregation

Published in Discover Public Health, 2024

Stroke has remained a major cause of mortality and disability in the United States for years, and its recurrence significantly increased the risks. For predicting stroke recurrence, traditional data aggregation methods have limitations in effectively handling the numerous subcategories of stroke risk factors. This pilot study proposed a Segmented Neural Network-Driven Aggregation (SNA) method, and it aimed to improve the prediction model’s accuracy. Utilizing the TriNetX diagnosis dataset, we processed various risk factors and demographic information through traditional and our proposed data aggregation techniques. We applied logistic regression and random forest classifiers to predict stroke recurrence. Our findings revealed that using the SNA method significantly outperformed other aggregation methods for both classifiers. Using the SNA method with a random forest classifier achieved higher accuracy (84.2\%) and a better balance between sensitivity and specificity (AUC of ROC = 0.928, AUC of PR = 0.940) compared to other combinations. These results showed the potential of machine-learning supervised encoding methods in stroke recurrence predictions, providing implications for clinical practice and future epidemiological research.

Keywords: Stroke recurrence; Data aggregation; Machine learning; Interpretable neural network; Supervised encoder; Logistic regression; Random forest

Recommended citation: Ding, X., Meng, Y., Xiang, L. et al. Stroke recurrence prediction using machine learning and segmented neural network risk factor aggregation. Discov Public Health 21, 119 (2024). https://doi.org/10.1186/s12982-024-00199-6 https://link.springer.com/article/10.1186/s12982-024-00199-6