Learning Classification Trees for Personalized Cardiovascular Risk Stratification
Cardiovascular disease is the leading cause of death worldwide. There are many effective treatments available, but identifying high-risk patients who are most likely to benefit from various therapies is an unsolved problem. Risk stratification would benefit from the development of principled data-driven methods to systematically combine prognostic information from many risk variables into a clinically useful classification tree. In this paper, we present a classification tree induction algorithm, and show that it produces trees that can be used for personalized cardiovascular risk stratification. A challenge in doing this is the high class imbalance in medical datasets. Our algorithm uses non-symmetric entropy measures for two critical tasks in classification tree learning: discretization of continuous variables and assigning a variable to a node. We tested our algorithm on 4219 cardiovascular patients for two different risk stratification tasks: prediction of cardiovascular death and myocardial infarction. For both tasks, our classification tree-based models outperformed other types of classification trees and SVMs.