Clinical Applications of Machine Learning for Urolithiasis and Benign Prostatic Hyperplasia: A Systematic Review - Beyond the Abstract

In this article, we systematically reviewed the literature for articles pertaining to the use of machine learning (ML) for urolithiasis and benign prostatic hyperplasia. After identifying 63 articles for data extraction, we evaluated these according to the Standardized Reporting of Machine Learning Applications in Urology (STREAM-URO) framework which is a 26-item checklist designed to promote the development of standardized and high-quality studies within urology.1 This evaluation is what distinguishes this review from previous systematic reviews examining the use of ML within urology. A detailed view of the STREAM-URO and the criteria included are listed in Table 1.


In our review, we highlight two criteria that were not discussed among almost all articles included in our review. These criteria are the inclusion of a bias assessment and the presence of a reference standard. The presence of a bias assessment evaluates whether the evaluation metrics of the ML model and reference standard are evaluated and stratified based on relevant factors such as age, gender, ethnicity, or socioeconomic status. It is important to perform a bias assessment when evaluating ML algorithms as these may lack generalizability within diverse populations. Indeed, previous studies have shown that the performance of ML algorithms may vary according to race.2 That being said, failure to evaluate an algorithm among different subgroups can lead to the creation of disparities when implementing ML tools in clinical care.3

In addition, we also noted that most articles failed to compare their ML algorithm to a reference standard. A reference standard serves as a baseline for comparing the model to other existing models. Reference standards can be in the form of nomograms or traditional regression models that use similar features. Currently, most studies included in this study simply demonstrated the feasibility of their algorithm. However, when critically appraising an ML study, it is also essential to evaluate whether the model is superior to the current standard of care.1 Carrying out these comparisons allow investigators to advance the field of ML and develop practice-changing tools.

Table 1. STREAM-URO Criteria. Retrieved from Kwong et al.1

STREAM-URO Criteria Definition of criteria
Title Identify the report as a ML application to a specific urological question. If applicable, state whether DL was used
Background Describe the urological problem and rationale for implementing ML models
Objective Clearly state what the proposed ML model(s) aims to address with respect to study population and outcome
Problem State whether the study is a supervised or unsupervised, classification or regression problem
Source of data Describe how the dataset was obtained (eg, single/multicenter or local/national database) and the study period
Eligibility criteria Specify all criteria for inclusion/exclusion of patients and features, and provide rationale
Label Define the label of interest and how it was assessed
Data abstraction Describe the methods used to develop the final dataset, with consideration of the following: Feature abstraction, Handling of missing data, Feature engineering, and Removal of features (eg, clinical intuition, principal component analysis, recursive feature elimination, or correlation analysis)
Data splitting Outline the reference standard that will serve as the baseline for comparison for the study (eg, existing models from the literature or regression model using the same features)
Reference standard Outline the reference standard that will serve as the baseline for comparison for the study (eg, existing models from the literature or regression model using the same features)
Model selection Describe the ML model(s) and version(s) used
Hyperparameter tuning Specify all model hyperparameters that were optimized, search space for hyperparameter tuning, and evaluation metric(s) used to optimize parameters
Model evaluation List the evaluation metrics used to assess performance and clinical utility, including the justification for selection
Cohort characteristics Provide the sample size and summary statistics of the training, validation (if used), and testing cohorts, including incidence of the label of interest
Model specification Present the final ML model and specify the final panel of features included and hyperparameters tuned
Model evaluation Compare evaluation metrics for the ML model(s) and reference standard
Bias assessment Compare evaluation metrics for the ML model(s) and reference standard when stratified by relevant factors such as age group, gender, ethnicity, or socioeconomic status, to identify subgroups that benefit, are not helped at all, or harmed by the models
Limitations Discuss the limitations of the ML model(s), with consideration of the data, features, model(s), and/or biases
Critical analysis Describe the main findings of the study, including the following: New predictors of the label of interest identified using ML, Strengths of the ML model(s) compared to the current models in the urological literature, Why the ML model(s) performed better/worse than what is currently available
Clinical utility Describe how the ML model(s) can be applied to urological practice, with respect to the potential to improve patient care, clinical decision-making, and/or efficiency
Disclosures Disclose all financial relationships, sources of funding, and potential conflicts of interest
Abbreviations: DL: deep learning; ML: machine learning.

Written by: David Bouhadana, Xing Han Lu, Jack W Luo, Anis Assad, Claudia Deyirmendjian, Abbas Guennoun, David-Dan Nguyen, Jethro C C Kwong, Bilal Chughtai, Dean Elterman, Kevin Christopher Zorn, Quoc-Dien Trinh, Naeem Bhojani

McGill University Faculty of Medicine and Health Sciences, Montreal, Quebec, Canada; McGill University School of Computer Science, Montreal, Quebec, Canada; McGill University Faculty of Medicine and Health Sciences, Montreal, Quebec, Canada; University of Montreal Hospital Centre, Urology, Montreal, Quebec, Canada; Université de Montréal, Medicine, Montreal, Quebec, Canada; University of Montreal Hospital Centre, Urology, Montreal, Quebec, Canada; University of Toronto, Urology, Toronto, Ontario, Canada; University of Toronto, Urology, Toronto, Ontario, Canada; Weill Cornell Medical Center, Urology, New York, New York, United States; University of Toronto, Urology, Toronto, Ontario, Canada; University of Montreal Hospital Centre, Urology, Montreal, Quebec, Canada; Brigham and Women's Hospital, Urology, Boston, Massachusetts, United States; University of Montreal Hospital Centre, Urology, Montreal, Quebec, Canada

Reference:

  1. Kwong JC, McLoughlin LC, Haider M, et al. Standardized Reporting of Machine Learning Applications in Urology: The STREAM-URO Framework. Eur Urol Focus 2021;7:672-682.
  2. Nayan M, Salari K, Bozzo A, et al. Predicting survival after radical prostatectomy: Variation of machine learning performance by race. Prostate 2021;81:1355-1364.
  3. Checcucci E, De Cillis S, Granato S, et al. Applications of neural networks in urology: a systematic review. Curr Opin Urol 2020;30:788-807.

Read the Abstract