The advent of digital pathology has seen the introduction of algorithms capable of automating tedious and repetitive tasks within the lab and algorithms capable of identifying and grading certain types of cancer. While many of these algorithms have demonstrated appealing evidence for high diagnostic accuracy, few have been validated on large, heterogenous datasets resembling real life use cases. Several outstanding questions remain: are these algorithms as accurate and reliable as they need to be to be used clinically? Do they compare to their human counterparts, both in terms of identifying cancer and evaluating Gleason grading? Do these algorithms work in real-life applications, where a variety of different staining protocols, scanners, and variables can affect slide quality?
To investigate these questions, we validated the tumor detection and Gleason grading ability of an AI tool, HALO Prostate AI for the HALO AP digital pathology platform from Indica Labs*, by conducting a retrospective review of seven cohorts totaling 5,922 H&E sections representing 7,473 biopsy cores from 423 patient cases, sourced from five independent institutes, scanned by three scanners at two magnifications. This ensured that the data analyzed was representative of the heterogeneity present within samples across multi-institutional patient populations. For Gleason grading, two cohorts containing tumors representing all Gleason grades were analyzed. The cases were compared to the evaluation of an international group of board-certified pathologists including nine experienced genitourinary pathologists. This study represents one of the largest to date focusing on the validation of an AI-based digital pathology tool.
Eleven pathologists participated in the validation study for Gleason grading representing different countries and grading practices (USA, Germany, Netherlands, Austria, Japan, Vietnam, Russia, and Israel) with some of them being renowned experts in genitourinary pathology.
The AI tool showed high accuracy in detecting prostatic adenocarcinoma, demonstrating a sensitivity ranging from 0.971-1.000, a specificity ranging from 0.875-0.976, and a negative predictive value ranging from 0.988-1.000 across different test cohorts. Most false positive tumor misclassifications and alerts occurred in the setting of known carcinoma mimickers, as well as in regions with dense histiocyte-rich inflammatory infiltrate. During the review of these regions, most pathologists perceived these alerts as useful flags for further evaluation and IHC workup. The few false negative results were associated with slide preparation quality control issues (out-of-focus regions, crush, and other mechanical artifacts). The AI tool detected the presence of cancer in up to 13 cores per cohort that were missed by reviewing pathologists.
In terms of Gleason grading, the weighted kappa value for the AI tool was 0.77 in the first cohort (pathologists’ average kappa values 0.62-0.80) and 0.72 in the second cohort (pathologists’ average kappa values 0.64-0.76). Agreement between the AI tool and pathologists was especially high in cases where consensus among pathologists could be reached and at the diagnostically critical Gleason grade group 1, where the decision between surveillance and active therapy is made.
This study represents a significant milestone in digital pathology, paving the way for the optimization of prostate cancer diagnosis through the combination of AI tools and human expertise. Our study demonstrates that the AI tool has a high level of diagnostic accuracy for prostate cancer detection and agreement levels for Gleason grading comparable with experienced genitourinary pathologists. While further prospective studies are needed in the future to capture even greater data variation, it sets a promising precedent, suggesting that the collaborative efforts of AI tools and pathologists can elevate diagnostic accuracy.
*HALO Prostate AI and HALO AP are CE-marked for in-vitro diagnostic use in Europe and the UK. HALO Prostate AI and HALO AP are For Research Use Only in the US and are not FDA cleared for clinical diagnostic use.
Written by: Yuri Tolkach,1 Vlado Ovtcharov,2 Alexey Pryalukhin,3 Marie-Lisa Eich,4 Nadine Therese Gaisa,5 Martin Braun,6 Abdukhamid Radzhabov,7 Alexander Quaas,4 Peter Hammerer,7 Ansgar Dellmann,8 Wolfgang Hulla,3 Michael C Haffner,9 Henning Reis,10 Ibrahim Fahoum,11 Iryna Samarska,12 Artem Borbat,13 Hoa Pham,14,15 Axel Heidenreich,16 Sebastian Klein,4 George Netto,17 Peter Caie,2 Reinhard Buettner4
- Institute of Pathology, University Hospital Cologne, Cologne, Germany.
- Indica Labs, Albuquerque, NM, USA.
- Institute of Pathology, Landesklinikum Wiener Neustadt, Wiener Neustadt, Austria.
- Institute of Pathology, University Hospital Cologne, Cologne, Germany.
- Institute of Pathology, University Hospital Aachen, Aachen, Germany.
- Institute of Pathology Troisdorf, Troisdorf, Germany.
- Urology Clinic, Municipal Clinic of Brunswick, Brunswick, Germany.
- Institute of Pathology, Municipal Clinic of Brunswick, Brunswick, Germany.
- Divisions of Human Biology and Clinical Research, Fred Hutch Cancer Center, Seattle, WA, USA.
- Dr. Senckenberg Institute of Pathology, University Hospital Frankfurt, Goethe University Frankfurt, Frankfurt am Main, Germany.
- Department of Pathology, Sourasky Medical Center, Tel Aviv, Israel.
- Department of Pathology, University Hospital Maastricht, Maastricht, The Netherlands.
- Department of Pathology, Burnasyan Federal Medical Biophysical Center of Federal Medical Biological Agency, Moscow, Russia.
- Department of Pathology, Bach Mai Hospital, Hanoi, Vietnam.
- Department of Pathology, University of Nagasaki, Nagasaki, Japan.
- Clinic of Urology, University Hospital Cologne, Cologne, Germany.
- Department of Pathology, University of Alabama, Birmingham, AL, USA.