AI-Derived Tumor Volume from Multiparametric MRI and Outcomes in Localized Prostate Cancer - Martin King & David Yang
January 2, 2025
Martin King and David Yang join Zachary Klaassen to discuss their study on AI-derived tumor volume assessment in prostate cancer MRI. The research demonstrates how their AI model successfully segments and measures prostate tumors from MRI images, showing strong prognostic value for cancer outcomes in both radiation and surgery patients. The study reveals that AI-derived tumor volume serves as an independent prognostic factor and outperforms NCCN risk categories in predicting seven-year metastasis risk for radiation patients. The researchers highlight the model's potential to automate and standardize tumor assessment while acknowledging current limitations, including decreased performance with PI-RADS 3 lesions and the need for multi-institutional validation. They emphasize the technology's potential to complement clinical decision-making and maximize the value of existing imaging data without requiring additional testing.
Biographies:
Martin King, MD, PhD, Director, Brachytherapy Clinical Operations, Brigham and Women’s Hospital, Dana-Farber Cancer Institute, Senior Physician Assistant Professor of Radiation Oncology, Harvard Medical School, Boston, MA
David Yang, MD, Radiation Oncologist, Brigham and Women’s Hospital, Dana-Farber Cancer Institute, Boston, MA
Zachary Klaassen, MD, MSc, Urologic Oncologist, Assistant Professor Surgery/Urology at the Medical College of Georgia at Augusta University, Well Star MCG, Georgia Cancer Center, Augusta, GA
Biographies:
Martin King, MD, PhD, Director, Brachytherapy Clinical Operations, Brigham and Women’s Hospital, Dana-Farber Cancer Institute, Senior Physician Assistant Professor of Radiation Oncology, Harvard Medical School, Boston, MA
David Yang, MD, Radiation Oncologist, Brigham and Women’s Hospital, Dana-Farber Cancer Institute, Boston, MA
Zachary Klaassen, MD, MSc, Urologic Oncologist, Assistant Professor Surgery/Urology at the Medical College of Georgia at Augusta University, Well Star MCG, Georgia Cancer Center, Augusta, GA
Read the Full Video Transcript
Zachary Klaassen: Hi, my name is Zach Klaassen. I’m a urologic oncologist at the Georgia Cancer Center in Augusta, Georgia. I’m delighted to be joined on UroToday by Dr. David Yang and Dr. Martin King, radiation oncologists at Dana-Farber Cancer Institute. We’re going to be talking about their recently published paper called “AI-derived tumor volume for multiparametric MRI and outcomes in localized prostate cancer.” David and Martin, thank you so much for joining us today.
Martin King: Our pleasure. Thank you for the invitation.
Zachary Klaassen: Absolutely.
David Yang: Thank you so much for having us.
Zachary Klaassen: So I’d love for you guys to walk us through some of your key slides. And then we’ll have a nice discussion afterwards.
Martin King: My name is Martin King. I’m from Brigham and Women’s Hospital and Dana-Farber Cancer Institute. And, David, do you want to just tell us where you’re from?
David Yang: Sure, sounds great. I am also a radiation oncologist based out of Brigham and Women’s Hospital and Dana-Farber Cancer in Boston.
Martin King: And together, we’re going to discuss our recent publication in Radiology Journal entitled “AI-derived tumor volume for multiparametric MRI and outcomes in localized prostate cancer.” Here’s our outline. I’ll talk about the introduction and methods, and Dr. Yang will talk about the results and the summary.
So as we all know, prostate MRI provides valuable information that has been associated with the risk of cancer recurrence. This information includes PI-RADS scores, radiographic staging, as well as tumor size. However, these characteristics are subject to inter-observer variability. Artificial intelligence algorithms can analyze images in a consistent manner, and the objective of this particular study was to evaluate whether the volume of AI-segmented intraprosthetic tumor provided prognostic information.
So we conducted a single-institution retrospective analysis of 732 patients who underwent an MRI prior to radiation therapy between 2009 and 2017, or radical prostatectomy between 2015 and 2017. For all of our patients, we had reference segmentations delineated for all PI-RADS 3 to 5 lesions. And so I just wanted to show you some of the delineations that we did. So if you look at this image set, we have three channels or three sequences: diffusion-weighted, the apparent diffusion coefficient in the middle, and T2 on the right.
And we had delineations of a larger right peripheral zone tumor on the ADC image, as well as a left anterior transitional zone tumor that was PI-RADS 4. And so basically these were the images that were utilized. We could also segment based on these segmentations, calculate the volume of the tumors. And so the total tumor volume in this case based on just the reference segmentations was 3.5 milliliters.
Per AI algorithm, we utilized nnU-Net. This is an open-source deep learning algorithm that has achieved excellent performance across multiple medical datasets. We trained this algorithm using a cross-validation subgroup of patients treated with radiation. This was about two-thirds of patients in the radiation cohort, or about 288 patients. And then we did cross-validation. We also combined all the models together and applied them to a subcohort of radiation patients that we called the test RT cohort, as well as all patients treated with RP. And in this way, we were able to get AI delineations of all patients included in this analysis.
We then conducted statistical analysis. We first wanted to evaluate how the AI algorithm performed. We calculated the patient-level and lesion-specific F1 scores, which can tell us what the balance is between the sensitivity and the positive predictive value. We also calculated the sensitivities for PI-RADS 3, 4, and 5 lesions. Then we looked at Cox regression analysis for both the RT group and the RP group separately. For each group, we adjusted for clinical and radiographic staging factors to better understand how AI volume provided—whether AI volume improved or provided additional prognostic information. We also calculated time-dependent AUC values of biochemical failure and metastasis for the RT and RP groups. And I will turn this over to Dr. Yang.
David Yang: Thank you, Dr. King. So in terms of our results, we first looked at how well this model that we trained performed in terms of identifying the lesions correctly on both a patient-level and on a lesion-level analysis. So what we can see, in each column here in this table—I’ll turn our attention to the table on the left here—each column is one of the subgroups. So again, we trained our model using these cross-validation radiation therapy subgroups. So this is the CVRT.
And then we examined the performance of the model on two held-out test sets which do not contain any of the training data. And this includes the test RT subcohort as well as the RP subcohort. And what we can see is that on the patient-level F1 scores, this ranged from 84 to 87%, so relatively stable across all three subgroups. And some sort of reference for the F1 score: a score of 0 essentially is saying that the model does not identify anything correctly, and a score of 100% means excellent, top performance. So on a patient level, 87% means it’s doing a pretty good job.
As we can see, though, on a lesion-level performance, the model’s performance does go down. So the F1 scores now are ranging from 65% to 70% across all three cohorts. And when we looked at the reasoning behind why that was by looking at the performance of the model across the three PI-RADS scores—3, 4, and 5—we see that the performance of the model, sensitivity-wise, is the highest for the PI-RADS 5 lesions and the lowest for PI-RADS 3.
And lastly, we looked at the differences in the tumor volume for the lesion volume for the different types of lesions. So the true positives, the false positives (which means that these were lesions identified by the model but not present in the actual reference images), and the false negatives (which means that it was present in the reference but not identified). We see that the true-positive lesions are numerically significantly bigger than the false positives and false negatives.
And on the right here is an example of what the output from this model looks like. We see here that the AI model, again, identified a lesion in the right peripheral zone—this lesion, which is bright on DWI, very dark on the ADC. And you can see some presence there in the T2, some amount of hypointensity. We see that the model appears to have missed a PI-RADS 4 lesion in the left transitional zone, something which was a bit more subtle on the DWI and the ADC, and quite subtle on T2. It did miss this lesion.
In addition, we turn next to think about, could this model help us? Essentially, can it become a tool with prognostic value? Can it help us identify patients who may be at a higher risk for worse cancer-related outcomes in a way which is independent of other known clinical, pathologic, and radiologic factors? And to do so, we performed a Cox regression analysis looking at the outcomes—here we’ll look at algorithms for metastasis-free survival. And we also have data for biochemical failure as well.
We see that in this cohort, now looking at the patients treated with radiation therapy, even after adjusting for a number of other factors, including clinical variables such as the NCCN risk group, the PI-RADS score, and the radiographic T stage, we see that the total volume of the AI-segmented tumors in the prostate was independently associated with a higher risk of developing metastasis in this cohort, which is treated with radiation. And this is with a median follow-up of 7.9 years, almost eight years.
And in the group here now looking at patients treated with radical prostatectomy, we see a similar trend, which is that even after adjusting for a number of clinical and radiologic covariates that could be confounding the volume, the total volume of the AI-identified prostatic disease was an independent risk factor for worse outcomes again. And we see this perhaps best illustrated in the Kaplan-Meier curves on the right, where we see that patients with larger tumors, especially the group with a volume of 2.0 ml or above, appear to have a much higher risk of biochemical failure as well as developing metastasis. And this is with a median follow-up of 5.5 years.
And lastly, we want to see how well this model performs in terms of predicting outcomes at seven years for the group which received radiation and five years for the group which underwent a radical prostatectomy. And we chose the values for seven and five years because that was the closest to the median follow-up of the group, which allows us to perform a balanced analysis. And what we see here is that when looking at the area under the receiver operating characteristic curve for the seven-year risk of metastasis, the performance of looking at—if you were to just look at the NCCN risk category, you had an AUC of 0.74.
And again, the AUC of 0.5 is essentially a coin flip; 0.74 is saying that the NCCN risk group is certainly better than a coin flip in predicting the risk of metastasis. However, we see that the volume of the AI-identified intraprosthetic tumor has an AUC of 0.84, and compared to the NCCN AUC of 0.74, that difference was statistically significant with a p-value of 0.02. And now, turning to the subgroup which received a radical prostatectomy, you see that the AUC values are 0.79 versus 0.89, a difference here which numerically favors the AI-identified tumor volume, though this p-value here, I would say, is not statistically significant.
So in summary, I think there are a couple of points to take away. The first is that it appears that this AI, this deep learning model, can be trained to segment lesions from prostate MRIs with good performance. And secondly, when looking at the volume of the identified tumor, that would appear to be an independent prognostic factor for cancer outcomes for patients with localized prostate cancer, treated with either surgery or radiation. And for the radiation cohort, for patients with radiation, the volume of prostatic tumor based on AI segmentation exhibited a higher AUC for predicting the seven-year risk of distant metastasis than the NCCN risk category.
There are a couple of strengths to our study. And I think these include that the segmentations can be obtained in this automated fashion from staging MRIs, and this can be checked by radiologists. So essentially, it’s a nice way to make a process that otherwise may be time-consuming much faster and potentially—hopefully—more easily integrated into current workflows. But there are some limitations as well. Our cohort was from a single institution, and some of these involve older scanning configurations. And we also know that our model did miss some PI-RADS 5 lesions, and we’re investigating why that may be.
And so our future work is centered around validating this model on multi-institutional cohorts obtained with multiple different scan parameters to understand how well our model or some approaches may perform in other kinds of scans and patient cohorts. So I just want to again, on behalf of Dr. King and myself, thank you all today for the opportunity to discuss the results and hopefully disseminate that throughout the community. Thanks.
Martin King: Thank you.
Zachary Klaassen: So first of all, congratulations on this excellent work. I mean, this is really well done and really showing—we’ve seen AI already in prognostication of pathology, we’re seeing it now in MRI. And so this is excellent work and we’re delighted to have you share it with us on UroToday. So just a couple of discussion points. Martin, when I look at this—I mean, we know MRIs are difficult to read. We know there are some issues with the PI-RADS scoring system. There’s interobserver variability among readers. On a global scale, on a really high-level scale, what’s the potential implication for work like this?
Martin King: Yeah, I mean, I think that there could be big potential implications. What we ultimately would love to see is that the AI tumor volume is included on radiology reports. And if we can associate, in larger datasets, the potential risk after treatment with standard-of-care treatment with these volumes, then I think it could also be used by clinicians for deciding or discussing potential treatment options. For this to happen, there are a couple of things. Number one is that the radiologist would need to make sure that the image quality is sufficient to be able to perform this analysis. So if there’s a lot of rectal gas or hip artifacts, those MRIs likely would not be good candidates.
But the other thing is that just by doing the volume of the tumor, this can be second-checked by a radiologist. And so the radiologist just says, “This is a really heterogeneous tumor that I thought was large. The AI value missed a large portion of it. Maybe this analysis is not good and we should not present this to the patient.” And with those two checks and balances, I think that we can give patients a reliable result that could be used to inform decision-making with additional validation across multiple scanners and institutions.
Zachary Klaassen: Yeah, absolutely. And David, so I know you looked at biochemical failure and metastasis in this particular study. If we look at this—and I know you haven’t done this work yet—but from an active surveillance standpoint, there are a lot of patients that get MRIs either before biopsy, after biopsy, as part of their active surveillance follow-up algorithm. What’s the potential implication of a technology like this on patients with active surveillance?
David Yang: That’s a great question and I think a very important clinical scenario in which AI methods, such as our approach toward AI-based intraprostatic tumor identification and segmentation, may have value. I think, as you noted, this is not quite the patient cohort we looked at, which was patients with localized disease treated with surgery or radiation. With that being said, it is very possible that an AI model could also do a nice job of segmenting and identifying the volume of intrinsic lesions for patients on active surveillance.
I think what we know from our work is that the performance of the model does go down, especially for patients with PI-RADS 3 disease. And we’re working on ways to try and improve that performance, especially if trained on a cohort that’s enriched for this patient population, such as one for patients on active surveillance. I think that’s certainly—I think it’s a foreseeable, achievable goal.
And so really, I think what I foresee is that in the future, a similar approach of identifying, segmenting, and calculating the volume of AI-based segmentation of disease may have potential as a way to risk-stratify patients on active surveillance as well. Perhaps it’s possible that the volume may be associated with time to progression or likelihood of progression or risk classification, if you will. And so I think those are all interesting and important directions to be taken in the future.
Zachary Klaassen: That’s great. So Martin, you mentioned multi-institutional validation, etc. What other exciting things are you guys planning for the model over the next, say, one to two or three years?
Martin King: Yeah, I mean, I think that one of the nice things about being in this field is that there are so many new AI algorithms that are being developed, and they just keep getting better and better. And so we’re also looking at implementing many of these algorithms. A lot of the source code is just on the web. We’ve looked at even a couple of small maneuvers, and we’ve gotten significantly better performance in some of our datasets. And so definitely improving that performance.
But also trying to better understand what is the reaction for the end users, such as clinicians—whether it’s radiation oncologists or radiologists—and getting more prospective experience in understanding how we can really utilize this on the clinical level. And so I think that prospective protocols are something that I’m also very excited about.
Zachary Klaassen: That’s great. Again, congratulations on some great work. I’d love to give you guys each a minute to give our listeners a take-home message from your awesome study.
David Yang: Yeah, that sounds great. I think I would say that one of the main takeaways of our study is that I think these kinds of AI methods have—we’re starting to see their potential as ways to not so much replace the work that we do as clinicians but rather, I think, nicely complement our current capabilities in ways that help us personalize patient recommendations and management in the future. For something like this, where we may be able to improve risk stratification if validated in a prospective setting, that may allow us to tell the patient in front of us with more precision what exactly is the best treatment or regimen for them. And that, for me, is especially exciting.
Zachary Klaassen: Yeah, absolutely. All right, Martin, you have the last word.
Martin King: Yeah, I mean, I completely agree with everything that David has said. I think another thing that I’m really excited about is that this work really maximizes the value of images that have already been obtained by patients. And so we wouldn’t need to tell the patients, “Oh, we have to wait for x number of weeks to get another test.” We really just maximize everything that we can obtain from the MRI images, and we can use that to hopefully better inform the patients about their risk of recurrence and utilize not only just tumor volume but also the other good stuff that we get from the reports—extracapsular extension, seminal vesicle invasion, nodal involvement—and really integrate that together to give the patient a full picture of what we think is going on, or at least what we can see radiographically. And so that’s one thing that I’m really excited about with this.
Zachary Klaassen: That’s super. Again, we thank you for your time and expertise. We look forward to hearing more about your exciting work as we move forward. Thank you both again.
Martin King: Thank you.
David Yang: Thank you for having us.
Zachary Klaassen: Hi, my name is Zach Klaassen. I’m a urologic oncologist at the Georgia Cancer Center in Augusta, Georgia. I’m delighted to be joined on UroToday by Dr. David Yang and Dr. Martin King, radiation oncologists at Dana-Farber Cancer Institute. We’re going to be talking about their recently published paper called “AI-derived tumor volume for multiparametric MRI and outcomes in localized prostate cancer.” David and Martin, thank you so much for joining us today.
Martin King: Our pleasure. Thank you for the invitation.
Zachary Klaassen: Absolutely.
David Yang: Thank you so much for having us.
Zachary Klaassen: So I’d love for you guys to walk us through some of your key slides. And then we’ll have a nice discussion afterwards.
Martin King: My name is Martin King. I’m from Brigham and Women’s Hospital and Dana-Farber Cancer Institute. And, David, do you want to just tell us where you’re from?
David Yang: Sure, sounds great. I am also a radiation oncologist based out of Brigham and Women’s Hospital and Dana-Farber Cancer in Boston.
Martin King: And together, we’re going to discuss our recent publication in Radiology Journal entitled “AI-derived tumor volume for multiparametric MRI and outcomes in localized prostate cancer.” Here’s our outline. I’ll talk about the introduction and methods, and Dr. Yang will talk about the results and the summary.
So as we all know, prostate MRI provides valuable information that has been associated with the risk of cancer recurrence. This information includes PI-RADS scores, radiographic staging, as well as tumor size. However, these characteristics are subject to inter-observer variability. Artificial intelligence algorithms can analyze images in a consistent manner, and the objective of this particular study was to evaluate whether the volume of AI-segmented intraprosthetic tumor provided prognostic information.
So we conducted a single-institution retrospective analysis of 732 patients who underwent an MRI prior to radiation therapy between 2009 and 2017, or radical prostatectomy between 2015 and 2017. For all of our patients, we had reference segmentations delineated for all PI-RADS 3 to 5 lesions. And so I just wanted to show you some of the delineations that we did. So if you look at this image set, we have three channels or three sequences: diffusion-weighted, the apparent diffusion coefficient in the middle, and T2 on the right.
And we had delineations of a larger right peripheral zone tumor on the ADC image, as well as a left anterior transitional zone tumor that was PI-RADS 4. And so basically these were the images that were utilized. We could also segment based on these segmentations, calculate the volume of the tumors. And so the total tumor volume in this case based on just the reference segmentations was 3.5 milliliters.
Per AI algorithm, we utilized nnU-Net. This is an open-source deep learning algorithm that has achieved excellent performance across multiple medical datasets. We trained this algorithm using a cross-validation subgroup of patients treated with radiation. This was about two-thirds of patients in the radiation cohort, or about 288 patients. And then we did cross-validation. We also combined all the models together and applied them to a subcohort of radiation patients that we called the test RT cohort, as well as all patients treated with RP. And in this way, we were able to get AI delineations of all patients included in this analysis.
We then conducted statistical analysis. We first wanted to evaluate how the AI algorithm performed. We calculated the patient-level and lesion-specific F1 scores, which can tell us what the balance is between the sensitivity and the positive predictive value. We also calculated the sensitivities for PI-RADS 3, 4, and 5 lesions. Then we looked at Cox regression analysis for both the RT group and the RP group separately. For each group, we adjusted for clinical and radiographic staging factors to better understand how AI volume provided—whether AI volume improved or provided additional prognostic information. We also calculated time-dependent AUC values of biochemical failure and metastasis for the RT and RP groups. And I will turn this over to Dr. Yang.
David Yang: Thank you, Dr. King. So in terms of our results, we first looked at how well this model that we trained performed in terms of identifying the lesions correctly on both a patient-level and on a lesion-level analysis. So what we can see, in each column here in this table—I’ll turn our attention to the table on the left here—each column is one of the subgroups. So again, we trained our model using these cross-validation radiation therapy subgroups. So this is the CVRT.
And then we examined the performance of the model on two held-out test sets which do not contain any of the training data. And this includes the test RT subcohort as well as the RP subcohort. And what we can see is that on the patient-level F1 scores, this ranged from 84 to 87%, so relatively stable across all three subgroups. And some sort of reference for the F1 score: a score of 0 essentially is saying that the model does not identify anything correctly, and a score of 100% means excellent, top performance. So on a patient level, 87% means it’s doing a pretty good job.
As we can see, though, on a lesion-level performance, the model’s performance does go down. So the F1 scores now are ranging from 65% to 70% across all three cohorts. And when we looked at the reasoning behind why that was by looking at the performance of the model across the three PI-RADS scores—3, 4, and 5—we see that the performance of the model, sensitivity-wise, is the highest for the PI-RADS 5 lesions and the lowest for PI-RADS 3.
And lastly, we looked at the differences in the tumor volume for the lesion volume for the different types of lesions. So the true positives, the false positives (which means that these were lesions identified by the model but not present in the actual reference images), and the false negatives (which means that it was present in the reference but not identified). We see that the true-positive lesions are numerically significantly bigger than the false positives and false negatives.
And on the right here is an example of what the output from this model looks like. We see here that the AI model, again, identified a lesion in the right peripheral zone—this lesion, which is bright on DWI, very dark on the ADC. And you can see some presence there in the T2, some amount of hypointensity. We see that the model appears to have missed a PI-RADS 4 lesion in the left transitional zone, something which was a bit more subtle on the DWI and the ADC, and quite subtle on T2. It did miss this lesion.
In addition, we turn next to think about, could this model help us? Essentially, can it become a tool with prognostic value? Can it help us identify patients who may be at a higher risk for worse cancer-related outcomes in a way which is independent of other known clinical, pathologic, and radiologic factors? And to do so, we performed a Cox regression analysis looking at the outcomes—here we’ll look at algorithms for metastasis-free survival. And we also have data for biochemical failure as well.
We see that in this cohort, now looking at the patients treated with radiation therapy, even after adjusting for a number of other factors, including clinical variables such as the NCCN risk group, the PI-RADS score, and the radiographic T stage, we see that the total volume of the AI-segmented tumors in the prostate was independently associated with a higher risk of developing metastasis in this cohort, which is treated with radiation. And this is with a median follow-up of 7.9 years, almost eight years.
And in the group here now looking at patients treated with radical prostatectomy, we see a similar trend, which is that even after adjusting for a number of clinical and radiologic covariates that could be confounding the volume, the total volume of the AI-identified prostatic disease was an independent risk factor for worse outcomes again. And we see this perhaps best illustrated in the Kaplan-Meier curves on the right, where we see that patients with larger tumors, especially the group with a volume of 2.0 ml or above, appear to have a much higher risk of biochemical failure as well as developing metastasis. And this is with a median follow-up of 5.5 years.
And lastly, we want to see how well this model performs in terms of predicting outcomes at seven years for the group which received radiation and five years for the group which underwent a radical prostatectomy. And we chose the values for seven and five years because that was the closest to the median follow-up of the group, which allows us to perform a balanced analysis. And what we see here is that when looking at the area under the receiver operating characteristic curve for the seven-year risk of metastasis, the performance of looking at—if you were to just look at the NCCN risk category, you had an AUC of 0.74.
And again, the AUC of 0.5 is essentially a coin flip; 0.74 is saying that the NCCN risk group is certainly better than a coin flip in predicting the risk of metastasis. However, we see that the volume of the AI-identified intraprosthetic tumor has an AUC of 0.84, and compared to the NCCN AUC of 0.74, that difference was statistically significant with a p-value of 0.02. And now, turning to the subgroup which received a radical prostatectomy, you see that the AUC values are 0.79 versus 0.89, a difference here which numerically favors the AI-identified tumor volume, though this p-value here, I would say, is not statistically significant.
So in summary, I think there are a couple of points to take away. The first is that it appears that this AI, this deep learning model, can be trained to segment lesions from prostate MRIs with good performance. And secondly, when looking at the volume of the identified tumor, that would appear to be an independent prognostic factor for cancer outcomes for patients with localized prostate cancer, treated with either surgery or radiation. And for the radiation cohort, for patients with radiation, the volume of prostatic tumor based on AI segmentation exhibited a higher AUC for predicting the seven-year risk of distant metastasis than the NCCN risk category.
There are a couple of strengths to our study. And I think these include that the segmentations can be obtained in this automated fashion from staging MRIs, and this can be checked by radiologists. So essentially, it’s a nice way to make a process that otherwise may be time-consuming much faster and potentially—hopefully—more easily integrated into current workflows. But there are some limitations as well. Our cohort was from a single institution, and some of these involve older scanning configurations. And we also know that our model did miss some PI-RADS 5 lesions, and we’re investigating why that may be.
And so our future work is centered around validating this model on multi-institutional cohorts obtained with multiple different scan parameters to understand how well our model or some approaches may perform in other kinds of scans and patient cohorts. So I just want to again, on behalf of Dr. King and myself, thank you all today for the opportunity to discuss the results and hopefully disseminate that throughout the community. Thanks.
Martin King: Thank you.
Zachary Klaassen: So first of all, congratulations on this excellent work. I mean, this is really well done and really showing—we’ve seen AI already in prognostication of pathology, we’re seeing it now in MRI. And so this is excellent work and we’re delighted to have you share it with us on UroToday. So just a couple of discussion points. Martin, when I look at this—I mean, we know MRIs are difficult to read. We know there are some issues with the PI-RADS scoring system. There’s interobserver variability among readers. On a global scale, on a really high-level scale, what’s the potential implication for work like this?
Martin King: Yeah, I mean, I think that there could be big potential implications. What we ultimately would love to see is that the AI tumor volume is included on radiology reports. And if we can associate, in larger datasets, the potential risk after treatment with standard-of-care treatment with these volumes, then I think it could also be used by clinicians for deciding or discussing potential treatment options. For this to happen, there are a couple of things. Number one is that the radiologist would need to make sure that the image quality is sufficient to be able to perform this analysis. So if there’s a lot of rectal gas or hip artifacts, those MRIs likely would not be good candidates.
But the other thing is that just by doing the volume of the tumor, this can be second-checked by a radiologist. And so the radiologist just says, “This is a really heterogeneous tumor that I thought was large. The AI value missed a large portion of it. Maybe this analysis is not good and we should not present this to the patient.” And with those two checks and balances, I think that we can give patients a reliable result that could be used to inform decision-making with additional validation across multiple scanners and institutions.
Zachary Klaassen: Yeah, absolutely. And David, so I know you looked at biochemical failure and metastasis in this particular study. If we look at this—and I know you haven’t done this work yet—but from an active surveillance standpoint, there are a lot of patients that get MRIs either before biopsy, after biopsy, as part of their active surveillance follow-up algorithm. What’s the potential implication of a technology like this on patients with active surveillance?
David Yang: That’s a great question and I think a very important clinical scenario in which AI methods, such as our approach toward AI-based intraprostatic tumor identification and segmentation, may have value. I think, as you noted, this is not quite the patient cohort we looked at, which was patients with localized disease treated with surgery or radiation. With that being said, it is very possible that an AI model could also do a nice job of segmenting and identifying the volume of intrinsic lesions for patients on active surveillance.
I think what we know from our work is that the performance of the model does go down, especially for patients with PI-RADS 3 disease. And we’re working on ways to try and improve that performance, especially if trained on a cohort that’s enriched for this patient population, such as one for patients on active surveillance. I think that’s certainly—I think it’s a foreseeable, achievable goal.
And so really, I think what I foresee is that in the future, a similar approach of identifying, segmenting, and calculating the volume of AI-based segmentation of disease may have potential as a way to risk-stratify patients on active surveillance as well. Perhaps it’s possible that the volume may be associated with time to progression or likelihood of progression or risk classification, if you will. And so I think those are all interesting and important directions to be taken in the future.
Zachary Klaassen: That’s great. So Martin, you mentioned multi-institutional validation, etc. What other exciting things are you guys planning for the model over the next, say, one to two or three years?
Martin King: Yeah, I mean, I think that one of the nice things about being in this field is that there are so many new AI algorithms that are being developed, and they just keep getting better and better. And so we’re also looking at implementing many of these algorithms. A lot of the source code is just on the web. We’ve looked at even a couple of small maneuvers, and we’ve gotten significantly better performance in some of our datasets. And so definitely improving that performance.
But also trying to better understand what is the reaction for the end users, such as clinicians—whether it’s radiation oncologists or radiologists—and getting more prospective experience in understanding how we can really utilize this on the clinical level. And so I think that prospective protocols are something that I’m also very excited about.
Zachary Klaassen: That’s great. Again, congratulations on some great work. I’d love to give you guys each a minute to give our listeners a take-home message from your awesome study.
David Yang: Yeah, that sounds great. I think I would say that one of the main takeaways of our study is that I think these kinds of AI methods have—we’re starting to see their potential as ways to not so much replace the work that we do as clinicians but rather, I think, nicely complement our current capabilities in ways that help us personalize patient recommendations and management in the future. For something like this, where we may be able to improve risk stratification if validated in a prospective setting, that may allow us to tell the patient in front of us with more precision what exactly is the best treatment or regimen for them. And that, for me, is especially exciting.
Zachary Klaassen: Yeah, absolutely. All right, Martin, you have the last word.
Martin King: Yeah, I mean, I completely agree with everything that David has said. I think another thing that I’m really excited about is that this work really maximizes the value of images that have already been obtained by patients. And so we wouldn’t need to tell the patients, “Oh, we have to wait for x number of weeks to get another test.” We really just maximize everything that we can obtain from the MRI images, and we can use that to hopefully better inform the patients about their risk of recurrence and utilize not only just tumor volume but also the other good stuff that we get from the reports—extracapsular extension, seminal vesicle invasion, nodal involvement—and really integrate that together to give the patient a full picture of what we think is going on, or at least what we can see radiographically. And so that’s one thing that I’m really excited about with this.
Zachary Klaassen: That’s super. Again, we thank you for your time and expertise. We look forward to hearing more about your exciting work as we move forward. Thank you both again.
Martin King: Thank you.
David Yang: Thank you for having us.