Predicting Prostate Cancer Molecular Subtype with Deep Learning on Histopathologic Images - Tamara Lotan

August 17, 2023

Andrea Miyahira hosts Tamara Lotan to discuss the paper, "Predicting Prostate Cancer Molecular Subtype with Deep Learning on Histopathologic Images." Dr. Lotan details the collaborative work with Angelo De Marzo's lab at Johns Hopkins and AIRA MATRIX, an AI deep learning company in India. The study aims to predict underlying molecular subtypes of prostate cancer using deep learning algorithms on hematoxylin and eosin stain slides. Demonstrating high accuracy in identifying genomic alterations such as TMPRSS2:ERG rearrangement and PTEN tumor suppressor status, the algorithms offer an inexpensive screening test, potentially leading to more personalized treatment strategies. Dr. Lotan emphasizes the need for multi-institutional testing, training the algorithms on racially diverse cohorts, and evaluating performance across various institutions to ensure robustness, calling the results promising.

Biographies:

Tamara Lotan, MD, Johns Hopkins University School of Medicine, Baltimore, MD

Andrea K. Miyahira, PhD, Director of Global Research & Scientific Communications, The Prostate Cancer Foundation


Read the Full Video Transcript

Andrea Miyahira: Hi everyone, I'm Andrea Miyahira here at the Prostate Cancer Foundation. Thanks for joining us today. Today I'm here with Dr. Tamara Lotan, a professor at Johns Hopkins. She'll be discussing her group's recent paper, "Predicting Prostate Cancer Molecular Subtype with Deep Learning on Histopathologic Images." It was recently published in Modern Pathology. Thanks so much for joining me today, Dr. Lotan.

Tamara Lotan: Great, thank you so much for having me. Yeah, so I'm going to talk about some work that we did together with Angelo De Marzo's lab at Johns Hopkins and a company, an AI deep learning company called AIRA MATRIX, which is based in India. And we were interested in whether we could use deep learning on ordinary hematoxylin and eosin stain slides to try to predict prostate cancer underlying molecular subtype. This was really sort of a proof of principle study. So in the field of pathology we know that histology is really tightly correlated to underlying molecular alterations in some tumor types. Here I'm actually picturing two different kinds of renal cell carcinomas. On the left we have clear cell renal cell carcinoma, and on the right we have a benign tumor called an oncocytoma. And you could see that those have very obviously different histologies which are easily recognizable, and we know that those histologies correlate very tightly with underlying genomic alterations.

So 3P deletions of the VHL or von-Hippel-Lindau tumor suppressor are essentially uniformly present in clear cell renal cell carcinomas, whereas oncocytomas, for example, have very minimal copy number alterations. So just by looking at the hematoxylin eosin stain slides, the diagnostic slides that pathologists use for initial diagnosis of these tumor types, we can make a very strong prediction of the underlying molecular status of these tumors. In prostate cancer, unfortunately, we don't think that morphology, at least by the human eye, using a hematoxylin eosin, or H&E, slide is predictive of molecular subtype. So I pictured here two prostate tumors. They're both a similar grade group. One on the left is positive for the TMPRSS2:ERG rearrangement, which is a genomic rearrangement that's present in about half of prostate cancers. And the one on the right, conversely, is actually negative or not carrying this underlying rearrangement, which we think occurs very early in tumor genesis.

And by looking at these images visually, I really cannot see a difference between these two tumor types. And studies have shown that there have not really been reproducible, visually identifiable molecular features that are different, or visual morphologic features that are different by morphology or by molecular subtype in these tumors. So we really can't look at them and know what the underlying molecular status is.

And the other real motivator for our study was that we know that germline testing and all kinds of next-generation sequencing assays are expensive and not widely used in early-stage prostate cancer and really not accessible to many patients despite recommendations, at least in more advanced tumors and aggressive tumor subsets that patients undergo, either germline or somatic sequencing, to try to understand if they have any underlying actionable alterations that could be actionable for therapy down the road. And because these are not accessible, it would really be beneficial if we had a very inexpensive screening test that we could use, for example, at the time of diagnosis and with the diagnostic H&E to try to predict who, for example, really should go for genetic screening, either germline or somatic sequencing, to prove definitively that they have an underlying alteration.

So our idea was that we could use these deep learning algorithms. If we could train them on prostate cancer to identify or predict underlying molecular subtype, we could then use them to potentially screen cases and be able to recommend sequencing in subsets of patients who would particularly benefit from that. And this would be an inexpensive method to screen patients and also perhaps a better way to recommend specific patients should follow up on our recommendation of sequencing.

So I'm not going to get into the details of the deep learning algorithm that we developed, but we essentially used several hundred tumors that had known genomic status for one of two genomic alterations that we were interested in testing for. The first was the TMPRSS2:ERG alteration or ERG expression. And the second one that we were developing algorithms for was the most commonly deleted tumor suppressor in prostate cancer, which is the PTEN tumor suppressor. And we used algorithms that AIRA MATRIX had already developed to do tumor identification on an H&E image. So these are deep learning algorithms that automatically identify, you can see the blue annotations on the right, all the tumor tissue within this quadrant of a radical prostatectomy, which is pictured here with H&E staining and can distinguish the tumor from the underlying stroma and normal glands. And once this tumor identification was performed, we could train our algorithms on sets of tumors that either had known ERG status or known PTEN status by feature representation learning and feature extraction classification to have the algorithm predict which tumors were more likely to have underlying alterations.

And the algorithm essentially works on small tiles that are extracted from these larger images to make it computationally more efficient. And the tiles are comprised entirely of tumor that's been identified in this tumor identification algorithm. And so we're feeding in hundreds, thousands of tiles from several hundred prostate cancers that have a known positive or negative ERG status, known positive or negative PTEN status, and training it to identify which tiles are likely to have this underlying alteration and which likely do not. And it essentially spits out a probability that any given tile has the underlying genomic alteration, and then we can create a probability for the whole tumor having that alteration.

This is just a little bit more detail for those who are interested in the actual deep learning algorithms showing the tiles that we generate from each image and then the concatenation of the feature maps that are extracted from these tiles and fed into these vision transformer algorithms.

So this is what our outcome looked like. So we, as I said, trained on just over 200 images, and then we tested on some holdout images from the same cohort we had trained on. These are previously published radical prostatectomy cohorts that were constructed for various reasons, comparing men by self-identified race or comparing men by natural history in terms of development and metastasis. But what they all had in common is that we knew the genomic status for both PTEN and ERG in these cohorts. So you could see, on the 64 images that we held out from our original training cohort, 26 of them were ERG positive, so a little bit less than half, as we would have predicted, but the algorithm actually was quite accurate in predicting which of those were likely to be ERG positive, with an AUC of about 0.9, and the max AUC is one. So this is a zero to one scale, so that's really an excellent result.

But of course this was done on held out images from our training data set, so we wanted to test it on independent radical prostatectomy cohorts. That's what the RP here stands for. And so we tested it on an additional 248 images from a case cohort design for metastasis as the final outcome. And also this cohort I mentioned where we're comparing patients of different self-identified races, so a very diverse cohort. And you could see that actually the AUCs remained very high, close to 0.9, between 0.86 and 0.89. So very excellent results. So next we wanted to see, can these algorithms work with smaller amounts of tumor? As I mentioned on the previous slide, the algorithm is extracting all of these tiles of tumor tissue from the radical prostatectomy, and of course it can extract many hundred from a large tumor that's present in a radical prostatectomy sample.

A needle biopsy is a very tiny sample compared to radical prostatectomy, just a millimeter or so across, much fewer tiles of tumor can be extracted. So we wondered about the accuracy of our algorithm in that setting. And you could see that we do have a slight decrease in the AUC in two needle biopsy cohorts that we tested. One was a cohort of patients undergoing radiation therapy, one was a cohort of patients actually in active surveillance here at Johns Hopkins. But still actually pretty reasonable AUCs overall, especially considering that very little tumor tissue is being fed into these algorithms, especially in the active surveillance cohort, for example, which tends to have very low volume tumors. So we thought these were really encouraging results, that even on diagnostic needle biopsy samples we can predict ERG rearrangement. Now, of course ERG by itself is not predictive of response to any therapy. So this was really more of a proof of principle type of experiment to see for a very common alteration where we have a large training data set, is it possible to train to detect an underlying molecular alteration?

We also looked at whether we could predict PTEN status. PTEN is arguably potentially a better predictive and also a prognostic biomarker. And so we did a very similar experiment. Again, we trained on cases here. We restricted, because PTEN can often be subclonal and heterogeneously lost within the tumor, we restricted our training to cases that had homogeneous PTEN loss, and also our initial testing was restricted to cases that had homogeneous loss. So that's why the numbers here are a little lower than they were on the previous slide for ERG. But nonetheless, you could see again, on the cases held out from the training cohort we have very nice AUCs close to 0.8, and very similar in some additional independent cohorts, radical prostatectomy cohorts that we tested. And then remarkably, in a needle biopsy cohort we see AUC is fairly similar to what we saw in the radical prostatectomy cohort. So the algorithm apparently works fairly well even with very few tiles of tumor, relatively less tumor sample.

And then in the final slide, I'll just mention we did what I thought was a really useful test to look at how the algorithm performed in radical prostatectomy sections with heterogeneous PTEN status. So these are tumors that have subclonal loss of PTEN. So on the left you can see a tumor that's actually been immunostained for PTEN. And in blue the pathologist has circled the areas that have PTEN deletion. In red we have the areas of tumor that have PTEN intact still. You could see that there's just a lot of intermixing between these areas. So a lot of clonal admixture here and a very complex pattern of areas of loss and intact PTEN. And on the right you're looking at the deep learning algorithm's prediction for which areas of the tumor, again in blue, would have PTEN loss or which areas have intact PTEN. And I think what you can appreciate is we see very similar patterns. It actually did a fairly good job of reproducing what we know is the ground truth based on this genetically validated immunohistochemistry assay.

So you can see these broad areas of loss along here, an area of intact PTEN through the center of the section, more loss at the top, some areas of intact towards the edge of the section. So actually remarkably accurate. And when we looked, we did this across a number of cases, you can see that actually the percentage of the tumor area that was predicted to have PTEN loss by the pathologist based on examining the immunohistochemical stain slide was highly correlated with the percentage of loss that was predicted by the deep learning algorithm. So that's a nice validation in heterogeneous cases that we can in fact predict subclonal genomic alterations potentially.

So just to wrap up, I think this is obviously really more proof of principle at this time. We need to test this in a lot of multi-institutional cohorts because there can be some differences in H&E quality between institutions. We need to make sure it's robust across other institutions material. But I think a nice proof of principle that we can use simple diagnostic H&E stain slides to make molecular predictions. Obviously these are just probabilities, so the patient would still require definitive testing, but this would potentially help us screen for patients who might benefit from BRCA sequencing, for example, BRCA 1 or 2 gene sequencing, or mismatch repair gene sequencing, or we can think of a number of potentially actionable underlying genomic alterations where we really want to make sure those patients get sequenced and we can spend our resources trying to direct them to getting sequencing rather than taking all comers who are diagnosed with prostate cancer.

Andrea Miyahira: Okay, thank you for that, Dr. Lotan. So how would other alterations in the same pathways or with a similar molecular impact affect the performance of your AI algorithm? For instance, do other PI3 kinase or AKT pathway alterations impact the PTEN AI output?

Tamara Lotan: Yeah, that's a great question and actually one that we're trying to study right now. So PTEN alterations are very common in prostate cancer, PI3K is rare, so we don't have too many cases of those. But in the case of ERG translocations, ERG is part of the larger family of ETS transcription factors, and we know that there are other rearrangements in ETV 1, ETV 4, ETV 5 that we think are functionally very similar to ERG rearrangements. So one thing that we're looking at right now is whether we can glean anything from the cases that are false positive. So in other words, cases where the algorithm is predicting the tumors are positive, but based on our immunohistochemistry we don't think it's actually ERG rearranged. And are those cases in fact rearranged for ETV 1, ETV 4, ETV 5?

So we're trying to test that right now to see. We expect those are transcriptionally all very overlapping in terms of the consequences for the tumor. So we would expect that if transcription is related to morphology, that the morphology or whatever the algorithm is picking up in the morphology should potentially be similar between ETV rearranged tumors and ERG rearranged tumors. But that's a great question. Hopefully we will be able to pick up all the things in the same pathway.

Andrea Miyahira: Okay, awesome. When considering the feasibility and access for patients to genomic testing and precision therapies, what types of algorithms do you think are most needed and how do you envision this tool being rolled out in widespread clinical use?

Tamara Lotan: Yeah, that's also a really good question. So as I said, I think at best this is just a screening algorithm. We're just really only estimating probabilities of underlying alterations. And I also think for ERG, and perhaps one could argue even now for PTEN, those are not yet predictive biomarkers. But I think it's these subsets of tumors with DNA repair alterations where we could immediately see the utility of being able to screen. These are fairly rare among all primary tumors diagnosed daily in the US. So if we had a way of searching for that needle in the haystack at the time of diagnosis very rapidly, you could imagine an algorithm on the pathologist's computer where they're looking at the slide digitally and they just run the algorithm as they're signing out the case or releasing the diagnosis for the case to estimate the probability that there is an underlying BRCA 2 mutation or mismatch repair gene mutation. Then we could write a note that this patient absolutely should be sent for germline and further downstream confirmatory sequencing.

We do a lot of screening using alternative methods that are similar in pathology. For example, in colorectal cancers we routinely do mismatch repair immunohistochemistry at the time of diagnosis. Again, that just screens for an underlying genomic alteration in mismatch repair genes. But that helps us then again say this case should get sequenced and look for potential for Lynch syndrome or somatic inactivation of the MMR gene. So I think it's really going to be in these actionable subsets that this will be most useful. And of course, pathology practices across the US are not fully digital yet, so maybe initially this could be something that is sent to some central facility for further screening. But I think within 10 or 20 years we expect everyone to be digital and hopefully it will just run as an app on the computer as pathologists are releasing the case.

Andrea Miyahira: Okay, cool. So I think all the samples that you analyzed in this study were from Johns Hopkins. Have you evaluated the performance of your algorithms on samples from other institutions? And then the follow-up question is, how variable can H&E staining and pathology slide preparation be from center to center?

Tamara Lotan: Yeah, so a great question. So that's a huge pitfall of developing these deep learning algorithms, and anyone who's spent any time doing that will tell you that, that you can really go down the garden path if you're just working with a single institution. So this is definitely preliminary and we're definitely planning to collaborate, hopefully with some very large multi-institutional cohorts where we've already assessed PTEN and ERG status, for example, in those samples to try to see how the algorithm performs. Yeah, H&E quality can be different from institution to institution. The other thing is pre-analytic variables in terms of how the tissue is fixed, the time until it's fixed, how long it's stored, what conditions it's stored in. All of these things can alter the very subtle details in the H&E. And so pathologists' eyes have been trained to read around that, but these deep learning algorithms can really stumble if you don't train them on multi-institutional cohorts.

The other thing I would say is also really racially diverse cohorts. We probably shouldn't be training them on cohorts that are all from one ancestry background, because we know from genomic studies that that can really get you into trouble in terms of understanding. For example, ERG, as I mentioned, is present in half of all prostate cancer. It's really only half of all prostate cancers in white men. In Black men it's only about 1/4 of all prostate cancers. So lots of differences that may relate to the morphology and I think argue for training on very diverse cohorts, institutionally diverse and then racially and ethnically diverse.

Andrea Miyahira: Okay. Well, congratulations on this study. It's nice to see this coming along so well, and thanks again for coming on today and sharing this.

Tamara Lotan: Well, thanks so much for having me.