Exploring the Role of Artificial Intelligence in Radiology - Giorgio Brembilla

January 10, 2023

Giorgio Brembilla presents an extensive overview of AI's integration and potential applications in radiology. He explains how AI and radiomics transform qualitative image evaluations into high-dimensional, mineable data, laying the foundation for AI-driven medical imaging. Detailing various AI techniques such as machine learning and deep learning, he emphasizes their potential to enhance imaging procedures, reduce subjectivity, and outlines ongoing studies evaluating tools like Quantib Prostate for cancer detection. Dr. Brembilla underscores challenges such as overfitting, generalizability, bias, and the need for standardization and rigorous methodology. He concludes with a call for clinical knowledge to drive AI development, emphasizing the need to be aware of both the potential and limitations of AI, and recognizing that despite obstacles, AI's impact on radiology is inevitable and promising.

Biographies:

Giorgio Brembilla, MD, PhD, Department of Radiology, IRCCS San Raffaele Scientific Institute, Milan, Italy


Read the Full Video Transcript

Giorgio Brembilla: Thank you very much for the invitation. My name is Dr. Brembilla, I'm a radiologist at San Raffaele Hospital. I will give a quick overview of the studies we have on AI in our institution, and I will try the impossible, to give some very basic concepts of AI and AI literature. That is, as you know, a very complex topic. Of course, my focus will be on radiology, but most of the concepts can be applied, also, to any other field; for example: optics, robotics, or epidemiology.

Let's start. As you know, to date, the interpretation of images relies on a qualitative evaluation done by radiologists, with obvious limitation in terms of subjectivity, interobserver variability, and dependence on training and experience of the reader. But, with advancement in medical imaging equipment, in digitalization of diagnostic images, and increased computational power, now, we are able to convert medical images into mineable data to extract a variety of quantitative factors to produce high-dimensional data, and this is the foundation, actually, of radiomics and AI.

Let's start with some simple definitions, but they are very important. Radiomics is the high-throughput image analysis to extract qualitative features that can be morphological, statistical, or textural features from images, that are missed by the human eye. They can represent imaging biomarkers. We can use radiomics feature to train AI models, and we can use AI to extract radiomic features, so they are interconnected.

Artificial intelligence, per se, is a very broad definition, and refers to any computer method that performs tasks that normally require human intelligence, even simple tasks like face recognition. Machine learning is one type of artificial intelligence that develops algorithms to enable computers to learn from data, and to make predictions on data that they have never seen before, and it's the type of artificial intelligence that we usually use in radiology. And deep learning, in turn, is a type of machine learning that uses a specific architecture. Namely, it's made of interconnected layers of software-based calculators, that are referred to as neurons, to form so-called neural networks. And the peculiarity is that deep learning can autonomously, automatically extract relevant features from images.

And finally, convolutional neural networks is a type of deep learning model that uses a mathematical computation, that is convolution, to deal with the amount of data that we have from images. And here, we have a quick example of pipelines. For example, when you use machine learning, we have to delineate, for example, the tumor within the prostate. We extract relevant features that we know to be associated with cancer, and then we use this rule to train the model that will learn to detect tumors autonomously.

In turn, deep learning, it's a little bit different in that we have an input that is, for example, magnetic resonance imaging. We define the desired output, and then, basically, it defines its own rule to recognize the pattern, to recognize pathology. And this is potentially a way more efficient, and way more powerful tool that can be even more accurate, but it comes with a cost, and it requires a lot more computational power, and requires a lot more data to be adequately trained. Also, it represents a sort of black box. We know the input, we know the output, but we cannot know what happens.

We cannot know the rules of the machine, so sometimes it can be difficult to apply these tools within the clinical practice, because we don't know what is going on in there. And this is a typical pathway for developing AI tools. It is important to know that we need a training set that should be as large as possible to train the algorithm. The algorithm is trained, and then is validated to see how it performs, and which algorithm performs well. And also, it is used, the validation, to optimize the hyper-parameters of the algorithm. Then, we end up at the final validation with a predictive model that should be tested.

And we need to be aware that the test set should be different. It's a different concept, even if sometimes it's mistakenly used as a validation concept. They are two different things. Once we have the final model, we test it, and it should be an external testing. When these are confused, this can have a huge impact on the validity, and on interpretation of the results. Because if we use, for example, the test set to then reset the algorithm, of course, we will have a problem with overfitting, with overperformance. We need to know that there's a stringent rule for developing AI tools, and we should be very aware of this. Of course, there's been a dramatic increase in interest in AI research in medicine, and not only medicine, in recent years, but now, it's becoming a little bit of a jungle. We need to make a little bit more clarity in this literature.

Where are, in terms of radiology, the potential applications of AI? In the very early days, the focus was on replacing radiologists, to detect tumors with machines. But I'm still here, so probably, the reality is a little bit different. And, actually, we are missing a really great opportunity if we have only this focus. We can apply AI virtually on anything, and regarding the radiologic workflow, we can apply AI in selecting patients. We can apply, for example, AI for acquisition to increase the quality of the images and to reduce the time that is required to acquire images, for example, for magnetic resonance imaging, of course, for image interpretation, for reporting and reducing the workload, and also for management. For example, biopsy planning, pretreatment decision. This is one example. This group used a deep learning algorithm that allowed acquiring biparametric MRI of the prostate in more or less four minutes instead of 15 minutes, reducing time, costs, reducing the need or contrast media, of course.

And, with increasing, potentially in a dramatic way, the accessibility and the availability of MRI, and increasing the accessibility for the patient. You understand that, outside detection, there could be a very dramatic impact of AI. For example, in this other study, they used AI to develop and validate an artificial intelligence algorithm to decide the necessity to perform dynamic contrast-enhanced sequences in prostate MRI, and this model performed more or less like the experienced reader. If we had something like this, this could render the requirement for on-table monitoring from the radiologist obsolete, and we can perform contrast-enhanced MRI only when needed. And this is just one example. When we go to detection and characterization of tumors, we have a plethora of studies in the literature that, more or less, they tell the same thing, that AI performs really well in these tasks.

I cannot do a comprehensive review of the literature. It's really, really vast. But just know that every model that is proposed in AI, for any tool, for any purpose, at the end of the day, performs very well. Probably, we had better concentrate on the issues, because we are not still using these tools, and why is that? We should be more aware, probably. We know very well the potential of AI, but we should also be aware of the limitations. And, many times, the studies in literature have insufficient quality in that they're retrospective, and with selection bias, and they have some problems with standardization in design or reporting, higher heterogeneity. For example, in a survey on radiomics studies, the average quality was only 23%, only a minority of the studies that generate stability assessment, clinical utility, or sufficient transparency.

And mainly, they are not sufficient in multicenter studies, and it's lacking the head-to-head comparison with radiologists. We still have many challenges in AI, the main, as I said, being numerous sources of methodological heterogeneity, bias in model validation, as we already saw, and the lack of evidence of clinical translation. And this is one very important concept, is that, for the nature of machine learning systems, they consist of a set of rules that are trained to operate under certain conditions. They can perform very well in certain scenarios, and then, if we translate them into other scenarios, they can also perform very poorly. This is a problem with generalizability that, sometimes, comes at the cost of sacrificing the performance, and probably, it's still early, because maybe one possibility to overcome this issue will be a long-term increase of training data.

And we need a huge amount of data to train an algorithm to be generalizable. In terms of heterogeneity, this is, for example, the pipeline for a radiomics study, and if you understand that each one of these steps can introduce variability, you know that the variability can be huge, and we need standardization. There have already been proposed a lot of AI quality evaluation tools for AI system development, and also, to evaluate the usefulness in clinical trials. It is important that we follow them, both when we design the study, and when we interpret these studies. And also, we already have a lot of commercially available, CE-marked, FDA-approved tools, but we need to be aware that FDA approval, CE-marking, does not mean effective, does not mean useful. Only a minority of them have peer-reviewed evidence, and most of the studies demonstrate lower levels of efficacy.

And it's not surprising that, to date... Not forever, but to date, the satisfaction of radiologists is pretty low, because sometimes, they feel that there's no additional value, that they do not perform as well as advertised. And sometimes, they asked too much workload. These should be supposed to reduce workload, but if they are not tuned, they will add workload. And finally, there's an issue here; what do we do with AI advice? There's a concrete risk of overreliance, so we do as AI says without any critical recognition of the advice of AI. We need, also, to understand in advance how we will be using the AI tools. Just for the introduction, to recap, AI, it's here to stay, no doubt, and it will have a huge impact, especially on radiologic workflow.

And it represents a unique opportunity to improve the efficiency of clinical and radiological workflow, increasing the performance and reducing the workload, but we need to be rigorous in methodology when performing and interpreting AI studies, and we need to set realistic expectations on AI tool. And clinical knowledge must drive AI development to address clinically relevant needs. We will do, also, a quick overview of ongoing studies regarding AI in our institution. The first study is, of course, on multiparametric MRI for prostate cancer detection. And this is, again, the same slide as before. We have this problem that we have commercially available tools, but we don't know how they perform, and if they are clinically usable, and clinically effective. We are evaluating this tool that is Quantib Prostate software. It's a very promising tool that is designed to help the radiologist, not only for improving the judgment of MRI, but also for allowing a more efficient workflow, and to reduce the workload of a radiologist.

There are already a few preliminary studies indicating a possibility of improved diagnostic accuracy, especially for less experienced radiologists, and a reduced reading and reporting time for both experienced and inexperienced radiologists. The aim of the study we are conducting to validate, essentially, this Quantib Prostate, will be to investigate the implementation of Quantib Prostate in our workflow with regards to time efficiency, inter-observer agreement, and diagnostic performance, compared to the traditional pathway. It will be a single-center, multi-reader retrospective study on 200 men who underwent MRI, and eventually a biopsy in our institution, with at least... We expect to have at least three readers that are experienced radiologists, two inexperienced radiologists, and also, urologists. Basically, we will include all the patients with high-quality MRI who did not undergo prostate cancer treatment or diagnosis before. And essentially, to resume the workflow, basically, we will review the images. Each of us will review the images with the AI and without the AI.

And then, we will compare, essentially, the performance. The primary aim will be to compare the performance, in terms of diagnostic accuracy, of radiologists, or urologists, with or without the artificial intelligence tool for the detection of clinically significant cancer. And, secondary aims, but still very important. We will evaluate the reporting time changes, interobserver agreement, and the benefit in all of this in experienced, non-experienced readers, and also, to urologists. We will have to perform the training with the software. We hopefully will start soon, and we will complete, hopefully, the study within one year. The other very, very interesting study will be to develop AI in another setting, that is multiparametric MRI of the bladder, to identify complete response in patients undergoing pre-neoadjuvant immunotherapy. I do not have to explain how immunotherapy changed the treatment in bladder cancer. And also, we are now realizing that multiparametric MRI could be, actually, a very effective tool in predicting complete response after neoadjuvant chemotherapy, and could potentially be one of the standards for treatment assessment.

And it's already the standard for other similar topics, like rectal cancer, esophageal cancer. We know we have hopes and expectations, and it has also been already explored nicely by Professor Necchi and his staff. AI in bladder cancer, basically, can be applied in many ways, but basically, in two ways in terms of evaluation of response at cystoscopy, to aid the identification of residual disease at cystoscopy, and on imaging using MRI. Current data on AI for bladder, compared, for example, to prostate imaging, is very limited.

And, to date, we have no study investigating AI for MRI in bladder cancer in a post-treatment setting. For example, here, we have an example. We have here, this line is pre-treatment, with this tumor that is very evident. And so, after treatment, it's still there. We have no doubt. This is a very simple case, but I can assure you that, especially in the post TURBT setting, things can be way more complicated. It will be very useful for us to have a tool to help us identify residual disease, and also, to include all the clinical and non-radiological data to predict the complete response to immunotherapy.

We have ongoing collaboration with Owkin Inc. and Columbia University on this. And basically, in our study, we'll aim to develop an algorithm to predict response to neoadjuvant pembrolizumab on MRI of the bladder in 160 men who underwent MRI before and after neoadjuvant chemotherapy. We will have 320 scans, and these will be patients that will be taken from the PURE-01 study. This will be a single-center, non-randomized, retrospective, observational study with the primary endpoint of correlating MRI with pathologic complete response. We will evaluate MRI to extract radiomic features that are associated with a complete response, and to train a machine learning algorithm that will be trained and validated and tested for this task.

Also, the performance will be evaluated against, and together with, clinical predictors that will be used, also, as covariates in multivariable logistic regression models to predict complete response. And a satisfactory algorithm will be the one that will be outperforming, in term of discrimination and net benefit through the decision curve analysis, the known clinical predictors of the response. And also, this was a very interesting, a very important concept, that, when we use AI, we also can exploit the multi-modality. We can include clinical information into AI tools that also use radiologic information, and this is known to perform better than the single approaches alone. Thank you very much for your attention.