Comparison of ChatGPT and Traditional Patient Education Materials for Men’s Health - Yash Shah & Mihir Shah

March 7, 2024

Ruchika Talwar hosts Yash Shah and Mihir Shah, a urologist from Jefferson in Philadelphia. They delve into their innovative research on the Utility of Artificial Intelligence, specifically ChatGPT, in creating patient educational materials for men's sexual health. Highlighting the challenge of most online health materials not meeting the recommended sixth-grade reading level, Drs. Shah and Shah explored whether AI could simplify complex medical information. Their study compared materials crafted by ChatGPT with those from the Urology Care Foundation across six men's health topics, adjusting ChatGPT responses to a sixth-grade comprehension level. Interestingly, while initial ChatGPT materials were found to be more complex and verbose, upon adjustment, they achieved better readability. Both sources provided high-quality, accurate information, underscoring the potential of AI like ChatGPT to enhance patient education by making information more accessible and understandable.

Biographies:

Yash Shah, MD Candidate, Sidney Kimmel Medical College at Thomas Jefferson University, Philadelphia, PA

Mihir S. Shah, MD, Clinical Assistant Professor, Department of Urology, Sidney Kimmel Medical College at Thomas Jefferson University, , Philadelphia, PA

Ruchika Talwar, MD, Urologic Oncology Fellow, Department of Urology, Vanderbilt University Medical Center, Nashville, TN


Read the Full Video Transcript

Ruchika Talwar: Hi everyone. Welcome back to UroToday's Health Policy Center of Excellence. I'm excited to be joined by Yash Shah, who's a medical student at Jefferson, and Mihir Shah, who's a urologist at Jefferson, both in Philadelphia, and they'll be presenting some recent work that they undertook that had to do with patient education materials and the use of AI, specifically ChatGPT. Thank you both so much for being here with us today.

Mihir Shah: Thanks for having us.

Yash Shah: All right. Hi, Dr. Talwar. Thank you for having us. So our project was titled Utility of Artificial Intelligence in Patient Educational Materials for Men's Sexual Health. Just to give you some background, over the last several years there's been more and more use of social media and eventually AI by patients, given the improving technology and all the different platforms that have been coming out. And also within this space in particular, given the stigma that surrounds sexual health, it's a sensitive and uncomfortable topic that patients sometimes might feel uncomfortable discussing with their provider or with other people in the community. And so people often try to go online first to get initial answers. Unfortunately, lots and lots of studies over many, many years now have shown that online educational information for patients almost always exceeds the sixth-grade reading level, which is what's recommended by both the NIH and the AMA.

And that's obviously a problem because if there are materials out there that have been either produced by physicians or on social media, etc., that patients are maybe not able to understand, it doesn't really do them that much good. So we were curious to see if AI could maybe address that challenge. There have been several studies in the last few years ever since AI chatbots were introduced that used that platform for physician tasks. So I listed a few of those here, but there have been studies showing the use in just routine paperwork charting, more recently even clinical decision-making, and supporting decisions based on guidelines. Things like radiology, pathology have obviously they're not necessarily being used in the clinic, but there've been studies showing that they have potential.

So our study again was to see if patients can use that for their benefit as well. So we sought to analyze resources created by ChatGPT and compare them with resources that are already out there created by the Urology Care Foundation, which is supported by the AUA, for six men's health topics. So we first analyzed those and then separately what we did was we adjusted the ChatGPT responses by adding the command, explain it to me like I'm in sixth grade, originating from that official guideline and target. And we used a few different analyses to see how the responses did. So I've listed these six formulas that are validated to understand readability, which is basically how easy is it for someone to understand something that's written. And then we also used a validated Likert scale to analyze quality, which we defined as accuracy and comprehensiveness, by two independent urologists. So I'll pass it along to Dr. Shah who can talk about what we found.

Mihir Shah: So the results were interesting, but consistent with what we know about available information online, and that is that most of the things that are out there are not at a sixth-grade reading level. And so when you compared Urology Care Foundation material, it was actually more readable for the patients than the ChatGPT across all topics and across all of the formulas.

And so here you can see the box plot showing us that for all of the readability tests. Additionally, the ChatGPT responses were actually longer. They were more verbose and used a lot more complex words. However, if we prompted ChatGPT to adjust the readability to a sixth-grade level, it actually did pretty well and was able to adjust that complexity in language down where it was more readable for the patient. And ultimately, in terms of quality and accuracy, when it was independently reviewed by the two urologists, it actually showed that Urology Care Foundation materials and ChatGPT performed equally well.

And so both materials are fairly high quality and accurate. However, in terms of picking up and reading the material, Urology Care Foundation did better than ChatGPT. However, you can actually have ChatGPT adjust its reading grade, which then actually performs better and becomes more readable. And so, in conclusion, what we found is that while ChatGPT education is less accessible upfront, they both actually have good quality data. And given the current landscape of AI, I think it has a lot of potential in the future if we can make material that is more readable, especially with ChatGPT or other chatbots, then it could potentially open up the door for these patients to have excellent access to high-quality data that is fairly readable at their fingertips. And in terms of the target material, it really should be focused at a grade between sixth and eighth-grade levels so that an average American can pick up the material and read it easily.

Ruchika Talwar: Thank you both. This is a really interesting study and I think it highlights several notable points that are relevant not just for the materials that you analyzed and assessed but for medical materials in general. First of all, it's a great reminder that when clinicians opt to use things like ChatGPT or other sources of artificial intelligence in preparing their own patient materials, we have to ensure that those materials are prompted to be at a sixth-grade reading level because I think that that's something we potentially are not as meticulous about. But this really highlights the fact that just because we understand it, it certainly doesn't mean our patients will understand it.

And that although AI is a time-saver that should be utilized more, it's something that we need to intentionally do. And another point I think is important noting is for urologic problems, like you talked about, a lot of times patients do turn to the internet before turning to their physicians or there could be certain questions that perhaps they don't feel comfortable asking in an office setting. And so even presenting this as an option that's accurate. Now we have the data to tell us that the accuracy did meet the standards of the Urology Care Foundation official materials. We can even tell our patients if things come up in counseling that we haven't touched on today, it is a reliable source. So I think those are two important points. Mihir, I'm curious, is this something that you have implemented in your practice thus far?

Mihir Shah: So I have to be honest, I haven't, and part of that was because we were still trying to see what the data was going to show us before we just jumped right in. Now obviously I've played around with it, I have the app on my phone, and a lot of you might've played with it and you could see, I mean, the responses are quick, but often pretty lengthy. And so I think it would be really helpful to have some guidelines or statements from our association like AUA or other societies to give us some guidance on how to really incorporate this into clinical practice. But I do tell patients if they come in asking me, hey, I looked this up and ChatGPT told me this, then I often say, actually, the quality of the material is pretty good and you can use it as a resource, but if it's too difficult to understand, here's a way to improve its accessibility for yourself, or obviously, you can reach out to us and ask us the questions as well.

Ruchika Talwar: Yeah. Another important point to note was that the Urology Care Foundation materials themselves did not meet the recommended reading level. So I'm curious, Yash, do you foresee perhaps ChatGPT being a way where patients can even copy and paste things that they don't understand and ask, prompt the chatbot to say, "Please explain this to me in simpler terms." Tell me a little bit about how you can foresee that being implemented into patient care.

Yash Shah: No, you're absolutely right. So that was one thing that we talked about in our discussion. The Urology Care Foundation is not unique. Almost every study that has looked at any patient or physician-written material for any specialty has found that usually it exceeds the recommended level. And I think it's really hard because you want medical experts that are trustworthy to write this information, and it's really hard for those people to understand what the average patient may or may not understand. So I think like you suggested, I think the first and probably easiest thing that patients could do even now is exactly what you said. So, copy-pasting the content and just asking the chatbot, please explain this to me in X grade level.

I think the one thing that we could do there is have further studies to define what is the best prompt. So in our study, we just had that one prompt and it yielded pretty good results, but that was just a prompt that we came up with, and I'm sure that we could improve that. So I think that's the first step, but looking towards the future, I think physicians could even start using this. So if you're writing a new resource, you could use this to reword it before you put it online, or ultimately, if we really develop it, you could even use it to create patient-specific resources. So articles online might be really helpful, but each person's disease or each person's problem is a little bit different. And if we're able to create really personalized educational materials very quickly, I think that would be the ultimate goal.

Ruchika Talwar: Yeah, I think the sky's the limit there, and I think over the next few years, we really are going to see the implementation of this and other aspects of AI into the way we care for patients. As we wrap up, Mihir, I'll ask you, what do you think the take-home messages are for the urologic community from your data? How should we use these findings to think about the integration of ChatGPT into patient education materials?

Mihir Shah: Yeah, so I think the first takeaway is that even the material that we, as urologists, put forward through the Urology Care Foundation exceeds the reading level. So I think we need to be very cognizant of trying to meet the patient where they are. And for a layperson, we really need to be able to explain the details at a lower grade level. And so I think that's where AI can really help us. But the challenge is going to be it's a chatbot, so it's open-ended, so what questions are appropriate to ask, and they can hallucinate, and that's well documented. And so I think having some guidelines or some baseline set of questions that they should be asked sequentially that we have now analyzed might be the best next step forward as we look to generate potentially our own material or patient-facing material where this could be helpful, where we standardize that, and so at least we know that they're not likely to end up in hallucination or a place where now we can't really trust the data they're getting.

The other thing is the data is limited currently, at least to 2021, and it's not really geared towards medical use. So if we can leverage it in the future to focus it towards medical use or urologic use, then it may even improve and improve access for our patients. And ultimately, like you alluded to, the sky's the limit, but I think starting to think about ways to standardize it and make it more patient-friendly would be the next best step.

Ruchika Talwar: Yeah, yeah, and I think as time goes on, lots of ethical considerations to look at here because with so much possibility, really we're going to find ourselves in places where, like you said, the chatbot may not be giving accurate information, or it may be superimposing or assuming certain things about a specific patient situation. So totally agree. All really important points, and thank you both for being here with us today and sharing your expertise on this topic. I know it's really relevant to our viewership, and we are excited to see what you all have in store for future work in this space. And to our audience, as always, we hope you'll continue to join us as we spotlight important health policy articles on UroToday. See you next time.