An Analysis of Large Language Models in the HealthCare Domain

Name: Omobolanle Oladeji
Major: Computer Science
Advisor: Kowshik Bhowmik

Most Timely Research Award

The purpose of this study is to analyze the use of Large Language Models (LLMs) for the task of question-answering in a medical context. This is vital as healthcare workers are constantly being overworked due to inadequate resources and an overwhelmingly large number of patients. We explore the possibility of Conversational Agents or chatbots, as a tool to answer common medical questions, especially in the context of developing countries. The methodology of this research paper is informed by an analysis of Natural Language Processing, particularly with Neural Networks and Transformers. We design an Artificially Intelligent Conversational Agent using Google’s BERT, Microsoft’s DialoGPT, and Google’s T5 language models. We evaluate these models on the metrics of BLEU score and Perplexity and supplement them with a survey to establish user preference. We also develop a web-based application for users to test the models in a real-world setting. We try to improve the user-preferred model by integrating it with a heuristic-based model and connecting its context to a medical corpus. Then, we discuss the results of our analysis especially concerning its potential use in developing countries. Though our results indicate great potential, we find that our models contain bias and are capable of misinforming users. Finally, the thesis concludes with a discussion on the limits of LLMs and recommendations for making these models inclusive.

Posted in Comments Enabled, Independent Study, Symposium 2023 on April 14, 2023.

4 responses to “An Analysis of Large Language Models in the HealthCare Domain”

  1. Saralee says:

    This is super cool research. Congratulations!!!

  2. Anjolaoluwa Olubusi says:

    Great IS. Got a few questions.
    1.) Why did you pick DialoGPT, BERT and T5 to train?
    2.) How difficult was it to train the LLMs?
    3.) How complex of a question could the LLMs answer? Were there questions the models could not answer?

  3. Bolanle says:

    Thank you so much Saralee!

  4. Bolanle says:

    Anjola: Thank you!

    1. I picked DialoGPT, BERT, and T5 to train because they were open source models. They also represent each category of the three major types of models that most Transformer-based models fall into (encoder only, decoder only, encoder-decoder).

    2. It was not difficult in terms of putting the code together, however it was very computationally heavy and expensive to train these models. They also took a lot of time (> 20 hours on average).

    3. They could answer some complex questions in the medical realm, some times I was surprised by some of the answers. However, because my model is not trained on the most current diseases, they could not answer questions on CoVid-19 for example.

    If you have any more questions, please do not hesitate to contact me by email or Instagram. Thank you for the thoughtful questions!