Episode 1: LLMs for Automated Medical Coding
Replacing Medical Coders in under 10 lines of Code
Hey Hackers!
Ok, I’ll admit it, LLMs aren’t ready to replace medical coders just yet, but I am stoked about what I could accomplish in under 30 minutes with a few lines of code.
In this first ever installment of Health Tech Hacks, we’re going to talk about medical coding and how LLMs can be used to automatically pull codes from doctor’s notes.
What is coding? Other than being a pain in the butt, medical coding allows us to standardize how we talk about diagnoses and procedures done in the healthcare system. In some cases, it helps automate claims approval or denial (even to a fault).
With OpenAI out of the box, a lot of medical coding can be automated. Although I’m not a medical coder myself, I’m hoping to provide a good starting point for interested health tech hackers to build some really cool products.
Access all code here: https://github.com/alearjun/healthtechhacks/
For those of you new to coding, I’ve included a brief overview of medical coding below, but feel free to click here and skip (and here’s a great article on medical billing from OOP Health).
A Quick Overview of Medical Coding
Medical coding is the process of translating medical diagnoses, procedures, and services into standardized codes used by healthcare providers, insurance companies, and government agencies. These codes serve as a standardized language that facilitates communication, billing, and reimbursement for medical services. The two most widely used coding systems in the United States are the International Classification of Diseases, Tenth Revision (ICD-10) and the Current Procedural Terminology (CPT) codes.
ICD-10 Codes
The International Classification of Diseases, Tenth Revision (ICD-10) is a globally recognized system used for classifying and coding medical diagnoses. Developed by the World Health Organization (WHO), ICD-10 codes are alphanumeric and describe a patient's illness or injury, along with any underlying conditions or symptoms. These codes are used by healthcare providers to document diagnoses in medical records and by insurance companies to determine coverage and reimbursement levels.
ICD-10 codes consist of a combination of letters and numbers, with the first character being an alphabetical letter followed by a series of digits. The codes can be quite specific, providing detailed information on the patient's condition and its severity, location, and other factors.
CPT Codes
Current Procedural Terminology (CPT) codes are a standardized system of medical codes developed by the American Medical Association (AMA) to describe medical procedures, services, and treatments provided by healthcare professionals. CPT codes are used by healthcare providers for billing and reimbursement purposes, as well as for statistical analysis and quality improvement initiatives.
CPT codes consist of five-digit numbers and are organized into three categories:
Category I: These codes describe procedures and services performed by healthcare providers, such as surgeries, diagnostic tests, and medical consultations.
Category II: These codes are used for tracking performance measures and quality improvement initiatives. They are not used for billing purposes.
Category III: These codes represent emerging technologies, services, and procedures that may not yet have sufficient data to support widespread adoption. They are used for tracking and reporting purposes.
Usage in Claims
Both ICD-10 and CPT codes are essential in the medical billing process. Healthcare providers use these codes to create claims that detail the services provided to a patient during a visit or procedure. The claims are then submitted to insurance companies, which use the codes to determine the reimbursement amounts for the services provided.
By using standardized codes, the medical billing process becomes more efficient and accurate, reducing the likelihood of errors and improving communication between healthcare providers and insurance companies. Additionally, these codes help to streamline data collection for statistical analysis and quality improvement initiatives.
Medical Coding is a Bore
Medical coding is a time-consuming and intricate process that requires a thorough understanding of medical terminology, anatomy, physiology, and the various coding systems (such as ICD-10 and CPT). The complexity of medical coding can be attributed to several factors:
Detailed documentation: Medical coders must carefully review and interpret complex medical records, including physician notes, laboratory results, and diagnostic findings, to accurately identify the diagnoses and procedures performed. This requires meticulous attention to detail and a deep understanding of medical concepts.
Specificity of codes: Both ICD-10 and CPT codes can be highly specific, with thousands of codes available to describe various conditions, procedures, and services. Medical coders must select the most accurate codes to ensure proper reimbursement and avoid claim denials or delays.
Constant updates: Medical coding systems are continually updated to accommodate new diagnoses, procedures, and advances in medical technology. Coders must stay current with these changes to ensure accurate coding and compliance with industry standards.
Compliance and regulatory requirements: Medical coders must adhere to strict guidelines and regulations, including those set forth by the Health Insurance Portability and Accountability Act (HIPAA) and other governing bodies. Failure to comply with these regulations can result in significant financial penalties and legal ramifications.
Error prevention and quality assurance: Inaccurate coding can lead to claim denials, delayed payments, and potential audits. Medical coders must double-check their work and maintain a high level of accuracy to prevent errors and ensure timely reimbursement for healthcare providers.
Due to these complexities, medical coding requires extensive training and experience, and it can be quite time-consuming. In many healthcare settings, medical coders work full-time, spending hours each day reviewing medical records and assigning the appropriate codes. However, the time and effort involved in medical coding also highlight the need for more efficient and automated solutions to help streamline the process and reduce the burden on healthcare professionals.
Automating Medical Coding with OpenAI
You can’t spend a day in the tech world without hearing about OpenAI and GPT4. It’s already revolutionized a wide range of fields and has fueled the next big thing in startups (just look at YC’s most recent batch).
In healthcare, there are still some questions about how and when to use LLMs, as well as the HIPAA compliance involved when using them. But given their immense power, they’ll soon be ubiquitously used across the healthcare industry.
As described above, a piece of the claims submission process is medical coding, but the process can be time consuming and expensive for healthcare systems. Let’s take a stab at automating insurance coding for physician dictations.
Overview: Converting your Dr’s voice to code
In this issue of Health Hack, our focus will be on taking a recording of a physician’s dictation of a procedure and determining which CPT and ICD-10 codes apply. We’ll start with transcribing the audio file, which many software already do for providers (e.g. Dragon from Nuance). We’ll then prompt OpenAI with the transcription to pull the ICD-10 and CPT codes related to the visit, and return those for the user to review.
Converting the audio file
As I mentioned above, lots of software can transcribe audio files. I’ve chosen to use OpenAI simply to show the range of capabilities that it offers with one platform, but you could use any transcription product you’d like, e.g. Amazon Comprehend, GCP speech-to-text, etc.
Let’s define a function that takes the audio file path as its input:
def mp3_to_text(file_path):
Then, we’ll read the file and use OpenAI’s Audio model to transcribe the audio file. Currently, the only model available via API is “whisper-1”:
with open(file_path, "rb") as f:
response = openai.Audio.transcribe("whisper-1", file=f)
Finally, we’ll return the transcript from the API response:
transcript = response['text']
return transcript
Now we have the transcript, we can use OpenAI completions to get the codes!
Using OpenAI’s Completions for Medical Coding
Out of the box, OpenAI is very capable at providing medical codes from provider notes (it was trained on billions of data points, so that seems to make sense).
We’ll start again by defining a function:
def get_codes(transcript):
Now comes a bit of a fun part, prompt engineering to get the best results. Since the highest model that completions works with is GPT-3.5 (text-davinci-003), I used ChatGPT to test some prompts and get responses. Starting broad, I wanted to see how good of a response it would give if I was a healthcare noob and just asked if for “Dr Codes”:
It wasn’t bad, but it didn’t let me know which codes were ICD or CPT codes, and it was definitely wrong, as there was no new patient visit (the last code provided). Let’s try again.
In my second try, I added the specificity, which got me better results I was satisfied with (for now at least):
Using this prompt, we can write the code for OpenAI’s completions API and obtain a response:
prompt = f"Given the following doctor's dictation transcript, identify the ICD-10 and CPT codes:\n\n{transcript}\n\nICD-10 and CPT Codes:"
response = openai.Completion.create(
engine="text-davinci-003",
prompt=prompt,
max_tokens=1000,
n=1,
stop=None,
temperature=0.5,
)
We’ll format the response a little:
codes_text = response.choices[0].text.strip()
And return the codes in an array:
codes_table = codes_text.split("\n")
return codes_table
Running the script
I won’t cover this in depth, but if you want to run the functions locally, you can write a main function to process a file and print the results:
def main():
audio_file_path = "neurosurgery_dictation.mp3"
transcript = mp3_to_text(audio_file_path)
codes_table = get_codes(transcript)
print("\nICD-10 and CPT Codes Table:")
print(codes_table)
if __name__ == "__main__":
main()
Testing the script
I had a hard time finding audio files of dictations, so I made my own by screen reading a transcription of a neurosurgery I found online. Running the script on the audio file originally gave me a limited result, but was resolved after I increased max tokens.
Running the script with some quick python formatting gave the following output:
ICD-10: M48.06 - Spondylosis with myelopathy, cervical region
CPT: 22845 - Arthrodesis, anterior interbody technique (eg, discectomy, corpectomy, laminectomy, fusion), including minimal discectomy to prepare interspace (other than for decompression); cervical below C2 (with or without interbody device)
Awesome! But the issue is that the codes aren’t actually correct. The right ICD-10 code should be M48.02 for cervical region, whereas M48.06 is for the lumbar region. My hypothesis for this is the fact that lumbar spinal stenosis is much more common than cervical, so when the model is determining the next word, it sees lumbar as having a higher likelihood than cervical. I’ll continue to play around with prompt engineering to see if we can get a more accurate response, but I’ll take the ability to recognize spinal stenosis (ICD M48) as a small win!
A few limitations to think about
I’m not a medical coder, but these two codes look pretty accurate. If you are a coder, let me know how well you think the script performed!
When using OpenAI or other LLMs in this way, there are a few considerations (among many others) I’d want to stress.
Accuracy
As there’s not an abundance of medical transcriptions available, it’s quite reasonable to think that training your own models could achieve higher accuracy than OpenAI out of the box. This is certainly demonstrated by GPT-3.5’s mix up with codes in the above example. I tried the same with GPT-4 using chat completions instead of completions, and it gave the same error in returning the .06 code.
In addition, LLMs are known for result variability, which means that each time you ask the same question, you may receive different results, which could result in errors in billing that affect some people, but not others. A human-in-the-loop approach is definitely best when it comes to people’s healthcare.
Training data
While OpenAI was trained on billions of data points, it was only trained up to June 2021 in this model’s case, so new code changes won’t be recognized and will likely be missed. For example, over 1,000 code changes were added to the ICD codes in 2022, and the AMA releases new CPT codes each year.
Medical Coders won’t be replaced by AI just yet
I hope this overview gave some exciting insight into how technology can be utilized to streamline healthcare, but it’s clear to me that the role of the medical coder won’t be subsumed by LLM models given their limitations on healthcare data. Better training data to maximize accuracy can be used to build improved models with the support of coders to maximize efficiency within the healthcare infrastructure.
But it’s still awesome to see what we could do in a few lines of code.
interesting and very informative. I am trying to see if we can customize this for each doctor/practice so its more accurate based on all past notes + billing codes as training data.
What do you think? Will that be better?