University automates healthcare document processing with an AI tool

Document Processing with AI Tool

In collaboration with the data management department of Semmelweis University, we created a tool prototype to automate the structuring of healthcare data and support high-level decision-making. In most cases, healthcare organizers still operate with unstructured data formats such as medical histories and discharge summaries. This creates a critical bottleneck for effective analysis. Our tool uses open-source LLMs to create a scalable and easy-to-validate solution to the challenge of unstructured data.

95%

standardization accuracy

Challenge

Semmelweis University’s clinical database stores medical data transformed to the OMOP standard. However, much of this data is in free-text documents (e.g., discharge summaries, histories, physical status), making analysis ineffective.

Manual standardization is labor-intensive. There are examples of applying Large Language Models for data extraction; however, there is no out-of-the-box solution for extracting specific medical data.

With a limited scope of only a few parameters, our pilot focused on validating the methodology.

Solution

In preparation to use the source data with LLMs, we created specific prompts for efficient parameter extraction and structurization. Working with subject matter medical experts, we created a golden set for data validation, containing about 730 documents. In each case, the targeted parameters were hand-labelled.

The pilot achieved a 95% accuracy, and LLMs even recognized parameter instances that manual labeling missed. After post-processing and mapping the acquired data to the OMOP standard, it was ready for integration into the data warehouse.

Service

AI

Industries

Healthcare

Technologies

Azure

Databricks

MLFlow

Together AI

Ready for takeoff?

It's time to check in