
In collaboration with the data management department of Semmelweis University, we created a tool prototype to automate the structuring of healthcare data and support high-level decision-making. In most cases, healthcare organizers still operate with unstructured data formats such as medical histories and discharge summaries. This creates a critical bottleneck for effective analysis. Our tool uses open-source LLMs to create a scalable and easy-to-validate solution to the challenge of unstructured data.
95%
standardization accuracy
Semmelweis University’s clinical database stores medical data transformed to the OMOP standard. However, much of this data is in free-text documents (e.g., discharge summaries, histories, physical status), making analysis ineffective.
Manual standardization is labor-intensive. There are examples of applying Large Language Models for data extraction; however, there is no out-of-the-box solution for extracting specific medical data.
With a limited scope of only a few parameters, our pilot focused on validating the methodology.
In preparation to use the source data with LLMs, we created specific prompts for efficient parameter extraction and structurization. Working with subject matter medical experts, we created a golden set for data validation, containing about 730 documents. In each case, the targeted parameters were hand-labelled.
The pilot achieved a 95% accuracy, and LLMs even recognized parameter instances that manual labeling missed. After post-processing and mapping the acquired data to the OMOP standard, it was ready for integration into the data warehouse.
AI
Healthcare
Azure
Databricks
MLFlow
Together AI
Azure
Databricks
MLFlow
Together AI
Explore more stories

Major bank accelerates customer support
10x
SPEED TO RESULTS

Beverage retailer scales data operations
50%
REDUCTION IN OPERATIONAL COSTS

Financial firm revolutionizes analytics with AI
99%
REDUCTION IN ANALYSIS TIME & COST

Healthcare provider builds secure AI platform for 360 patient view
4
WEEKS TO PRODUCTION-READY AI APPLICATION

Manufacturer eliminates production defect
Fixed
DECADE-LONG ASSEMBLY LINE FAULT

Data monetization team scales location analytics delivery
140%
FASTER DELIVERY TIME FROM CLIENT REQUEST TO FINAL REPORT