In collaboration with the data management department of Semmelweis University, we created a tool prototype to automate the structuring of healthcare data and support high-level decision-making. In most cases, healthcare organizers still operate with unstructured data formats such as medical histories and discharge summaries. This creates a critical bottleneck for effective analysis. Our tool uses open-source LLMs to create a scalable and easy-to-validate solution to the challenge of unstructured data.
95%
standardization accuracy
Semmelweis University’s clinical database stores medical data transformed to the OMOP standard. However, much of this data is in free-text documents (e.g., discharge summaries, histories, physical status), making analysis ineffective.
Manual standardization is labor-intensive. There are examples of applying Large Language Models for data extraction; however, there is no out-of-the-box solution for extracting specific medical data.
With a limited scope of only a few parameters, our pilot focused on validating the methodology.
In preparation to use the source data with LLMs, we created specific prompts for efficient parameter extraction and structurization. Working with subject matter medical experts, we created a golden set for data validation, containing about 730 documents. In each case, the targeted parameters were hand-labelled.
The pilot achieved a 95% accuracy, and LLMs even recognized parameter instances that manual labeling missed. After post-processing and mapping the acquired data to the OMOP standard, it was ready for integration into the data warehouse.
AI
Healthcare
Azure
Databricks
MLFlow
Together AI
Azure
Databricks
MLFlow
Together AI
Explore more stories
Factory optimizes maintenance with AI
7.5%
REDUCTION IN PRODUCTION DOWNTIME
Pharma wholesaler boosts online orders
30%
INCREASE IN CUSTOMER PORTAL ORDERS
Financial firm integrates data & AI platform
Under 3
MONTHS FOR FULL PLATFORM INTEGRATION
Global beauty retailer transforms data platform
15x
COST SAVINGS VIA CLOUD MIGRATION
Energy provider boosts fraud detection
2x
INCREASE IN DETECTION ACCURACY
Bank accelerates payment data processing
98%
DECREASE IN CORE PROCESSING TIME
Ready for takeoff?
It's time to check in