News
Article
Flatiron Health leverages AI to extract cancer progression data from EHRs, enhancing oncology research and patient care through advanced data analysis.
AI can track cancer progression via unstructured notes in the EHR: ©Infinite Flow - stock.adobe.com
Flatiron Health announced research findings showing that large language models (LLMs) can accurately and efficiently extract cancer progression data from unstructured electronic health records, potentially boosting oncology research and care.
The study, presented at a recent conference on artificial intelligence in cancer research, found that AI tools achieved F1 scores comparable to those of expert human abstractors across 14 cancer types. The LLM used in the study, provided by Anthropic, also generated nearly identical estimates of real-world progression-free survival, according to the researchers.
“AI and machine learning are fundamentally transforming how we generate and use real-world evidence in oncology,” said Stephanie Reisinger, senior vice president and general manager of real-world evidence at Flatiron Health. “This research exemplifies how Flatiron is harnessing AI and multimodal data to unlock new insights from oncology real-world data—accelerating clinical research, improving patient outcomes, and setting a new standard for evidence generation in cancer care.”
To validate the quality of the AI-extracted data, Flatiron used its VALID (Validation of Accuracy for LLM/ML-Extracted Information and Data) framework, comparing the AI’s performance to both a primary and duplicate expert human abstractor.
“Scalable, high-quality extraction of such an important and complex endpoint like progression will open new doors for novel research, predictive modeling, and more personalized patient care,” said Aaron B. Cohen, lead author and practicing oncologist at Bellevue Hospital in New York City.
Flatiron also presented work on AI fairness in data extraction, emphasizing the need for bias evaluation as AI continues to play a larger role in health care.
As artificial intelligence continues to evolve, its application in oncology has moved beyond imaging and diagnostics into the complex world of electronic health records. Traditionally, extracting useful clinical insights from unstructured EHR data—such as physician notes, pathology reports, and treatment narratives—has required manual review by trained professionals. This process is not only time-consuming and expensive but also difficult to scale.
Recent advances in natural language processing and machine learning have made it possible to automate the extraction of clinically meaningful data with a high degree of accuracy. These tools can now identify cancer progression events, treatment responses, comorbidities, and adverse events buried within unstructured text. Importantly, they do so at a speed and scale that far exceeds human capacity.
Key to these advances is the development of validation frameworks and performance metrics that allow AI-generated data to be compared directly with human-extracted data. The focus is shifting toward not only accuracy but also fairness, transparency, and reproducibility. Researchers are increasingly incorporating bias detection protocols to ensure these systems work equally well across diverse populations.
Beyond data extraction, AI is also being used to synthesize multimodal datasets—including genomic, clinical, and imaging data—to drive predictive modeling, identify candidates for clinical trials, and inform personalized treatment pathways.
As these systems mature, experts believe they will become integral to real-world evidence generation, allowing researchers and clinicians to answer complex clinical questions more quickly, identify emerging treatment patterns, and ultimately improve outcomes for cancer patients worldwide.
Stay informed and empowered with Medical Economics enewsletter, delivering expert insights, financial strategies, practice management tips and technology trends — tailored for today’s physicians.