Blog|Articles|June 17, 2026

Protected health information and the AI goldrush putting it at risk

Author(s)Denis Whelan
Fact checked by: Todd Shryock
Listen
0:00 / 0:00

Key Takeaways

  • A January breach of a telehealth clinician network exposed identifiers and medical data for 716,000 people, highlighting PHI’s value to attackers and reputational/regulatory downside.
  • HIPAA and BAAs do not permit unrestricted reuse of a provider’s patient data for AI training, nor downstream exposure to other vendors or customers.
SHOW MORE

Vibe-coded AI implementations can present serious risks to health data privacy, protection and, ultimately, to healthcare function.

The New York Times recently published an article about the entrepreneur behind Medvi, an AI-powered telehealth provider of weight-loss drugs, and the two-man startup’s projected $1.8 billion in sales this year. A few days later, AI eminence Gary Marcus took issue with what the story left out, listing a string of Medvi criticisms he felt were inadequately addressed in the NYT piece.

The story touches on a bunch of ethical and philosophical arguments around AI use that are worth discussing, but one specific point jumped out at me: A Medvi critic noted that its clinician network suffered a data breach in January this year that exposed a huge number of patient records.

Reporting from The HIPAA Journal confirms that the breach did take place, impacted 716,000 people, and that files exfiltrated included information such as names, addresses, email addresses, dates of birth and medical information. This is a troubling incident, and it raises the specter of how we handle protected health information in the age of AI.

As a provider of AI-powered technology solutions for the healthcare industry, I am very interested and invested in driving innovation with new technology. But protected health information (PHI) has been an enormous cybercrime target since the dawn of digitalization. So, I worry that “vibe-coded” AI implementations can present serious risks to health data privacy, protection and, ultimately, to healthcare function.

The current AI rush is really no different than any historical gold rush, and there are tons of entrepreneurs just like the Medvi founder who are using the tools to stake their claim. But working with AI in healthcare demands understanding the nuances of HIPAA, how AI models and PHI need to be treated in that context, and what’s at stake if you take shortcuts.

While there is no such thing as 100% cybersecurity for any data, including PHI, the truth is that there are solid standards and practices that reduce risk and defend against misuse and unauthorized access — and those standards and practices should be applied to AI function in healthcare at a foundational level.

For example, there are a lot of healthcare technology companies focused on using AI to untangle communications and unstructured information exchange via existing and entrenched channels, such as fax (yes, fax!). The basic function of such capabilities involves wrapping agentic AI around a legacy health channel to speed the flow of, say, lab or radiology results from a clinic into an EHR at a hospital without staff on both ends having to handle a bunch of manual administration throughout the process. Thus, the truly amazing capabilities of AI tackle bothersome and time-consuming bottlenecks in healthcare workflows. But because these tools deal with PHI exchange, they require some extra safeguards in order to work in a responsible and compliant way, and those safeguards should be specifically designed for healthcare settings.

Like all AI models, agentic AI requires data to perform effectively. It has to be trained on what, for example, lab results actually look like and indicate in order to do all that AI processing from input to output. But lab results are PHI. And even if a tech company has a BAA with a particular health provider, it cannot just feed that provider’s patient PHI into its AI product and use it however it wants or potentially expose that sensitive data to some other entity it may be doing business with.

The solution is to design healthcare-specific AI models and prompts that protect PHI by not using any PHI at all for model training or testing. A standard classification model can be trained on anonymized data scrubbed from all 18 HIPAA identifiers and samples can be generated in a synthetic data factory. Healthcare providers can also be supplied with their own custom AI models that have absolutely no data crossover with any of the tech company’s other customers. The healthcare provider supplies samples of anonymized and redacted documents they typically receive or think that they’ll be receiving, for example, and the tech company builds a model specific to their needs for recognizing and processing the type of information contained in those documents. The synthetic data factory uses these examples to generate a large, statistically relevant library of synthetic healthcare documents to train and hone the accuracy of the AI services.

So by segmenting customer-by-customer and anonymizing and redacting data in a HIPAA compliant way from the get go, the AI can be scaled compliantly and responsibly for healthcare use. It requires a little extra consideration and effort to achieve the technology’s benefits, but that is the assignment.

This cuts to the heart of the biggest risk people don’t seem to really understand when they’re evaluating AI use in healthcare. The very first question has to be “what data is being used and where is it going.” If you — or any of the toolsets or services you use — just throw ChatGPT or Claude or Gemini at a healthcare function, you cannot answer that question. So, you’re not actually protecting PHI or protecting your organization.

These are issues I discuss with my team and my customers often, and it hits on another core consideration in the age of AI. Bots don’t live in the real world, but we do. How we use AI in any healthcare function will ultimately impact real human lives and real healthcare outcomes — so it is important to actually work with other real humans to navigate context, compliance and risks together responsibly. The lonely prospector out in the digital wild wielding AI to whip up a billion-dollar healthcare marketing engine may not be considering what data is being used and where it is going — but he should be. So should we all.

Denis Whelan is the CEO of Documo, an AI-forward healthcare SaaS company building workflow and interoperability infrastructure for unstructured healthcare documents. Denis is focused on building the AI bridge between the document-centric world healthcare operates in today and the fully interoperable future ahead. Previously, Denis served as CEO of Projector PSA, a leading developer of cloud-based professional services automation software, where he led transformational growth at the company resulting in an acquisition by BigTime Software and Vista Equity Partners. Denis holds a BA in Business from the Isenberg School of Management at the University of Massachusetts, Amherst.