Banner

News

Article

AI can aid diagnoses, but human touch still matters

Author(s):

Fact checked by:

Key Takeaways

  • AI excels in accuracy and consistency for factual medical questions but struggles with nuanced, context-sensitive issues, lacking empathy and adaptability.
  • Human clinicians adjust language and tone based on sensitivity, offering context and compassion, unlike AI's templated responses.
SHOW MORE

A new study finds artificial intelligence can match doctors on facts, but struggles with empathy, nuance and consistency.

AI can aid diagnoses, but human touch still matters © SOMKID - stock.adobe.com

AI can aid diagnoses, but human touch still matters © SOMKID - stock.adobe.com

Artificial intelligence (AI) can process vast amounts of medical data and deliver clinical-grade responses, but a new international study suggests it still falls short where human connection matters most.

The study, published May 27 in the Journal of Health Organization and Management, compared 7,165 medical queries answered by either AI models or human clinicians in the United States and Australia. The findings highlight where AI tools excel — and where they still need some work.

The results offer new insights for physicians, health systems leaders and policymakers exploring AI’s future role in patient care.

“This isn’t about replacing doctors and nurses,” said C. Matt Graham, Ph.D., the study’s author and associate professor of information systems and security management at the Maine Business School. “It’s about augmenting their abilities. AI can be a second set of eyes; it can help clinicians sift through mountains of data, recognize patterns and offer evidence-based recommendations in real time.”

Performance and pitfalls

Graham’s analysis found that AI-generated responses often matched expert standards for accuracy and professionalism, particularly for factual or procedural medical questions. In some cases, AI outperformed human clinicians in consistency and clarity — particularly in length and format. Most AI responses were between 400 and 475 words, regardless of query complexity.

However, when it came to more nuanced “why” or “how” questions, performance broke down. AI models sometimes failed to fully grasp the clinical context or emotional weight of sensitive cases, like mental health concerns or end-of-life discussions.

Crucially, consistency also proved fragile. Although responses were generally stable within the same session, they varied when researchers rephrased or repeated the same question later — a red flag when dealing with conditions that rely on diagnostic precision.

The human element

Graham’s team also considered tone, vocabulary and empathy in AI responses, comparing them to answers given by practicing clinicians. While doctors adapted their language and tone based on the sensitivity of the topic at hand, AI models leaned heavily on clinical jargon, sometimes alienating or confusing patients.

“Health care professionals offer healing that is grounded in human connection, through sight, touch, presence and communication — experiences that AI cannot replicate,” said Kelley Strout, Ph.D., RN, associate professor at the University of Maine’s School of Nursing, who was not involved in the study.

According to the analysis, clinicians adjusted their response lengths based on question complexity, offered more context and often expressed compassion, especially in discussions involving terminal illness or mental health. AI models, by contrast, delivered templated responses that lacked emotional resonance.

The study’s cross-country design added additional insight. In Australia’s universal health care system, patients reported greater satisfaction and lower out-of-pocket costs — roughly one-quarter of U.S. levels. Americans, meanwhile, faced longer wait times to access care, often double that of Australians.

Those systemic differences may shape how AI tools are adopted and integrated into care. In the study, Graham notes that health system, regulatory and cultural differences like these will ultimately influence how AI is received and used. He urges that AI models be trained on diverse datasets that reflect real-world variability.

The study comes amid staffing shortages in both countries. Primary care shortages in the U.S. are nothing new, but they’ve only intensified — especially in rural areas.

Maine, for example, ranks 47th in the nation for doctor-to-patient ratios, with more than 115 patients per provider, according to the Health Resources and Services Administration. A separate 2024 report from the Maine Nursing Action Coalition forecasts a shortfall of 2,800 nurses by 2030.

Ethical AI integration

AI-driven virtual assistants and patient portal tools may help extend clinicians’ reach — but experts say that scaling must be intentional, inclusive and ethically grounded.

“Technology is only one part of the solution,” said Graham. “We need regulatory standards, human oversight and inclusive datasets. Right now, most AI tools are trained on limited populations. If we’re not careful, we risk building systems that reflect and even magnify existing inequalities.”

Strout added that health care leaders should avoid repeating past mistakes. “The electronic health record (EHR), for example, was largely developed around billing models rather than patient outcomes or provider workflows,” Strout said. “As a result, EHR systems have often contributed to frustration among providers and diminished patient satisfaction. We cannot afford to repeat that history with AI.”

As AI tools continue to evolve, it’s increasingly stressed that they are best used to support — not supplant — human decision-making. The study calls for further research into ethical safeguards, clinical governance and AI model training to better reflect a diversity of patient experiences.

“Technology should enhance the humanity of medicine, not diminish it,” Graham said. “That means designing systems that support clinicians in delivering care, not replacing them altogether.”

Related Videos
Locum tenens physicians — Lisa Grabl © CHG Healthcare