Every day, patients send hundreds of thousands of messages to their doctors through MyChart, a communications platform that is almost universally used in U.S. hospitals.
They describe their pain, share symptoms like the texture of a rash or the color of their stool, and trust the doctor on the other end of the phone to offer advice.
But increasingly, replies to these messages are not written by doctors, at least not in their entirety. About 15,000 physicians and assistants across more than 150 health systems are using MyChart’s new artificial intelligence capabilities to craft these responses.
Many patients who receive these replies don’t know they were written with the help of artificial intelligence: In interviews, officials at several health systems that use MyChart’s tools acknowledged that they don’t disclose that the messages contain AI-generated content.
The trend troubles experts, who worry that doctors may not be careful enough to spot potentially dangerous errors in medically significant messages created by AI.
In health care, where AI has been used primarily for administrative tasks like keeping track of appointments and challenging insurance denials, critics worry that widespread adoption of MyChart’s tools is infiltrating clinical decision-making and the doctor-patient relationship.
The tool can already prompt individual doctors to write in their own voice, but it doesn’t always produce the correct answers.
“The selling point for doctors was that it would save them time and give them more time to talk to their patients,” says Atmeya Jayaram, a researcher at the Hastings Center for Bioethics in Garrison, New York.
“In this case, they’re using generative AI to save time talking to patients.”
During the peak of the pandemic, when in-person visits were often reserved for the most seriously ill patients, many people turned to MyChart messaging as a valuable way to communicate directly with their doctors.
Healthcare providers didn’t realize there was a problem until years later: Even after most aspects of health care had returned to normal, they were still inundated with messages from patients.
Already overworked physicians were suddenly responding to patient messages during their lunch breaks and early evenings, and hospital leaders worried that unless they found a way to reduce this extra (and largely non-billable) work, patient messages would become a major cause of physician burnout.
So when Epic, the software giant that makes MyChart, began offering a new tool in early 2023 that uses AI to generate responses, some of the nation’s largest academic medical centers were eager to adopt it.
Instead of starting each message with a blank screen, doctors can see an automatically generated response over the patient’s question, which is created using a medical privacy law-compliant version of GPT-4 (the technology that ChatGPT is based on).
The MyChart tool, called “In Basket Art,” extracts context from a patient’s previous messages and information from the electronic medical record, such as medication lists, and creates a draft that the provider can approve or modify.
The hope was that by putting doctors in an editor-like role, the health system could process patient messages more quickly and with less mental energy.
This is partly borne out by early research, which shows that while art does indeed reduce burnout and cognitive load, it doesn’t necessarily save you time.
The AI tools will be accessible to hundreds of clinicians at UC San Diego Health, more than 100 health care workers at UW Health in Wisconsin, and all licensed clinicians, including doctors, nurses and pharmacists in Stanford Health Care’s primary care practices.
Dozens of physicians at Northwestern Health, NYU Langone Health and University of North Carolina Health are piloting ART, and leaders are considering expanding it more broadly.
In the absence of strong federal regulations or a widely accepted ethical framework, each health system will decide how to test the safety of the tools and whether to inform patients about their use.
Some hospital systems, like UC San Diego Health, include a disclosure statement at the bottom of each message explaining that the message was “automatically generated” and then reviewed and edited by physicians.
“Personally, I don’t see any downside to being transparent,” said Dr. Christopher Longhurst, the health system’s chief clinical and innovation officer.
Patients have generally been receptive to the new technology, he said. (One doctor emailed me to say, “I want to be the first to congratulate you on the introduction of AI Copilot and be the first to send an AI-generated patient message.”)
Other systems, including Stanford Health Care, UW Health, UNC Health and NYU Langone Health, have determined that informing patients would do more harm than good.
Some administrators worry that doctors will see the disclaimers as an excuse to send messages without properly vetting patients, said Dr. Brian Patterson, physician administrative director for clinical AI at UW Health.
And telling patients that a message contains AI content could devalue the clinical advice, even if a doctor recommends it, said Dr. Paul Testa, chief medical information officer at NYU Langone Health.
For Dr. Jayaram, the decision to disclose the use of the tool comes down to a simple question: what do patients expect?
When patients send messages about their health, they assume that their doctor will take into account their medical history, treatment preferences, family dynamics and other intangible factors that come from a long-standing relationship, he said.
“When you read the doctor’s report, you’re reading it in the doctor’s voice,” he says. “If patients find out that the messages they’re receiving from their doctor are actually generated by an AI, they’re going to feel betrayed.”
For many health systems, creating algorithms that convincingly mimic the “voice” of a particular physician would help make the tool more useful. Indeed, Epic recently gave the tool more access to past messages, allowing drafts to mimic each physician’s individual writing style.
Brent Lamb, UNC Health’s chief information officer, said this is in response to common complaints he hears from doctors, such as, “They don’t have my personal voice” or, “I’ve known this patient for seven years. They’re going to realize it’s not me.”
Healthcare administrators often refer to Art as a low-risk use of AI because ideally, healthcare providers would always read the draft and correct any errors.
That portrayal has frustrated researchers who study the relationship between artificial intelligence and humans. Ken Holstein, a professor at Carnegie Mellon University’s Human-Computer Interaction Institute, said the portrayal “runs counter to about 50 years of research.”
He said humans have a well-known tendency to accept algorithmic recommendations even when they contradict our own expertise, known as automation bias, which could lead doctors to be less critical when reviewing AI-generated recommendations, potentially leading to dangerous errors being passed on to patients.
And art is not immune to error: A recent study found that seven of 116 AI-created drafts contained so-called hallucinations, a type of fabrication that the technology is notorious for producing.
Dr. Vinay Reddy, a family medicine physician at UNC Health, recalled an instance where a patient messaged a colleague to see if she needed the Hepatitis B vaccine.
The AI-generated draft confidently assured patients they had been vaccinated and even provided the date of vaccination — a complete mistake, he said, and one that occurred because the model didn’t have access to patients’ vaccination records.
A small study published in The Lancet Digital Health found that GPT-4, the same AI model that underpins Epic’s tool, made more subtle errors when answering questions from fictitious patients.
Doctors who reviewed the responses found that there was about a 7% chance of serious harm if the draft was left unedited.
What’s reassuring to Dr. Eric Poon, Duke Health’s chief medical information officer, is that the model produces drafts of “moderate quality,” which he believes will allow doctors to remain skeptical and vigilant in order to catch errors.
On average, fewer than a third of AI-generated drafts are sent to patients unedited, signaling to hospital administrators that doctors haven’t approved of the message, Epic said.
“The question in the back of my mind is, what happens as the technology gets more advanced,” he says. “What happens when clinicians start to let their guard down? Will errors slip through?”
Epic built guardrails into the program to prevent Art from giving clinical advice, said Garrett Adams, the company’s vice president of research and development.
Adams said the tool is perfect for answering common administrative questions such as, “When is my appointment?” or “Can I reschedule my medical exam?”
But researchers have yet to develop a way to reliably get models to follow instructions, Dr. Holstein said.
Dr. Anand Choudhury, who oversaw the rollout of ART at Duke Health, said he and his colleagues repeatedly adjusted the tool’s instructions to stop it giving clinical advice, but with little effect.
“No matter how hard I tried, I couldn’t get rid of that instinct to help,” he said.
Three medical institutions told The New York Times they had removed some guardrails from their instructions.
Dr. Longhurst of the University of California, San Diego Health Department said the model “performed better” when they removed the line instructing Art to “not respond with clinical information.” Administrators felt comfortable giving the AI more freedom, since doctors would review the messages.
Dr. Christopher Sharp, chief medical information officer at Stanford Health Care, said he took a “managed risk” to allow Art to “think more like a clinician” after some of the strictest guidelines seemed to make the draft general and unhelpful.
Beyond issues of safety and transparency, some bioethicists have a more fundamental concern: Do we want to use AI in healthcare in this way?
Unlike many other AI healthcare tools, Art isn’t designed to improve clinical outcomes (though one study suggests responses may be more empathetic and positive), and it’s not strictly aimed at administrative tasks.
Rather, AI seems to be stepping in at the rare moments when patients and doctors can actually communicate directly, moments that technology should be able to enable, said Daniel Schiff, co-director of the Governance and Responsible AI Lab at Purdue University.
“Even if it were perfect, would you want to automate one of the few ways we still interact?”