New “AI doctor” predicts risk of death with 85% accuracy

Some of its predictions were better than those made by a team of doctors.

June 13, 2023

An “AI doctor” was able to make accurate predictions about patients based on medical notes doctors had written in their charts — suggesting a new way for technology to help guide healthcare.

“These results demonstrate that large language models make the development of ‘smart hospitals’ not only a possibility, but a reality,” said neurosugeon Eric K. Oermann, the study’s senior author.

AI doctor: By training AI models on troves of medical data, researchers around the world have created AI systems capable of predicting patients’ disease risks. A UK team’s AI can look at retinal scans to predict a patient’s risk of cardiovascular disease, for example, while an AI developed at Mass General can predict the risk of melanoma recurrence just by looking at pictures of the initial skin cancer.

“One thing that’s common in medicine everywhere, is physicians write notes about what they’ve seen in clinic, what they’ve discussed with patients.”
Eric K. Oermann

In a new study, published in Nature, researchers at NYU set out to see whether they could train an AI to make predictions about patients based on medical notes that doctors and nurses jot down when treating patients.

“One thing that’s common in medicine everywhere, is physicians write notes about what they’ve seen in clinic, what they’ve discussed with patients,” Oermann told AFP. “So our basic insight was, can we start with medical notes as our source of data, and then build predictive models on top of it?”

How it works: While AIs have proven capable of making predictions based on very structured data, like scans and test results, medical notes can have far more variance — two doctors treating the same patient might use different language or abbreviations, or choose to focus on different things. [These notes are all in electronic health records (EHRs), so the AIs did not have to decipher doctors’ handwriting, at least.]

To decode patterns in these notes, the NYU team built a “large language model” (LLM) called “NYUTron” — the same type of AI that powers OpenAI’s popular ChatGPT and Google’s Bard.

“Large language models make the development of ‘smart hospitals’ not only a possibility, but a reality.”
Eric K. Oermann

To train NYUTron, researchers fed it millions of medical notes written by doctors in more than 380,000 patients’ EHRs. These included progress reports, discharge instructions, observations on lab results, and more, with the final dataset totalling about 4.1 billion words.

They then fine-tuned the AI to make five predictions about a patient based on their medical notes:

Length of stay in the hospital
Risk of being readmitted within 30 days after discharge
Risk of dying in the hospital before discharge
Risk of developing a new, related health issue
Risk of having an insurance claim denied

The results: After training, NYUTron was tested against traditional formulas based on standardized data — its ability to predict readmissions, for example, was compared to that of the LACE index, which looks at factors such as the length of a patient’s current stay and how many times they’ve been hospitalized in the past six months.

NYUTron outperformed the standard models on all five counts, correctly identifying 85% of patients who would die in the hospital and 80% of those who were readmitted, compared to 78% and 75% for the traditional models, respectively. It also correctly estimated 79% of patients’ stay lengths, compared to 68% for the standard model.

“The most senior physician, who’s actually a very famous physician, he had superhuman performance, better than the model.”
Eric K. Oermann

NYUTron also beat out a group of six physicians who were tasked with predicting the readmittance likelihood for 20 patients based on discharge notes in their EHRs — the doctors’ median accuracy at the task was 62.8%, while the AI’s was 77.8%.

However, the AI wasn’t the best at predicting the risk of readmittance — a human doctor took the top spot.

“The most senior physician, who’s actually a very famous physician, he had superhuman performance, better than the model,” said Oermann. But the fact the model can do better than average, even if it’s not better than the very best, is still significant.

“The sweet spot for technology and medicine isn’t that it’s going to always deliver necessarily superhuman results, but it’s going to really bring up that baseline,” Oermann added.

Looking ahead: NYUTron has already been integrated with EHRs at NYU-affiliated hospitals throughout New York, but the researchers note the need for randomized clinical trials that compare interventions based on the AI’s predictions and traditional methods — those will confirm whether or not the system can actually improve patient outcomes.

They also warn in their paper that doctors shouldn’t over-rely on the system, noting that more research is needed to identify any unexpected potential failure points or sources of biases.

Even after that research is conducted, they say the AI should be viewed as a tool for doctors — not a replacement for them. Clinicians are still ultimately the source of the observations and judgments that make it into the notes to begin with.

We’d love to hear from you! If you have a comment about this article or if you have a tip for a future Freethink story, please email us at tips@freethink.com.