[ad_1]
The patient was a 39-year-old woman who visited the emergency room at Beth Israel Deaconess Medical Center in Boston. His left knee had been hurting him for several days. The day before, she had had a fever of 102 degrees. It was gone now, but she still had chills. And her knee was red and swollen.
What was the diagnosis?
On a recent steamy Friday, medical resident Dr. Megan Landon presented this real-life case to a room full of medical students and residents. They’ve been brought together to learn a skill that can be devilishly difficult to teach: how to think like a doctor.
“Doctors are terrible at teaching other doctors how we think,” said Dr. Adam Rodman, internist, medical historian and event organizer at Beth Israel Deaconess.
But this time, they could call on an expert to help them make a diagnosis: GPT-4, the latest version of a chatbot published by the OpenAI company.
Artificial intelligence is transforming many aspects of the practice of medicine, and some healthcare professionals are using these tools to aid in diagnosis. Doctors at Beth Israel Deaconess, a teaching hospital affiliated with Harvard Medical School, decided to explore how chatbots could be used — and misused — in training future doctors.
Instructors like Dr. Rodman hope medical students can turn to GPT-4 and other chatbots for something akin to what doctors call a curbside consultation — when they brush off a colleague and ask for their opinion on a difficult case. The idea is to use a chatbot in the same way doctors look to others for suggestions and ideas.
For more than a century, doctors have been portrayed as detectives who piece together clues and use them to find the culprit. But experienced doctors actually use a different method – pattern recognition – to figure out what’s wrong. In medicine, it’s called a disease storyline: signs, symptoms, and test results that doctors piece together to tell a cohesive story based on similar cases they know or have seen themselves.
If the disease scenario doesn’t help, Dr. Rodman said, doctors turn to other strategies, such as assigning probabilities to various diagnoses that might fit.
Researchers have tried for more than half a century to design computer programs to make medical diagnoses, but nothing has really succeeded.
Doctors say GPT-4 is different. “It will create something that is remarkably similar to an illness scenario,” Dr. Rodman said. In that way, he added, “it’s fundamentally different from a search engine.”
Dr. Rodman and other physicians at Beth Israel Deaconess asked GPT-4 for possible diagnoses in difficult cases. In a study published last month in the medical journal JAMA, they found he performed better than most doctors in the weekly diagnostic challenges published in the New England Journal of Medicine.
But, they learned, there is an art to using the program, and there are pitfalls.
Dr. Christopher Smith, director of the medical center’s internal medicine residency program, said medical students and residents are “definitely using it.” But, he added, “whether they learn something is an open question.”
The problem is that they could rely on AI for diagnostics the same way they would rely on a calculator on their phone to solve a math problem. This, Dr. Smith said, is dangerous.
Learning, he says, involves trying to understand things: “That’s how we remember things. Part of learning is the struggle. If you outsource learning to GPT, this struggle is over.
At the meeting, the students and residents broke into groups and tried to figure out what was wrong with the swollen knee patient. They then turned to GPT-4.
The groups tried different approaches.
One of them used GPT-4 to search the internet, the same way one would use Google. The chatbot spat out a list of possible diagnoses, including trauma. But when asked by band members to explain his reasoning, the bot was disappointing, explaining his choice by saying, “Trauma is a common cause of knee injury.”
Another group brainstormed possible hypotheses and asked GPT-4 to check them. The chatbot’s list aligns with that of the group: infections, including Lyme disease; arthritis, including gout, a type of arthritis that involves crystals in the joints; and trauma.
GPT-4 added rheumatoid arthritis to the top possibilities, although it was not high on the group’s list. Gout, the instructors later told the group, was unlikely for this patient because she was young and female. And rheumatoid arthritis could probably be ruled out because only one joint was inflamed, and for only a few days.
As a curbside consultation, GPT-4 seemed to pass the test, or at least agree with students and residents. But in this exercise, he didn’t offer any ideas or disease scenarios.
One reason could be that students and residents were using the bot more as a search engine than a curbside lookup.
To use the bot correctly, the instructors said, they should start by telling the GPT-4 something like, “You’re a doctor and you see a 39-year-old woman with knee pain.” Then they should list its symptoms before asking for a diagnosis and asking about the bot’s reasoning, as they would with a medical colleague.
This, the instructors said, is a way to harness the power of the GPT-4. But it’s also crucial to recognize that chatbots can make mistakes and “hallucinate” – provide answers that have no basis in fact. Its use requires knowing when it is incorrect.
“It’s not wrong to use these tools,” said Dr. Byron Crowe, an internal medicine physician at the hospital. “You just have to use them the right way.”
He gave the group an analogy.
“Pilots use GPS,” Dr Crowe said. But, he added, the airlines “have a very high level of reliability”. In medicine, he said, the use of chatbots “is very tempting,” but the same high standards should apply.
“He’s a great thinking partner, but that’s no substitute for deep mental expertise,” he said.
At the end of the session, the instructors revealed the real reason for the patient’s swollen knee.
It turned out to be a possibility that each group had considered and that GPT-4 had proposed.
She had Lyme disease.
Olivia Allison contributed reporting.
[ad_2]
Source link