In This Round, Humans 1, AI LLMs 0

A study by AI firm Kahun found human medical professionals outperformed OpenAI’s GPT-4 and Anthropic’s Claude3-Opus language models (LLMs) in answering medical questions. After exposing the LLMs to 105,000 evidence-based medical questions from Kahun’s Knowledge Graph, both incorrectly answered around a third of the questions. The human participants were two doctors and four medical students. They outperformed the LLMs, achieving 82.3% accuracy compared to Claude3’s 64.3% and GPT-4’s 55.8%. The study concluded that although LLMs were better at semantic questions, they were not yet reliable enough for a clinical setting.
Source: medcitynews.com
- Read more