RecruitingNot Applicable

Diagnostic Accuracy of GPT-4o and Claude 4.6 Sonnet in Turkish ED Anamnesis Notes

Turkey (Türkiye)600 participantsStarted 2026-06

Plain-language summary

This retrospective diagnostic accuracy study evaluates the ability of two large language models (LLMs) - GPT-4o (gpt-4o-2024-11-20; OpenAI) and Claude 4.6 Sonnet (claude-sonnet-4-6; Anthropic) - to generate correct diagnoses from anonymized Turkish-language emergency department (ED) anamnesis notes, and compares their performance with the diagnosis entered by the treating emergency physician. A consensus gold standard is established by three independent board-certified emergency medicine specialists who blindly review each note and vote on the primary diagnosis using ICD-10 three-character codes; the majority vote (at least 2 of 3 specialists agreeing) constitutes the reference standard. Both LLMs are evaluated using a standardized zero-shot direct prompting strategy (temperature=0, stateless API sessions). The primary outcome is diagnostic accuracy (proportion of ICD-10 chapter-level matches) and Cohen's kappa for each LLM against the gold standard. Secondary outcomes include top-3 accuracy, treating physician accuracy, inter-model agreement, and subgroup analyses by ESI triage level and ICD-10 chapter. Inter-rater reliability among the three specialists is quantified using Fleiss' kappa. Analyses are performed in Jamovi. This study represents the first evaluation of LLM diagnostic accuracy using Turkish-language clinical notes and the first to benchmark LLM performance against an independent three-specialist majority-vote gold standard rather than against the treating physician's own diagnosis.

Who can participate

Age range

18 Years

Sex

ALL

See this in plain English?

AI-rewrites the medical criteria so a patient or caregiver can understand them. Always confirm with the trial site.

INCLUSION CRITERIA: * Adult patients (aged 18 years and older) presenting to the emergency department. * Complete electronic health record available in the hospital information system (HBYS) containing a detailed anamnesis note with chief complaint, symptom duration, associated symptoms, and relevant medical history. * A definitive primary diagnosis recorded by the treating emergency physician using ICD-10 codes at the time of patient file closure. EXCLUSION CRITERIA: * Emergency department anamnesis notes containing fewer than 50 words or completely lacking substantive clinical content\[cite: 1\]. * Pediatric cases (age under 18 years)\[cite: 1\]. * Patients critically ill and triaged to high-acuity resuscitation areas (Emergency Severity Index \[ESI\] level 1)\[cite: 1\]. * Clinical notes containing residual identifying information that cannot be fully de-identified, preventing compliance with data privacy regulations\[cite: 1\]. * Non-independent clinical notes consisting solely of a brief cross-reference to a prior hospital visit without a new history entry\[cite: 1\].

Questions worth asking your doctor

Bring these to your next appointment. They're a starting point for a shared conversation — not a sign you qualify or a recommendation to enrol.

1Based on my diagnosis and history, is this trial worth exploring for me — or is there a standard treatment we should try first?
2What does this trial's phase tell us about how much is already known about its safety and benefit?
3What would taking part actually involve for me — visits, tests, time, and travel?
4What are the known and possible risks or side effects I should weigh, and how would they be monitored?
5If this trial isn't the right fit, what other options or trials would you suggest I look into?

Generated to help you prepare — always confirm anything about your own eligibility and care with the study team and your doctor.

Questions for the trial coordinator

The trial coordinator is the person who runs the study day to day. These cover the practical side — logistics, costs, and what taking part would actually mean for your life. The study team confirms whether you meet the criteria; these are questions to ask, not a sign you qualify.

1What does taking part actually involve week to week — how many visits, where, and how long does each one take?
2What costs are covered by the study, and what might I have to pay for myself, including travel, parking, or time off work?
3What happens during screening, and what happens if the study team confirms I don't meet the criteria after those tests?
4Who pays for the scans, blood work, and other tests the trial requires — the study, my insurance, or me?
5How will being in the trial affect my regular care, and will my own doctor stay informed and involved?
6Can I leave the trial at any point if I change my mind, and what would happen to my care if I do?

A starting point for the conversation — always confirm anything about your own eligibility, costs, and care with the study team and your doctor.

What they're measuring

Diagnostic Accuracy of GPT-4o for ICD-10 Chapter-Level Diagnosis

Timeframe: At the time of single-session algorithmic evaluation (each case evaluated once following data extraction in June 2026).

Diagnostic Accuracy of Claude 4.6 Sonnet for ICD-10 Chapter-Level Diagnosis

Timeframe: At the time of single-session algorithmic evaluation (each case evaluated once following data extraction in June 2026).

Trial details

NCT IDNCT07632859

SponsorMarmara University Pendik Training and Research Hospital

Sponsor typeOTHER

Study typeOBSERVATIONAL

Primary completion2026-07

Contact for this trial

Emir Ünal, Assistant Professor

+905327766010 emirunal@gmail.com

View on ClinicalTrials.gov More Emergency Medicine trials

Plain-language summary

Who can participate

Age range

18 Years

Sex

ALL

See this in plain English?

AI-rewrites the medical criteria so a patient or caregiver can understand them. Always confirm with the trial site.

What they're measuring

Diagnostic Accuracy of GPT-4o for ICD-10 Chapter-Level Diagnosis

Timeframe: At the time of single-session algorithmic evaluation (each case evaluated once following data extraction in June 2026).

Diagnostic Accuracy of Claude 4.6 Sonnet for ICD-10 Chapter-Level Diagnosis

Timeframe: At the time of single-session algorithmic evaluation (each case evaluated once following data extraction in June 2026).