CompletedNot Applicable

The Diagnostic and Triage Capacity of Laypeople-large Language Model Collaboration in China

China6,360 participantsStarted 2025-04-27

Plain-language summary

The goal of this randomized controlled trial is to evaluate the role of large language models in enhancing laypeople's ability to self-diagnose and triage common diseases. The main questions it aims to answer are: * Does using an LLM help participants make more accurate self-diagnoses and care decisions for common illnesses, compared to their first guess without any help? * How much better is it when people work together with an LLM, compared to using a regular search engine, using the LLM alone, or how doctors would decide? Researchers will compare participants who were randomly assigned to either the LLM group (using DeepSeek) or the search engine group to see if the LLM-assisted approach leads to better clinical judgments. Participants will: * Read one of 48 short, realistic health vignettes; * Make an initial guess about what might be wrong by listing up to three possible causes, ranked from most to least likely, and choose a care level: seek immediate care, see a doctor within one day, see a doctor within one week, or manage at home without medical care. * Use their assigned tool (either DeepSeek or a standard search engine) to look up information and update their guess and care decision; * Submit their final diagnosis and care choice after using the tool. In addition, the study team evaluated the performance of four other AI models (GPT-4o, GPT-o1, DeepSeek-v3, and DeepSeek-r1) and 33 experienced general physicians on the same vignettes.

Who can participate

Age range18 Years

SexALL

See this in plain English?

AI-rewrites the medical criteria so a patient or caregiver can understand them. Always confirm with the trial site.

Inclusion Criteria: * Age 18 years or older * Current resident of mainland China * History of high-quality participation in online surveys on Credamo platform (historical survey acceptance rate ≥ 80% and personal credit score ≥ 70) Exclusion Criteria: * Incomplete survey responses * Failure on embedded quality-check items * Implausibly short completion time (\<180 seconds for search engine group; \<360 seconds for LLM group) * Provision of non-diagnostic or irrelevant responses (e.g., "unknown", "don't know") * Consistent pattern of identical responses across all items

What they're measuring

Top-3 Diagnostic Accuracy

Timeframe: Immediately after intervention (within the same survey session)

Triage Accuracy (4-class exact match)

Timeframe: Immediately after intervention (within the same survey session)

Trial details

NCT IDNCT07250516

SponsorHuazhong University of Science and Technology

Sponsor typeOTHER

Study typeINTERVENTIONAL

Primary completion2025-07-01