Not Yet RecruitingNot Applicable

Large Language Models Versus Human Examiners for Grading Physiotherapy Clinical Cases

Spain65 participantsStarted 2026-08-01

Plain-language summary

This study evaluates whether large language models (LLMs) can reliably assess written clinical-reasoning case examinations completed by undergraduate physiotherapy students, compared with faculty assessment. In the course "Specific Methods in Physiotherapy" (third year of the Physiotherapy Degree), students solve complex clinical cases that require clinical reasoning, technical knowledge, and therapeutic decision-making. These cases are traditionally graded by faculty, a time-consuming process that may show inter-rater variability. A set of de-identified student case examinations will be assessed using the rubric currently applied in the course, which covers clarity and structure of clinical reasoning, integration of the biopsychosocial model (ICF and APTA frameworks), accuracy in identifying pain mechanisms, coherence between diagnosis, hypotheses, and treatment, originality and depth of analysis, and professional writing. Each examination will be scored independently by three LLMs (for example, Claude, ChatGPT, and Gemini), each receiving an identical standardized prompt that embeds the same rubric, and by faculty serving as the reference standard. To avoid overloading faculty, full double human grading may not be feasible; the human reference will therefore consist of expert faculty grading by one independent rater or, when resources allow, two independent raters. In contrast, paired assessment is fully implemented across the AI models: each examination is scored by several LLMs, and each model is queried in duplicate, allowing the study to estimate agreement between models and the test-retest stability of each model. The primary aim is to quantify agreement between LLM-generated scores and the faculty reference score. Secondary aims include agreement among the LLMs, test-retest reliability of each model, criterion-level agreement, the quality and usefulness of the qualitative feedback generated, the time and cost associated with each approach, and students' perceptions of the usefulness of human versus AI feedback. The findings will clarify the strengths and limitations of LLMs as supportive tools for formative assessment in health-professions education and will inform criteria for their responsible and effective use. No LLM output will affect students' official grades, which remain the sole responsibility of faculty.

Who can participate

Age range

18 Years

Sex

ALL

See this in plain English?

AI-rewrites the medical criteria so a patient or caregiver can understand them. Always confirm with the trial site.

Inclusion Criteria: * Students officially enrolled in the course "Specific Methods in Physiotherapy" (third year of the Physiotherapy Degree) during the study period. * Submission of a completed written clinical-reasoning case examination as part of the course. * Provision of informed consent for the anonymized examination to be used for educational-research purposes. Exclusion Criteria: * Refusal to provide, or withdrawal of, informed consent. * Blank, incomplete, or non-evaluable examinations (e.g., no developed written response). * Examinations that cannot be reliably de-identified prior to assessment.

Questions worth asking your doctor

Bring these to your next appointment. They're a starting point for a shared conversation — not a sign you qualify or a recommendation to enrol.

1Based on my diagnosis and history, is this trial worth exploring for me — or is there a standard treatment we should try first?
2What does this trial's phase tell us about how much is already known about its safety and benefit?
3What would taking part actually involve for me — visits, tests, time, and travel?
4What are the known and possible risks or side effects I should weigh, and how would they be monitored?
5If this trial isn't the right fit, what other options or trials would you suggest I look into?

Generated to help you prepare — always confirm anything about your own eligibility and care with the study team and your doctor.

Questions for the trial coordinator

The trial coordinator is the person who runs the study day to day. These cover the practical side — logistics, costs, and what taking part would actually mean for your life. The study team confirms whether you meet the criteria; these are questions to ask, not a sign you qualify.

1What does taking part actually involve week to week — how many visits, where, and how long does each one take?
2What costs are covered by the study, and what might I have to pay for myself, including travel, parking, or time off work?
3What happens during screening, and what happens if the study team confirms I don't meet the criteria after those tests?
4Who pays for the scans, blood work, and other tests the trial requires — the study, my insurance, or me?
5How will being in the trial affect my regular care, and will my own doctor stay informed and involved?
6Can I leave the trial at any point if I change my mind, and what would happen to my care if I do?

A starting point for the conversation — always confirm anything about your own eligibility, costs, and care with the study team and your doctor.

What they're measuring

Agreement between LLM global scores and the faculty reference global score

Timeframe: Single cross-sectional assessment during the data-collection period (approximately 2 months)

Trial details

NCT IDNCT07677202

SponsorNeuron, Spain

Sponsor typeOTHER

Study typeOBSERVATIONAL

Primary completion2026-08-10

Contact for this trial

Alfredo Lerín Calvo, MSc

+34620187457 alfredo.lerin@lasallecampus.es

View on ClinicalTrials.gov More Educational Assessment trials