This single-center, retrospective, observational study aims to construct a standardized benchmark evaluation system for intelligent breast ultrasound image interpretation and to systematically assess the diagnostic performance of current mainstream multimodal artificial intelligence (AI) models. De-identified B-mode breast ultrasound images with confirmed pathological diagnoses will be retrospectively collected from the institutional archive (2018-2025) and supplemented with images from published open-access datasets. Expert radiologists with varying experience levels will independently annotate all images according to the American College of Radiology (ACR) Breast Imaging Reporting and Data System (BI-RADS) v2025 criteria, including glandular tissue composition, lesion characterization (mass vs. non-mass lesion), morphological descriptors, and final BI-RADS classification. Baseline deep learning models (CNN-based ResNet-50 and Transformer-based USFM) will be trained to establish performance baselines and to stratify cases by diagnostic difficulty through cross-architecture consensus. Multiple multimodal large language models (MLLMs), including both general-purpose and medical-domain models, will then be evaluated via standardized API calls using BI-RADS-guided chain-of-thought prompts at temperature 0 for reproducibility. Primary endpoints include BI-RADS classification accuracy and diagnostic AUC for benign-malignant differentiation. Model robustness and safety will be assessed through out-of-distribution rejection testing, temperature-stability experiments, and thinking-mode ablation studies. This study adheres to the FLAIR and TRIPOD-LLM reporting guidelines.
Age range
18 Years – 75 Years
Sex
FEMALE
See this in plain English?
AI-rewrites the medical criteria so a patient or caregiver can understand them. Always confirm with the trial site.
Bring these to your next appointment. They're a starting point for a shared conversation — not a sign you qualify or a recommendation to enrol.
Generated to help you prepare — always confirm anything about your own eligibility and care with the study team and your doctor.
The trial coordinator is the person who runs the study day to day. These cover the practical side — logistics, costs, and what taking part would actually mean for your life. The study team confirms whether you meet the criteria; these are questions to ask, not a sign you qualify.
A starting point for the conversation — always confirm anything about your own eligibility, costs, and care with the study team and your doctor.
Diagnostic Accuracy for Pathological Diagnosis
Timeframe: At study completion, approximately 12 months
BI-RADS Classification Accuracy
Timeframe: At study completion, approximately 12 months