Substance use disorders (SUDs) show considerable clinical heterogeneity that limits the usefulness of traditional categorical diagnoses. This observational, cross-sectional study aims to apply an unsupervised deep learning method - an autoencoder - to learn continuous latent representations from standardised psychometric data and to explore whether those representations can help stratify clinical subpopulations. The investigators will recruit 155 adults undergoing residential treatment for SUD. Participants will complete six validated instruments assessing impulsivity (BIS-11), anger regulation (STAXI-2), behavioural activation/avoidance (BADS), borderline symptomatology (BSL-23), generalised anxiety (GAD-7), and environmental reward (EROS). Demographic and clinical variables (age, sex, primary substance, years of use, prior treatments) will also be recorded. After data cleaning and standardisation (z-scores), a symmetric autoencoder with a 12-dimensional bottleneck (architecture 21-32-24-12-24-32-21) will be trained using mean squared error loss. Regularisation includes L2 weight decay and dropout. The model will be trained 30 times with different random seeds to assess stability; the five best models (by validation pseudo-R²) will be combined into a weighted ensemble. Five-fold cross-validation will evaluate generalisation. For comparison, principal component analysis (PCA) will be applied to the same data. Gaussian mixture models (GMM) will be fitted on the latent space to explore potential clinical subgroups. The primary outcome is the stability of the latent representation (coefficient of variation of validation MSE across runs). Secondary outcomes include reconstruction performance (pseudo-R²) of the ensemble, comparison with PCA, and the interpretability of latent dimensions via correlations with original variables. GMM results will be described using BIC, silhouette width, bootstrap stability, and clinical characterisation of clusters. This study does not involve any intervention. Results will be hypothesis-generating and require external validation. No automated clinical decisions will be made.
Age range
18 Years – 60 Years
Sex
ALL
See this in plain English?
AI-rewrites the medical criteria so a patient or caregiver can understand them. Always confirm with the trial site.
Bring these to your next appointment. They're a starting point for a shared conversation — not a sign you qualify or a recommendation to enrol.
Generated to help you prepare — always confirm anything about your own eligibility and care with the study team and your doctor.
The trial coordinator is the person who runs the study day to day. These cover the practical side — logistics, costs, and what taking part would actually mean for your life. The study team confirms whether you meet the criteria; these are questions to ask, not a sign you qualify.
A starting point for the conversation — always confirm anything about your own eligibility, costs, and care with the study team and your doctor.
Latent dimension scores
Timeframe: Baseline (single assessment, cross-sectional)