ORCID

Abstract

BackgroundLesbian, gay, bisexual, transgender, queer and related community (LGBTQ+) individuals have significantly increased risk for mental health problems. However, research on inequalities in LGBTQ+ mental healthcare is limited because LGBTQ+ status is usually only contained in unstructured, free-text sections of electronic health records.AimsThis study investigated whether natural language processing (NLP), specifically the large language model, Bi-directional Encoder Representations from Transformers (BERT), can identify LGBTQ+ status from this unstructured text in mental health records.MethodUsing electronic health records from a large mental healthcare provider in south London, UK, relevant search terms were identified and a random sample of 10 000 strings extracted. Each string contained 100 characters either side of a search term. A BERT model was trained to classify LGBTQ+ status.ResultsAmong 10 000 annotations, 14% (1449) confirmed LGBTQ+ status while 86% (8551) did not. These other categories included LGBTQ+ negative status, irrelevant annotations and unclear cases. The final BERT model, tested on 2000 annotations, achieved a precision of 0.95 (95% CI 0.93–0.98), a recall of 0.93 (95% CI 0.91–0.96) and an F1 score of 0.94 (95% CI 0.92–0.97).ConclusionLGBTQ+ status can be determined using this NLP application with a high success rate. The NLP application produced through this work has opened up mental health records to a variety of research questions involving LGBTQ+ status, and should be explored further. Additional work should aim to extend what has been done here by developing an application that can distinguish between different LGBTQ+ groups to examine inequalities between these groups.

Publication Date

2025-10-13

Publication Title

BJPsych Open

ISSN

2056-4724

Acceptance Date

2025-08-27

Deposit Date

2025-08-28

Funding

This paper represents independent research part funded by the NIHR Maudsley Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King’s College London. R.S. is additionally part-funded by (a) the National Institute for Health Research (NIHR) Applied Research Collaboration South London (NIHR ARC South London) at King’s College Hospital NHS Foundation Trust; (b) UKRI – Medical Research Council (MRC) through the DATAMIND HDR UK Mental Health Data Hub (MRC reference no. MR/W014386); and (c) the UK Prevention Research Partnership (Violence, Health and Society, no. MR-VO49879/1), an initiative funded by UK Research and Innovation Councils, the Department of Health and Social Care (England) and the UK devolved administrations, and leading health research charities. A.R. is part-funded by (a) UKRI – Medical Research Council through the DATAMIND HDR UK Mental Health Data Hub (MRC reference no. MR/W014386) and RE-STAR: Regulating Emotions – Strengthening Adolescent Resilience (MRC reference no. MR/W002493/1); and (b) the UK Prevention Research Partnership (Violence, Health and Society, no. MR-VO49879/1), an initiative funded by UK Research and Innovation Councils, the Department of Health and Social Care (England) and the UK devolved administrations, and leading health research charities. The views expressed are those of the author(s) and are not necessarily those of the NHS, the NIHR or the Department of Health and Social Care. M.H. reports funding from NIHR and Maudsley Charity.

Share

COinS