ORCID

Abstract

Background:Electronic health care databases are widely used for epidemiological studies. However, they may contain inactive records of individuals no longer participating in the health care system. These inactive records create a methodological challenge as they systematically appear as unexposed with no recorded outcomes. Given the widespread health care system engagement during the COVID-19 pandemic, the English National Health Service (NHS), which hosts a national pandemic planning and research dataset with linkage to COVID-19 vaccination and emergency care data, makes it an ideal setting to identify the extent of overrepresentation due to inactive health care records and assess ways to mitigate them.Objective:The objective of this study is to report any differences between the general practitioner–registered adult population size based on health care records compared to census estimates for England and to apply methodology that could be used to correct for such differences.Methods:We compared the number of adult patients within the General Practice Extraction Service Data for Pandemic Planning and Research (GDPPR) with a valid general practitioner registration as of 1st October 2021, with estimates published by the Office for National Statistics (ONS) for the English population. We used an approach adapted from a weighting method to correct for non-response bias in surveys and down-weighted individuals with no evidence of recent activity in their records.Results:There were 61,194,033 registered NHS patients (in the GDPPR) compared with 56,550,138 in the ONS census-based population. De-duplication on NHS number reduced the population to 57,876,641, including 46,835,968 adults, with the biggest overrepresented group aged 30‐45 years. Of the 46,835,986, 1,121,954 (2.4%) individuals had their initial weights down-weighted due to non-engagement with the health care system since January 2019. The down-weighting removed most of the differences between NHS and ONS populations.Conclusions:There are notable differences in the adult population size as per GDPPR when compared to census estimates. While the overall population size in the GDPPR data was seen to be inflated when compared to ONS census estimates, this was differential with respect to sociodemographic variables. A weighting-based approach can be applied to correct for the inflated denominator. Not correcting for it in large health care datasets, including the English NHS data, could introduce selection bias in epidemiological studies.

Publication Date

2025-10-27

Publication Title

JMIR Public Health and Surveillance

Volume

11

Acceptance Date

2025-07-31

Deposit Date

2025-10-28

Keywords

Adolescent, Adult, Aged, COVID-19/epidemiology, Censuses, Cohort Studies, Databases, Factual/statistics & numerical data, England/epidemiology, Female, Humans, Male, Middle Aged, State Medicine/statistics & numerical data, Young Adult

First Page

64788

Last Page

64788

Share

COinS