INTRODUCTION: Electronic health records (EHR) are linked together to examine disease history and to undertake research into the causes and outcomes of disease. However, the process of constructing algorithms for phenotyping (e.g., identifying disease characteristics) or health characteristics (e.g., smoker) is very time consuming and resource costly. In addition, results can vary greatly between researchers. Reusing or building on algorithms that others have created is a compelling solution to these problems. However, sharing algorithms is not a common practice and many published studies do not detail the clinical code lists used by the researchers in the disease/characteristic definition. To address these challenges, a number of centres across the world have developed health data portals which contain concept libraries (e.g., algorithms for defining concepts such as disease and characteristics) in order to facilitate disease phenotyping and health studies. OBJECTIVES: This study aims to review the literature of existing concept libraries, examine their utilities, identify the current gaps, and suggest future developments. METHODS: The five-stage framework of Arksey and O'Malley was used for the literature search. This approach included defining the research questions, identifying relevant studies through literature review, selecting eligible studies, charting and extracting data, and summarising and reporting the findings. RESULTS: This review identified seven publicly accessible Electronic Health data concept libraries which were developed in different countries including UK, USA, and Canada. The concept libraries (n = 7) investigated were either general libraries that hold phenotypes of multiple specialties (n = 4) or specialized libraries that manage only certain specialities such as rare diseases (n = 3). There were some clear differences between the general libraries such as archiving data from different electronic sources, and using a range of different types of coding systems. However, they share some clear similarities such as enabling users to upload their own code lists, and allowing users to use/download the publicly accessible code. In addition, there were some differences between the specialized libraries such as difference in ability to search, and if it was possible to use different searching queries such as simple or complex searches. Conversely, there were some similarities between the specialized libraries such as enabling users to upload their own concepts into the libraries and to show where they were published, which facilitates assessing the validity of the concepts. All the specialized libraries aimed to encourage the reuse of research methods such as lists of clinical code and/or metadata. CONCLUSION: The seven libraries identified have been developed independently and appear to replicate similar concepts but in different ways. Collaboration between similar libraries would greatly facilitate the use of these libraries for the user. The process of building code lists takes time and effort. Access to existing code lists increases consistency and accuracy of definitions across studies. Concept library developers should collaborate with each other to raise awareness of their existence and of their various functions, which could increase users' contributions to those libraries and promote their wide-ranging adoption.



Publication Date


Publication Title

Int J Popul Data Sci





First Page


Last Page


Embargo Period


Organisational Unit

School of Nursing and Midwifery


Data Collection, Electronic Health Records, Libraries, Publications, Research Report, concept libraries, linked Electronic health records, phenotype algorithms, review