Methods of Usability Testing in the Development of eHealth Applications: A Scoping Review

Background: The number of eHealth applications has exponentially increased in recent years, with over 325,000 health apps now available on all major app stores. This is in addition to other eHealth applications available on other platforms such as PC software, web sites and even gaming consoles. As with other digital applications, usability is one of the key factors in the successful implementation of eHealth apps. Reviews of the literature on empirical methods of usability testing in eHealth were last published in 2015. In the context of an exponentially increasing rate of App development year on year, an updated review is warranted. Objective: To identify, explore, and summarize the current methods used in the usability testing of eHealth applications. Methods: A scoping review was conducted on literature available from April 2014 up to October 2017. Four databases were searched. Literature was considered for inclusion if it was (1) focused on an eHealth application (which includes websites, PC software, smartphone and tablet applications), (2) provided information about usability of the application, (3) provided empirical results of the usability testing, (4) a full or short paper (not an abstract) published in English after March 2014. We then extracted data pertaining to the usability evaluation processes described in the selected studies. Results: 133 articles met the inclusion criteria. The methods used for usability testing, in decreasing order of frequency were: questionnaires (n=105), task completion (n=57), ‘Think-Aloud’ (n=45), interviews (n=37), heuristic testing (n=18) and focus groups (n=13). Majority of the studies used one (n=45) or two (n=46) methods of testing. The rest used a combination of three (n=30) or four (n=12) methods of testing usability. None of the studies used automated mechanisms to test usability. The System Usability Scale (SUS) was the most frequently used questionnaire (n=44). The ten most frequent health conditions or diseases where eHealth apps were being evaluated for usability were the following: mental health (n=12), cancer (n=10), nutrition (n=10), child health (n=9), diabetes (n=9), telemedicine (n=8), cardiovascular disease (n=6), HIV (n=4), health information systems (n=4) and smoking (n=4). Further iterations of the app were reported in a minority of the studies (n=41). The use of the ‘Think-Aloud’ (Pearson Chi-squared test: χ2=11.15, p< 0.05) and heuristic walkthrough (Pearson Chi-squared test: χ2=4.48, p< 0.05) were significantly associated with at least one further iteration of the app being developed. Conclusion: Although there has been an exponential increase in the number of eHealth apps, the number of studies that have been published that report the results of usability testing on these apps has not increased at an equivalent rate. The number of digital health applications that publish their usability evaluation results remains only a small fraction. Questionnaires are the most prevalent method of evaluating usability in eHealth applications, which provide an overall measure of usability but do not pinpoint the problems that need to be addressed. Qualitative methods may be more useful in this regard. The use of multiple evaluation methods has increased. Automated methods such as eye tracking have not gained traction in evaluating health apps. Further research is needed into which methods are best suited for the different types of eHealth applications, according to their target users and the health conditions being addressed.


Introduction
eHealth is emerging as a key sector for delivering health in UK.There are government calls to enable this, and funding is being made available for national and regional programmes to expand the use of eHealth.This is outlined in the National Health Service (NHS) Five Year Forward View, which aims to put together "An expanding set of NHS accredited health apps that patients will be able to use to organise and manage their own health and care" [1].In the recently released NHS Long Term Plan, one of the stated aims is for digitally enabled care to go mainstream across the NHS.This includes working with the wider NHS, the voluntary sector, developers, and individuals in creating a range of apps to support particular conditions [2].There has also been simultaneous phenomenal growth in the eHealth application market.A recent report stated that there were over 3.7 billion downloads of mobile health applications in 2017, an increase of 16% from the year before.There were 325,000 health apps (health & fitness and medical apps) available on all major app stores, with, 78,000 new health apps have been added to major app stores in 2017 alone [3].However, fitting digital solutions onto health problems is not an easy task.Attempts to scale up digital health implementations from pilots and demonstrators have proven to be difficult or in some cases, unsuccessful [4][5][6].
According to a report published by the Institute of Medicine, "usability and health literacy strategies should guide the development of mHealth apps" [7].Usability has been identified as a key component of good practice in the development of digital applications [8], and a number of published standards have identified usability as an essential criteria for the assessment of digital applications in health, such as the NHS Digital Assessment Questionnaire [9], the guidance from the Medicines and Healthcare products Regulatory Agency [10], the Organisation for the Review of Care and Health Applications [11], and Our Mobile Health [12].The International Organization for Standardization has defined usability as, "the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use" [13].Usability becomes a vital factor in the adoption of digital health applications, as the people who need to use them may have problems when using mobile devices due to their health conditions [8].There is a need to ensure that health technologies are appropriately designed and targeted to the end-users' needs before they are used as health interventions [14].This can be achieved by applying robust methods of evaluation to ensure good usability.Conducting usability evaluation on eHealth applications will have enormous value for patient benefit, as better usability can lead to a number of benefits, including improved productivity, enhanced user well-being, avoidance of stress, increased accessibility and reduced risk of harm, which is stated in the International Standards Organization standard for Ergonomics of Human Computer Interaction (ISO 9241-210) [15].Another benefit would be greater acceptance, as clinicians' acceptance of and attitudes towards EHR systems have been shown to relate closely to system usability [16][17][18].
In 2014, Zapata et al reviewed empirical usability methods for mobile applications, in health, analysing 22 studies [8].They identified several areas for further research including; (a) a combination of two or more different types of usability methods, (b) automation of usability evaluation methods, (c) adoption of iterative usability evaluation processes and (d) validation of the reliability of the evaluation methods employed.At the time of that review, the number of medical applications in app stores was estimated at 28,000 (20,000 iOS and 8,000 Android).Since then, the number of available health apps has increased more than tenfold.It is very likely that the health conditions they address, the publication channels for usability studies, and perhaps the types of usability evaluation methods employed have changed or broadened.Thus, it is time to re-investigate how usability testing methods for eHealth applications are described in the literature published since April 2014.
The aim of this study was to identify, explore, and summarise the current state of the literature on usability testing of eHealth applications since 2014 through a scoping review.We chose to do a scoping review as our aim is to map the literature or evidence rather than seek to answer a specific question by only looking for the best available information, as defined in the Joanna Briggs Institute reviewers' manual 2015: Methodology for JBI scoping reviews [19].This is similar to the mapping studies in software engineering described by Kitchenham [20].We used the following research questions to guide our review: (1) What is the current state of the literature that addresses usability testing for developing eHealth applications?(2) What are the usability testing methods that are being used in the development of eHealth applications?(3) What health conditions / diseases are being addressed by the apps that employ usability testing?(4) What types of people are being recruited to be the participants in the usability tests?(4) How has the number of published studies regarding usability testing of eHealth applications changed over time?(5) What are the types of journals where usability evaluations of eHealth applications are reported?And (6) How many of the published studies employed an iterative development method?
The inclusion of a usability section in the NHS Digital Assessment Questionnaire for apps seeking to be included in the NHS apps library is proof that usability evaluation is a crucial part for acceptance of eHealth apps into the healthcare system.Thus, knowledge about the proper use of the methods of usability testing will be useful for developers, commissioners, healthcare professionals, patient participation groups and other researchers.It will provide an overview of the state of usability evaluation in eHealth as reported in the literature.This can then provide a guide for developers, as well as inform the other eHealth stakeholders about the methods used for usability evaluation.Our main aim is to investigate what academia has contributed to the employment of usability evaluation methods in the development of eHealth.We are conscious that a lot of eHealth app development is done outside of academia -but in the context of increasing policy standards, governance bodies are looking for evidencebased standards, with peer-reviewed evidence, being the gold standard [21].This review of the peer-reviewed literature will provide a baseline for that process.

Materials and methods
We selected a systematic scoping review as the method, keeping in mind that our aim was to map the literature on usability testing in eHealth since 2014.

Information sources
We examined a variety of information sources, searching four electronic databases from medicine, nursing, allied health, computer and engineering sciences: The Association for Computing Machinery Digital Library (ACM DL), the Cumulative Index to Nursing and Allied Health Literature (CINAHL), the Institute of Electrical and Electronics Engineers (IEEE) Xplore Digital Library, and Medline / PubMed.

Search strategy
The search strategy was developed by one of the authors (IM).The general search terms were eHealth, mHealth and usability.Searches were conducted between the months of June 2017 and November 2017 for articles up to 31 October 2017.
For Medline / PubMed, the following search string was used: (("telemedicine"[MeSH Terms] OR "telemedicine"[All Fields] OR "ehealth"[All Fields]) AND usability[All Fields]) OR (("telemedicine"[MeSH Terms] OR "telemedicine"[All Fields] OR "mhealth"[All Fields]) AND usability[All Fields]) For all other databases, this search string was employed: (ehealth OR mhealth or telemedicine) AND (usability) Inclusion and exclusion criteria Any literature about eHealth applications using empirical methods of usability testing published between April 2014 and October 2017 was considered.Literature was considered for inclusion if it met the following criteria: IC1.The paper is focused on an eHealth application, which includes websites, PC software, smartphone and tablet applications.
IC2.The paper provides information about usability of the application.
IC3.The paper provides empirical results.
IC4.The paper evaluates an application for final users, not just a wireframe or low fidelity prototype.
IC5.The paper must be a full or short paper (not an abstract).
Studies were excluded if: EC1.The paper is not written in English.
EC2.The paper was published after October 2017 or before April 2014 EC3.The paper evaluates a medical device, or smart device, a service, a smartphone or tablet feature (not an app), or software that is not an eHealth application.
Patients / Carers were participants in the usability testing of 99 of the studies.HCP's were involved as participants in 44 of the studies and heuristic experts in 19.Twenty studies had both Patients / Carers and HCP's as participants, while the combination of Patients / Carers and heuristic experts were involved in 9 studies.Five studies utilised both HCPs and heuristic experts, and three of the studies involved Patients / Carers, HCPs and heuristic experts in usability testing.In studies that used only one class of tester, 73 were Patients / Carers only, 22 were tested only by HCPs, and eight had only heuristic experts as participants.
Only 25.95 % (n=34) of the studies cited a reference as their justification for the number of participants.
The number of participants varied according to the type of testing used, whether heuristic evaluation, qualitative, quantitative or multi-modal.The number of participants according to the type of testing is shown in the following table: Studies which only used heuristic methods had the least number of participants, and studies which only used quantitative methods had the greatest number of participants.
Timeline and publication channels for reports of usability testing of eHealth applications Figure 6 shows the number of articles published for each of the years included in the search, Figure 6: Timeline of publication of usability studies.
NB. 2018 was cited in the database as the publication year for 2 of the articles, although they were available online in 2017.Figure 7 shows the types of journals that the articles were published in according to the year of publication: Figure 7: Types of journals that usability studies were published.
As can be seen in the graphs, most of the selected papers were published in 2016 (n=59) and 2017 (n=57).
Health Informatics journals were the main publication channel in the selected literature, accounting for 65% (n=86) of the selected articles.Other publication channels were medical journals, allied health, computer science, and engineering journals.The table showing all the publication types is shown here:

Iterative model of development
We wanted to see if any of the articles mentioned the development of further iterations of the app as a result of the usability testing, as the iterative approach is cited as an important component of health intervention development [15,16] .We found that 41 out 131 (31.3%) of the studies reported that at least one further iteration of the app was developed following the results of the usability testing.
We performed a Chi-squared test of association using 2x2 tables to see if iterative development was associated with the type of usability testing done.The use of the Think Aloud protocol and Heuristic testing were significantly associated with a report of further iterative development, whereas questionnaires, task completion, interviews and focus groups were not associated with a report of another iteration of the app.The

Key findings
Findings in this scoping review suggest that together with the rapid growth of the number of eHealth applications, the number of studies that report the usability testing findings in eHealth app development is likewise increasing.Twenty-two studies were included in the review for the period of 2010-2014 when there were 28,000 health apps on the app stores [8].For the years 2014-2017, the number of studies that reported the results of usability testing has increased to 131, a six-fold increase, while the number of apps has grown more than 10 times, with 325,000 reported in 2017 [3].The increase in the number of published usability studies has grown at a slower rate than the number of digital health applications available.It should be noted that most digital health applications are found in commercial "app stores" such as those of Apple and Google, and are developed by commercial developers, rather than the academe.This sector does not normally publish results of their usability studies, which they may view as giving away a competitive advantage.It also illustrates an apparent non-involvement of academia in this rapidly growing area.
The health conditions / clinical areas being targeted by the apps have also expanded, with 13 being reported in 2014, whereas we found 48 distinct clinical areas being addressed by the apps in the selected literature.The clinical areas being addressed by digital health applications has clearly expanded.
However, the methods being used to test usability have remained unchanged since 2014.Despite the recommendation of a previous review [8] to utilise more objective and automated methods of usability testing, none of the selected studies used these methods, for example eye tracking and remote monitoring.A few of the studies used transmitted logs to record simple things like number of times the app was used and task completion.While these automated methods are well reported and utilised in other domains [33], there may be factors such as cost of equipment (e.g.eye trackers) that make the adoption of these methods prohibitive to developers, especially small to medium enterprises (SMEs).There may also be factors unique to the healthcare domain that inhibit the adoption of these methods.
Patients and caregivers are very much involved in usability testing, accounting for the largest proportion of participants.Health care professionals are also involved in the testing, mostly when the app is made for use by the health care professional, but also in cases where patient entered data is meant to be reviewed by the health care professional.The need to test both users is being recognised in these cases.However, the sample size is not being given much attention in these studies, as only a quarter of them reported a reference to validate the choice of a sample size.Studies where heuristic experts were the participant constituted only a small proportion of the selected literature, indicating a shift towards a more patient-centred approach to eHealth app development.This reflects recent calls for a more participatory design approach to eHealth application development, as well as the adoption of iterative methods [34].
Most of the selected literature were published in 2016 and 2017, coinciding with the growth in the eHealth App market as well as in the increase in the number of channels for publication of these type of studies.Health Informatics journals, which have increased in number in past few years, were the publication channel for most of the selected articles.In addition to the health informatics journals, allied health and medical journals were the second most employed publication channel.In contrast to the review 2014, computer science and engineering journals were in the minority with respect to publication channels, although this may have been affected by the choice of the databases searched in 2014 (i.e., the non-inclusion of Medline / PubMed and CINAHL).It may be useful for future usability studies of eHealth applications to be submitted to Human Computer Interaction and User Experience journals to improve awareness an increase uptake of more robust and object methods of usability testing.
Iterative design has been recognised as the key to enabling rapid development of successful products, using usability data to remove human factors as a barrier to success.Iterative development is the means of accommodating the life cycle of a product in an ever changing market [35,36].Yet, the number of studies where the usability testing results were used to create another iteration of the app were in the minority accounting for less than a third of the included studies.The use of an iterative development strategy was seen in 41 out of the 131 papers reviewed (31.3%).This is very similar to the proportion of papers found in a previous review [8], where 7 out of 22 (31.8%)studies used an iterative development strategy.It may be that the majority of the apps were in the final stages of development and some iteration had already taken place prior to the study being reported, or the initial iteration of the application already had good usability.We noted that the Think Aloud protocol and the heuristic walkthrough were significantly associated with iterative development, however various factors including study aims, previous work and other factors taken in context would have influenced the choice of the evaluation method.
Gaps and potential for future research We see several areas that have a potential for opportunities and the need for future research.The use of objective automated methods of evaluating usability has already been established in other domains.Further research is needed to find ways to employ these methods, such as eye tracking and remote monitoring in the development of eHealth applications.There are also other automated methods that may potentially be useful, such as electroencephalogram (EEG) headsets [37], which can record brainwave patterns associated with attention, interest, relaxation and other mental states whilst evaluating the app.If validated, this could be useful objective measure of app usability.Eye-tracking is another automated method that was cited in a previous review [8] that is potentially useful in evaluating the usability of Ehealth applications.At least one recent study has already started exploring the validity of eye-tracking in the evaluation of Ehealth applications [38].
Validation of sample size estimates would also contribute to more efficient use of resources in usability investigations.In the selected studies, only 25.95% cited a reference to justify their sample size.Often, only one method such as questionnaires, was used in the evaluation because of finite resources when the investigators want as large a sample as possible to improve validity.However, a large sample size is wasteful if a smaller sample size is sufficient to ensure validity.The smaller sample size could then be used with more cycles of testing, giving a more complete picture of what is needed to improve the usability of the app.
We also found that the manner of reporting user experience evaluation lacks uniformity, making it difficult to compare results.Some studies merely reported that their participants found the applications to be usable, whereas others reported the scores using validated instruments such as the System Usability Scale.The use of many types of questionnaires, some validated and some that were not validated, also made the comparison of results across studies very difficult.
As new types of health apps and new platforms for them are developed, then new methods of usability testing will need to evolve.For example, there has been a growth in the number of health apps developed for the smart speaker platform, such as the Amazon Echo and Google Home product line [39][40][41].These apps, which use voice recognition, will have to use different methods to assess their usability.Further research needs to be done to develop usability testing methods for these platforms.
As noted earlier, most digital health applications are developed in the commercial sector rather than the academe, and that this sector rarely publishes in the academic literature.There is scope for further research into the methods of usability evaluation employed by eHealth developers in the commercial sector, using the methodologies found in the work of Eshet, who conducted interviews and surveys amongst IT professionals [42,43].This new research would give a more complete picture of the methods used by eHealth developers for usability evaluation.
Commissioning bodies will be looking for evidence of effectiveness for digital health applications, and in response to the need for guidance, the National Institute for Health and Care Excellence (NICE) has published an Evidence Standards Framework for Digital Health Technologies (DHT) [44].In the framework, user experience falls under the Acceptability portion for the Tier 1 level of evidence, where the minimum accepted level of evidence is being able to show relevant user involvement in the design, development and testing of the DHT as well as user satisfaction data, to a "best practice standard" of publicly available or published evidence of user involvement and user satisfaction.Thus, there will now be an onus on DHT developers to publish the results of their user experience evaluations, as they will be required to submit these as evidence when seeking to have their

Implications
The findings of this scoping review provide an update to the field and highlight the fact that while the number of available digital health applications has greatly increased, the proportion of these applications that report the results of their usability experience research in peer reviewed publications has not increased and has in fact decreased slightly.The methods that were used three years ago are still being used but there are obvious areas for further research: to both evaluate these approaches and/or to develop / test new approaches to usability evaluation.Patient participation groups would also want to know how involved patients are in the development and testing process of eHealth apps.Researchers who are looking for new areas to do usability research in will find new opportunities in sample size validation, and in evolving ways of testing new platforms for eHealth apps such as smart speakers and virtual reality.Finally, as this was a scoping review of usability testing methods in eHealth applications, there is room to further qualitatively explore the underlying themes revealed by user experience studies of digital health technologies, as well as scope for further quantitative work.

Limitations
One limitation of this review is the exclusion of articles not published in English.This is common in scoping reviews, but we may have missed some relevant papers, especially for apps that are not published in English.We noted however, that some foreign language apps, such as in Korean, Chinese, Spanish, etc., were included in the review.Another limitation is that a lot of the apps in the app stores are not developed by academics and their developers do not report the findings of their usability tests in the academic literature.We would like to see in the future more information sharing from the developers of eHealth applications with regards to their usability testing methods, without necessarily giving away trade secrets.As mentioned previously, mixed-methods research with eHealth developers [42,43] may be useful in this regard.

Conclusions
This scoping review gives a descriptive map of the literature on the methods used for usability testing of eHealth apps since 2014.This is a rapidly expanding area, seeing a tenfold increase in the number of eHealth apps in just three years, and yet the number of articles has not expanded accordingly, and the proportion of published literature has even decreased.There are still gaps in the research that need to be addressed, especially as commissioning bodies who wish to deploy digital health applications as part of services are demanding evaluation evidence as a prerequisite to deployment.As eHealth becomes increasingly relied upon to help deliver efficient and effective health care, there must be assurance that eHealth apps are usable, effective and fit for purpose.

Table 4 :
Number of participants according to type of usability evaluation method.

Table 6 :
Usability method and iterative development.

Table A1 :
Description of Included Studies (Alphabetically by Author name)