Building and Developing a valid method to study adaptive behaviours with regard to IEQ in primary schools

Adaptive behaviour impacts the classroom's environment and the student's comfort. Therefore, a deep under- standing of students' adaptive behaviour is required. This study aims to develop a valid and reliable method to realize how children in their late middle childhood (9 – 11) practise adaptive behaviours as a response to the classroom's Indoor Environmental Quality (IEQ). A self-reported questionnaire accompanied with an observation form is designed based on children's ‘ here and now ’ sensations, their cognitive and linguistic competence. Validity and reliability of the questionnaire were tested by running pilot and ﬁ eld studies in eight primary schools from July 2017 to May 2018. Through transverse sampling, 805 children were observed, and 1390 questionnaires were collected in 31 classrooms. Questions and responses of the designed questionnaire were validated by monitoring answer-process, non- participant observations, cross-checking questions and statistical tests. Validating process improved the wording of the questions and response categories and resulted in a questionnaire with a high and valid response rate. The reliability of the questionnaire was tested by measuring the variability and standard deviations of responses under similar conditions. To conclude, the study introduces a questionnaire and an observation form that should be used together to provide a valid and reliable method for studying adaptive behaviour of primary school children.


Introduction
Children spend a quarter of their waking life in classrooms [1]. Poor classroom environments affect children's health [2] and academic performance [3,4]. Hence, concern over the environmental quality of primary school classrooms is growing [5]. Adaptive behaviours of school occupants affect the classroom environment and children's comfort [6][7][8]. Therefore, adaptive behaviours should be facilitated in schools to achieve higher levels of comfort for children [9]. The UK National Adaptation Programme 2018 (NAP) proposes a change in occupant behaviour to mitigate the risk of overheating due to global warming [10]. It is vital to gain a deeper knowledge of children's behaviour as a response to Indoor Environment Quality (IEQ) in schools.
It is important to collect information on children's opinions and behaviours directly from them rather than proxy reporting as society is becoming more interested in and concerned with children's rights [11,12]. However, ambiguity in children's questionnaires decreases response quality, especially when survey forms do not suit children's cognitive and linguistic competence [11]. Therefore, designing valid and reliable questionnaires is more vital for children than that for adults [13]. Methods for engaging children to obtain their views about the school environment can be found in the following studies [2,[14][15][16]. However, questionnaire surveys are the main research technique to study their adaptive behaviours [17], especially when considering personal adaptive behaviours such as adjusting clothing, posture and activity that cannot be easily measured by sensors.
Several studies have developed questionnaires to record children's perception of environment and their clothing level [18][19][20][21][22][23][24]. One of the first studies on children's adaptive behaviour is the study by Humphreys in the 1970s, who develops a self-reported questionnaire consisting of thermal comfort rating scale and clothing checklist [25]. Kwok and Chun (2003) design a questionnaire by presenting sketch drawing of clothing items to help students identify their clothing quicker [21]. Haddad et al. (2012) add custom designed cartoon illustrations to verbal descriptions of thermal sensations to improve the clarity of the rating scale [23]. Teli, Despoina et al. (2012) ask children's thermal sensation, preference, overall comfort and state of the jumper by pictorial illustration and using colours [18]. Fabbri (2013Fabbri ( , 2015 evaluates thermal comfort of children aged 4-5 by using "pedagogical approach" in which thermal comfort was debated with children through using https://doi.org/10.1016/j.buildenv.2019.02.018 Received 10 November 2018; Received in revised form 9 February 2019; Accepted 12 February 2019 references related to ideas from school programs, i.e. "it is freezing cold or sizzling hot" [24,26]. De Dear, R et al. (2015) question clothing level by creating twelve clothing ensembles based on the combination of school uniform garments [22].  ask children's state of the jumper and also their primary behaviours when they feel hot through an open-ended question [27]. Kim and De Dear (2018) ask student's general adaptive strategy when feeling discomfort, 'What could you do to feel more comfortable?' [20].
Although clothing behaviour is asked in above questionnaires, not much is applicable to both personal and environmental adaptive behaviours toward overall comfort. Therefore, this study adds behavioural questions to sensational questions based on children's cognition for realizing how children adjust themselves or the environment to reach comfort. This study proposes a valid and reliable questionnaire to record adaptive behaviour of primary school children with relation to indoor environmental quality.

Methodology
Surveying children can lead to distinctive methodological complexities, therefore, data quality should be improved by paying special attention to questionnaire structure and pretesting it [12]. Pretesting, preferably by following a pilot study, is an important part of survey development, especially when little is known about the survey population [28]. To demonstrate rigour of research findings and achieve good quality outcomes, reliability and validity are considered two important indicators [12,[28][29][30].

Validity
Validity describes the closeness of what we measure to what we intend to measure or to the concept it claims to measure [12,28,[30][31][32]. It also describes respondents' understanding of what was asked and studies if the data obtained truly reflects what is under investigation [33]. External validity describes the ability to apply the findings of the research to other studies with confidence [30], i.e. results should be generalizable [34]. To maximise the generalizability of the results, it is important to achieve high response rates [35], that is another indicator of the quality of responses [12,28]. Internal validity addresses the reasons for the outcome of the study [30] with three main approaches: content, construct and criterion validity [30][31][32]. Content validity measures the degree to which the content of a questionnaire adequately reflects the intended concept [30][31][32]. Construct validity shows the relationship between the concepts under study and the related hypothesis [30][31][32], i.e. the relation between variables and factors conformed to what might be expected [36]. Criterion validity is established if a tool can be compared to other similar related measures of the same concept [30,31], that is not applicable to this study.
Factors affecting validity of questionnaires include design of the questionnaire, sampling, non-intentional errors in responding (due to misunderstanding of the questions, difficulties in remembering, lack of knowledge or time) and intentional errors (due to non-confidentiality) [33]. Methods that are used to validate questionnaire data include interviews, observations, instrument monitoring [37] and cognitive pretests [11]. Cognitive pre-tests discover which questions or wordings are problematic and why, discover sources of misunderstanding and confusion, and suggest solutions for improving the questionnaire [11]. What concerns this study in terms of validity is a) Whether children understand what is being asked to identify wording and concept-related problems and b) Whether questions, responses and scales truly reflect what is under investigation and c) Whether invalid responses can be removed to provide more robust findings.

Reliability
Reliability means reproducibility of results [33] or when repeated administration of the test gives similar results [38], with the assumption that nothing has changed [30]. To measure internal consistency, relationship between all the results obtained from a single questionnaire should be studied [30] and responses should be consistent across constructs [31]. Cronbach's alpha is the most common measure for internal consistency to determine if the scale is reliable [31], however, it is mostly applicable when multiple Likert questions form a scale to evaluate one topic, for example, job satisfaction. In this study, the questionnaire asks different questions on demographic, behavioural and sensational information of the respondents. Therefore, Cronbach's alpha test does not provide an appropriate method to measure internal consistency of this questionnaire. Another way to measure internal consistency is test and re-test correlations to investigate if votes are stable over time [31]. As environmental conditions and adaptive behaviours change, test and re-test correlations do not account for reliability of results in this questionnaire. Considering that reliability also estimates individual differences [39], what concerns this study in terms of reliability is within-test variations of children's votes for an individual question under similar environmental conditions. The variability can be calculated by Standard Deviation (SD). A low SD indicates that the data is clustered around the mean, and a high SD shows that the data is widely spread over a wider range of values [39].

Methodological framework
The methodological framework consisted of four stages: Designing, Running, Testing and Developing. 1) Designing a questionnaire for 9-11 years old children based on their cognitive and linguistic competence by reviewing relevant literature 2) Running designed questionnaire in pilot study and then main field studies 3) Testing quality of questions and responses by evaluating reliability and validity 4) Developing a valid and reliable questionnaire and consequently a method to study both personal and environmental behaviours of primary school children.
Reliability is tested by calculating standard deviations of the votes for a set of questions that are filled out under similar conditions. The validity of questions and responses is tested by below methods: a) Cognitive pre-tests (monitoring answer-process) help revealing children's interpretations of questions and their misunderstandings of wording. A cognitive pre-test is usually written in a report by suggesting and explaining question amendments [28]. b) Observational forms are used to observe controls, children and behaviours for cross-checking with questionnaire results. Observation is used to identify factors that are difficult to measure or explain [40]. c) Cross-checking questions with each other is applied to remove invalid responses. d) Statistical tests provide evidence of construct validity for the responses by testing correlation between variables. Statistical analyses are performed using SPSS 25 software [41].

Analysis
competence are studied to design an appropriate questionnaire in terms of wording, mode of administrating, number of questions, scale of responses and layout.

Respondents
Primary school children can describe their perception, tell their own viewpoint, and structure their memory [11,24]. Therefore, 7-11 years old children can answer to a structured questionnaire. However, children in their late middle childhood (9)(10)(11) compared to children in their early middle childhood are chosen for the scope of this study for five main reasons: • Development of language and literacy skills [12].
• The ability to think productively and evaluate facts [24].
• Development of attention span [24]. • Increase in data quality and consistency of findings [12].
Children in their early middle childhood have a higher tendency to please and gain social desirability 1 [11,12], therefore, there is a risk that they reply to questions to please the researcher or teacher.
• Type of Questions: It is important to draw attention to the logical order of questions to get valid and reliable responses [11,24,28,37,38]. There are two types of questions in this study, factual and non-factual questions. Factual questions ask about facts and have true responses. Non-factual questions ask for opinions or attitudes, and there is no such thing as true attitude or perception [38]. In this study, questions on gender, way of commuting to school and adaptive behaviours that are considered 'factual' are asked first, followed by 'non-factual questions' on sensations and preferences.
• Wording of Questionnaire: As capacity, speed and processing time of memory are still developing in middle childhood, it is better to use simple words for children without ambiguity and complexity [11,12]. There are various types of questions: single choice, multiple choice, nested, and open-ended questions [24]. It is recommended to ask one question at a time and not use vague quantifiers in questions about the frequency of behaviours [11,12] as questions on periodical behaviours are memory demanding [38], such as 'how often … ?'. Nested questions with the following question clarifying a previous question can be confusing for children. Negatively formulated questions can make the intended meaning ambiguous for children and should be avoided [11,12,28]. When questions are clear and concrete about 'here and now', children can provide more credible responses [11,28,58]. Adaptive behavioural studies usually ask about occupants' "general" perception of controls and adaptive behaviours [43,55,[59][60][61]. This study asks about children's 'here and now' feelings and it focuses on adaptive behaviours during the recent session by single or multiple-choice questions.
•Number of Questions: To prepare questionnaire for children, number of questions and response categories are important. There is a risk of fatigue in reading after a maximum of 15 questions [24], however, the risk of irritating respondents can be reduced by asking a minimal number of questions [37]. In long questionnaires, lack of motivation and difficulties in concentration result in poor data quality [12]. In this study, a variable such as age that is already defined, 9-11 years old, is removed from the questionnaire to make it shorter. A total number of 14 questions is designed for morning surveys and is reduced to 12 questions for afternoon surveys by removing two questions on gender and way of commuting to school.
•The Scale of Responses: Scales are commonly used to evaluate personal experiences of environmental conditions [62]. In this study, respondents were provided with a 5-point rating scale for sensational questions due to the following reasons: 1) Accuracy V.s. Precision: To provide more accurate responses than more precise responses 2 and increase children's ability to discriminate between different scales. According to Nicol (2008), "it is generally agreed that accuracy is not improved significantly by adding more points to the scale" [63]. 2) Improving understanding and reducing confusion: To improve children's understanding of the questionnaire that is also supported in a similar study [64]. A 5-point scale is more comprehensible to respondents [65] and communicates better with them [66]. It also increases response rate and response quality by reducing respondents' "confusion" and "frustration level" [67]. Several other studies have reported higher reliabilities for five-point rating scales [68][69][70][71]. 3) Consistency through the questionnaire: To provide a consistent rating scale for all thermal, air quality and visual sensational questions so that the effect of each aspect on overall comfort can be consistently evaluated. •Layout of Questionnaire: All questions are printed on one single page to make the layout easier to follow. Receiving feedback from teachers, Comic Sans MS theme Font, size 12 and Bold was chosen for the questionnaire.
There is a strong connection between respondents' vocabulary knowledge and their comprehension [72,73]. Therefore, the designed questionnaire was provided to 46 heads and teachers a week before Fig. 1. Methodological framework. 1 The tendency for respondents to present a positive image of themselves on questionnaires, or in a way that is consistent with societal norms or beliefs [37]. 2 While precision is a measure of the variation among survey estimates, accuracy is a measure of the difference between the survey estimate [101].

Running stage
Field studies, one pilot study to evaluate the first version of the questionnaire and seven field studies to test the revised version, were carried out in 8 naturally ventilated primary schools located in West Midlands, the UK from July 2017 to April 2018. Naturally ventilated buildings can be prone to outside noise through open windows [74], and window operation can be restricted to provide acoustic comfort in schools located in regions with a high background noise level [75]. Therefore, schools eligible for this study were selected from quiet regions with a considerable distance to the main road to allow window operation without any disturbance from excessive external noise. Location of all schools was checked by England Noise Map Viewer [76], and the results showed that the regional Road Noise, LAeq 16h, is less than 55 dB in all cases. This is the maximum allowable external noise level that lets natural ventilation under local control of teacher to prevent overheating [77]. A recruitment email was sent to principals (heads) of eligible schools. Among schools that showed interest in the project, priority was given to schools with different architectural designs that could provide different levels of opportunities for practising adaptive behaviours. There were no teacher's restrictions or school rules on children's window operation in studies classrooms.
In total, 805 children are observed in 31 classrooms and 1390 questionnaires were collected from morning and afternoon sessions, Table 1. The number of girl children (51%) and boy children (49%) was approximately the same that can reduce bias and increase the credibility of results. This study obtained its ethic approval before the start of the project and all ethical considerations were followed during the project, including getting consent from heads, teachers and students.
The pilot study was done during summer in a school with high number of openable windows (eight windows per classroom), hence opportunities for adaptive behaviours are sufficiently provided (Figs. 2 and 3).

Sampling mode
An overview of studies shows that most behavioural studies use transverse sampling [18,19,23,24,[45][46][47][48][49][50][51][52][53][54]56,78], however, there are several studies with longitudinal sampling in office [42,44] and repeated transverse sampling in residential buildings [55,57]. The problem with longitudinal sampling in this type of study is that many intervening variables affect understudied variables during a long time [38]. In this study, transverse sampling was carried out to study 805 children during different seasons. There is evidence that spending enough time studying respondents and their environment promotes validity [34]. Therefore, each student was asked to fill out the paperbased questionnaire two times a day. Questionnaires were usually administrated once at the end of the morning session and once at the end of afternoon session for two reasons: 1) To assure all adaptive behaviours practised during the whole session were reflected in the questionnaire 2) To let children adapt to the classroom's environment to safeguard thermoregulation. It was assured that children maintained a stable activity level at least 30 min before filling out the questionnaire [18,79].

Objective measurements
Environmental variables were recorded at 5-min intervals by multifunctional SWEMA equipment [80], temperature and humidity data loggers with USB [81], CO 2 meter (TGE-0011) [82] and Light Meter [83]. Details of the equipment including their range, resolution and accuracy are provided in Table 2. SWEMA equipment, designed to comply with ISO 7726 [84] and ISO 7730 [80,85] standards, collects data from three sensors: air velocity and temperature, air humidity and temperature and radiant temperature. The location of the sensors varied in each classroom considering the set-up criteria and children's health and safety. Measurement station was located away from the main airflows (e.g. windows), away from heat sources (e.g. projectors) Table 1 The number of schools, classrooms and observed children.   and also away from sun patches at a height of 1.1 m as recommended by ISO 7726 [84]. Equipment was placed within the vicinity of students' desks without impairing their visual access and seating arrangement. The instruments were usually set up in the classrooms before children's arrival in the morning to let instruments acclimatize to the classrooms' environment before reading [86]. Time-lapse cameras were installed inside the classrooms to record the state of windows, blinds and doors at 5-min intervals. Calibrated light meters measured illuminance level on each students' working desk when students were filling out the questionnaire. Outdoor variables were taken from local weather stations that were maximum 3 miles away from each study site [87].

Non-participant observation
Observation is a very useful tool by giving records of settings and enriching data collected from other techniques [88]. In this study, what to observe and why to observe was clear for the observer, therefore, the method of non-participant observation was applied [88]. Briggs et al. (2012) propose several points to reduce bias of observation [88], which were considered in this study. First, the observation was accompanied by a variety of other means including subjective and objective measurements and time-lapse cameras. Second, semi-structured observation procedure underwent piloting to obtain structured observation procedure. Third, the procedure of observation was controlled by other members of the research team to have their opinion on the applied method. Fourth, the observer tried not to be seen different from studied samples in the classroom. Therefore, observer remained silent in the back of the classroom without interrupting classroom activities or operation of controls. Furthermore, to make children feel at ease, the procedure of observation is not explained, however, it was explained that time-lapse cameras record the state of controls and not children. The observer was accurate in reporting descriptive information of observations to achieve descriptive validity [34].
The observation form is designed as an extended component to the designed questionnaire with three main parts (Appendix-B). The first part focuses on schools and classroom's architectural features and was filled out with the help of the head teacher. The second part focuses on children's personal and environmental behaviours and was filled out at 10 min intervals by the observer. The third part is designed mainly for validating responses and is focused on an individual child with a reference number. Classrooms' maps were drawn on observer's logbook and students' seats were given a match reference number. This reference number was used on top of the questionnaire with one distinct sticker for each reference number. Reference numbers helped grouping morning and afternoon surveys.

Validity and reliability stage
To develop a valid and reliable questionnaire, each question was tested during the pilot study for necessary content modifications to obtain an accurate questionnaire for main field studies. Questions are divided into four main categories: 'gender and way of commuting', 'adaptive behaviours', 'sensations and preferences' and 'comfort and tiredness'. The procedure of testing, modifying and validating questions is explained as follows:  Table 5. Children were also asked 'How did you get to school today?', provided with a 5-point descriptive rating scale. This question shows the effect of activity level and metabolic rate on children's sensations and consequently their adaptive behaviours, especially upon arrival. 'Monitoring answer-process' of this question during the pilot study, several children commented 'I usually get to school by X but today I got to school by Y' and several mentioned 'I got to school by X and Y'. Therefore, the word today was underlined, and the phrase 'You may need to check more than one box' was added to the question, (Question 2, Appendix-A). This question had also a response rate of 100% during both pilot study and main field studies, Table 5. However, there is no way to test the accuracy of the responses though observations.
3.3.1.2. Personal behaviour. The procedure of revising and validating questions on personal behaviours (i.e. clothing, fanning and drinking) is explained in the below: 3.3.1.2.1. Clothing behaviour. There is a school uniform policy in the UK that can restrict available clothing choices [89], and children have a specific range of school uniform options [18]. Children's clothing layers consist of fixed layers (i.e. worn for the whole day) and adjustable layers (i.e. jumper) that can be adjusted according to the classrooms' conditions and preference of children. In this study, each primary school has a unique uniform, however, most observed clothing combinations and their estimated Clo value according to ISO 7730 [85] can be seen in Table 3. All combinations include underwear, and when the jumper/cardigan is worn, a value of 0.25 is added to the combination [18].
To make the questionnaire more straightforward, top part of clothing uniform is not questioned as children mostly wear short sleeve shirt/blouse. Note that long sleeve light-weight shirt/blouse has the same Clo value as short sleeve shirt/blouse [85]. Figs. 4 and 5 show the most observed clothing uniforms in the studied schools. C O 2 0-5000 ppm 1 ppm 50 ppm Light Meter [83] Light level 0 to 50000 Lux/Fc 0.1 Lux/Fc ± 5% ± 10d (< 10000Lux) ± 10% ± 10d (> 10000Lux) The third and fourth questions focus on children's clothing to discover how children adjust clothing in different seasons. To achieve content validity, children and their answer-process were observed during the pilot study. Initially, the question on clothing layer was 'What are you wearing today?' and then changed to 'What are you wearing now?'. This is because visual observations showed that students might change their school uniform for Physical Education (PE) or for a performance. Indeed, what they were wearing at the time of filling out questionnaire might differ from what they were wearing for the day of the questionnaire. Furthermore, 'now' is less memory demanding than 'today', which can lead to more valid responses (Question 3, Appendix-A). According to Presser et al. (2004), the validity of surveys can be evaluated by comparing revised versions of surveys with original ones [13]. By improving this question, the validity of responses in the revised version improved by 3% (Table 5). To check 'construct validity' for this question and to see if clothing level is linked to a stimulus such as outdoor temperature, responses were compared with recorded outdoor temperature. Results show that average outdoor temperature is significantly different in different response categories (p < 0.001, ANOVA). Fig. 6 shows that when the outdoor temperature is lower, students mostly wear 'trouser' or 'skirt with tights' with higher Clo values. On the other hand, when outdoor temperature is higher, students mostly wear 'shorts' or 'skirt with socks' with lower Clo value. Seventynine invalid votes on children's clothing question were specified by observations (Table 5). Furthermore, eleven invalid votes were specified by both observation and cross-checking gender and clothing questions. Those votes are for boys who voted 'I am wearing skirt with socks or tights'.
Similarly, to show clothing layer at the time of filling out the questionnaire and achieve content validity, the question 'Did you take off your jumper this morning?' was replaced with 'Are you wearing a jumper/cardigan now?'. The question provided discrete two-point scale (yes/no)' during the pilot study but changed to descriptive three-point scale after monitoring answer-process of children in the pilot study. Several students commented 'What if I do not have a jumper/cardigan with me today', therefore, 'I don't have a jumper/cardigan today' was added to response categories to assure content validity, (Question 4, Appendix-A). By doing observations on the state of jumper/cardigan and comparing them with questionnaire results, percent of valid responses can be obtained. As the result of improving this question, the validity of responses improved by 3% from the pilot study to main filed studies, Table 5. For construct validity, responses were compared with recorded operative temperature to check if the state of jumper/cardigan is linked to a stimulus such as indoor operative temperature. Results show that average operative temperature is significantly different in different response categories (p < 0.001, ANOVA). Fig. 7 shows that when the mean operative temperature is lower, students wear jumper/ cardigan and when operative temperature is higher, students take off jumper/cardigan or do not have jumper/cardigan with them.
3.3.1.2.2. Fanning behaviour. Children were asked on fanning behaviour by 'Did you fan yourself this morning?' with discrete twopoint scale '(yes/no)'. The question provided 99% response rate during pilot study and 98% response rate during main field studies, Table 5 (Question 5, Appendix-A). It is difficult to validate responses through observations. Results show that average operative temperature is significantly different in two categories of 'Yes' and 'No' (p < 0.001, T-test). This question achieves construct validity since fanning behaviour increases as mean operative temperature goes higher, Fig. 8

Table 4
Cross-checking questionnaire results with visual observations and time-lapse photos.   Total number of window adjustments and number of window adjustments by children (obtained from visual observations and questionnaires) are presented in Table 4. According to observations, only 13% of operations were carried out by children, however, questionnaires' results claim that 62% of adjustments were done by children. This shows a significant gap between what has been claimed and what has really happened. Furthermore, a low response rate for this question (33%) and low percent of valid responses (55%) show that children failed to provide valid responses for this question, Table 5. Several reasons for misunderstanding to this question can be discussed: 1) The pronoun 'you' is the second-person pronoun that is both singular and plural, hence, children might interpret 'you' as the whole class. 2) Children might have ignored adverbs of time (this morning/this afternoon) and their window operation in the near past can possibly drive them to check 'yes, I opened/closed the window'. 3) It was observed that several children looked at windows when answering this question and if they found the window open, they checked the box 'Yes, I opened the window'. Therefore, this question was removed from the questionnaire to leave environmental behaviours to visual observations and timelapse cameras.  Table 5. This is probably because children preferred to check boxes without challenging their writing skills! Children tend to write much slower than they can read [38] that can be another reason for skipping openended questions. Therefore, open-ended questions were removed from the questionnaire to make it shorter and easier for children.
3.3.1.5. Sensation and preference. The next part of the questionnaire is focused on non-factual questions on sensation and preference over classroom's environment to find out how adaptive behaviours are affected by/affect perception of the classroom's environment and comfort. Authors declare that children's personal external factors such as social factors [90][91][92] are not the focus of this part of the questionnaire.
To check 'construct validity', statistical tests show that Thermal Sensation Vote (TSV) and operative temperature (T op ) are positively correlated (p < 0.001, Spearman's correlation) and children find the classroom warmer as T op increases, Fig. 10.
Children's thermal preference was questioned by 'How would you like the classroom to be now?' with a five-point rating scale 'warmer', 'a little  Cross-checking thermal sensation and preference questions, three of the responses are considered inconsistent in the dataset (less than 1%), Fig. 11. These responses are for children who found the classroom hot (TSV = 2) and preferred to be warmer (TPV = 2) or children who found the classroom cold (TSV = −2) and preferred to be cooler (TPV = −2), i.e. TSV + TPV = ±4, Table 6. This method for removing inconsistent responses has already been applied in similar studies [27,94]. Inconsistent data constituted 7% of the gathered data in one of the studies [94] and 5-8% of the data in another [27]. The low percent of inconsistent data in this study (less than 1%) highlights the validity of responses and the applied method. 3.3.1.5.2. Indoor air quality. Indoor air quality was questioned by two questions on freshness and the smell of classrooms. The level of freshness was questioned by 'How is the air in the classroom now?' with a five-point rating scale as 'very fresh', 'fresh', 'OK', 'stuffy', 'very stuffy' (Question 9, Appendix-A). The same rating scale is used in another study for evaluating schools' indoor air quality in the West Midlands [95]. To test 'construct validity', statistical tests show that children's votes on freshness and CO 2 levels are correlated (p < 0.001, Spearman's correlation). Children find the classroom stuffier by an increase in mean CO 2 level, Fig. 12.
Children's preference for air quality was questioned by 'I like the air to be fresher/as it is now' (Question 10, Appendix-A). Cross-checking questions on air quality sensation and preference shows that less than 1% (12 out of 1390) votes are inconsistent. Inconsistent responses are for children who found the classroom 'stuffy' or 'very stuffy' and preferred the classroom 'as it is'.
Another question for evaluating children's perception of indoor air quality during pilot study was 'Is your classroom smelly now?'. Out of 193 responses collected for this question, only two children checked the box 'Yes', even when the CO 2 level was high. Children mostly relate this question to strong smells. Therefore, this question was removed for main field studies. Another study supports that student's perception of air freshness better accounts for the level of CO 2 in the classroom than children's perception of smell [95].
3.3.1.5.3. Visual Comfort. Among different aspects of visual environment, illuminance level (lux) that is more likely to be affected by adaptive behaviours was questioned by 'My classroom is a) Very bright, b) Bright, c) OK, d) Dark or e) Very dark'. Students' votes were compared with measured illuminance level on each students' working desk and no correlation was found between these two (p = 0.288, Spearman's correlation). This shows that the scale conveys the colour of the classroom rather than the level of light in the classroom. The question and scale were changed to 'The light in my classroom is a) Much b) Enough c) OK d) Not enough e) Little', (Question 11, Appendix-A). The scale is comparable to the scale in a similar study that questioned light availability by following scale [a) Much b) Enough c) Average d) Not enough e) Little'] [96]. The present study used 'OK' instead of 'average' or 'neutral' to make it more understandable for children. The use of 'OK' instead of 'Neutral' in thermal comfort scale is applied in similar studies [18,27,89,92,94]. By changing the question for main field studies, illuminance level is found to be correlated with visual sensations (p < 0.001, Spearman's correlation). By an increase in illuminance level, children find higher light levels in the classroom, Fig. 13. Visual sensation in schools is questioned in Ref. [64] by 'Do you think there is    Table 6. Inconsistent votes are for children who found the light in the classroom 'not enough' or 'little' and preferred the light to be 'less' (84 votes), and children who found the light in the classroom 'much' and preferred the light to be 'more' (13 votes). 3.3.1.5.4. Overall comfort and Tiredness. Evidence shows that poor environmental conditions reduce overall comfort and academic performance [3,4,98,99]. Therefore, the last two questions focus on student's comfort and tiredness. Both questions provided two-point scale responses during the pilot study: 'Yes, I am comfortable/tired-No, I am not comfortable/tired'. However, the scale changed to a three-point response category after pilot study because several children asked questions such as 'What if I am a bit tired' or 'What if I am a little comfortable'. Therefore, 'I am a little tired/comfortable' was added to the questionnaire to provide content validity (Questions 13 & 14, Appendix-A). This scale is approved in a similar study that questioned 'level of tiredness' by a '3-point rating scale' [18].
Monitoring answer process showed that children relate discomfort and tiredness to many factors: 'The chair is not comfortable', 'I am hungry', 'I do not like math' or 'I want to go home'. However, unacceptable environmental conditions were among the most common factors related to student's tiredness and discomfort; 'I am boiling' or 'It is so hard to breathe in here'. CO 2 level is one of the environmental factors that affects children's level of tiredness and comfort. Results of the study show that with higher levels of CO 2 , Children feel more tired (p < 0.01, Spearman's correlation) and less comfortable (p < 0.01, Spearman's correlation), Figs. 14 and 15.
Among children who provided invalid responses on clothing questions, 46% felt tired, 42% felt a little tired and 12% did not feel tired, Fig. 16. This shows the effect of tiredness on number of invalid responses. Similarly, among children who provided inconsistent responses on sensational and preference questions, 30% felt tired, 46% felt a little tired and 24% did not feel tired. It is obvious that percent of invalid responses is less among students who were not tired, Fig. 16. This part of the study confirms that high CO 2 levels in the classrooms impacts children's overall comfort and tiredness, and consequently their errors in responding.
Response rate of the last two questions (96%) is lower than response rate of other sensational questions that can be contributed to children getting bored at the end of the questionnaire and skipping these two questions, Table 6.

Reliability
To test the reliability of the questionnaire there is a need to check how responses vary under similar conditions. In this study, mean clothing values increase from 0.41 in summer (t op = 26.4°C) to 0.69 in winter (t op = 22.3°C), however, it remains around 0.6 during midseasons of spring (t op = 22.4°C) and autumn (t op = 24.2°C), Table 7. Standard Deviations (SD) of clothing values for summer, autumn, winter and spring are 0.12, 0.14, 0.09 and 0.14, respectively. SDs are higher during mid-seasons (autumn and spring) than during extreme   [21]. In this study, mean TSVs for summer, autumn, winter and spring are 0.52, 0.35, 0.36 and 0.31, respectively, and SD TSV for above seasons stand 0.99, 1.08, 0.99 and 1.06, Table 7. Summer results in this study (T op = 26.4°C, TSV mean = 0.52 and SD TSV = 0.99) are comparable with counterpart results in Australia (T op = 25.1°C, TSV mean = 0.45 and SD TSV = 1.38) [22]. Furthermore, TSV mean and SD TSV [94]. Standard deviations of TSVs in this study are lower than counterparts in above mentioned studies [22,94], suggesting the reliability of TSVs by being clustered around the mean.

Discussion
The paper has designed a self-reported questionnaire for children aged 9-11 years old for studying their adaptive behaviours. The validity and reliability of the questionnaire and responses were investigated by different methods, Table 8. Results of the study show that monitoring answer-process is the main method for validating the content of all Fig. 11. Comparing thermal sensation votes with thermal preference votes to find inconsistent votes.    children find the question on jumper/cardigan easier to respond than the question on fixed part of clothing. According to Table 6, number of 'inconsistent' responses is higher on visual sensations (7%) than on indoor air quality and thermal sensation questions (less than 1%). It indicates that children have a better understanding of thermal and indoor air conditions than visual conditions. Note that there is a true response to the factual and behavioural questions and invalid responses can be removed by observations. However, there is no true response to sensational questions and validating process can only remove inconsistent responses. Reliability in this study refers to the variability of the responses under similar conditions and is evaluated by calculating standard deviations. It is expected that children's responses are not varied significantly under the same environmental conditions and seasons. This expectation was met when considering the variability of responses for clothing values and TSVs.
This study demonstrates that questionnaire is mainly effective for children's sensations and their personal adaptive behaviours rather than their environmental adaptive behaviours. This is discovered from the significant discrepancy between children's claim in the questionnaire (62%) and visual observations on windows (13%), highlighted in Table 4. Furthermore, questionnaire is more effective when checkbox questions are provided rather than open-ended questions. Children and their approach in controlling environment were observed through the designed observation form (Appendix-B). Adaptive behaviour questionnaire should be accompanied with the observation form for a valid and reliable method for studying both personal and environmental adaptive behaviour of primary school children. Applying both observational and self-reported questionnaires to study adaptive behaviours is supported in similar studies in different contexts and climates [25,43,[51][52][53]100]. The study provides a guide on how to use the selfreported questionnaire and observation form together, (Appendix-C).
To generalize the proposed method, the results should meet another form of validity, called external validity [34]. To maximise the generalizability of the results, it is important to achieve high response rates [35], because measure of non-response items is an indicator of response quality [12,28]. In this study, all questions received a high response rate between 96% and 100%, that can be attributed to distributing paper-based questionnaires and respondent's interest in the project as already supported in similar studies [11,38]. High response rates are also due to the clear structure of the check-box questions, scales and response categories for children. When the validity and reliability of the designed questionnaire are supported by other studies, there would be greater support for the claim of external validity [34]. This method can be generalized to other contexts and climates considering that observation form focuses on classrooms with different architectural features and controls under various climatic conditions and seasons.

Conclusion
The study proposes a valid and reliable method to study the adaptive behaviour of primary school children with regard to IEQ. The study designs a questionnaire based on children's cognitive and linguistic abilities consisting of factual questions on adaptive behaviours and of non-factual questions on 'here and now' environmental sensations, Appendix-A. The questionnaire was first tested during the pilot study and then was run in main field studies. The four main methods used to validate this questionnaire and responses include monitoring answerprocess, non-participant observations, cross-checking questions and statistical tests. Validating process improved the wording of the questions and response categories during pilot study. Part of validating process removed invalid votes on fixed and adjustable part of clothing and removed inconsistent votes on thermal, indoor air quality and visual sensations. Validated questionnaire achieved high response rates and high percent of valid responses. 'Window operation' and 'open-ended' questions were removed from the questionnaire due to the low response rates and low percent of validity. Developed questionnaire is mainly effective for recording children's sensations and their personal adaptive behaviours rather than environmental adaptive behaviours. To record environmental adaptive behaviours, the study introduces an observation form that needs to be completed alongside of self-reported questionnaire (Appendix-B). The combination of running self-reported questionnaire and observation form proposes a method that can be used for studying personal and environmental adaptive behaviour of primary school children with regard to IEQ. A guide is also provided on how to use the questionnaire and observation form together, Appendix-C.