RLFS_Methodology_%402024version__0.pdf - Page 2

11 Labour Force Survey, Methodology LFS, Methodology © NISR, 2024 Customized allocation Starting from one of the options described above, this allocation permits to oversample households in specific domains of interest, for example urban-rural. However, this would imply a loss of national estimates' precision compared to the proportionate or the Neyman allocations. The most suitable allocation scheme for the new LFS sample was decided based on three criteria: – To increase the countrywide urban sample size and improve the precision of the estimates in this domain, districts with the largest proportions of urban population were assigned larger samples. In particular, larger samples were assigned to the three districts that form Kigali (Nyarugenge, Gasabo and Kicukiro) and the six districts with the largest proportions of urban population outside Kigali (Rubavu, Musanze, Bugesera, Rwamagana, Rusizi, and Kamonyi). In addition, within these six districts, PSUs were stratified into urban and rural and the district sample was allocated equally into these two substrata. – The previous point was considered as long as the annual sample for the remaining 21 districts would permit estimating the unemployment ratio by district with an expected coefficient of variation of at most 15%. – The expected coefficients of variation for the annual and quarterly estimates of the unemployment ratio were assessed for five domains: Urban, Rural, Kigali, Urban Non-Kigali, and Rwanda. Table 1 shows the final quarterly and annual sample allocation among the 30 districts in Rwanda, including the number of PSUs and households. Table 1. LFS Sample Size by District Province District npsu quarter nhh quarter nhh int year Rwanda 552 6,624 26,496 Kigali Nyarugenge 28 336 1,344 Kigali Gasabo 28 336 1,344 Kigali Kicukiro 28 336 1,344 Southern Nyanza 16 192 768 Southern Gisagara 12 144 576 Southern Nyaruguru 12 144 576 Southern Huye 20 240 960 Southern Nyamagabe 16 192 768 Southern Ruhango 16 192 768 Southern Muhanga 20 240 960 Southern Kamonyi 24 288 1,152 Western Karongi 16 192 768 Western Rutsiro 12 144 576 Western Rubavu 24 288 1,152 Western Nyabihu 20 240 960 Western Ngororero 12 144 576 Western Rusizi 24 288 1,152 Western Nyamasheke 12 144 576 Northern Rulindo 16 192 768 Northern Gakenke 12 144 576 Northern Musanze 24 288 1,152

12 Labour Force Survey, Methodology LFS, Methodology © NISR, 2024 Province District npsu quarter nhh quarter nhh int year Northern Burera 16 192 768 Northern Gicumbi 12 144 576 Eastern Rwamagana 24 288 1,152 Eastern Nyagatare 20 240 960 Eastern Gatsibo 16 192 768 Eastern Kayonza 20 240 960 Eastern Kirehe 12 144 576 Eastern Ngoma 16 192 768 Eastern Bugesera 24 288 1,152 The expected coefficient of variation of the unemployment ratio can be derived from the sample size formula as The following tables present the final sample sizes for five domains (Urban, Rural, Kigali, Urban Non-Kigali, and Rwanda) under the sample allocation among districts shown above2. They indicate the expected coefficients of variation of the unemployment ratio estimate by domain. A population unemployment ratio of 6% is assumed. Table 2. Expected Annual Sample Size of Household Interviews by Domain Domain nhh int/year Domain nhh int/year Kigali 4,032 Rural 17,550 Rural non-Kigali 17,147 Urban 8,946 Urban non-Kigali 5,317 Rwanda 26,496 Rwanda 26,496 Table 3. Expected Annual Unemployment Ratio CV by Domain (under p=6%) Domain CV Domain CV Kigali 5.9% Rural 2.7% Rural non-Kigali 2.7% Urban 4.9% Urban non-Kigali 6.4% Rwanda 2.3% Rwanda 2.3% Table 4. Expected Quarterly Unemployment Ratio CV by Domain (under p=6%) Domain CV Domain CV Kigali 11.8% Rural 5.4% Rural non-Kigali 5.4% Urban 9.8% Urban non-Kigali 12.7% Rwanda 4.7% Rwanda 4.7% 2 File “Rwanda LFS Sample Size and Allocation.xlsx” includes the calculations and simulations performed to obtain the LFS national sample size and the sample allocation across districts and urban/rural areas.

13 Labour Force Survey, Methodology LFS, Methodology © NISR, 2024 1.8 Panel Rotation Scheme The LFS sample will have a 2-2-2 panel rotation scheme. This means that the sample in each stratum is randomly distributed into four equal-size groups, called “rotation groups” and each of these groups is randomly split into equal-size subgroups, called “panels”. Thus, each PSU in the sample and the households in it are randomly allocated to a panel that will be part of the LFS sample for two consecutive quarters, leave the sample for the following two quarters, and then return to the sample for another two quarters. After this, the panel PSU and the households in it will abandon the sample and be replaced by a new panel of PSUs and households; this is why these are referred to as “rotating panels”. In sum, each household in the LFS sample will be visited for the first time in one specific quarter and for the fourth and last time eighteen months later. In practice, to construct the rotation groups, the sampling frame of PSUs in each stratum is first split randomly into four equal-size subsamples (the rotation groups), each of them representative of the stratum population. Next, equal-size subsamples of PSUs (the panels) are selected within each rotation group independently. The LFS 2-2-2 rotation scheme leads to a 50% sample overlap between two consecutive quarters and a 50% overlap between the same quarter of two successive years. As a result of the panel rotation pattern, every quarter the LFS sample will be composed of one part including households from the previous quarter and another part formed by households not included in the previous quarter. The diagram below shows the LFS sample corresponding to each quarter in the columns, and the four rotation groups and their panels in the rows. Each panel has a different color. For example, the Quarter 3 sample in year y+1 is formed by four panels of PSUs and households corresponding to the four rotation groups. The panels in rotation groups B and D (panels B2 and D3) were also included in Quarter 2 of the same year y+1, so 50% of the Quarter 3 sample overlaps with the Quarter 2 sample within year y+1. The diagram also shows that in Quarter 3 of year y+1, the panels in rotation groups A and B (panels A2 and B2) were also included in Quarter 3 of the previous year y, so 50% of the Quarter 3 sample of y+1 also overlaps with the Quarter 3 sample of the previous year y. Figure 1. LFS Sample 2-2-2 Panel Rotation Scheme Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 RG A Panel A2 Panel A2 Panel A2 Panel A2 .............. RG B Panel B2 Panel B2 Panel B2 Panel B2 .............. RG C .............. RG D Panel D3 Panel D3 .............. Rotation Group Year y Year y+1 Year y+2 Also, when the LFS samples corresponding to the four quarters in a year are aggregated into a larger yearly sample, the resulting 26,496 household interviews will consist of panel households interviewed more than once over the year. In other words, each household is unique within a year, but each household is interviewed more than once within a year. Specifically, under the 2-2-2 rotation scheme, any given year will comprise 9 independent panels of PSUs and households. 7 of them will be included in two quarters within the year and the other two will be included just once during that year. As a result, each PSU and the households within it will be in the annual sample an average of times. This means that the 26,496 interviews throughout a year will consist of unique panel

14 Labour Force Survey, Methodology LFS, Methodology © NISR, 2024 households. Annex 1 details the complete LFS schematic 2-2-2 rotation scheme. This scheme was created by combining all sample PSUs scheduled for implementation from Quarter 1 of 2024 to Quarter 4 of 2034 and aligning each rotation group with its respective panel. The objective of seeking overlapping samples over successive survey rounds as part of the sampling design is to achieve a positive correlation for the variables of interest as large as possible between consecutive quarters or quarters one year apart. This correlation reduces the estimator's variance of the difference of a given variable between two quarters. The expression below shows how the sampling variance of the change estimator for variable is reduced - and thus, the precision of the change estimate increased - due to the existing correlation between and , which is in part determined by the magnitude of the overlap between the two successive samples. Since respondents in a panel are the same over the consecutive samples, the covariance between and is expected to be non- zero. The larger the overlap between samples in two quarters, the higher the covariance and the larger the precision of the change estimate. where indicates the point estimate of variable in sample 1, corresponding to time point 1; is the point estimate of variable in sample 2, corresponding to time point 2; denotes the point estimate of the change of variable between time points 1 and 2. However, even if maintaining a fixed panel sample over time (i.e. keeping a complete overlap between successive samples) would yield the largest precision of the change estimates, this is not used in official surveys with numerous consecutive rounds, as it would result in increasing levels of attrition over time because of respondent fatigue. This is why official regular surveys like Rwanda LFS use rotating panels, where households are re-interviewed for a limited number of rounds. Annex 4 includes the Stata code for testing the change of a variable between any two LFS waves, accounting for both the panel overlap and the sample design features. The test output shows the point estimate of the change, and the corresponding standard error, t-score, p-value, and 95% confidence interval. 2.9 Sample Selection The new Rwanda LFS sample has a two-stage stratified probability design. It includes 36 strata, formed by 24 entire districts and 6 districts partitioned into urban and rural areas (Rubavu, Musanze, Bugesera, Rwamagana, Rusizi and Kamonyi). Each stratum is divided into four equal-sized rotation groups. In the first sampling stage, the PSUs are census EAs, or groups of EAs. PSUs were selected independently within each rotation group with probabilities proportionate to their size (PPS) using the number of households reported by the 2022 Census as a measure of size. The selection probability of PSU in rotation group in stratum is

Where represents the number of households registered by the 2022 Census in PSU in rotation group in stratum , and is the number of PSUs selected in the rotation group and stratum. In the second sampling stage, a subsample of households was drawn within each of the PSUs selected in the first stage. Specifically, a listing operation was implemented in each PSU to update the census list of households and a sample of households was then selected with systematic sampling. The selection probability of household in PSU , rotation group and stratum is Where is the number of households selected in the PSU and denotes the number of households listed in PSU
in rotation group and stratum . Subsample size is fixed across all sample PSUs. Then, the final selection probability of a household is3 In every sample household in the LFS, all individuals are interviewed, either directly or through a proxy respondent. Therefore, the selection probability of an individual within their household is 1, and their final selection probability is the same as the household. 1.10 Weighting The estimation of population parameters from a probability sample is based on the premise that each sample unit represents a certain number of other units in the population in addition to itself. For example, a specific household with a sampling weight of 210 represents itself and another 209 households in the population. Correspondingly, the total number of units in the population with a given characteristic is estimated by summing the weights of units in the sample with that characteristic. The LFS, as in most household surveys, will use a weight equal to the inverse of the probability of selection of each unit. Thus, the base weight (or design weight) of a sample household is the inverse of its final selection probability . As with the final selection probabilities, the base weight of any sample individual is equal to the weight of the household to which it belongs. In practice, however, household and individual base weights are often modified for numerous reasons and 3 In PSUs where the household count from the listing operation is the same as the census household count (), the household final selection probability is simplified as If this scenario held in all PSUs, then the final selection probability of all households would be the same and the design would be self-weighting. However, as the census data gets outdated over time, it is expected that will diverge from , so the household final selection probabilities will vary more across PSUs. As a result, some precision will be lost for the survey estimates in exchange for a reduction in coverage error due to the listing conducted every quarter. 16 Labour Force Survey, Methodology LFS, Methodology © NISR, 2024 are not directly used to obtain the survey estimates. In addition to sampling error – i.e., the error due to working with a sample as opposed to the entire population – every survey is subject to different types of nonsampling error, such as nonresponse, underrepresentation of specific population groups, measurement error, etc. Sampling and nonsampling errors combined contribute to the total survey error affecting the estimates. They are hard to quantify and are generally systematic, i.e. non-random, introducing some degree of bias to the survey estimates. Minimizing the effect of the different sources of nonsampling error on the survey estimates is one of the central subjects of survey methodology. This is achieved by keeping sampling frames complete and up-to-date, testing the questionnaire design, assessing the field protocols in pilot tests, training enumerators sufficiently, visiting households several times and at different times of day until they are reached, supervising interviewers closely, and having a comprehensive quality control system in place to allow for corrective actions as soon as any issue in the field is detected. Nonetheless, nonsampling errors cannot be entirely eliminated even taking all these measures. In particular, undercoverage and nonresponse potential bias can be addressed and reduced, though not eliminated, through weighting adjustment. 1.11 Weighting Nonresponse Adjustment Nonresponse occurs when an eligible household4 is selected as part of the sample but cannot be interviewed due to several possible reasons, e.g., all household members refuse to cooperate, household members are temporarily absent during the survey period, the interview started, but it was not possible to finalize it, etc.5 Every household survey should try to reduce nonresponse as much as possible. Some common practices consist of training enumerators with techniques to achieve a change of attitude in the respondents who refuse to participate, and revisiting households after visits where nobody is at home several times before discarding the household. Even so, nonresponse is an ever-present phenomenon in any survey and a source of potential bias. Though the magnitude of the bias due to nonresponse is generally unknown, it is related to the level of nonresponse and the difference in the characteristics under study between respondent and nonrespondent households. The base weights described in the previous section were adjusted to compensate for any potential nonresponse bias using a class-based adjustment, with the class being each PSU. The weighting class nonresponse adjustment is based on the inverse of the weighted response rate estimated in each PSU, which is the ratio of the sum of the base weights of all sample households (respondents and nonrespondents) to the sum of the base weights of respondents in that PSU. It is given by where is the adjustment factor applied to each responding household in PSU in rotation group and stratum to compensate for nonresponse. Subindices R and NR indicate the respondent and nonrespondent households respectively and is the household base weight. 4 Eligible households are those that are part of the survey target population, i.e. excluding vacant dwellings, business stores, etc. 5 The total nonresponse rate in the LFS has usually been below 5%. 17 Labour Force Survey, Methodology LFS, Methodology © NISR, 2024 Finally, the nonresponse-adjusted weight of each respondent household in PSU is

1.12 Weighting Calibration Adjustment After obtaining adjusted weights, the latter were calibrated to known population projections for four demographic groups: Males and females population less than 16 years old and males and females 16 years old and over living on private households The population projections were derived from the NISR census publication. 6 The projections were adjusted by deducting estimated values for the institutional population not living in private households. The calibration procedure followed the methodology of Deville and Sarndäl.7 Accordingly, the final calibrated weights were obtained from the formula, � CalibratedWeight(hhk ) = wk = dk ' × (1+ λxk ' ) where dk’ is the adjusted weight for non-response, λ is a regression vector obtained from the calibration formula, and xk’ is the vector of the count of male less than 16 years old, male 16 years old and over, female less than 16 years old and female 16 years old and over of interviewed households in the enumeration area k. All individuals in the same household are assigned the weight of the household in which they belong. 1.13 Estimation Estimation from survey data is the inferential process of obtaining estimates or approximations to unknown population parameters. Let be the final weight of households
in PSU in rotation group in stratum – that is, the base weight adjusted for nonresponse and calibrated. Final weights allow the creation of simple expressions for the estimators. If 𝑌 and 𝑍 are two survey variables of interest measured at the household level, their most commonly used estimators are totals, means and ratios. Total: Mean: Ratio: where 6 National Institute of Statistics of Rwanda, Fifth Population and Housing Census, Rwanda, 2022, Thematic Report Population Projections, July 2023. 7 Deville, J.C., and Sarndäl, C.E., “Calibration Estimators in Survey Sampling,” Journal of the American Statistical Association, Vol. 87, 1992, pp. 376-382. 18 Labour Force Survey, Methodology LFS, Methodology © NISR, 2024 is the number of PSUs selected in rotation group in stratum , with and ; and is the number of households selected in PSU in rotation group in stratum . 1.14 Sampling Error Estimation As mentioned above, the LFS has a complex sample design, i.e., a design featuring stratification, clustering, and unequal selection probabilities due to the disproportionate sample allocation across strata done to improve the precision of the survey estimates in some districts and urban areas. Such sample design features make unequal base weights necessary which, added to the adjustments introduced after fieldwork, result in weights with a certain degree of variability. When estimating sampling errors (through the sampling variance, standard errors, confidence intervals or coefficients of variation) for statistics such as means, proportions, ratios and regression parameters, all features of the complex LFS sample design must be accounted for. If they are not, standard statistical software will “assume” that the sample is a simple random sample (SRS), resulting in biased estimates and unrealistically low sampling errors. Therefore, standard errors and coefficients of variation would be underestimated, confidence intervals would be erroneously narrower, and test statistics would be biased. The two most common approaches to estimating sampling errors for complex sample data are 1) Taylor series linearization (TSL) of the estimator and the corresponding approximation of its variance or 2) replication variance estimation techniques, such as jackknife repeated replication (JRR), bootstrapping, and balanced repeated replication (BRR). Stata, SAS and other statistical software packages use the ultimate cluster estimator and the TSL method by default to estimate sampling errors for complex sample data. Annex 3 indicates the Stata syntax that should be used when analyzing the LFS data to account for its sample design features and weighting. Under the TSL method, the sampling variance of the mean or proportion of variable can be approximated as where is the estimator of the mean or proportion of variable , is the weighted total of variable , and is the total of the sampling weights (the sum of the weights). The sampling variances and covariances of totals and are estimated using the ultimate cluster estimator, simple formulae that require only knowledge of the totals by PSU.

Page 2 of 7