Data Processing & Appraisal Data Editing Data Editing (see external resource entitled: Final Data Processing Report) Questionnaires were reviewed by the controller in the field before they were dispatched for data entry. A control sheet was provided to the controllers to assist in the process of manually editing the questionnaires. Questionnaire structures were verified when the questionnaires were checked in prior to data entry. Three contracted persons reviewed the questionnaire and filled in a form that served as a primary data control sheet. Automated data editing was largely done during the data entry phase (see "Other Data Processing" for details). Some batch edit programs were used to identify inconsistent data. Data Imputation Data imputation was largely done during the analysis phase by analysts. However, a "structural" imputation on the microdata was required for the own consumption data. This was done to adjust for erroneous pricing when the unit for measuring own consumption was buckets. For more information, please refer to the SPSS syntax files other data processing report. Primary Data Issues Coding of products was based on sequential codes for each section. Sequential coding was used to correlate the indexed position of the item for locating the record in the data processing system with the actual row number or sequence. For the poverty study, a recode was done to expenditures to the EICV-1 codes. The recodes are available in the syntax files. However a general recode to standardize commodities to a standard (such as COICOP) was not done. Other Processing Data Entry (see external resource entitled: Final Data Processing Report) New systems and techniques were used to capture and edit the data for the EICV-2. Many improvements were implemented to the data entry system for the EICV-2. The EICV-1 used the DOS based software called IMPS for both data entry and data editing (CENTRY and CONCOR modules respectively). In addition, EICV-1 used various short term and intermittent consultant inputs for the design and implementation of the data processing system. The first five months of the data entry process during the EICV-1 suffered greatly from a lack of quality control. This lack of cohesive support during both the design phase and initiation of the data processing system likely impacted the quality of the data despite attempts made to correct the system during mid-survey. For the EICV-2, long-term and continuous technical support was provided by the OPM consulting firm and better trained and more committed local supervisors followed through in implementing and maintaining the system. In addition and more importantly, the EICV-2 data processing activities followed quickly behind the processing of the DHS (Demographic and Health Survey). It was clearly advantageous to simply adapt the DHS data processing system for the EICV-2. The DHS data processing system is a broadly used and dynamic system designed for use with the data processing software CSPro (Census and Survey Processing System). In fact, CSPro is designed with the DHS as its model survey. Furthermore, this system of managing the data processing activities is also being used by UNICEF to process the MICS. Applying a robust system and modifying it for use during the EICV-2 saved a great deal of time and effort in training and development. The staff was already familiar with the DHS data processing and editing system and porting the system to the EICV-2 over the long term and through the extent of the survey proved very useful. Some of the specifications that are used by the DHS, MICS and the EICV are: a. An integrated sample design control sheet used to check in questionnaires. b. A data entry system designed as "system control". A system controlled application is a very tight control system where the path of data entry cannot be circumvented by the data entry clerk. The path is fully programmed and must include: skips and pre-defined keys for: missing, other or incoherent data. c. Full double-entry for independent verification. d. A systematic control of data files from: primary-verified-raw-edit-final data files. e. Full reconstruction of the consolidated data file with the primary cluster file. f. All corrections done on the lowest ASCII cluster level. Enquête Intégrale sur les Conditions de Vie des Ménages 2005-2006 - Overview
- 9 -
The data entry was done centrally in the NISR headquarters. Activity was initiated in the old Census building in Remera on October 20. On December 16, 2006, the NISR consolidated its offices and moved the Census activities to its current location in the old MINIPLAN building. The move required the establishment of the new data entry operations in the new building and the transfer of all machinery to the building. This operation did not adversely affect the keying operations. The remainder of the survey was keyed in the MINPLAN building. All computers were set up in a LAN with data being copied and written to the supervisor machines and backed up daily. The questionnaires were received and checked into a central repository. Data was entered by the cluster (9 urban questionnaires or 12 rural questionnaires). Two archivists managed the check-in and distribution of questionnaires to the data entry supervisors. A sample of the check-in forms is provided in Annex 1. Once the questionnaires were received and logged on a control sheet, the control sheet was entered in an automated control system by the data entry supervisors prior to being assigned to the data entry clerk. This system maintained by the supervisors assured that the sample design was strictly adhered to and that the coding and tracking of the questionnaires was properly initiated and followed. This system was built on the DHS control system and used CSPro to manage the flow and assignment of the questionnaires. There was a 100% full independent double data entry of the questionnaires. This assured virtual certainty that inconsistencies found in the data were mostly due to errors and misreported items from the field. Average data processing time to process all three questionnaires related to a cluster was 21.3 days Estimates of Sampling Error Given that the survey estimates are subject to sampling variability, it is important to calculate the sampling errors for the most important estimates from each survey. The sampling error is measured by the standard error, or square root of the variance of the estimate. The CENVAR software, a component of the Integrated Microcomputer Processing System (IMPS) developed by the U.S. Census Bureau, was used for tabulating the standard errors and other measures of precision, taking into account the stratification and clustering in the sample design. The CENVAR output tables show the value of the estimates, standard errors, coefficients of variation, 95 percent confidence intervals, design effects and number of observations. Given that the confidence intervals provide a user-friendly interpretation of the sampling variability, an annex was produced with tables showing the 95 percent confidence intervals for the most important estimates from the EICV1 and EICV2 data appearing in the preliminary report. These tables provide a quick conservative test to determine whether any difference between the EICV1 and EICV2 estimates is statistically significant. The INSR was also provided with tables showing the full CENVAR results. The design effect is defined as the variance of an estimate based on the actual sample design divided by the corresponding variance based on a simple random sample of the same size; it is a measure of the relative efficiency of the sample design. In comparing the CENVAR results from EICV1 and EICV2, it was found that the design effects are generally lower for EICV2, indicating that the stratification used for this survey was very effective. Given that the EICV1 was based on an older sampling frame from the 1991 Rwanda Census, this also contributed to the higher design effects for the EICV1 estimates. Accessibility Contact(s) Emmanuel Gatera (NISR) , www.statistics.gov.rw , [email protected] Confidentiality Individual confidentiality and responses are secured by law. The current data set has provided only relevant levels of geographic disaggregation to the old provincial level. A district code is provided (new district) since there is a demand to examine results at this level. The identifying key for the household in not a geographic key. It is based on a sequential cluster number and sequential household number. Access Conditions Access to the microdata at this stage is only with the permission of the NISR. The current data set is currently not distributable. With some exception the microdata can be accessed and used on the NISR premises. All data must remain on NISR computers. Citation Requirements The following citation is provided when producing results or tables using the microdata: "Source: National Institute of Statistics-Rwanda, Enquête Intégrale sur les Conditions de Vie des Ménages Enquête Intégrale sur les Conditions de Vie des Ménages 2005-2006 - Overview
- 10 -
2005-2006 (EICV 2005), Version 1.1" Rights & Disclaimer Disclaimer The National Institute of Statistics-Rwanda releases these data and cannot guarantee or assure nor be held responsible for any published results produced by external users. Copyright Copyright 2007: National Institute of Statistics of Rwanda Enquête Intégrale sur les Conditions de Vie des Ménages 2005-2006 - Overview
- 11 -
Files Description Dataset contains 46 file(s) eng_eicv2_s0_id
Cases
6900
Variable(s)
17 File Structure Type: relational Key(s): KEY (Household identification) File Content Section 0 (eng_eicv2_s0_id): Introductory (structure: household level file): Contains introductory observations and records the response rate and replacement households as well as dates of the interview. Producer NISR (National Institute of Statistics) Version Version 1.0 Processing Checks Processing checks were conducted in CSPro. Most of the processing checks were conducted during data entry. In cases where coded product lists are provided, these have been given a sequential id in order to facilitate the correlation of the product with the internal index. None of the products between sections have been provided standard codes. Some recoding has been done to products to make them comparable with the EICV-1 results. This recode syntax is provided in an external file (see Study Description-Edits). At the time of archiving this data, an internal standard classification scheme had not been institutionally adopted. Each file also contain the household weight; the design strata variable; the age and sex of the person (where applicable) Missing Data Missing data in most case was coded as follow: 0 was used for pre-coded questions -1 was used for expenditure and monetary type data as 0 is a valid reported amount. BLANK values are out of the interview path and are not applicable. If there are questions regarding BLANK records, refer to the stated contents of the file or the universe description as provided at the variable level. eng_eicv2_s1_demo
Cases
34785
Variable(s)
26 File Structure Type: relational Key(s): KEY (Household identification) , PID (Person ID) File Content Section 1 (eng_eicv2_s1_demo): Demographics (structure: person level file): Contains general demographic information of the persons present at the household during the survey and makes a determination on who is a household member based on the appropriate selection criteria (see the variable description for household member for more information). Producer NISR (National Institute of Statistics) Version Version 1.0 Processing Checks Enquête Intégrale sur les Conditions de Vie des Ménages 2005-2006 - Files Description
- 12 -
Processing checks were conducted in CSPro. Most of the processing checks were conducted during data entry. In cases where coded product lists are provided, these have been given a sequential id in order to facilitate the correlation of the product with the internal index. None of the products between sections have been provided standard codes. Some recoding has been done to products to make them comparable with the EICV-1 results. This recode syntax is provided in an external file (see Study Description-Edits). At the time of archiving this data, an internal standard classification scheme had not been institutionally adopted. Each file also contain the household weight; the design strata variable; the age and sex of the person (where applicable) Missing Data Missing data in most case was coded as follow: 0 was used for pre-coded questions -1 was used for expenditure and monetary type data as 0 is a valid reported amount. BLANK values are out of the interview path and are not applicable. If there are questions regarding BLANK records, refer to the stated contents of the file or the universe description as provided at the variable level. eng_eicv2_s2_education
Cases
28018
Variable(s)
72 File Structure Type: relational Key(s): KEY (Household identification) , PID (Person ID) File Content Section 2 (eng_eicv2_s2_education): Education (structure: person level file): All household members 6 years and over. Contains information on school attendance (current and past), expenditures, literacy etc. Producer NISR (National Institute of Statistics) Version Version 1.0 Processing Checks Processing checks were conducted in CSPro. Most of the processing checks were conducted during data entry. In cases where coded product lists are provided, these have been given a sequential id in order to facilitate the correlation of the product with the internal index. None of the products between sections have been provided standard codes. Some recoding has been done to products to make them comparable with the EICV-1 results. This recode syntax is provided in an external file (see Study Description-Edits). At the time of archiving this data, an internal standard classification scheme had not been institutionally adopted. Each file also contain the household weight; the design strata variable; the age and sex of the person (where applicable) Missing Data Missing data in most case was coded as follow: 0 was used for pre-coded questions -1 was used for expenditure and monetary type data as 0 is a valid reported amount. BLANK values are out of the interview path and are not applicable. If there are questions regarding BLANK records, refer to the stated contents of the file or the universe description as provided at the variable level. eng_eicv2_s3_health
Cases
34785
Variable(s)
72 Enquête Intégrale sur les Conditions de Vie des Ménages 2005-2006 - Files Description
- 13 -