en-1707152307-Quality Guideliness 26 November 2012(SMSC).pdf

Type: Document | Status: ready

A Handbook of Quality Guidelines for Statistical Production in Tanzania

31

3.8.1.2 Items to be coded 3.8.1.3 Coding protocol (manually or automatic) 3.8.1.4 Percentage of manually coded cases to be check coded 3.8.1.5 Data editing protocol 3.8.1.6 Appropriate statistical software 3.8.1.7 Appropriate statistical adjustments (e.g. imputation, weights) 3.8.1.8 Appropriate standard error estimation

3.8.2 Guidelines

3.8.2.1
Use coding to classify survey responses into categories with associated numeric values. This can be done as follows:

3.8.2.1.1 Review survey answers for response patterns and make any necessary modifications to the pre-coded response options in order to accurately represent range of the collected data, as well as use this data review to create codes for each variable that had not been pre-coded. Create code structures systematically as follows;

a. Design the code framework with the following attributes: (i) One value for each code number (ii) A text label for each code number (iii) A code number for each possible response category (remember to include code numbers for item-missing data, e.g. “Don’t know,” “Refused,” and “Not Applicable”)

(iv) Mutually exclusive response categories for each variable (v) The appropriate number of categories to meet the analytic purpose.

b. With hierarchical code structures, have the first character to represent the main coding category with subsequent characters representing subcategories.

c. Use consistent codes across survey items. For example: (i) A “Strongly Agree, Agree, Neither Agree nor Disagree, Disagree, Strongly Disagree” scale would always have the values ranging from 1 = Strongly Agree to 5 = Strongly Disagree.

(ii) A “Yes-No” item would always have the values 1 = Yes and 2 = No.

A Handbook of Quality Guidelines for Statistical Production in Tanzania

32

(iii) Refused item-missing data would always have the values of 9 (or if two-digit code numbers, the values of 99).

d. Keep a link from the codes to the verbatim data to facilitate quality control.

3.8.2.1.2 Generate a data dictionary entry for each survey item. Each entry should contain the following information:

(i) Variable ID, name, and label. (ii) Data format. (iii) Response options and associated code numbers. (iv) Universe statement. (v) Interviewer and respondent instructions.

3.8.2.1.3 Building upon data dictionary, develop a code-book which describes how the survey responses are associated with all the data. The code-book includes additional metadata on the survey items, such as the question text and raw frequency of responses.

3.8.2.1.4 For automated coding, feed the responses into a computer with software that assigns appropriate code- numbers based on matching the responses to a data dictionary.

3.8.2.1.5 Properly train coders on the study’s coding design, and periodically assess their abilities.

3.8.2.2 Capture the data into an electronic format. This can be done as follows;

3.8.2.2.1 Use similar conventions in programming the data entry application as used when programming the survey instrument application. For example, maintain the question order and the measurement units of the survey in the data entry system.

3.8.2.2.2 When entering values, allow for interviewer/keyer edit checks to reduce processing error.

A Handbook of Quality Guidelines for Statistical Production in Tanzania

33

3.8.2.2.3 With a paper and pencil questionnaire, minimize the required amount of interviewer judgment by having an expert, such as a supervisor; check the responses before data entry. The expert should mark the questionnaire with the value to be entered when the response is not clearly indicated.

3.8.2.2.4 Perform independent rekey verification. (a) Have two keyers work separately and then compare their work. (b) Settle the discrepancies with a computer or an adjudicator. (c) Strive to verify 100% of the data entry. (d) Look for the following keyer errors: (i) Wrong column/field (ii) Corrected/modified (misspelled) responses.

3.8.2.2.5 Consider automated alternatives to key entry, including: (i) Optical Character Recognition (OCR) to read machine-generated characters.

(ii) Intelligent Character Recognition (ICR), commonly known as scanning, to interpret handwriting.

(iii) Mark Character Recognition (MCR) to detect markings. 3.8.2.3 Edit the data as a final check for errors as follows;

3.8.2.3.1 Create editing rules that the interviewers and editing staff can follow both during and after data collection. This can include checking for the following: (i) Wild values (such as out of range responses, unspecified response categories, etc.) (ii) Imbalance values (e.g. subcategories that do not sum to the aggregate) (iii) Inconsistent values (e.g. males that report pregnancies, etc) (iv) Implausible outliers (e.g. extremely high or low values) (v) Entirely blank variables. (vi) Confirming the proper flow of skip patterns. (vii) Flagging omitted or duplicated records. (viii) Ensuring a unique identification number for every sample element, as well as a unique identification number for each interviewer.

A Handbook of Quality Guidelines for Statistical Production in Tanzania

34

3.8.2.3.2 Create a flag that indicates a change has been made to the collected data, and keep an un-edited dataset in addition to the corrected dataset. The latter will help to decide whether the editing process adds value. If an unedited data are not kept, it is truly impossible to establish whether or not improvements have been made.

3.8.2.3.3 Assess a random sample of each interviewer’s completed questionnaires by examining the captured data. Review the use of skip patterns and the frequency of item-missing data to see if the interviewer needs additional training on navigating the instrument or probing for complete answers.

3.8.2.4 Develop survey weights for each interviewed element on the sampling frame.

3.8.2.5 Consider using single or multiple imputations to compensate for item-missing data. Single imputation involves replacing each missing item with a single value based on the distribution of the non-missing data or using auxiliary data; and the goal of multiple imputations is to account for the decreased uncertainty imputed values have compared to observed values.

3.8.2.6 When calculating the sampling variance of a complex survey design, use a statistical software package with the appropriate procedures and commands to account for the complex features of the sample design.

3.8.2.7 Document the steps taken in data processing and statistical adjustment.

3.8.3 Quality indicators Main quality dimensions and elements: Interpretability

3.9 Data dissemination Goal: To ensure that data producers and users of all cultures involved in a project follow the accepted standards for the long-term preservation and dissemination of data to the social science research community and the wider public.

3.9.1Quality inputs 3.9.1.1 Procedures for testing accessibility of achieves with knowledgeable users 3.9.1.2 Procedures for electronic preservation of files 3.9.1.3 Procedures for testing files with major statistical packages

A Handbook of Quality Guidelines for Statistical Production in Tanzania

35

3.9.2 Guidelines

3.9.2.1 Make dissemination and data preservation plan early in the statistical project lifecycle that includes archiving, publishing and distribution. Verify and ensure that the released data after all the processing steps are consistent with the source data. In the case of the derived variables, it means that one should be able to reproduce the same results from the source data.

3.9.2.2 Preserve sustainable copies of all key data and documentation files produced during the data collection process, as well as those files made available for secondary analyses. Consider;

3.9.2.2.1 To define the long-term preservation standards and protocols used.

3.9.2.2.2 To maintain older versions of important data and documentation files so that users can follow the changes made from one version to the next.

3.9.2.2.3 Archiving collections in one archive which would keep master copies of files in several locations but minimize the possibility of conflicting versions of data and documentation files.

3.9.2.3 Conduct a disclosure analysis to protect respondent confidentiality. The key goal of disclosure risk analysis is to ensure that the data maintain the greatest potential usefulness while simultaneously, offering the strongest possible protection to the confidentiality of the individual respondents.

3.9.2.4 Think about the production of both public and restricted use of data files. Considering the following:

(a) Make data files fully available to the research community by establishing clear rules under which researchers can obtain the data.

(b) Establish clear policies for how users may access the restricted data files by creating a set of application materials and restricted-use data agreement that specify how users can obtain and use such data.

A Handbook of Quality Guidelines for Statistical Production in Tanzania

36

(c) In order to provide optimal utility for the users, produce a variety of products for varied constituencies;

(i) Produce set-up files and ready to use portable files in SPSS and STATA to address the needs of those who seek to do intensive statistical analyses with particular software packages. (ii) Consider disseminating data on removable media, e.g. CD ROM or DVD if appropriate.

3.9.2.5 Consider disseminating research findings. This can be done by creating a dissemination plan and making research results accessible to the desired audiences such as study participants, community members, Agencies and Services Providers and Policy makers.

3.9.3 Quality indicators Main quality dimensions and elements: Accessibility, Timeliness, Relevance and Coherence.

A Handbook of Quality Guidelines for Statistical Production in Tanzania

37

ANNEXES Annex I Tanzania Household Budget Survey
CONTROL FORM - Complete for every Issued Household

REGION

ENUMERATION AREA (EA)

DISTRICT

HOUSEHOLD NUMBER

Urban 1 Rural 2
DSM 3

WARD

Household Number from the Listing Form

Name of Interviewer Interviewer Code Name of Supervisor Supervisor Code

Original Household 1 Replacement Household 2 Name and surname of Head of Household (HoH)

1 – HoH from listing 2 – New HH Head

Telephone number with dialling codes

New name of HH Head (write in)

/

/

Total number of household members

Date interview completed

/

/

Number of Visits to the Household

Call No.
Time of call Outcome of call Date

Time

Comments. For REFUSALS GIVE A FULL DESCRIPTION OF REASON FOR REFUSAL AND PERSONS WHO REFUSED (GENDER, AGE, ETC.)
1

2

3

Total number of visits

Final Household Outcome

1 Household interviewed 6 Household refused

2 Non-contact after 3 calls 7 Household refused to do Diary

3 Address not found 8 Replacement Address not contacted

4 Address empty/derelict

5 Address temporarily empty

ENTER FINAL HOUSEHOLD OUTCOME CO

National Bureau of Statistics

Director General P.O. Box 796 Dar es Salaam Telephone +255 22 2122724 General Office P.O. Box 796 Dar es Salaam Telephone +255 22 2122722/3 Fax: +255 22 2130852 E-mail:
[email protected],
[Website www.nbs.go.tz]

Vision To be a preferable source of official statistics in Tanzania

Mission To facilitate informed decision-making process, through provision of relevant, timely and reliable user-driven statistical information, coordinating statistical activities and promoting the adherence to statistical methodologies and standards

Page 5 of 5