Home

CENTRO INTERNACIONAL DE MEJOIAMIENTO DE

image

Contents

1. distinguished from the number 1 by putting two horizontal bars onto the letter i the letter Z should be written with one horizontal bar in order to distinguish it from the number 2 the letter G should be carefully distinguished from the number Miu CENTRO INTERNACIONAL DE MEJORAMIENTO DE MAIZ Y TRIGO ROGRALIA FECHA HOJA DE CODIFICACION HOJA LLL SE PROCRAMO
2. the first questionnaire have been coded into the tir e lines of the coding sheet s then all variables from the second questionnaire will be coded into the second line s of the coding sheet s occupying the same data fields for each variable V However this will rarely occur when the questions in the questionnaire are based on the information obtained by a good exploratory survey 14 Example This type of coding has the following advantages 1 Coding all variables from one questionnaire in one step allows detection of further inconsistencies in T 2 After coding has been finished each data field may be easily checked in a vertical manner and values which do not correspond to the category range established for each data field may be detected and corrected 5 0 Key punching After finishing the task of transferring the data onto coding 2 chestal the data usually will be punched onto punch cards dr It is not always necessary to transfer the data onto coding sheets since the preparation of a precoded questionnaire allows key punching right from the questionnaire However such a precoded questionnaire is less flexible e g does not allow omission of certan irrelevant variables from coding or does not allow open questions Enumera tors also may have more difficulties in handling the todos and a precoded questionnaire usually becomes more voluminous Inconsistencies which often are detected in the process of c
3. to be analised through a relatively large number of cross tabulations Surveys with more than 100 farmers can nearly always be analysed PET efficiently by a computer if appropriate computational futher are available This note provides a short overview on presently used com puters hardware and computer programs software assuming that the researcher hacme experience with computers It also presents the steps involved in preparing the data for computer analysis 2 0 Hardware There are effectively three types of computers microcom puters minicomputers and the large mainframe computers The differences most noticed between these TN computer types lies in their speed of computation and in the amount of memory The computer meno is usually expressed in numbers of K bora Any computer has a main or central part Dee of the power supply and the central processing unit CPU Additionally Where may be devices such as video displays with keyboards card readers disk and tape drives printers and other peripheral devices Microcomputers Sfren are not larger than a normal typewriter They usually have an in built video display and iheir memory generally does not exceed 64 K bytes Data are typed in ose keyboard and may be stored on a floppy disk or on a cassette As microcomputers are a relatively PT ENNO a shortage of appropriate software still exists However it is expected that these computers will be increasingly wea in agricul
4. CIMMYT Institutional Multimedia Publications Repository http repository cimmyt org CIMMYT Socioeconomics A general guide to data preparation for computer analysis of farm survey data Hesse E 1982 Downloaded from the CIMMYT Institutional Multimedia Publications Repository A GENERAL GUiDE TO DATA PREPARATION FOR COMPUTER ANALYSIS OF FARM SURVEY DATA Edith Hesse de Polanco Economics Training Note 1982 A GENERAL GUiDE TO DATA PREPARATION FOR COMPUTER ANALYSIS OF FARM SURVEY DATA Edith Hesse de Polanco Economics Training Note 1982 Research Assistant CIMMYT Economics Program Mexico The views expressed are not necessarily those of CIMMYT January 1982 Training Note A General Guide to Data Preparation for Computer Analysis of Farm Survey Data Edith Hesse de Polanco 1 0 Introduction f a computer and an appropriate program for survey data analysis are available the researcher has to decide whether or not these computer facilities should be used to analyse the Survey tats or whether an analysis by hand is sufficient In almost all bases a preliminary analysis by hand is very useful to get a feel for the data in RUNE a complete manual analysis will be more efficient 4 if the number of farmers in the sample is less than 50 If the sample size is about 50 100 farmers a computer analysis may be helpful when the study area is rather complex and arere practices and circum stances need
5. atistical Package for the Social Sciences McGraw Hill 1970 1975 cleared up while the enumerators have the interviews still fresh in mind Unlikely values or illegible data may be noted and sometimes a revisit is required In the case of serious problems certain question naires might be discarded In cases where the sample size is rather small less than 50 farmers values for the most important variables are written by hand for each farmer onto a large sheet of paper This facilitates the manual calculation of simple frequency distributions and means If these calculations are to be performed for specific groups of farmers the task becomes increasingly time consuming TNT NE analysis Once the decision to use a computer for data analysis has been made the data hevente be coded This means that all important infor mation from the questionnaire has to be transferred onto coding Sheers and later onto punch cards according to precise rules Every variable has to be identified boo var able name or number and code dote for each ranie have to be determined This is usually done by preparing a so called code book The beginning of a typical code book is shown in Table l t conta ns three major pieces of information 1 a number and or a shortened name for each variable of the questionnaire 2 the code categories for each variable and 3 the card number and column range into which codes for these gartanles have to be punched onto card
6. d idea to identify those variables to be coded by consecutive numbers i e V1 V2 In certain cases it might be useful to identify certain variables by Serisi names instead of numbers especially for those variables that are analysed frequently For example if topography is an important variable for cross tabulation the variable V4 in Table 1 might be better labeled with TUPOGR In the same way if a number of cross tabulations by village should be done VILLAGE could be used as a variable name However it is impractical to choose partial names for the bulk of the variables 4 2 Code Categories In general codes should be uber although some statistical Packages allow the use of letters or special characters Some questions which often arise when appropriate code categories are to be chosen are discussed below ij Guant Rate or continuous data e g number of hectares of a given crop should always be coded as actual numbers One should never categorize une tes data as this can be done much easier after wards by data transformations in the program For example never code area as 0 10 ha 2 11 20 ha etc if actual area is known This implies a loss in available information and flexibility 7 2 Qualitative or discrete data can be coded by assigning a number to each category For example seed source might be coded as follows 1 own seed 2 from a neighbor 3 from the bank 4 other it is oft
7. en sufficient to use a separate code only for the most commonly occurring categories and to code all residual observations as other 3 Subjective data e g opinions and qualitative data should be grouped into similar categories Example Why didn t you plou Ww a Ground was too hard b Not enough moisture c Couldn t obtain tractor d Tractor has been out of working order e Off farm work f Busy in other farm work g Other In such an open question any number of subjective reasons might appear in the questionnaires In the process of editing it is usually convenient to group togehter some of the answers in order to end up with a reasonable number of categories However in certain cases one farmer might mention two or more categorized reasons to the same question In such a case the researcher has to decide if it is worthwhile to introduce an additional code which indicates the combination of two categories e g Code l not enough moisture and off farm work 4 For coding dates e g weeks or month in which a given practice has been done it is often best to use the number of days or weeks from a key reference point Example 0 harvest month of the previous crop one month after harvest 2 two months after havest 12 etc in this case the time range from one reference point to an other may easily be calculated in the computer program 5 If the farmer is asked to indicate the quanti
8. ne survey to the other in order to make the coding task more straightforward For example one should always use a 0 for no and a 1 for yes or a 1 for manual a 2 for animal and a 3 for tractor etc h 3 Column Range In gut experience we found it useful to assign the same number of columns to each coded variable This means that even in the case of a yes no question the data field should be three columns wide even though the code for yes usually a 1 and the code for no usually a 0 will only occupy one column This code citer into the right justified column of the data field leaving the left two columns blank In very few cases where certain quantities might occupy a four column data field e g tractor rental 1800 ha the values for every farmer of this tractor rental variable should be coded by dividing ail values by 10 h h Identification Code A complete identification code allows the researcher to identify each data card in a unique manner TIn cases where a two stage sampling of farmers is used dade a village sample and then a farmer sample the identification normally will consist of two different codes one referring to the village and the other referring to the Free in most cases the coding of all important variables V The column range eee to a certain variable is usually called a data field 2 In the computer language you will later use the expression observation case or unit ins
9. o variety groups new and old respectively Additional codes identify each variable in the two groups Interpretation of frequency tables is easier if coding is organized this way 10 8 The researcher always must have in mind what type of analysis is ccs done with each variable in order to determine the correct form for coding i e coding several variables from one question or coding only one variable with several code categories from the same question Example In an irr gated area in northern Mexico where two crops by season were grown researchers were interested in knowing to what extent the weed problems observed in the field were related to the preceding crops In this case the farmer was asked Which grop did you plant on this field in 1980 1979 1978 Summer Winter Summer Winter Summer Winter There was an initial temptation to code this question using six different variables i e one for icu crop cycle But as the astral analysis was a crosstabula tion of weed problems by previous rotation it was decided to code it in the following way Only one variable called previous crop was coded using the following categories il Cotton 1980 Safflower 1980 Other row crop 1980 One year continuous wheat Two years cont nuous wheat Three years continuous wheat fl AM VW N u fl 11 9 Code categories should be uniform not only within one survey but also from o
10. ocial Sciences SPSS also has ads facilities for the analysis of survey data Both packages consist ot a large number of statistical procedures and are highly flexible with respect to data manipulation and representation An inexperienced researcher needs only a day or so to get acquainted with the basic instructions contained in the special user s gades a though some guidance from the local computer staff may be useful Data preparation begins before the questionnaires arrive in the office The researcher has to check and edit every questionnaire thoroughly as soon as possible At this stage inconsistencies should be V FAO FARMAP User s Manual Farm Management Data Analysis Package Rome 1981 Hesse de Polanco E amp P Walker A User s Guide to FASAP A FORTRAN Program for the Analysis of Farm Survey Data CIMMYT Economics Working Paper Sept 1980 The latter program has been developed for a minicomputer but is also readily usable in larger computers It has facilities for data transfor mation and missing values performs one way frequencies cross tabulations and tables of means by group all with the associated statistics SAS Institute SAS Introductory Guide 1978 SAS Circle Box 8000 Cary North Carolina 27511 SAS Institute SAS User s Guide 1979 Edition Post Office Box 10066 Raleigh North Carolina 27605 Both gu des are available at US 10 00 at their respective addresses NIE N H et al St
11. oding and which can still be cleared up cannot be detected by a key puncher n the case of microcomputers data usually are directly typed on keyboard and stored on a floppy disk or a cassette 15 Key punching usually is done by a specially trained key punch operator and is verified by redoing it on a verification machine in certain cases cards may have to be interspersed after pos EH AS A data printout should be requested immediately afterwards to allow the researcher himself a thorough check of his coded data Errors will be marked on the B nchecards and corrections often may be done by the researcher himself After data checking and correction data usually will be put onto a disk or tape file because punch cards easily become damaged if they are put into the card reader many times It is also more expensive to read the data from the cards instead of reading them from a tape or disk file since a unit cost per card read will be charged However a temporarily file on ie or disk might be deleted accidentally so that cards always should be stored in a safe place considering that humid conditions may cause deterioration Program instructions are generally written also onto the coding sheets and later punched onto punch cards Some simple rules should be observed in rder to prevent mistakes during key punching the letter 0 should be distinguished from the number zero by crossing the letter O by a atash 8 the letter has to be
12. s V A typical coding sheet is included in the annex Each line of a coding sheet is later punched onto one card The above mentioned preliminary manual analysis might also be done from the coding Sheets or from a later computer listing of the data Identification Table 1 Variable Names VILLAGE FARM CARD V1 V2 V3 vh V25 VILLAGE FARM CARD V26 1 CODE BOOK Code Column Range l Tequesquinahuac 1 2 Huexotla 3 Tlaixpan No 1 100 2 h Card Number 5 Number of plots 6 8 Hectares of wheat 9 11 Hectares of maize 12 14 l flat 15 17 2 some slope 3 steep tractor use yes 78 80 O tractor use no the same as in card number 1 2 4 Card Number 2 5 Tractor Rental ha 6 8 See paragraph 3 2 Identification Code Continuation Table l V27 l not enough moisture 9 11 2 tractor not available 3 didn t have time k not enough moisture and not enough time v28 l owned 12 14 2 rented 3 community l government V29 l fertilizer use yes 18 20 2 fertilizer use no V50 It of water per ha used for herbicide application 78 80 VILLAGE 5 the same as in card number FARM 2 4 CARD Card Number 3 5 N51 manual 6 8 2 animal 3 tractor etc us 4 1 Variable Names in most computer programs variable names should begin with a letter should not be longer than 8 characters and shouid not include blank spaces between the characters It is normally a goo
13. sing values before data are coded 1 Using SAS for our data analysis we coded this type of missing values as R and using FASAP we coded it as 1 2 using SAS we coded it as N and using FASAP as 1 21 13 4 6 Additional Hints for Data Preparation In the process of coding the order of questions i e variables should not be changed That is each variable should be coded in the same order as it appears in the questionnaire This eat mean that clearly worthless variables should not be omitted For example the variable tractor use yes no should not be c if all sampled farmers were using a tractor and in the same way the variable herbicide use yes no should not be coded if no sampled farmer used a herbicide In our experience the best way to transfer the data from the questionnaires onto the coding sheets is the following All va riables from one questionnaire should be coded in one step using different coding chests t necessary For example if 25 variables can be coded into the 80 columns of one line of the coding sheet i e the 80 columns of one punch card it is best to use one coding sheet for the first 25 variables and the following set of 25 variables is coded onto a second coding sheet the next 25 variables onto a third coding sheet and so on repeating the identification code and identi fying eschi codiitg sheet by a consecutive number see example below and also Table 1 When all variables of
14. tead of farmer 12 from one questionnaire will occupy more than one punch card i e more than 80 columns In diese cases the TU number also has to be coded see Table 1 Some people even use an identification code for the SUED e g they put a special code for survey in the beginning or final columns of each data card 4 5 Missing Value Indicators At the beginning of any computer analysis it is important to determine how missing data are to be handled In survey data two types of missing data are usually found In some cases the farmer uses a certain input but does not remember the quantity or the date when he applied it In very few cases the foris may amply decline to answer a certain question For these cases a miss ing value indicator for no response shouid be coded The second type of missing value indicator is used in those cases where a certain question is not appropriate to the specific situation of the farmer For example it is senseless to ask the farmer whether he used an owned or rented tractor if we know from a previous question that he used no tractor at all A missing 2 value indicator for not appropriate question should then be coded into the data field of the variable tractor aonni The form and the handling of missing value indicators depend on the software Therefore it fs important to know which program or system package will be used in order to observe the existing rules with respect to the mis
15. tural research in developing countries primarily because 2 of their relatively low costs Minicomputers normally have a memory larger than 64 K bytes They may be connected to larger mainframe computers and data are usually read in through a card reader Data can later be stored on 1 All computers use the binary system i e any number letter or special character is expressed by a combination of zeros or ones Each O or is called a bit The minimum amount of bits necessary to represent a character is called a byte Finally 1024 bytes form one K bytes 2 A 48 K byte microcomputer system with a floppy disk drive and a printer presently costs about US 3000 4000 3 tapes or disks Minicomputers normally use computer programs similar to those of large mmainframe computers although ike prepared statistical packages described below usually are too large for minicomputers Large anant came computers such as the IBM 360 and the IBM 370 have a much larger menory and work at a considerably higher speed Apart from their wide use in T countries they are now installed in many research and dovsrnmental institutions in developing countries However in some cases the appropriate software is not available or has not been successfully implemented so that some of these computers are not being used at their full potential The above nie between mini and micro and micro and mainframe computers can serve as a S
16. ty of a given input it is always important to ask and code first a so called yes no question Example l Did you use herbicide yes no 2 How much did you apply lt ha If the first question wouldn t have been included the temptation to put a zero for a non user into iE data field for question No 2 becomes evident For non users question No 2 has to be coded with a missing value indicator V See paragraph 3 5 for the description of missing value indicators 9 The same occurs with the performance of certain practices with their related questions Example l Did you plough 2 Date of ploughing 3 Implement for ploughing etc If the first question is answered by no all following related questions should be coded with a missing value indicator 6 The coding of fertilizer data may cause problems since there often exist a abat of N and P products and compounds The best way to handle ft is to manually calculate nutrients applied and to code then these nutrient quantities A possible exception is when the form of fertilizer is itself a variable 7 Code categories themselves may be categorized to facilitate the analysis and data interpretation For example barley varieties might be coded as follows Example 11 Cerro Prieto 12 Puebla New varieties 13 Centinela 24 Apizaco 25 Porvenir Old varieties 26 Chevalier In this example a Code and 2 were chosen to identify the tw
17. vsetical guide However the rapid changes in computer technology are making the described differences less recognizabie 3 0 Software The crucial factor in the usefulness of a given computer for the researcher is the availability of software There are a number of programs and software packages which may be used for the analysis of farm survey data Some are only available on large computers others are designed for minicomputers but can also be used on large computers and very few are presently avaliable for microcomputers although this situation is changing rapidly For example a small program for the analysis of experimental data including economie analysis will soon be 2 available for a microcomputer Two other small programs specially V This can range from 256 K bytes up to several Megabytes 1 Mega byte 1024 K bytes 2 Stilwell T C Manual del usuario del sistema de estad stica agr cola Para el digital PDP 11 45 Consortium for International Development Cochabamba Bolivia zc designed for the analysis of farm survey data are presently available for nini conputers However if a large computer with one of the more widely used statistical packages is available the researcher should t5 to get access to it especially for the analysis of larger surveys The Statistical Analysis System SAS allows the analysis of experimental data as well as of survey data The statistical Package for the S

Download Pdf Manuals

image

Related Search

Related Contents

Téléchargez ce conseil de déménagement en format imprimable PDF  Evénements de la semaine du 7 au 13 juillet 2008  Téléchargez la notice PRO LITE (fichier pdf / 558 ko)  Mark IV Opus Manual  Especificações Técnicas  Rappels de produit de consommation  automatism for overhead doors assembly and user's manual  Chariot télescopique rotatif  ISC-P    

Copyright © All rights reserved.
DMCA: DMCA_mwitty#outlook.com.