Home

STATA 11 for Windows SAMPLE SESSION

1. gt district monapo quarts 1 Variable Obs Mean Std Dev Min Max cret_ae 28 1171 714 420 7546 224 4898 1806 867 gt district monapo quarts 2 Variable Obs Mean Std Dev Min Max cret_ae 27 2239 088 199 4202 1888 33 2554 892 gt district monapo quarts 3 Variable Obs Mean Std Dev Min Max cret_ae 2a 3343 003 461 9159 2685 971 4303 122 gt district monapo quarts 4 Variable Obs Mean Std Dev Min Max cret_ae 27 7619 101 3557135 4359 737 20873 97 gt district ribaue quarts 1 Variable Obs Mean Std Dev Min Max cret_ae 30 T29 1391 358 8783 429 2929 1790 432 gt district ribaue quarts 2 Variable Obs Mean Std Dev Min Max cret_ae 30 2171 697 205 3644 1835 298 2566 006 gt district ribaue quarts 3 Variable Obs Mean Std Dev Min Max cret_ae 30 3165 192 330 2283 2578 604 3731 045 gt district ribaue quarts 4 Variable Obs Mean Std Dev Min Max cret_ae 29 5828 97 1632 9 3825 879 9464 901 gt district angoche quarts 1 Variable Obs Mean Std Dev in Max cret_ae 29 929 4182 388 3228 207 9077 1395 962 gt district angoche quarts 2 Variable Obs Mean Std Dev in Max cret_ae 29 1718 789 166 1601 1447 059 1984 059 gt district angoche quarts 3 Variable Obs Mean Std Dev in Max cret_ae 29 2442 247 347 8035 1997 711 3063 996 gt district angoche quarts 4 Variable Obs Mean Std Dev Min Max cret_ae 28 5022 29 2443 45
2. Variables leave empty for all variables Column widths Default Compress width of columns in both table and display formats Use display format of each variable C Overtide minimum abbreviation of variable names C Truncate string variables 0 C Do not list observation numbers C Display all levels of factor variables The list dialog box list List values of variables has 5 tabs where you can set specific parameters for the data that you want to list On the Main tab you can specify the variables to be listed or leave it blank to list all variables The default column width separates each variable by 5 spaces and shows the variables in display format Below is an example district angoche vil ca2 monari son daugh ca univ single arizona 1 Select the variables using the drop down arrow district vil hh mem cal ca2 ca3 ca4 cad ca6 Note if you wished to include all variables leave the box empty 2 Click on the tab labeled by if in In this tab we can limit the number of cases that are displayed 3 Check the box V next to Use a range of obser vations Specify the range to be from 1 to 10 4 Click on the Options tab Under Table options check the box V next to Force a clean table Note value labels will be displayed To see the numeric 33 Stata 11 Sample Session Section I Basic functions Files De
3. relation to Age group head 0 to 10 Ti to 9 20 to 60 61 and older Total head 6 296 41 343 wife husband 25 280 5 310 son daugher 503 184 3t 718 mother father 5 J 6 other relative 70 55 16 2 143 Total 573 270 628 49 1 520 Copying tables from the Results window relation to head 0 told 11 to 19 head 0 6 0 00 1 75 0 00 2 22 wife husband 0 0 00 8 06 0 00 9 26 son daugher 503 70 06 25 63 87 78 68 15 mother father 0 0 00 0 00 0 00 0 00 You can also copy the information from the Results window into your word processor Stata provides three choices from the Edit menu for copying tables Click on Edit to look at the choices 1 Copy text Ctrl C copies the table as straight text 2 Copy table Shift Ctrl C copies the table and includes tabs where it thinks there should be tabs 3 Copy table as HTML Shift Ctrl Alt C copies the table into HTML format You may encounter problems with the second and third options if you use these Stata determines if there should be tabs and may not make the correct decision You might need to increase the width of the columns in the output to make sure that tabs are included Below is an example of the same table using the Shift Ctrl C Age group 20 to 60 61 andol Total 296 41 343 86 30 11 95 100 00 47 13 83 67 22 57 25 280 5 310 90 32 1 61 100 00 44 59 10 20 20 39 184 31 0 718 4 32 0 00 100 00 4 94 0 00 47 24 0 5 1 6 83 33 16 67 100 00 0 80 2 04 0 39
4. 2 Select List value labels Select district and vil click on and switch to the do file editor to paste the command Switch back to the dialog box and click on Ok The listing shows you what values are assigned to a label Note A label name can be assigned to multiple variables You can create a label name for 1 yes 2 no and assign that label name to several different variables In the Command window you can also type label list district vil To document all the variables including those that do not have value labels another command is available 1 From the Data menu select Describe data 2 Select Describe data contents codebook Click on to copy the command and switch to the do file to paste the command Switch back to 25 Stata 11 Sample Session Section I Basic functions Files Descriptives Data Transformations Generating descriptive statistics The summarize and tabulate commands Continuous variable Categorical variable the dialog box and click on Ok In the Command window you can also type In this output every variable is listed The type of variable is given the range of values in the variable number of unique values how many cases have a missing value and it also includes descriptive statistics for variables The output for the descriptives is based on whether Stata thinks the variables are continuous or categorical Stata cannot always tell if the variable is categorical so it d
5. Assign value label to variables The Label values Assign value label to a variable dialog box opens The default choice is to attach a value label to variables In the Variables box select age_gp This is the variable that we want to attach a label to In the Value label box select age_gp Click on the copy button switch to the do file editor paste the command switch back and click on the Ok button The Stata command is label values age_gp age_gp The steps to label a variable and define value labels has been made much easier in this version of Stata Another method we can use is to generate the new variable assign the new values and assign the labels for the values in one step Select Create or change variables from the Data menu Select Other variable transformation commands Select Recode categorical variable In the Main tab select ca3 in the Variables box In the Required box specify the range you want and the new value to be assigned as well as the label for that new value e g 0 10 1 0 to 10 In the Optional boxes continue to specify the ranges and value to be assigned e g 10 001 19 2 11 to 19 19 001 60 3 20 to 60 60 001 max 4 61 and older Note examples on how to specify the value can be see if you click on the Examples button Click on the Options tab Click on the radio button next to Generate new variables In the box type the name
6. cprod_ae if district z nquantiles 4 create a new variable generate quart replace values from quartile1 quartile2 quartile3 into quart levelsof district local levels foreach z of local levels replace quart quartile z if district z delete the 3 extra variables levelsof district local levels foreach z of local levels drop quartile z tabulate quart district label variable quart Calorie production quartile produce the table by district quart sort summarize cprod_ae sort file by key variables sort district vil hh save hh file3 dta replace close log file log close 85 Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation Exercise 2 1 as eas HR Produce similar output using calories retained production minus sales instead of calories produced It will show calories retained per adult equivalent per day from the total of the same six food crops The output should be broken down by district and calories retained quartile Hints a The procedure is very similar to the work that we just completed Open a new do file to save your commands for this exercise Sales come from c q5 dta Check the file for the appropriate variable for the quantity of sold production Note that the product codes are the same as for c q4 dta Also check for the variables by which to sort You can start
7. unmatched master _merge _merge The above command tells Stata to merge the working data file or master the file in memory with the 58 Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation Check the resulting data file conver dta file or using data file using conver dta as a table lookup to add the conver variable to our working data file We had renamed p1a to unit Key variables are required in any procedure to merge two files when one of the files is being used as a keyed table Our key variables specify how to merge the lookup file using product and unit the grouping variables because we have a different conversion factor for each product unit combination If we had used only prod Stata would expect each product to have only a single conversion factor with the same value regardless of the unit of measurement used For example it would expect the same conversion factor for rice whether it was in a 100 kg bag or a 20 liter can This would be incorrect The new working file produced by the join contains the needed conversion factor variable conver For every product unit combination conver is equal to the number of kilograms in that unit It is always important to verify if the join was successfully completed Click on the Data Browser button to look at some cases to verify that the conversion factors match the products We could also use the list comman
8. 3 Click on the icon to copy the command to the clipboard and then click on Ok In the Command window you can also type labelbook to obtain the same results This command describes only those variables with value labels It is a good command to document these variables This output is quite long You will see more at the bottom of the Results screen more indicates there is 24 Stata 11 Sample Session Section I Basic functions Files Descriptives Data Transformations The label list command The codebook command more information to be displayed but the display has paused so that you can view the first part of the output You will need to click on more several times to see the complete output To continue to the next screen click the lt spacebar gt or you can click on the more or you could also click on the green button on the tool bar This button only shows on the icon bar if there is more output to be seen in the Results window If you wish not have the output displayed one screen at a time you can turn this feature off The command is You can include it at the beginning of the do file so that when you want to run the do file another time more will be turned off If you want to stop the listing from completing you can click on the You can select specific variables to only look at those labels From the menus 1 From the Data menu select Data utilities Label utilities
9. See reshape notes below fillin varlist adds observations with missing data so that all combinations of varlist exist thus rectangularizing the file the variable _fillin is added to the data _fillin is 1 for created observations and 0 for previously existing observations svy commands these are commands prefixed with svy and they pertain to commands used in analyzing survey data calculates and displays tables of statistics format varlist fmt formats numeric variables as follows number before the decimal indicates the length of the variable number after the decimal indicates number of decimal places g general numeric format 5 0g t f fixed numeric format e g 5 2f e base 10 power strings are formatted as follows and can be 81 chars long its e g 10s Reshape notes The reshape command is particularly useful for files such as that shown in the following example Households were asked about the number of livestock owned for three types of livestock coded 330 331 and 335 To save on data entry time only those entries reporting any livestock were entered Missing livestock codes in the file therefore means that the household did not own the livestock associated with the code The file looks like this 115 Stata 11 Sample Session Annex I I Survey Instrument hh animcode num 206 331 70 217 331 65 217 335 8 221 330 1200 221 331 200 The above file could have been organized such t
10. What if we had 20 districts This method would be a bit cumbersome Another method is to use a counter Add the command for z in num 1 3 in front of the xtile command where z is a temporary variable that loops through the values specified with the num 1 3 The value of 3 would be replaced with the number of values in the district variable Note that the z is added to the quart variable and that z is used instead of the actual numeric value for the value to use for the value for district Stata provides another looping command that we can use to compute the new ranking variable It is not available through the menus The looping command can be found in the Programming manual and is called foreach Stata added a new command called levelsof The values are stored in temporary variables called r levelsof That information can be stored in a local variable and the variable used to cycle through the values 1 Type the following command in the Command window levelsof district The results should display the values of the districts e g 123 2 Now let s store that information in a local variable To make a temporary local level we include the word local which means the variable only exists with the do file We need this 76 Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation command to be placed in the do file Switch to the do file ed
11. nameofdofile do which you can see in the Results window Stata stores your data in a data file In addition to the values themselves a data file contains such things as variable labels and value labels formatting information missing value specifications notes etc Before you can do any data analysis in Stata 11 you must first tell Stata to open a Data file Select File from the menu select Open highlight a data file example c hh dta and click on The command is immediately run The data in the file are now available to be viewed in the Data Editor window In the Review window you see the command that opened the data file In the Variables window you see the list of variables that are available There are 2 methods that you can use to look at the data The first opens the file in the Data Editor window In this window you can manually change the data so be careful when you use this method The other method opens the data in a browser window where you cannot change any of the values but you can sort and look at the data 1 The first method to view the data is to open the Data Editor window Click on the Data Editor button Eh or in the Command window you can type edit and press lt Enter gt If value labels have been assigned to the values in a variable you will see the value label rather than the actual value Below is an example of a data file with value labels displayed for some variables and values only for other varia
12. ncrop 8 136 households had no increased sales on any of the crops ncrop 0 C What is the distribution of the crops For this question we can use the summarize command but we could also use the tabstat command 1 From the menus click on Statistics then Summaries tables amp tests then Tables then Table of summary statistics tabstat 2 Under the Main tab type h64 in the Variables box 3 Under Statistics to display place a tick in the first box and select Sum as the statistic 4 Under the Options tab in the Use as Columns change to Statistics 5 Click on the copy button switch to the do file editor paste the command switch back to the dialog box and click on Ok The Stata command is tabstat h64 statistics sum columns statistics You see all the h64 variables with a count of the number of cases where yes was specified Manioc h64b was the most frequent crop for which households had increased sales 115 sorghum h64g was the least 14 Multiple Response The other type of multiple response question is where the survey question asks the respondent to list up to xx choices from a set of ten choices If four 96 Stata 11 Sample Session Section 3 Tables and other Types of Analysis responses are requested four variables must be used to code the responses The set of possible responses are assigned sequential values and the same set of values are used for each of the 4
13. on the icon You can see a list of all the variables with the details for each variable To exit the Data Editor click on the x in the upper right hand corner of the Data Editor The window closes You will often get a data file compute new variables make transformations and finally save the modified set of data to a new name to be used at another time For example you might retrieve a data file with land area per crop add to it production per crop from another file and then calculate yield If you want to use the new production and yield variables at a later time you must make sure that the data file is saved with the new variables in it Never save data that has been modified to the same file name unless these are permanent changes to be made to the original data set We do not want to save any changes that were made to this data file Below are instructions on how to save the file if you wanted save the data file You can close the Data Editor or you can also save the file within the Data Editor From the menus select File Save As and enter the name From the Command window you can also type save newfilename or if you want to use the same name type save replace 15 Stata 11 Sample Session C The Brower Window The browse command D The Stata Results Window E The Command Window F The Viewer Section 0 File structure and Basic Operations for Stata 11 The same name as
14. 100 2 sack 50 3 kilo 4 liter 5 can 20 Period of sale 1 planting Aug Dec 2 hungry period Jan April 3 this year s harvest 4 various times Motive for sale at this time 1 needed money 2 buyers available 3 consumer goods available 4 attractive price 1 lojista 2 wholesaler 3 AGRICOM 4 ambulante 5 brigada 6 company Locale of sale 1 farmgate house 2 village 3 locality 4 district 5 province Distance from the farm enter the kms between farmer and point of sale Who in the household is responsible for the sale 1 husband 2 wife Why sold to this buyer Value of Sales meticais Unit 1 unit price 2 total value 1 the only one available 2 always sell to this one 3 best price 4 transportation provided 5 carries consumer goods gt N B Not all of the variables that appear in the printed table are in file c q5 dta Only variables VEN V2a V2b V9a and V9b were kept for this exercise The PROD variable replaces the V1 variable 122
15. 239 09 3 343 00 7 619 10 37 0 2101 1 806 87 2 554 89 4 303 12 20 873 97 20 873 97 224 49 1 888 33 2 685 97 4 359 74 224 49 ribaue 1 291 39 2 171 70 3 165 37 5 828 97 3 081 46 1 790 43 2 566 01 3 734 49 9 464 90 9 464 90 429 29 783530 2 578 60 3 825 88 429 29 angoche 929 42 Le 1819 2 442 25 57022429 2 506 50 1 395 96 1 984 06 3 064 00 12 674 86 12 674 86 209I 1 447 06 Tp GOT EEE 3 134 74 207 91 Total 1 118 42 2 040 13 2 977 30 6 135 48 3 044 26 1 806 87 2 566 01 4 303 12 20 873 97 20 873 97 207 91 1 447 06 1 997 6 721 3 134 74 207 91 Multiple Response Questions 1 Multiple dichotomy yes no questions One of the types of question used in survey research asks the respondent to select multiple answers A single variable cannot record the answers to this type of question adequately because a variable can have only one value The solution is to record each possible response in a different variable The responses can be analyzed separately using commands you have already seen tabulate but ideally we want to analyze these related variables jointly Multiple dichotomy yes no questions Ifa survey question asks the respondent to check all that apply from a set of ten choices a separate variable is required for each of the ten responses Each variable has a value to indicate whether the response was checked 1 or yes or not checked 2 orno An example of this type of question can be found in the household level survey question
16. 55 2 030 40 3 175 78 5 066 72 28 465 75 28 465 75 294 10 1 984 11 3 009 46 5 021 75 294 10 Print a table from the Viewer Exercise 3 1 The table command permits you to specify more than one variable to summarize and also permits formatting of the contents of the table A simple way to print a table you have just created is to open the Viewer select the table and print 1 Open the Viewer Click on File then View A dialog box opens asking for the name of the file 2 Click on the Browse button and select the file session3 smcl and click on Ok 3 Scroll down to the table you want to print and block it 4 Click on File then Print then Viewer The Print dialog box opens Under Page Range click on the radio button next to Selection Then click on Print 5 Another dialog box opens labeled Output Settings In this box you can specify a Header a Name and a Project If you do not want line numbers and the Stata logo to print you should remove the ticks next to the boxes labeled Print Line s and Print Logo 6 Click on OK to print the selection Produce a similarly formatted table using calories retained using the data file that was created in Exercise 2 1 Include totals by 93 Stata 11 Sample Session Section 3 Tables and other Types of Analysis retained quartile Your table should look similar to the table below Calories retained quartile district 1 2 3 4 Total monapo a Taree be Derg 2
17. R KR resets the information in the dialog box so that nothing has been selected The third icon will copy the command to the clipboard You can then switch to the do file and paste the command into the do file On the right hand side you have the choices to click on Ok Cancel or Submit If you choose the dialog box remains open so that you can select another option within the dialog box without having to open the box again If you choose the dialog box closes The command is automatically executed whether you choose Submit or Ok We want a description of all variables therefore we can leave the list of variables blank Before you click on Ok click on the Copy button Switch to the do file and press lt Ctrl V gt or right click and choose Paste to paste the command Switch back to the dialog box and click on Ok In the Results window you will see the description of 22 Stata 11 Sample Session Section I Basic functions Files Descriptives Data Transformations the variables To obtain the same results from the Command window you can type The output shows the file name the number of observations the number of variables the size and then information about each of the variables the storage type the display format the value label and variable label Contains data from c qla dta obs 1 524 vars 11 size 73 152 93 0 of memory free storage display value variable name type format label va
18. Stata command is histogram ca3 width 5 frequency See a copy of the graph below If you want to save this graph to a word processing document you can lt right click gt on the graph select copy graph then switch to your word processor and 31 Stata 11 Sample Session Section I Basic functions Files Descriptives Data Transformations paste it into the document If you want to save the graph to disk lt right click gt and choose save graph Note Only one graph appears in the graph window at a time If you run multiple graph commands at one time from a do file only the last graph will be visible You must run one command save or copy the graph then run the next graph command save or copy that graph For a more detailed description of the sub commands available for Summarize and Tabulate refer to the Guide for STATA References S Z oO oO ap Sa N gt g 6b O D LL oO d E 80 The list command You may want to look at the data selecting only specific cases rather than scrolling down through the data set to find a specific case or cases The list command gives you the option to select all or specific cases 1 From the Data then Describe Data menu select List Data 32 Stata 11 Sample Session Section l Basic functions Files Descriptives Data Transformations E list List values of variables Main by f in Options Summary Advanced
19. Type the following block and run the commands delete temporary variables levelsof district local levels foreach z of local levels drop quartile z Always check the new variables that are created to see if the values are what you expect to see We can use the tabulate command with 2 variables district and quart to check the variables 1 From the menus click on Statistics then Summaries tables amp tests then Tables then Two way tables with measures of association The tabulate2 Two way tables dialog box opens 2 In the Row Variable box select quart In the Column Variable box select district 4 Click on the copy button switch to the do file 78 ios Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation monapo editor paste the command switch back to the dialog box and click on Ok 5 Write a comment in the do file to explain what the commands are doing The number of cases in each cell should be almost the same counts plus or minus a case or two e g district ribaue angoche 28 27 27 27 30 29 30 29 30 29 29 28 109 Examples of the foreach looping command Lg 115 The new variable requires a label 1 Click on Data then Data utilities then Label utilities then Label variable 2 Inthe Variables box select the name of the first variable quart 3 Inthe Attach label to variable up to 80 c
20. also help point out obvious errors e g a variable whose values are missing for all listed cases Decide which of the variables in this file are continuous and which are categorical normally you would refer to the questionnaire to make this decision You need to know this in order to select the right procedure to use for each variable If you mistakenly perform a Tabulate on a continuous variable you will probably get more output than you really want with possibly hundreds of different categories one for each different value found If you perform a Summarize on a categorical variable you will usually get meaningless results since the average value of a variable that consists of categories has no real significance By examining the data you should have found that variable Ca3 age is continuous and the remaining variables are categorical To run descriptives on ca3 do the following 1 From the Statistics menu select Summaries Tables amp Tests then Summary and Descriptive Statistics then Summary Statistics This will open the Summarize Summary Statistics dialog box This command is also available from Data Describe data Summary Statistics 27 Stata 11 Sample Session Section I Basic functions Files Descriptives Data Transformations 2 The cursor should be in the variables box There is a dropdown arrow at the end of the variables box g Click on the drop down arrow to select the variables
21. and Basic Operations for Stata 11 This section introduces the basic concepts of levels the notion of cross sectional analysis and consequently the methods of data organization This section gives a brief description of the file structure of Stata version 10 It is essential that you read through this section before starting the cross sectional session Overview When you open Stata 11 for the first time you will see four different windows within the program the Results window results of a command are displayed in this window the Review window commands submitted to the processor appear in this window the Variables window the list of variable names in the data set that has been opened and the Command window where commands can be typed this is the active window at startup rit Stata IC 11 0 Results File Edit Data Graphics Statistics User Window Help BaS E EAN A ALAE x Command copyright 1984 2009 Statistics Data Analysis Statacorp 4905 Lakeway Drive college Station Texas 77845 USA 800 STATA PC http www stata com 979 696 4600 stataGstata com 979 696 4601 fax Single user Stata perpetual license Serial number 30110536417 Licensed to Margaret Beaver Michigan State University m option or set memory 300 00 MB allocated to data Format New update available type update all lt Percent complete 0 Other windows are available but are not opened at start
22. back to the Stata window 1 From the File menu select Exit A dialog box will open to say that Data have been changed without being saved Do you really want to exit 2 Click on Yes We do not need to save the newly created categorical variable We will not be using it again If you want to look at the log file that we just created open Stata 1 From the File menu select View A dialog box will open Choose File to View 2 Click on the browse button You will see listed a file called session1 smcl 3 Select that file click on Open then click on Ok The Viewer opens and displays the log file that has saved all the commands and output from Section 1 of the tutorial To close the Viewer click on the xX in the upper right hand corner of the Viewer 51 Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation STATA 11 SAMPLE SESSION SECTION 2 Restructuring Data Files Table Lookup amp Aggregation Restructuring Data Files For some types of analysis the data files may need to be restructured to a different level The data from the four sections of the questionnaires household member production and sales are in four separate data files because the data are at different levels The household data is at the most general or highest level one case per household The other three files contain more detailed data which is usually thought of as
23. c qia dta This file contains data on household roster characteristics It is at the household member level We need to use the variables ca3 age and ca4 sex in this exercise to compute the number of adult equivalents per household e c q4 dta This file contains data on crops produced by the household The variables we need to calculate the total production of the household are a prod contains the codes for the agricultural crop produced b pia contains the codes for the unit in which the production was measured 100 kg sack 50 kg sack etc c pib contains the number of units produced for the year Note that the unit of production is not a standard unit for each crop For example a 100 kg sack as the term is used in Mozambique weighs 100 kg only when the sack is filled with maize When it is filled with manioc root it weighs much less than 100 kg Thus we need conversion factors to be able to convert each of the units in which production was actually measured to our standard unit which is the kilogram e conver dta This is a table lookup file This file was created specifically to handle the problem of converting non standard units to a standard unit For each product unit combination there is a conversion factor to convert the measurement to equal the weight in kilograms In other words there is a different conversion factor for each product unit combination For example the conversion factor for a 50 kg
24. dialog box check v Customize format and change the contents to read 11 2fc The Help format button shows different formats that can be specified This format says to use a width of 11 with 2 decimals fc means fixed format with a comma Click on Ok 12 Click on the copy button switch to the do file editor paste the command switch back to the dialog box and click on OK For each district the first row is the mean the second row is the maximum and the third row is the minimum The three Stata commands are 92 Stata 11 Sample Session Section 3 Tables and other Types of Analysis by district quart sort summarize cprod_ae tabulate district quart summarize cprod_ae nostandard nofreq noobs table district quart contents mean cprod_ae max cprod_ae min cprod_ae row col format 1 1 2fc TABLE 1 Food Production in calories per adult equivalent per day Mean Maximum and Minimum Calorie production quartile district 1 2 3 4 Total monapo 1 248 70 2 539 36 3 997 49 9 150 02 4 206 51 1 972 67 3717578 5 066 72 28 465 75 28 465 75 294 10 1 984 11 3 229895 5 10712 294 10 ribaue 1 502 24 2 554 49 4 062 53 7 607 72 3 900 85 2 030 40 3 141 39 4 983 72 13 123 99 L31239 429 29 2 082 42 3 190 41 5715159 429 29 angoche 1 297 97 2 465 51 3 698 81 8 495 49 3 950 26 2 023 65 2 996 37 4 691 52 20 485 10 20 485 10 353 88 2 037 20 3 009 46 57021 T5 353 88 Total 17 3924 95 2 519 74 3 919 46 8 399 38 4 014
25. estimation Let s run the same analysis with only the weight specified to see the difference 5 Click on the tab labeled SE Cluster then click on the button labeled Survey settings 6 Click on the button labeled Clear settings 7 Click on the Weights tab 8 Click on the radio button next to Sampling Weight Variable 9 Click on the drop down arrow for the Sampling weight variable box and select hhwgt 10 Click on the copy button switch to the do file editor paste the command switch back to the dialog box and click on Ok 11 Click on the task svy total on the Windows task bar 12 Click on the copy button switch to the do file editor paste the command switch back to the dialog box and click on Ok Note we have gotten the same point estimate as the design based estimate but the standard errors are much smaller The second table does not account for the sampling design svyset _n pweight hhwgt vce linearized singleunit missing pweight VCE Single uni hhwgt linearized missing Strata 1 SU es FPC 1 svy linearized total lt one gt lt observations gt lt zero gt maizea ricea milleta sunfa running total on estimation sample Survey Total estimation Number of strata Number of PSUs Number of obs 6601 Population size 807414 Design df 6600 Total Linearized Std Err 95 Conf Interval maizea ricea milleta sunfa 649230 9 1
26. f beans l yes 2 no H64G g sorghum l yes 2 no H64H h cashew nuts l yes 2 no H65 65 Compared with five years ago has the marketing of these products been more difficult or easier 1 more difficult gt question 66 2 easier gt question 67 H66 66 If more difficult why 1 fewer buyers 2 transportation problems 3 security problems 4 low prices 5 lack of consumer goods 6 other 118 Stata 11 Sample Session Annex I I Survey Instrument H67 67 If easier why 1 more buyers 2 better transportation 3 better security 4 attractive prices 5 more consumer goods 6 other H83 83 Does your family usually receive traditional gifts or participate in exchange relations l yes 2 no H84 84 If yes how often 1 only when there is a lack of food 2 only during feasts and rituals 3 frequently XI TYPICAL CONSUMPTION PATTERNS H86 86 How many meals did these people have yesterday Number of meals H89 89 Do you consider these meals adequate to maintain the health of all the household members l yes 2 no We would also like to ask you about your diet during the hungry period January to May H91 91 How meals do you customarily prepare daily during hungry period H92 92 In general are these hungry period meals adequate to maintain the health of all household members l yes 2 no H96 96 During the hungry period was there always food available to purchase from the market or from your neighbors l yes 2 no 119 Sta
27. increased or decreased the amount of land in food crops 1 increased 2 decreased 3 no change H31 31 During a normal year is your farm production sufficient to feed your entire family l yes 2 no 117 Stata 11 Sample Session Annex I I Survey Instrument We would like to ask you about the cash crops you grow on your farm H34 34 Do your grow any crops that are principally destined for the market l yes 2 no 35 Which crops are grow principally to be sold List the three most important H35A 1 cotton 4 sunflower H35B 2 peanuts 5 rice H35C 3 sesame 6 other H36 36 Over the last five years have you changed the area grown in these cash crops 1 increased 2 decreased 3 no change H39 39 Do you normally grow cotton l yes 2 no H52 52 Since your involvement with the cotton companies have you reduced your area dedicated to food crops such as maize and manioc l yes 2 no IV PRODUCTION H56 56 Do you have cashew trees l yes 2 no H57 57 How many trees do you presently have number H57A 57A Of these trees from how many did you harvest during the last year number vV AGRICULTURAL SALES We would like to ask about the marketing of your agricultural products since August of 1990 64 Over the last five years have you increased the quantities marketed of the following crops H64A a maize l yes 2 no H64B b manioc l yes 2 no H64C c rice l yes 2 no H64D d cotton l yes 2 no H64E e peanuts l yes 2 no H64F
28. integer If the fweight associated with an observation is 5 it means the observation represents 5 identical observations pweight Sampling weights inverse of the probability that this observation is included in the sample due to the sampling design A pweight of 100 indicates that this observation represents 100 subjects in the population There are qualifications to this weight when used with survey analysis commands aweight or cellsize analytic weights inversely proportional to the variance of an observation The observations typically represent averages and the weights are the number of elements that produced the average iweight Importance weights relative importance of the observation This weight is generally used by programmers who want to produce a specific computation To read more about weights look at the User manual weights If you use the generic weight sub command Stata will tell you which weight it assumes you want to use Not all commands will allow a weight to be included The format is 98 Stata 11 Sample Session Section 3 Tables and other Types of Analysis Indicator variables type_of_weight variable_in_file Let s use one of the Stata s sample data files to explore this sub command 1 Click on File then Open A dialog box should open telling you that the data have changed and do you want to continue and lose unsaved data We don t want to save any changes to the d
29. look at the Annex Table IA House hold Member questionnaire Variable ca2 relation to head is a categorical variable because its values are limited to 6 categories 3 An indicator variable is a special type of 26 Stata 11 Sample Session Section I Basic functions Files Descriptives Data Transformations Indicator variable Descriptive statististics using one variable Descriptives The summarize command categorical variable This type of variable denotes whether something is true e g yes no questions or whether a person is male or female This type of variable contains only 2 categories i e it divides the data into 2 groups Start by examining the data in the file Use the Data Editor window to scroll through your data file To do this perform the following steps 1 Click on the Data Editor button on the Tool Bar or in the Command Window type edit and press lt Enter gt You could instead click on the Browse button since we only want to look at the data 2 Scroll through the data A period in a field indicates a missing value or system missing value In Stata you can specify up to 27 different missing values e g a or b These are called extended missing values Extended missing values are used to identify specific reasons why there are no data e g person refused to answer or a question was not asked Scrolling through the data will give you a feel for what is in file It might
30. number of times it takes on the values 0 1 9 the number of times it is missing and the number of times it is equal to some other value String variables are not tabulated but are identified at the end of the displayed table To download this ado file connect to the Stata website 1 Click on Help then SJ and User Written Programs 2 Inthe Viewer click on STB 3 Scroll down to stb25 and click 4 Click on 936 Tabulating the counts of multiple categorical variables In this screen click on click here to install 5 The program will be installed in the directory C ado plus t 6 To use the program in the Command window type tabw h35a h35b h35c 97 Stata 11 Sample Session Section 3 Tables and other Types of Analysis The output is Variable h35a h35b h35c Using this type of analysis you could state the following Cotton was the most frequent primary cash crop 90 households grew this crop peanuts and rice were the next most often grown for cash You can also use the tab1 command tab1 h35 Other Types of Analyses Stata provides for a method to analyze data using different types of weights The type of weight that is to be used with a set of Weights data will depend on the type of sampling that has been used See the table below for an explanation of the available weight types fweight or frequency frequency weights Number of replicated observations this value is always an
31. of association 48 Stata 11 Sample Session Section I Basic functions Files Descriptives Data Transformations The Tabulate2 two way tables dialog box opens 2 Use age_gp for Row variable and Ca2 relation to head for Column variable 3 Check the proper selections in the Cell content choices for we want both Row and Column percentages 4 Click on the copy button switch to the do file editor paste the command and switch back Click on OK to run the command The Stata command is tabulate age_gp ca2 column row From the table you can see that 11 95 of heads of households are 61 years of age or older Also of the people 61 years or older 83 67 are heads of households Apply what you have learned about data transformations and descriptive statistics in the following exercise Exercise 1 2 Using the Household Data and Questionnaire available in the annex find out the number of households in each district that have 1 4 5 7 and more than 7 persons per household One way to find out this information is to create the following table Hints a Use the file c hh dta b Recode h1 into hhsize using the following groups 1 thru 4 5 thru 7 8 thru Highest Add a variable label and value labels d Runa two way table Tabulate on this variable by district a Looking at the results you can see 34 76 ofall 1 to 4 member households are found within Monapo and that 60 75 of all households in Monapo have
32. processor The process to copy output to a word processor is basically the same for Graphics such as pie charts and histograms but there is more flexibility in the ways to save the file along with more difficulties in getting just the look you want As an example we will look at the distribution of cashew tree ownership across households in the Mozambique data using a histogram Open a new do file and place the requisite information at the top e g 104 Stata 11 Sample Session Section 4 Tables and Graphs Survey estimation capture log close log using session4 append session 4 copying Tables and Graphs to a word processor tasks Your name date version 10 set memory if you need to clear all macro drop _all set the directory where you will work cd C Documents and Settings aec_user My Documents StataTraining Save this do file under the name session4 do We are now ready to open the household file that contains the tree ownership variable c hh dta 1 Click on File Open 2 Select c hh dta and click on Open 3 Paste the command from the Results window to the do file editor Remove the directory reference Create the Histogram chart using the variable H57 number of trees owned 4 Select Graphics then Histogram 5 Inthe Variable box select H57 Number of cashew trees Note you can specify whether the variable is continuous or discrete 6 Under Bins
33. results etc b Increasing the amount of memory in the middle of a Stata session One megabyte can be used up fairly quickly so it is recommended that you set the memory at the beginning of the session to a larger size e g set memory 30m If you wish to have the memory already set when you start the program you can set the memory permanently set memory 30m perm If you want to increase the amount of memory in the middle of your session you will not be able to do so unless you close the data file using the command Stata 11 Sample Session The drop _ all command Types of files used by Stata and their extension names 1 Data files 2 Log files The log using command Section 0 File structure and Basic Operations for Stata 11 drop _all Another option is to just close the Stata program and set the memory using the set memory command after you open the program and before you open a data file files containing data Extension dta Data files have an extension of dta From the Stata 11 window you can open a data file From the Menu Select File then Open If you are not in the directory where your files are change to the appropriate directory Only files with an extension name of dta will be listed From the Command window if you are working in the correct directory you can type use name of file clear commands and output Extension SMCL Stata markup and con
34. the commands You must save your data separately as described in the following section We suggest that you use the default extension of do when naming command files Examples of file names are Rep7 do dem all do and section1 do By storing your commands to a do file you can retrieve look at or modify sets of commands and rerun them To retrieve a do file into the editor open the Do File Editor pull down the File menu and select Open or you can click on the yellow file folder hn the tool bar in the editor Select the file you wish to open and click on Open Once you have opened a specific file you can use the 11 Stata 11 Sample Session B The Data Editor Window Open Data Editor window The edit command Section 0 File structure and Basic Operations for Stata 11 commands from the file without having to recreate or type them again If you make changes to the command file that you wish to keep make sure you save them to disk again Caution From Windows Explorer if you double click on a do file the Stata program will open and run all the commands in the do file immediately The do file will not be opened To opena do file from Windows Explorer right click on the file name and choose edit The application STATA will open and the do file will be opened in the Do File Editor When you have opened a do file in this manner STATA automatically executes the command doedit
35. the browse command to look at the data or click on the browse icon In Zambia for surveys conducted in the 1990s and early 2000 a Stratified random sampling method was used This method divided the districts into census supervisory areas CSA Within the CSA Standard Enumerator Areas SEA were defined The primary sampling unit PSU for this sample is the SEA To identify each SEA as being unique the three variables district CSA and SEA must be combined into one variable District has 3 numbers CSA has 3 numbers and SEA has 2 numbers To create a new variable with these variables one must multiply the district variable by 100 000 add CSA multiplied by 100 and add SEA The Stata command is gen float cluster1 dist 100000 CSA 100 SEA We want to change the format of this variable so that we can easily read it to verify the variable has been created correctly Use the format command format cluster 9 0f Clusters may further be sampled in groups which are called strata The Zambia example uses province district as the 109 Stata 11 Sample Session Section 4 Tables and Graphs Survey estimation strata Strata are considered to be statistically independent and can be analyzed as such A weight has already been calculated for each household The variable which contains this value is called hhwgft We need to compute the cluster variable We can use dist for the strata variable since it already contains the pro
36. then Rename variable The rename Rename variables dialog box will come up In the Existing Variables box select pla In the New variable name box type unit Click on the copy button switch to the do file editor paste the command switch back to the dialog box and click on OK The Stata command is rename pia unit The files are now ready to be merged We are doing a File Table merge where the second file is our Lookup Table We want to keep all records in the master file or the file in memory and keep only those records in the using file that match From the Data menu select Combine datasets then select Form all pairwise combinations within groups The joinby form all pairwise combinations within groups dialog box will open To fill in the box labeled Filename of dataset on disk click on the Browse button Select the filename conver dta and click on Open In the box labeled Join observations by groups formed from specific variables select prod unit Click on the Options tab Under Unmatched Observations select Include from data in memory This option will keep cases in the master data set in memory that do not have a match in the lookup data set Click on the copy button switch to the do file editor paste the command delete the directory reference switch back to the dialog box and click on Ok The Stata command is joinby prod unit using conver dta
37. to examine variables using the methods of Tabulate and tab1 describe describing the variables in the file Within the Do file Editor you can submit several commands at once In the Command window only one command at a time can be submitted for execution You cannot send commands directly from the Stata Command window to the Do file Editor The command must be copied There are 2 ways to open the Do File editor 1 From the Button Bar you can click on the Do file Editor button 47 Another window opens which is the Do file editor 2 From the Command window you can type It is important to recognize the significance of the different types of files and to understand the various commands you use to create and access the files 10 Stata 11 Sample Session A The Do file Editor Section 0 File structure and Basic Operations for Stata 11 The Do file Editor is the window where commands can be typed before they are submitted to the STATA processor Commands can be typed directly into the Do file Editor or you can copy the commands from the Results window and paste the commands into the Editor There are four main uses of the Do file Editor To type commands directly into the Do file Editor to be processed later by STATA To send these commands to Stata 11 for processing To save these commands to a file to be run again in the future and To retrieve files of commands that you have saved prev
38. variables The respondent must record a different value in each of the 4 variables These types of variables are called multiple response variables Question 35 in the household questionnaire is an example of a multiple response question It asks about crops grown principally to be sold Each household is asked to specify up to three main crops which are coded into variables h35a h35b and h35c Codes are provided for five of the most common crops The question is left open ended however since a code of 6 is allowed for a crop not on the list The name of the crop is written down on the questionnaire and later assigned a code Because the question was open ended more categories were added to these variables than what appears in the annex After the data are collected the researcher assigns a code to each of the crops specified for 6 other this procedure is called post coding Codes and value labels are entered into the data file and the data changed from the value of 6 to the appropriate code As you will see using the tab1 command eleven different crops were specified for question 35 Stata does not have an official command that will tabulate data collected in this format We can do frequencies of each variable or develop commands to pull out specific information There is a user written ado command called tabw Peter Sasieni STB 25 Stata 3 1 For each variable in a list of 2 variables this command will tabulate the
39. you want Highlight ca3 and click to select it To close the drop down box click in another area of the dialog box In the options section below the variable box note that Standard Display is the default selection for output Don t forget to click on the icon to copy the command to the clipboard switch to the do file editor paste the command and switch back to the dialog box 3 Click on the button to run the command The dialog box will remain open The output appears in the Stata Results window You will see that the mean for age ca3 is 21 33602 years The Stata command is summarize ca3 The Results window displays Variable Obs Mean Std Dev ca3 1524 21 33602 17 69252 Return to the dialog box On the task bar you can see summarize Summary click on this task Click on the in the lower left corner to see more detail about the summarize command In this help window you can see that the first two letters of the command are underlined e g summarize In Stata only the letters that are underlined are absolutely required for the command to be recognized The following command also works su ca3 Scroll down through the Viewer to see the options available with this command and examples To close the Viewer click on the x in the upper right hand corner of the dialog box If we wanted to see more summary statistics on this variable we can ask for detail Switch back to the
40. 002384 r sum r min 5 r max 81 r pl 1 r p5 1 r p10 3 p25 F r p50 16 r p75 32 r p90 48 r p95 57 r p99 69 You can use these values stored in memory to perform calculations For example to subtract the mean of ca3 from cas generate ca3_mean ca3 r mean Using the information in memory eliminates the need to type specific numbers and will give you more accurate values Since the variables ca1 work on a farm or not ca2 relation to head ca4 sex cad level of schooling and ca6 marital status are categorical we will run a Tabulate on them To run a tabulation do the following 1 From the menus click on Statistics then Summaries Tables amp Tests then Tables then Multiple one way tables The Tab1 One way Tables dialog box opens 2 Click on the drop down arrow to select the variables cal ca2 ca4 cad ca6 3 Click on the copy button switch to the Do File Editor and paste the command then switch back to the dialog box and click on the Submit button 4 The command will be executed You will see in the Stata Results window that for ca1 70 67 of the household members work on a farm There are 1524 cases for this tabulation The results for ca4 show that 51 53 are males and 48 47 are females How many cases have been included Only 1508 cases 30 Stata 11 Sample Session Section I Basic functions Files Descriptives Data Transforma
41. 1 to 4 members in a household 49 Stata 11 Sample Session Section I Basic functions Files Descriptives Data Transformations Key frequency row percentage column percentage Household size monapo district ribaue angoche 1 4 members 65 34 76 60 75 48 74 s67 39 57 34 64 35 5 7 members 39 56 36 s19 48 06 lt 90 8 12 members 15 5 We have completed Section 1 Before we close down the session we need to close the log file that has been recording the commands and output The command to close the log file is We can type this command in the Command window and run it and then copy and paste the command in the do file Before exiting Stata save the do file The file contains all of the commands It is useful to keep this file so you can rerun the commands if you want review the commands and the output that is produced If you have not yet saved the file follow these instructions Otherwise click on the al l Save icon on the tool bar 1 Ifyou have not saved the do file make the Do file Editor the active window using its icon on the Windows taskbar 2 From the File menu select Save As 3 Enter the filename session1 The do extension will be added to the name automatically 4 Click on Save 50 Stata 11 Sample Session Section I Basic functions Files Descriptives Data Transformations To exit Stata switch
42. 28 Stata 11 Sample Session Section I Basic functions Files Descriptives Data Transformations Summarize Summary Statistics dialog box you can see the icon on the task bar 4 Click on the radio button next to Display additional statistics Click on the to copy the command to the clipboard 5 Click on the button to run the command The dialog box will close The results are age Percentiles Smallest 1 HI 1 s6 3 1 Obs 1524 7 1 Sum of Wgt 1524 16 Mean 21 33602 Largest Std Dev 17 69252 32 T5 48 76 Variance 31370252 57 78 Skewness 29152221 69 81 Kurtosis 3 00135 The median age is 16 50 Percentile The Stata command is summarize ca3 detail Switch to the Do File Editor and paste the command Insert comments to explain the commands you have pasted Information returned by Stata When you run a command Stata sends the information to commands the Results window as well as saves the information in memory To see what has been saved you can use the return list command In the Command window type return list The information that is returned from the summarize command is displayed 29 Stata 11 Sample Session Section I Basic functions Files Descriptives Data Transformations TABULATE Frequencies r skewness r kurtosis 1524 1524 21 33602362206289 313 0251689442948 17 69251731507687 9152220664756392 3 001349748747086 32516 10000
43. 4 prod 47 check to see that there are only 7 crops listed tabulate prod need to sum all calories produced by the household Using the collapse command collapse sum cprod_tt by district vil hh label variable cprod_tt Calories produced in staple foods describe verify you have the right average calories produced over whole sample summarize cprod_tt save the file 83 Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation save hh file1 dta replace kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk Step 2 kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk calculating adult equivalents based on age and gender use c q1a dta clear generate byte ae 1 if ca4 1 amp ca3 gt 10 replace ae 84 if ca4 2 amp ca3 gt 10 amp ca3 lt 19 replace ae 72 if ca4 2 amp ca3 gt 20 replace ae 6 if ca3 lt 10 label variable ae Adult equivalents check the variable tabulate ae missing calculate mean to determine average ae across the whole population To fill in the missing values summarize ae replace all system missing with the value of 79 replace ae 79 if ae tabulate ae missing need to sum the adult equivalents for each household collapse sum ae by district vil hh label variable ae Adult equivalents per household summarize ae save file for later use save hh file2 dta replace KKK KKK K
44. 4 3134 742 12674 86 87 Stata 11 Sample Session Section 3 Tables and other Types of Analysis Tables STATA 11 SAMPLE SESSION SECTION 3 Tables and Other Types of Analysis Using the Table command you can calculate various statistics and present them in a variety of ways that are completely under your control Table allows you to choose how you want to assemble variables and statistics for display in rows columns and super columns or super rows A super column or super row has a variable nested below it Variables can be stacked or nested Nested means that all of the values for one variable are displayed below the individual values of another variable You can manipulate table structure content and presentation format With this command there a few limitations a up to 4 variables can be specified in the by b up to 5 statistics can be displayed in each cell c the sum of the number of rows columns super columns and super rows is called the number of margins A table may contain up to 3000 margins e g a one way table may contain 3000 rows a two way table may contain 2998 rows and 2 columns or 2997 rows and 3 columns and so forth Commands that produce similar results are tabstat displays summary statistics for a series of numeric variables in a single table tabsum produces one and two way tables of means and standard deviations this command is faster but the table command is more flexi
45. 4472 95 61770 91 24319 15 14013 13 621760 6 676701 2 1327 559 11870 5 17075 39 3942 684 54041 97 69499 84 1907 919 20579 01 28059 29 112 Stata 11 Sample Session Annex I I Survey Instrument Annexes ANNEX I Stata Commands This annex provides a brief reference guide and to explain the various functions of the Stata commands most commonly used This annex was developed by Ellen Payongayong The commands in the table below do not contain the full Stata syntax Note that commands can be abbreviated In the Help Syntax Viewer the syntax explanation will show how much of the command must be typed e g Summarize can be shortened to su or sum In this Help viewer the letters that are required for the command are underlined rr Pe a save filenameZ saves current file in memory into filename if filename already ee a exists stata will not let you overwrite it save filename2 replace saves current file in memory into filename2 overwriting any file E Prviter Even cect tt gts ated inane save replace saves current file in memory into filename of that which is browse brings up the same data editor as in edit but will not allow you to change data describe gives a description of the data file number of observations number of variables list of variables variable type and width variable labels if any summarize gives basic summary statistics number of valid observations mean standard devi
46. 96 532 34 30 24 tT SPOODFNONDDCOCODNINDOOFRONOOASEA Total Summarize Variable 00 Mean Std Dev plb p2b p3b p5b p7b Descriptive Statistics using two or more variables Two way Tables with Categorical Variables Cross tabulation 26 35286 163 4359 22 81508 159 5101 26923121 4 574581 15 61243 86 10356 4 938435 6 875536 We wish to produce a table that shows the distribution of cases according to their values using two or more categorical variables Look at the household member questionnaire in the annex section Annex Table IA One thing you might be interested to know is how the gender of the respondents varied by their relationship to the head of household 36 Stata 11 Sample Session Section I Basic functions Files Descriptives Data Transformations The tabulate command This would tell you for example how many females are heads of households The Tabulate command will produce this type of summary Make the household member file c q1a dta the working data file 1 Click on the yellow open folder tool at the top left of the Toolbar 2 Select the file c qia dta 3 Click on Open to open the file 4 Copy the command for opening the file which appears in the Results window into the Do file Editor window Reminder You should add comments to your do file so that you can remember what and why you were doing specific commands when you developed t
47. Because we created hh file1 dta by collapsing the file it is already sorted by district vil and hh hh file2 dta was also created by collapsing the file so it is also sorted by district vil and hh We are ready to merge the two files 1 Select Data then Combine datasets then Merge Two datasets The Merge Merge dataset in memory with dataset on disk dialog box will appear The default type of merge is one to one on key variables this is the merge we want to do 2 We are doing a one to one on key variables type of merge the default selection For the Filename of dataset on disk box click on the Browse button Select the file hh file1 dta and click on Open 3 In the Key variables match variables box select district vil hh These are the Key Variables 4 Click on the Options tab Under this tab you see the box labeled Specify new name of variable to mark result of merge The default name is _merge This variable received a code of 1 or 2 or 3 to describe what type of merge occurred The code definition is 73 Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation Calculate the total calories produced per adult equivalent per household for the year 1 observation is from file in memory 2 observation is from file on disk 3 observations are from both files It is very important to look at the values in this variable after you have run the merge 5 Click on t
48. Documents data log using log_session2 append We are now ready to open c q4 dta the production file Select File Open Select the file name c q4 dta Click on Open to run the command Copy the command to open this file from the Results window switch to the Do File Editor lt Ctrl 8 gt or click on the button on the task bar and paste the command into the do file Delete the reference to the directory 5 Save the do file to the name session2 do We must convert all production of the crops into kilograms To find the conversion factor appropriate for Se 56 Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation Rename any key variables in both files to the same name each case in the production file c q4 dta we need to look up the product and unit in the conver dta file We will merge the information from this file into the file in memory the production file The variable with the conversion factor will then be available to calculate the total kgs produced In Stata we want to use the joinby command for this merge It can be found through the menus with the following choice Data Combine datasets Form all pairwise combinations within groups The input files for a merge must be sorted by the key variable s key variables are those variables you are using to match by between the two files Since there is a unique conversion factor for each product unit co
49. KK KK kk k kk k k k kk kk kk kkk kk kkk kkk Step 3 kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk use hh file2 dta clear now combine both the hh file1 with hh file2 both files are already sorted by key variables match files by district vil hh merge 1 1 district vil hh using hh file1 dta check to see which file the variables are coming from tab _merge 3 variables came from both files drop _merge calculate the calories per adult equivalent per day generate double cprod_ae cprod_tt ae 365 label variable cprod_ae Calories per adult equivalent per day sum cprod_ae rank the new variable by district into quartiles 84 Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation check for number of districts tabulate district there are 3 districts so we want to loop 3 times first method for z in num 1 3 xtile quartz cprod_ae if district z nq 4 initialize variable gen quart replace values with information from temporary variable for z in num 1 3 replace quart quariz if district for z in num 1 3 drop quartz check results should see equal number of cases in each category tabulate quart district second method drop quart second method to create the quartile variables the following stores those numbers into local variables levelsof district local levels foreach z of local levels xtile quartile z
50. Quite a bit of editing is required to make the above table presentable 103 Stata 11 Sample Session Section 4 Tables and Graphs Survey estimation Using Excel to create columns from the table Relation to head Exercise 4 1 Graphs 0 to 10 Another method is to copy the table from the Results window into Excel Then use the method to convert text to columns that is provided in Excel The left most cells in each column will contain the text for the entire row Click on Data then Text to Columns The Text to Columns Wizard will start Follow the instructions in the wizard to divide the text into columns Upon completion of the wizard block the columns copy and paste into your word processor The table will be now ina Table in the word processor where you can easily manipulate the widths and other formatting as required Below is an example of output from the Results window converted to columns in Excel and pasted into Word 11t0o19 20to60 61 and older Total 6 296 41 343 1 75 86 3 11 95 100 2 22 47 13 83 67 22 57 25 280 5 310 8 06 90 32 1 61 100 9 26 44 59 10 2 20 39 184 31 0 718 25 63 4 32 0 100 68 15 4 94 0 47 24 0 5 1 6 0 83 33 16 67 100 0 0 8 2 04 0 39 55 16 2 143 38 46 11 19 1 4 100 20 37 2 55 4 08 9 41 270 628 49 1 520 17 76 41 32 3 22 100 100 100 100 100 Select another table from your Session3 SMCL file Use all three methods to copy another table from your log file into a word
51. STATA 11 SAMPLE SESSION Cross Sectional Analysis Short Course Training Materials Designing Policy Relevant Research and Data Processing and Analysis with STATA 11 1st Edition Department of Agricultural Economics Michigan State University East Lansing Michigan March 2010 Stata 11 Sample Session Section 0 File structure and Basic Operations for Stata 11 Components of the Cross Sectional Training Materials Section 0 Introduction to the Window structures for STATA 11 Stata Results Review Variables and Stata Command Windows as well as the Do File Editor This section must be read before starting the sample session Section 1 Basic functions Section 2 Table Lookup amp Aggregation Section 3 Tables amp Multiple Response Questions and Other Useful Commands Section 4 Graphs tables publications and presentations how to bring them into word processor and use of Survey commands Annexes I Frequently used Stata commands II Several pages from the socio economic survey of the smallholder survey in the Province of Nampula Mozambique NDAE Working Paper 3 1992 References to papers discussions levels of data On the Food Security Group web site at MSU there are several survey research training materials which you might find helpful The website is http www aec msu edu agecon fs2 survey index htm Two papers discuss levels of data 1 Computer analysis of survey data File organization for mult
52. Suppose we want to know how the age of the member varied by his her relationship to the head of household If we did this with Tabulate we would get a table with dozens of cells for the different ages represented The table would not be usable Instead we will use Summarize with the by key word 1 From the Statistics menu select Summaries Tables amp Tests Summary Statistics Summary Statistics The Summarize Summary Statistics dialog box opens 2 Select ca3 from the drop down box for Variables 3 Under Options in this tab select Standard Display 4 Click on the by if in tab 5 Click in the box Repeat command for groups defined by 6 In the box below this option select ca2 fe Click on the copy button switch to the do file editor and paste Switch back to click the Ok button The command will be executed This command calculates the means of the variable ca3 age separately for each different value of the variable Ca2 relation to head including the system missing value 39 Stata 11 Sample Session Section I Basic functions Files Descriptives Data Transformations The Stata command is by ca2 sort summarize ca3 Note that the command begins with by This command is first sorting the data by Ca2 before it runs the summarize command You could also sort the file by Ca2 first and then just use the by key word e g sort ca2 by ca2 summarize ca3 From this output you f
53. The first argument is the variable name that you want to categorize The rest of the arguments are used to determine how to code the new variable 1 Select Create or change variables from the Data menu 2 Select Create new variable 3 Click on the reset button in the lower left hand corner of the dialog box KR if you need to remove any information that appears in the box 4 Under the Main tab type the name of the new 46 Stata 11 Sample Session Section I Basic functions Files Descriptives Data Transformations 10 11 variable in the Generate Variable box agecat Click on the Generate variable as type drop down box and change to byte For the Contents box click on the Create button In the Expression builder box under the Category section select Programming A list of available functions is displayed Scroll down to the recode function and highlight that function You will see a description of the function at the bottom of the dialog box Double click on this function The function will be pasted in the window at the top of the dialog box so that you see recode x x1 x2 xn The first x is highlighted Replace the first x with the variable name Ca3 so that the expression now looks like recode ca3 x1 x2 xn Replace the x1 with the value of the highest age that you want to recoded for the first group e g recode ca3 10 x2 xn Continue replacing the values with the next gro
54. V check the box next to width of bin and type 20 7 Under Y Axis click on the radio button next to Frequency 8 Click on the copy button switch to the do file editor paste the command switch back to the dialog box and click on OK The Stata command is histogram h57 frequency width 20 Another window will open and you should see a histogram chart that looks like the one below 105 Stata 11 Sample Session Section 4 Tables and Graphs Survey estimation 40 Frequency 20 m 0 100 200 300 400 number of cashew trees To copy this graph to your word processor 1 Click on Edit Copy graph You could also right click on the graph itself and select Copy Graph 2 Open your word processor and click on Edit Paste You will not be able to edit this graph other than the size placement wrapping of text and other basic aspects allowed by your word processor You can also save a Stata chart to a file 1 Click on File then Save A dialog box opens where you can type the name of the file The default extension is gph which is the format that Stata recognizes as a graph file If you save the file with a gph extension you can then open the graph again within Stata In the filename box type Cashew_trees and click on Save 3 Copy the command from the results window switch to the do file editor and paste the command You can also save the graph into several different formats such a
55. _merge 3 obs from both master and using data merge varlist using filename nokeep causes merge to ignore observations in the using data nokeep that have no corresponding observation in the master 114 Stata 11 Sample Session Annex I I Survey Instrument assert assert verifies that an expression is true if it is the command produces no output if it is not assert informs you that the assertion is false append using append appends a STATA format dataset stored on disk to the end of the dataset in memory mvencode varlist mv changes all occurrences of missing to in the variable listing override specified override specifies the protection provided by mvencode is to be overridden without this option mvencode refuses to make the requested change if is already used in the data mvdecode varlist mv changes all occurrences of to missing in the variable list a I y arguments in varlist constructs categorical dummy variables for variables omitting the first category this command can do is determined by the previous command probit probit estimates maximum likelihood probit models search searches the keyword database Use search when you are not certain of the command e g search string shows all commands associated with strings calculates and displays tables of statistics converts data from wide to long form and vice versa wide and long refer to how data are organized
56. a2 ca3 in 1 5 gsort ca2 ca3 list district vil hh mem cai ca2 ca3 in 5 1 Reminder after any command built we will copy the command into the Do file Editor window and switch back to the menu box and run the command Apply what you ve just learned about descriptive statistics by doing the following exercise Exercise 1 1 Run descriptive statistics on another sample file Use the production questionnaire Table IV whose data are in file C Q4 DTA Hints a make C Q4 DTA your working data file b Use the Summarize command for continuous variables and tab1 for categorical variables Prod is a categorical variable d Quantities p1b p2b are continuous variables z 35 Stata 11 Sample Session Section I Basic functions Files Descriptives Data Transformations Tabulate product e Units pla p2a are categorical variables f p4 month in which stocks ran out last year amp p6 month in which stocks will run out this year are categorical variables A small sampling of what you should find from running these frequencies and descriptive statistics follows Percent cotton peanuts rough rice bananas sweet potato cashew liquor sugar cane liquor dried cashew sugar cane cashew nut coconut beans manteiga beans sunflower oranges cashew fruit manioc sorghum maize ossura tobacco tomato 90 51 16 195 71 42 65 vie 77 68 66 48 41 30 Sl 60
57. ables box select the variable age_gp from the drop down box 3 Click on the copy button switch and paste in the do file editor switch back and click on the Ok button The Stata command is tabulate age_gp There should be 4 codes in the frequency table 1 2 3 and 4 We can use the Data Browser to check to see what changes were made Click on the Data Browser button Close the window when you are finished The values do not have any value labels to define what the values of 1 2 3 and 4 mean We want to add both a variable label and give labels to the values in this variable To assign a variable label 43 Stata 11 Sample Session Section I Basic functions Files Descriptives Data Transformations The label define command 1 Click on Data then Data utilities then Label utilities then Label variable 2 In the Variable box select the name of the variable age_gp 3 In the New variable label box type Age Group Note Label may be up to 80 characters 4 Click on the copy button switch and paste in the do file editor switch back and click on the Ok button The Stata command is label variable age_gp Age group To assign value labels to a variable we first have to define a label and assign value labels to the values in that label 1 Click on Data then Data utilities then Label utilities then Manage value labels Remember Stata assigns a name to a group of value labels 2 Click on t
58. above will be placed in the window lt PageDn gt moves back down through the commands that appear in the Review window lt Esc gt clears the contents of the Command window The Viewer in Stata is used to view help files and log files and to print these files To enter the Viewer click on File View The Choose File to View dialog window opens You can type the name of the file or click on the Browse button By default the file type extension name is SMCL Files smel Select the file you want and click on 16 Stata 11 Sample Session G Stata Graph window Summary of the Basic File Types Do file files Section 0 File structure and Basic Operations for Stata 11 The file name is pasted into the dialog box where you can then click on If you decide to use Help from the menus the Help files are opened in the Viewer A graph is opened in its own window and is not stored in the Results window If you wish to keep a graph you can copy the graph to a word processing document or you can save the graph to a file Right click on the graph to see these options A graph file has the extension gph Do file files or command files contain commands saved in the Do file Editor They do not contain output or data only commands Do files are made accessible to Stata if you open the Do file editor Within the Do file editor you can open a do file Log files contain statistical output data information and pres
59. ae missing Now we need to calculate the number of adult equivalents for each household The current file is at the member level but we need values at the household level Again we use Collapse to go from the member level to the household level The new variable ae will be calculated by summing ae across all members of a household Reminder The Grouping variable s specify the variables to be used for combining cases in the collapsed file Any cases from the original file that have identical values for all of the grouping variables will be combined into a single case in the collapsed file We want the collapsed file to have one case per household so we use the variables that identify a household in our survey district vil and hh 70 Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation 1 From the Data menu select Create or change variables then Other variable transformation commands then Make dataset of means medians etc The collapse Make dataset of means medians etc dialog box will appear 2 On the Main tab in the Statistics box for 1 change mean to sum by clicking on the drop down arrow In the Variables box select ae 3 Click on the Options tab and in the Grouping variables box select district vil hh in that order because those variables represent the identification of an individual household The Grouping variable s is used to specify the v
60. aining data When a data file is opened it is loaded from the disk into memory the computer s RAM making it the working file This means that the data from this file is now available for you to use Let s start with the data for Table IA Household Member Characteristics The data file that 19 Stata 11 Sample Session Section I Basic functions Files Descriptives Data Transformations Opening a data file The use command l 2 corresponds to this table is c qla dta To open this file perform the following steps From the File menu select Open This will open the Open File dialog box If you have run the cd command you should see a list of data files to be used with this tutorial Select the file c qia dta Click on the button to open the file The command appears in the Review window In the Review Window you will see the text use c qla dta clear oe 29 will be replaced with whatever the name of the directory is where you are working We want to create a do file to save our commands The command that was just executed appears in the Results window and the Review window Press lt PageUp gt The command which was just run now appears in the Command window Block the command and press lt Ctrl C gt to copy it Click on the button in the Tool Bar to open the Do File Editor E and paste the command into this file lt Ctrl V gt We want to copy the cd command as wel
61. allysis cccccccecesessesseeseeeeseesssceseeceneeseecenecnsceeeecnecnesesesaeeeeseees MNS table COMMA NG seid cece evesessycatieagseveta E E E O oe LCST oho ee ANE AE is RS Comparison of the commands summarize tabulate and table Print a table from the Viewen in 63 5 ccds cdi ok sinatie Seasetet ee Busi os vee GA ev aee OA a a oi a aa Multiple Response QUESTIONS 32 6 205 2 22 00sccee ces revienec nes eae acee sealed dela deceit R GA E eee 1 Multiple dichotomy yes no questions 0 0 cece essesesesesneseseeseeceneeeeneseeeeseeeeneeteneaeeeeseeteneeees The count commandi aenor annn sists he OS eee oes BEA A Te NO The recode command cccceeeeteseeees The egen command 2 ececeeeseseeteeeeees The tabstat command eee 2 Multiple response srnec arn et eo re Get E E A ss Ap as ee eet Other Types of Analyses Weights cunna aa Indicator variables Converting continuous variables to indicator variables oo cece eeeeeeseseeseseeeeseeeeseseeneneeeeseeeeneeeeneateneass Converting categorical variables to indicator variables eee ec eeseeeeeseeeeseeeeseseeeeneesenes SECTION 4 Table and Graphs how to bring them into a word processor and How to move Stata results into other applications ccc cscs eseseseeeeeeeseseeesteteteneseseeeeees Stata 11 Sample Session Section 0 File Structure and Basic Operations for Stata 11 Stata 11 SAMPLE SESSION SECTION 0 File structure
62. an also look at this variable by doing a descriptives Use the Summarize command to run a mean on the new variable cprod_tt You should find that the average number of calories produced per household per year is 4 483 965 Save this data file using the Save As command Use Save As from the File menu Name the file hh file1 Click on Save Copy the command from the Results window and paste it into the do file editor delete the reference to the directory and add a comment to explain what you have done Poe Remember to save your do file regularly You must be in the Do file editor to save the do file Step 2 Generate a household The data needed to calculate adult equivalents per level file containing the household is in the member file C Q1A DTA number of adult equivalents per household 1 Click on the Open Folder button on the Stata Taskbar 2 Select the file name c q1la dta and open the file 3 Copy the command and paste it into the do file editor delete the directory reference and add a comment to explain what you have done The adult equivalent value says that on average a female 10 to 19 years old needs only 84 as many calories as a male 10 years or older and that children under 10 need only 60 as many calories as the typical male 10 years and older Thus for example a child male or female under age 10 is counted as 60 adult equivalents For each person observation in the member fil
63. an now look at the variable properties within the data 13 Stata 11 Sample Session Section 0 File structure and Basic Operations for Stata 11 editor Select a variable lt right click gt and choose Variable Properties A small dialog box opens to show you the name of the variable the label for the variable the type of variable the format of the variable and the name of the label attached to the variable We will discuss these items later Click on the red X in the upper right hand corner ka to close the dialog box Look at the icons available from the data editor You can open a data file directly within the data editor rather than opening it first in Stata If you open a file while in the data editor the command will appear in the Results window as well as in the Review window The data editor can remain open while you are working with the data a change from the previous versions Within the data editor you can change from being able to edit the data to only browsing the data To switch to browse click on the icon You can also open the data file from the main Stata window in browse mode rather than edit mode All of the data manipulations that are available from the main window are also available within the data editor If you want to sort the file in a specific way you can click on Data Sort A dialog box opens where you can specify the variables you want to use to sort the data Below is a snapshot of th
64. and click on OK 7 Add a comment to explain what you have done The Stata command is by district quart sort summarize cprod_ae You should note that the mean for the 2nd quartile in Monapo is 2 539 364 The output from the summarize command gives you the numbers necessary for the table However the output is difficult to read There is another command table which can also be used to produce the final table We will discuss this command in Section 3 Before you save the file you should sort the file by the key variables and then save this file as hh file3 dta 1 Sort the file by the key variables Type in the Command window Sort district vil hh 2 We no longer need the variable merge so it should be dropped Type in the Command 80 Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation Document the do file with comments window drop _merge Click on File Save As Filename is hh file3 Click on Save Copy the three commands and paste them into the do file 7 Close the log file Type in the Command window log close 8 Copy this command into the do file DN ee oe Remember to save the contents of the Do file Editor to a permanent file so you can use it another time 1 The Do File Editor should be the active window 2 Click on File Save As 3 Use the filename session2 The do extension will be added automatically This file now contains all the co
65. ariables to be used for combining cases in the collapsed file Any cases from the original file that have identical values for all 3 of the grouping variables will be combined into a single case in the collapsed file We want the collapsed file to have one case per household so we use the variables that identify a household in our survey district vil and hh 4 Click on the copy button switch to the do file editor paste the command switch back to the dialog box and click on Ok 5 Add a comment in the do file to explain what you have done The Stata command is collapse sum ae by district vil hh Collapse creates a new working file The new working data file is at the household level with one case per household The variable ae is the total adult equivalents for that household Look at the resulting file click on the Data Browser tool You should see four variables with only one case per household You can also look at the variable definitions using the describe command The computed variable ae does not have a very descriptive label any more so we need to change the label to reflect what the variable is 1 Click on Data then Data utilities then Label utilities then Label variable 2 Inthe Variables box select the name of the first variable ae 3 Inthe Attach label to variable up to 80 characters box type Adult equivalents per household 4 Click on the copy button switch to the do file editor paste the co
66. ariables will be created with names of quartilel quartile2 quartile3 4 We want all the information in just one variable so we will create another variable and fill it with the information from the variables created above If you created a variable called quart following the instructions above you will need to drop it 77 Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation before proceeding the command is drop quart The next step you are familiar with We create a new variable and fill it with system missing generate quart 5 We now replace the data in quart with the data in the temporary variables Remember we must rerun the levelsof command as well since the data are temporarily stored in memory Type the following lines block and run them replace values in quart with information from the 3 quartile variable created above levelsof district local levels foreach z of local levels replace quart quartile z if district z This commands cycle through the values for z and replaces the contents of quart with the contents of quart if district is equal to 1 in the first loop then replaces the contents of quart with the contents of quart2 if district is equal to 2 in the second loop then replaces the contents of quart with the contents of quart3 if district is equal to 3 for the final loop 4 The next step is to delete the temporary variables
67. ata file click on Yes 2 Select census dta 3 Click on Ok Remember to copy the command and paste it into the do file editor Use the Browse button to look at the data There is one observation for each state The variable called pop is the total population for the state The variable called medage is the median age of the population First let s get the population weighted mean 1 From the Statistics menu select Summaries tables amp tests then Summary and descriptive statistics then Summary statistics The Summarize Summary Statistics dialog box opens 2 Select medage in the variables box 3 Click on the weights tab Note that only 3 types of weights are available to choose from There is also a help button on weights 4 Select Analytic weights 5 Click on the copy button switch to the do file editor paste the command switch back to the dialog box and click on Submit Look at the output The sum of the weight is 225 907 472 This is the population of the U S in the 1980 census The weighted mean is 30 11 Now return to the dialog box 6 Click on None under the Weight tab 7 Click on the copy button switch to the do file editor paste the command switch back to the dialog box and click on Ok The unweighted mean is 29 54 The Stata commands are use census dta clear summarize medage aweight pop summarize medage Survey weights are discussed in the next section An indicator var
68. ation minimum value and maximum value list lists observations keep retains in memory only those variables or cases specified drop discards from memory all variables or cases specified tabulate generates one and two way frequency tables tab1 generates one way table for each variable specified after the command log using filename saves all commands and related output into specified file the default format is SMCL for Stata Markup and Control Language file is given extension smcl log using filename text saves all commands and related output into an ASCII file with extension txt log off on close off temporarily suspends the log file switches it off on switches the log on and close closes the log file log using filename append adds subsequent commands to an existing log 113 Stata 11 Sample Session Annex I I Survey Instrument Command Description log using filename replace saves all commands and related output into the specified file overwriting said file if it already exists By opening a log file with cmdlog instead of log you record only what you type in the command window results are suppressed The same basic syntax applies for both cmdlog and log You can open both an smcl file and a log file help command accesses help feature of Stata sorts observations in ascending order according to the specified variable note 1 allows you to enter notes about the dataset note varna
69. be in the box under File name above the Save as type Stata data DTA drop down box Since dta in the File name area is blocked you can immediately start typing the new file name 2 Type qla age The DTA extension will be added automatically 3 Click on Save to run the command The Stata command is save qla age dta Copy this command from the Results window to the do file editor We do not want to include the specific directory so delete the part of the command that references the specific directory If we want to share the do file with another colleague that person will only have to change the initial cd change directory command at the beginning of the do file to be able to run the do file To be able to rerun the do file successfully an error will be generated when the processor reaches this Save command Stata will not save a file if the file already exists on disk To make sure the command will run you must add further instructions i e replace save qla age dta replace Now each time the data file Q1A AGE DTA is opened the age_gp variable as well as the other two age_gp1 and agecat will be included You might want to analyze this new categorical variable using the tabulate command to determine how many people in each age group are heads of households spouses or children 1 From the menus click on Statistics Summaries Tables amp Tests Tables Two way tables with measures
70. being at a lower level there are multiple cases per household If you are not familiar with the concept of levels of data read Computer Analysis of Survey Data File Organization for Multi Level Data by Chris Wolf before continuing on with this section This paper is available at http www aec msu edu fs2 survey index htm The analysis we did in Section 1 was done at each level separately using just the variables in a single file at a time However other types of analysis require combining data from more than one file Let s look at an example Suppose we want to create a table of calories per adult equivalent produced per day from the principal food crops Furthermore we want to see how this varies by district and calorie production quartile TABLE 1 Food Production in calories per adult equivalent per day Districts Monapo Ribaue Angoche Calorie Production Quartile 2 3 4 The data in their current form cannot produce this table Many transformations are required to restructure the data to be able to provide the results for this table The above table is an example of the complications you will encounter in real world data analysis This entire section will be devoted toward the goal of creating this table To begin let s look at the files we have and at the variables we need to use from each of these 52 Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation e
71. ber of adult equivalents per household 66 Create a variable with the adult equivalent for each person The generat cif command morarenta oas e n E E ELER E E AER The replaces E COM Ma e a a a a T AE nA Replace missing values with a mean value oo cece cseseseseeeseeceseseseseseeceueseseseeesteneeneaeseeeeees Calculate the adult equivalents for the household 0 ec cece csesesesceseseseseeeseeneseseseseseeeesenens The collapse command 42 202 c0 shay anton biloba asad a a aa aa aa ara iae Step 3 Merge the two files created in steps 1 amp 2 to compute calories produced per adult equivalent 72 The merge COMMAN errero AEE E TEA RN EEEE R EN E AKRE Calculate the total calories produced per adult equivalent per household for the year Computing q artilesS scsnireacin tenean i ea nent an ee The Xile commMandsUsing theca See chat eet i eras tb ete ietace ttre ee eg eens aetncebeteiaracie etnies adimeaietes ates eae The for z in num 1 3 looping command 00 0 cee ecessssesesceeesseesesesesseseneseseecessseseseecaesseseseecaesceeeseneseess The foreach looping command 0 ceccccccscsesesssseseseseesssesesescecssesesesesesueseseseseseecesesesesesesesseseseseaesceeaeneaeess The llevelsof commandsiscic0 onara capa dee ee oie dvs else eddie ea ME neg Re ees Examples of the foreach looping COMMANG 0 0 ce ceccsssesesessesesesescseseeseseecsescessseseseseseeseseseseecstetesenereseeeeees SECTION 3 Tables and Other Types of An
72. ble tabulate one and two way tables of frequencies tabl produces one way tabulation for each variable tab2 produces two way tabulations of all combinations of the variables Let s compare the tabulate command with the table command to create two way tables Create a do file with all the proper commands at the beginning of the do file Refer to the do files you have already created You can copy several of the commands that you need and comments Remember to start the log file for this session Open the member file we created from Section 1 that contains the age variable q1a age dta 88 Stata 11 Sample Session Section 3 Tables and other Types of Analysis frequency row percentage column percentage head Age group 11 to 19 20 to 6 Tara 86 Zee 47 25 8 06 90 9 26 44 184 25 63 68 15 4 0 0 00 83 0 00 55 38 46 11 20 37 2 270 py ae 41 100 00 100 ROND File Open Select q1a age dta Click on Open Copy the command paste it into the new do file and add comments First do a simple two way table using the tabulate 1 ies From the menus click on Statistics then Summaries tables amp tests then Tables then Two way tables with measures of association The tabulate2 Two way tables dialog box opens In the Row Variable box select ca2 In the Column Variable box select age_gp Under Cell Contents click in the box next to Within column relative frequen
73. ble The recode Recode categorical variable dialog box opens 3 Under the Main tab click in the Variables box and select all the variables that start with h64 e g h64a h64b h64c h64d h64e h64f h64g h64h 4 Inthe box for Required type 2 0 3 Click on the copy button switch to the do file editor paste the command switch back to the dialog box and click on OK The Stata command is recode h64a h64b h64c h64d h 4e h64f h64g h64h 2 0 The values in the h64x variables are either 0 no or 1 yes We are ready to count the number of crops that increased in sales 1 Select Create or change variables from the Data menu 2 Select Create new variable extended The egen Extensions to generate dialog box opens 3 Under the Main tab type the name of the new variable in the Generate Variable box ncrops 95 Stata 11 Sample Session Section 3 Tables and other Types of Analysis The tabstat command 2 Multiple response 4 For the egen function box scroll down and highlight Row total 5 Inthe box for Generate variable as type select integer 6 Click on the egen function argument Variables box type h64 4 Click on the copy button switch to the do file editor paste the command switch back to the dialog box and click on OK The Stata command is egen ncrops rowtotal h64 Now you can do a frequencies on the new variable tabulate ncrops Two households had increased sales on all crops
74. bles Stata 11 Sample Session i Data Editor Edit c hh dta File Edit Data Tools 5 id Sa 3 AS tats district 1 1 adistrict vil monapo netia monapo netia sjousdeus 1e monapo netia monapo netia monapo netia monapo netia monapo netia wannuw amp WN monapo netia 2 3 4 5 6 7 8 9 monapo netia H monapo netia H H monapo netia H N monapo netia H w monapo netia H D monapo netia H in monapo netia H o mananan natia a a y nnwWwW te A FW OW WwW DN amp VN Section 0 File structure and Basic Operations for Stata 11 DER lack of 1 land lost security security Vars 41 Obs 343 Filter Off Mode Edit Note that some of the data are words and others are numbers Those variables with words are showing the value labels for the values If you want to see the values rather than the labels place your cursor at the top of one of the columns and lt right click gt Copy Ctrl C Select All Ctrl A Variable Properties Replace Contents of Variable Sort Hide Selected Variables Show Only Selected Variables Keep Only Selected Data Drop Selected Data Value Labels Preferences Font Choose Value Labels Hide all Value Labels If you want to hide variables or see only certain variables there are options to Hide Selected Variables or Show Only Selected Variables You c
75. cies to puta v Click in the box Y next to Within row relative frequencies Click on the copy button switch to the do file editor paste the command switch back to the dialog box and click on OK The Stata command is tabulate ca2 age_gp column row Below is the output 60 61 and ol Total 296 41 343 30 11 95 00 00 13 83 67 22 e941 280 5 310 32 1 61 00 00 59 10 20 20 39 31 0 718 32 0 00 00 00 94 0 00 47 24 5 il 6 33 16 67 00 00 80 2 04 0 39 16 2 143 19 1 40 00 00 55 4 08 9 41 628 49 1020 32 3 22 00 00 00 100 00 00 00 89 Stata 11 Sample Session Section 3 Tables and other Types of Analysis The table command Let s use table to produce a similar table However with the table command we cannot ask for row or column percentages the table command is generally used for summary statistics Frequency and Totals are possible to select from this command 1 From the menus click on Statistics then Summaries tables amp tests then Tables then Table of summary statistics table 2 Under the Main tab select ca2 in the Row variable box 3 Click in the box Y next to Column variable and select age_gp in the box below 4 Inthe Statistics section 1 select Frequencies from the drop down box 5 Under the Options tab tick Add row totals and also tick Y Add column totals 6 Click on the copy button switch to the do file editor paste the command switch back to the dialog box a
76. command is This command creates a file that records only the commands In the Stata cmdlog using session1 append Command window type A file is opened which is named session1 txt and information will be appended to anything that already exists in this file To close the log in the Command window type log close Reminder The log file that is written in SMCL format can only be opened in Stata It is a specific format as mentioned earlier If you want to share your commands and results from the log files with another person who might not have Stata you should save your log files in the TEXT format with the extension of log Any editor or word processor can open this file However in the word processor the font must be set to a fixed font such as Courier New Otherwise the output will be difficult to read Stata commands Extension do A do file contains commands that Stata can execute The do file is created in the Do file Editor The user can type commands or paste commands into the editor Other ways to create a do file are a You can create a log file that contains only the commands using the cmdlog command see above b You can select the Review window click the right mouse button and select Save Review contents The extension do will be automatically added to the file name you enter into the File name box c You can copy the command to the clipboard using the optio
77. ction 0 please do so now to clarify the concept of the Command Window the Review Window the Results Window the Do file Editor and the Viewer Data from questionnaires that has been entered into Stata are stored in what are called data files If we want to work with a set of data we must open the corresponding data file so that it is available to the program The working directory is the directory where your data files are stored You can use the cd command to change to the directory where you have placed the data files you want to use In the Command Window type cd name of working directory Changing to the directory where the files are located eliminates the need to include the directory name in the do file that we will be creating If the directory you are changing to has spaces the directory must be enclosed in quotes Example of a directory name with spaces C Documents and Settings My Documents data You can also set the working directory through the menus From the menus select File Change Working Directory A dialog box opens in Browse mode Find the directory that you want to work in and then click on Ok In the Results window the cd command has been executed We can copy that command to place it in our do file so that if we share our do file with another person the command can be modified to fit the directory structure that person is using Example cd C Documents and Settings My Documents StataTr
78. d created a new variable cprod_tt which we calculated by summing cprod_tt total calories produced across all cases all the different food crops for each household The only variables which are contained in a collapsed file are the grouping variables and any new collapsed computed variables created e g cprod_tt Remember to close the browser before you continue Stata automatically added a variable label which is the function and variable used to create the resulting new variable You can look at the variable definitions using the describe command The computed variable cprod_tt does not have a very descriptive label any more so we need to change the label to reflect what the variable is 1 Click on Data then Data utilities then Label utilities then Label variable 2 Inthe Variable box cprod_tt should be selected 3 In the New variable label may be up to 80 characters box type Calories Produced in Staple Foods 4 Click on the copy button switch to the do file editor paste the command switch back to the dialog box and click on the Ok button 65 Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation 5 Run the describe command again The Stata commands are describe label variable cprod_tt Calories produced in staple foods describe The new working data file now contains what we need total number of calories from staple foods produced per household We c
79. d to see if a 20 liter can filled with maize grain has a conversion value of 18 kilograms prod 47 unit 8 The Stata command is list prod unit conver if prod 47 amp unit Note Two equal signs are required The two equal signs distinguish relational equality from the exp assignment phrase For example if you want to create a variable where you will be assigning values to that variable you will use an expression exp and need only 1 equal sign example gen newvar oldvar 2 5 In the above example prod already has values and we want to see only records where prod has the value of 47 Therefore it is a relational equality and we must use 2 equal signs e g show me only records where prod 47 and unit 8 We should also run a tabulate on the _merge variable as well to look at how the merge was done tabi _merge 59 Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation Compute total kilograms produced The generate command From the output you should see there are the same number of records in the file as there was before the merge i e 1 693 Note that there are 27 cases where there was not a match for the prod unit combination in the look up file How would you specify the list command to look at these 27 cases You would want to investigate further to see if the records without a look up value are crops that you want to have included
80. data file produced by the merge now contains the needed calorie variable calories but check to make sure Maize grain PROD 47 should have 3590 calories per kilogram in the calories variable We can browse the data and or we can use the list command again The Stata command is list prod calories if prod 47 Also check the _merge variable to see how the merge was 61 Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation Calculate the total calories produced Assign variable labels done tab1 _merge Note that there are 87 cases with no value in the calorie variable How would you check to see which products have no calorie value We can now compute total calories produced 1 Select Create or change variables from the Data menu 2 Select Create new variable The Generate Create a new variable dialog box opens We have used this dialog box earlier To clear the contents click on the Reset icon in the lower left corner of the dialog box 3 Under the Main tab change the Variable type to double 4 Type the name of the new variable in the Variable name box cprod_tt 5 For the Contents of new variable box type in qprod_tt calories 6 Click on the copy button switch to the do file editor paste the command switch back to the dialog box and click on Ok 7 Add a comment in the do file to explain what you have done The Stata command is
81. e Next let s look at the values and labels for the variable district label list district To make 3 indicator variables we can type tabulate district generate district Now run the describe command again describe Three new variables have been created called district district2 and district3 We can examine the variables using the tab1 command tab1 district The variables district1 district2 and district3 can now be used for regression analysis as dummy variables They contain either a 0 ora 1 101 Stata 11 Sample Session Section 4 Tables and Graphs Survey estimation Stata 11 SAMPLE SESSION SECTION 4 Table and Graphs how to bring them into a word processor and survey estimation accounting for design effects How to move Stata results The objective of this section is to give you the tools necessary to prepare reports i e to learn how to move Stata results into into other appl ications other applications The method is simple once a graph or a table has been produced it can be printed or incorporated into reports prepared using word processors or publishing programs Incorporating tables from Stata can be done using the copy and paste procedure You should save the log file as well in case you need other tables that were created Find the table in the session3 scml file that showed the count of the Tables relation of head to age group cross tabulation 1 Click on File th
82. e 1502 242 2554 488 4062 3014 7607 719 3900 7966 angoche 1297 9691 2465 509 3698 807 8495 49 3950 2608 Total 1352 5022 2519 7353 3919 3795 8399 3828 4014 5181 91 Stata 11 Sample Session Section 3 Tables and other Types of Analysis Notice that the number of decimals is not uniform We can fix that with the tabstat command 1 From the menus click on Statistics then Summaries tables amp tests then Tables then Table of summary statistics The table Tables of summary statistics dialog box opens 2 Press the Reset button R to clear the boxes Under the Main tab select district in the Row variable box 3 Click in the box Y next to Column variable and select quart in the box below 4 Inthe Statistics section 1 select Mean from the drop down box 5 In the box to the right specify the variable to use for the Mean statistic cprod_ae We would also like to see the minimum and maximum values 7 Click on the drop down box next to 2 and scroll down to Maximum and select that statistic For the variable select cprod_ae 8 Click on the drop down box next to 3 and scroll down to Minimum and select that statistic For the variable select cprod_ae 9 Under the Options tab check Add row totals and also check Add column totals 10 To format the numbers check Y next Override display format for numbers in cells Click on the Create button to the right of this box 11 Inthe Create a display format
83. e do file editor paste the command switch back to the dialog box and click on Ok 5 Add a comment to explain what you have done The Stata command is keep if prod 5 prod 6 prod 30 prod 31 prod 41 prod 44 prod 47 Only cases with these product codes will now be used for analysis Note that 464 observations were dropped You can use the tabulate command to verify that you now have only 7 crops in the file In the Command window you can easily type There should be 1 239 cases remaining for the 7 crops that are considered staple foods Now we need to know how many calories were produced per household for all these 7 staple food products combined To do this we need to sum for each household the values of cprod_tt for all of the food crops the household produced In other words we need to create a new household level file from the current household product level file where there is only one case per household Stata uses the command collapse to aggregate the number of cases at one level to a new level We will sum all the cases for each household to create just one case for household Create a new file which is a To create the new household level file we use the household level file rather than a command collapse Stata always uses the working data household product level file file as the file to be collapsed The collapse command 1 From the Data menu select Create or change variables then selec
84. e options available to manipulate the data within the data editor from the Data menu These options will be discussed later in the tutorial Edit Tools i 23 Describe data gt Create or change data gt Variables Manager 1 Data utilities gt 2 Sort gt 3 Combine datasets b i Matrices Mata language 6 Matrices ado language gt Z Other utilities gt If you want to change a value directly within the data editor a dialog box opens to warn you that you are changing a value and asks if you want to continue 14 Stata 11 Sample Session Exercise Exiting the Data Editor Saving the Stata Data File The save replace command Section 0 File structure and Basic Operations for Stata 11 Exercise Change the value to 1 in the hh column where hh 2 Clicking on Yes will change the value No will cancel out If you switch back to the Stata window you will see a command that has been run that replaces the value Now switch back to the data editor By default the value labels are displayed To look at the values lt right click gt on the table and choose Hide all Value Labels Another option available with the lt right click gt is to Manage Value Labels where a value label can be defined dropped or added We will discuss later value labels when we create new variables A description of all the variables is also available within the Data editor Click on Data Variable Manager Or click
85. e using cd C Documents and Settings aec_user My Documents StataTraining data turn off more so the whole file will run set more off kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk Step 1 KKK KKK KKK KKK ERE RK KERR K RRR KR RREREKE REE 82 Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation open production data file use c q4 dta clear sort variables to match by to merge in the conversion value to convert to kgs sort prod pla tab1 prod pla rename the pta variable to unit to match the conver data file rename pia unit joinby prod unit using conver dta unmatched master _merge _merge check to be sure merge done correctly tab1 _merge check to see if got what was expected using list command list prod unit conver if prod 47 amp unit 8 calculate kgs produced generate double qprod_tt p1b conver merge in the lookup conversion value for calories and calculate total calories drop _merge joinby prod using calories dta unmatched master __merge _merge check to be sure merge done correctly tab1 _merge compute total calories produced generate double cprod_tt qprod_tt calories add variable labels label variable qprod_tt Total production in kgs label variable cprod_tt Total calories produced select only staple crops keep if prod 5 prod 6 prod 30 prod 31 prod 41 prod 4
86. e we need to look at the variables sex ca4 and age Ca3 to calculate adult equivalents The rules we will use for calculating adult equivalents for 66 Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation Create a variable with the adult equivalent for each person The generate if command The replace if command this survey are Males 10 years and older 1 0 Females 10 to 19 years old 0 84 Females 20 years and older 0 72 Children under 10 years old 0 60 We will use the Generate If command to compute the adult equivalents for each member We will name the adult equivalent variable that we create as ae l 2 Soy Su Select Create or change variables from the Data menu Select Create new variable The generate Generate a new variable dialog box opens Under the Main tab type the name of the new variable in the Generate Variable box ae For the Contents box type the value of 1 Click on the if in tab Type the statement ca4 1 amp ca3 gt 10 Click on OK Now that the new variable has been created another command is used to assign the codes for the other adult equivalent groups that have not yet received a value We use the replace command 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Select Create or change variables from Data Select Change contents of variable The replace Replace conten
87. eber at webermi msu edu Four portions of the questionnaire are referenced each of which has a corresponding Stata data file Two other Stata data files are required for conversion of units of measure Questionnaire Section Stata Data File Main Household Section c hh dta Table IA Household Member Characteristics c qla dta Table IV Characteristics of Production c q4 dta Table V Sales of Farm Products c q5 dta Conversion factors for computing kilograms conver dta Conversion factors for computing calories calories dta This training consists of four sections each of which should take approximately two hours We recommend that you complete each section in a single sitting These tutorial materials make the following assumptions e You know how to use Windows with a mouse e The six data files listed above should be stored in a directory of your choosing on your hard disk Important Always remember to SAVE the changes to the data after each exercise and section using a new file name Also you may want to save Review window contents to a do file if you have not been copying commands to a do file already You may also want to save your log file created during each session 18 Stata 11 Sample Session Section I Basic functions Files Descriptives Data Transformations Data files and the working file Working Directory The cd command Open your Stata software If you have not read or completed Se
88. ee ENE Mle AEE AAE Someta tel The doedit command Discussion of the Windows used in STATA The Do tile Editori nane nenea a a aa a a a h eee The Data Editor Window 0 0 The edit command eee Saving the Stata Data File The save replace comman4d The Brower Window 0 0 0 000 The browse commandes 24 20 uaa ins NERS eek Bree ito PR Ea eg EEO I AI Aa Ss The Stata Results Window rinra Me oh Sa aati Ra a a a a a R S The Command Window The Viewer 9scce8 sears to heb eiaicetee oes Sesh ea Rede Dyin aie Re Maa net eda Weide fanalvlede was desea bs cheer ANA Stata Graph WiINdOW o eini ete ee Rete odes aber ate ate ate eee aie anaes Summary of the Basic File Types oo cee esessessseseeseeteseeesnceesesseseescecsucessuceeseeseacsnsessuceeseeeeeeaneaeanees SECTION 1 Basic functions Stata files Descriptives and Data Transformations MAE OGU CTI ON eied a aaan na ae a bane ts eect gt ale ee sae Data files and the working file Working Directory oo eecccceseeseeteseseeeenens Hasta Woh ofa aa T al eee peer ee Pet et eee et O E eee er Openinga data Tei eco M25 i caste heeeesh sees ehisbaateeoes assert teas a aia diebe bibs Maen trades debe hd The use command Describing the contents of a data file The describe COMMA ccs civcsscvedeaz bh Mess Mesh db adie a aa a Mies Mae iaai iaa Data storage types eens Display format cc eeeeeeeteeeeteeees Labels ioe te i e ea Doc
89. eeds to be the product variable The data file has already been sorted by product 60 Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation The drop command see the previous merge so we don t need to sort it again Stata will reuse the _merge variable again with the next join we do so we should drop this variable first since we no longer need it The command to delete a variable is called drop The Stata command is Now we are ready for the next join 1 From the Data menu select Combine datasets then select Form all pairwise combinations within groups The joinby form all pairwise combinations within groups dialog box opens 2 To fill in the box labeled Filename of dataset on disk click on the Browse button Select the filename calories dta and click on Open 3 In the box labeled Join observations by groups formed from specific variables select prod only 4 Click on the Options tab 5 Under Unmatched Observations select Include from data in memory This option will keep cases in the original data set that do not have a match in the lookup data set 6 Click on the copy button switch to the do file editor paste the command delete the directory reference switch back to the dialog box and click on Ok 7 Add comments to the do file The Stata command is joinby prod using calories dta unmatched master _merge _merge The new working
90. en Log then View In the Choose File to View dialog box click on the Browse button 2 Select session3 smcl Click on Open Then click on Ok 4 Locate a table that you want to copy to your word processor Use your mouse to block the table 5 Press Ctrl C copy This key sequence copies what you have blocked 6 Now open your word processor software if it is not already open 7 Place your cursor where you want the table to appear Press Ctrl V paste to paste the table 8 In your word processor block the text that you just pasted Now change the font to a fixed font e g Courier New or Letter Gothic Click on Format Font and select the font The size of the font may need to be adjusted depending on the margins of your paper The default will be 12 and you may want to select 10 or 9 or 8 iv Below is an example of a table copied into a word processor before the font is changed to a fixed pitch relation to age gp head Otol0 11to1l9 20to 60 61 and older Total beh thre et ia BSA a ah tS Ut Bat sal At ha A aS tel a a head 6 296 4 343 wife husband 25 280 5 310 son daugher 503 184 31 718 mother father 5 1 6 other relative 70 55 16 2 143 Total 573 270 628 49 1 520 102 Stata 11 Sample Session Section 4 Tables and Graphs Survey estimation Below is the same table after the font is changed to a fixed pitch and the font size is adjusted so that the table will fit on the page
91. entation generated by the Stata processor They do not contain data Log files are made accessible to Stata with a File View command The extension is smcl Data files contain data including original survey variables plus any new variables created through various Stata commands such as the generate command Data files are made accessible to Stata using a File Open command from the menus or typing the command in the Command window Stata 11 Sample Session Section I Basic functions Files Descriptives Data Transformations Stata 11 SAMPLE SESSION SECTION 1 Basic functions Stata files Descriptives and Data Transformations Introduction This is a self paced training aid designed to introduce the commands needed for some typical statistical survey analyses using Stata 11 This tutorial is intended to be a stand alone training tool To use it most effectively you should ask a knowledgeable STATA user to help you get started and to answer questions as you work independently through the session It can also be used as a guide for classroom training A copy of the questionnaire on which the data is based can be found in the Mozambique project 1992 NDAE Working Paper 3 A Socio economic survey of the smallholder survey in the province of Nampula Research Methods copies of the three tables which were made available and can be found at the end of the manual in the annex section for further information please contact Dr Michael W
92. er 4 Select Create or change variables from the Data menu 5 Select Create new variable The generate Generate a new variable dialog box opens 6 Under the Main tab type the name of the new variable in the Generate Variable box age18p 7 For the Contents box type in ca3 gt 18 8 Click on the Generate variable as type drop down box and change to byte 9 Click on the copy button switch to the do file editor paste the command switch back to the dialog box and click on Ok 10 Runa tabulate to look at the results Note if there had been a missing value for an observation that observation would have been assigned a value of 1 It would have been better to put a qualifier on the command to assign the values to cases where ca3 was not missing e g ca3 lt 100 Stata 11 Sample Session Section 3 Tables and other Types of Analysis Converting categorical variables to indicator variables generate byte age18p ca3 gt 18 if ca3 lt tab1 age18p Then any missing values in Ca3 would also be missing in the new variable age18p Suppose that you want to do regression analysis and control for effects of the different geographic regions We have a variable called district which has 3 categories We want to create indicator variables for the three districts These types of variables are also called dummy variables First let s run the describe command to look at the contents of the file describ
93. from a blank file and build all the commands necessary to produce the calories retained or you can copy the commands used to generate the table from section 2 and adjust the commands as necessary to calculate the calories retained Changes must be made for file names and variables Computing the calories sold involves the same basic steps as computing the calories produced Step 1 Average calories sold should be 1 407 493 Merge this newly created file the file containing calories sold with the file containing calories produced hh file3 dta Check the merge variable tab merge and explain why you see a value of 2 and a value of 3 Keep in mind that only 257 households sold products but all 343 households produced and retained calories If the calories sold variable is missing it means the household did not sell food so it should be recoded to zero Compute calories retained calories produced calories sold The average calories retained per adult equivalent for the whole population should be 3044 261 Rank into quartiles Use the Tabulate command to show calories retained by district and quartile Save the data file to the name hh file4 dta Save the contents of the do file editor to a new name reflecting the name of the exercise Below is an example of the output you should produce 86 Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation
94. generate double cprod_tt qprod_tt calories Note that missing values were generated for 131 cases The two new variables do not yet have variable labels To assign a variable label 1 Click on Data then Data utilities then Label utilities then Label variable 2 Inthe Variable box select the name of the first variable qprod_tt 3 In the New variable label may be up to 80 characters box type Total production in kgs 4 Click on the copy button switch to the do file editor paste the command switch back to the dialog box and click on the Submit button Clicking on the submit button leaves the dialog box open so we can then define the label for the cprod_tt variable without having 62 Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation Select only staple food products The keep if command to select it again from the menus 5 In the Variable box select the name of the second variable cprod_tt 6 In the New variable label map be up to 80 characters box type Total calories produced 7 Click on the copy button switch to the do file editor paste the command switch back to the dialog box and click on the Ok button 8 Add a comment to the do file to explain what you have done The Stata commands are label variable qprod_tt Total production in kgs label variable cprod_tt Total calories produced This gives us a new working data file with t
95. ges to assign values to the other categories In the If expression box change the criteria to ca3 gt 19 amp ca3 lt 60 Click on the Main tab and type 3 in the New Contents box Click on the copy button switch and paste in the do file editor switch back and click on Submit The dialog box remains open and the command is run 629 real changes made Type 4 in the New Contents box Click on the If In tab In the If expression box change the criteria to 42 Stata 11 Sample Session Section I Basic functions Files Descriptives Data Transformations The label variable command ca3 gt 60 20 Click on the copy button switch and paste in the do file editor switch back and click on Ok The Stata commands created and run are generate byte age_gp 1 if ca3 gt 0 amp ca3 lt 10 949 missing values generated replace age_gp 2 if ca8 gt 10 amp ca3 lt 19 271 real changes made replace age_gp 3 if ca8 gt 19 amp ca3 lt 60 629 real changes made replace age_gp 4 if ca3 gt 60 49 real changes made Note that the Results window also shows how many observations were modified after each command was run The next step is to verify that the changes were made correctly Run the Tabulate command on the new variable 1 From the menus click on Statistics Summaries Tables amp Tests Tables One way tables The Tabulate1 One way Tables dialog box opens 2 For the Categorical Vari
96. haracters box type Calorie production quartile 4 Click on the copy button switch to the do file editor paste the command switch back to the dialog box and click on Ok The Stata command is label variable quart Calorie production quartile Examples of the use of the foreach command are Computing new variables foreach var of varlist inc1 inc12 generate tax var var 10 Collapsing across variables 79 Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation foreach qtr of numlist 1 4 local m3 qtr 3 local m2 qtr 3 1 local m1 qtr 3 2 generate incgtr qtr inc m1 inc m2 inc m3 This command computes the quarterly income variables incqtr1 incqtr4 using the foreach command Display the final output table We can now display a table showing the average caloric production in quartiles for each of the districts 1 From the menus click on Statistics then Summaries tables amp tests then Summary and Descriptive Statistics then Summary statistics The summarize Summary statistics dialog box opens 2 Inthe Variable s box select cprod_ae Click on the by if in tab 4 Click in the box Repeat command for groups defined by 0S 5 Inthe box below this option select district quart 6 Click on the copy button switch to the do file editor paste the command switch back to the dialog box
97. hat each household has only one line of information and the three animal types appear as three different variables Such a file would be the wide form of the data The file as it is organized now is the long form of the data The following reshape command converts the file from long to wide form such that each animal code is now a variable and the file becomes a household level file reshape wide num i hh j animcode list nol nod noo hh num330 num331 num335 206 70 F 217 65 8 221 1200 200 When followed by this next command the file is re converted from wide to long But note that the file has become rectangularized that is the three animal codes now appear for each household reshape long num i hh j animcode list nol nod noo hh animcode num 206 330 206 331 70 206 335 217 330 217 331 65 217 335 8 221 330 1200 221 331 200 221 335 The command fillin would have also generated the same rectangularized file as in the preceding example Do file suggested commands to place at the beginning of a do file to set the parameters before starting to work 1 Commands in a do file may be delimited by a carriage return or a semi colon To set the semi colon as the delimiter the command is delimit This command will only work in a do file The delimiter cannot be changed from the console If you wish to revert back to the carriage return as the delimiter the command is delimit cr 2 The next c
98. he copy button switch to the do file editor paste the command delete the directory reference switch back to the dialog box and click on OK 6 In the do file insert comments to remind you what you ve done The Stata command is Merge 1 1 district vil hh using hh file1 dta In the results window you see a summary of the number of observations not matched and matched Now that you have run the merge run a tabulate on the _merge variable You can abbreviate the name to _m e g tabulate _m You should see only the value of 3 for 343 observations That means that there was an observation for each district vil hh combination in each of the two files Merge Files created a new working data file The two variables you need to compute calories produced per adult equivalent are now in the working file Total calories produced cprod_tt per household for the year divided by total adult equivalents per household ae divided by 365 days per year gives us calories produced per adult equivalent per day cprod_ae 1 Select Data then Create or change variables then from Create new variable The generate Create a new variable dialog box opens 2 Ifyou see information in the dialog box click on the Reset icon to clear the contents 3 Under the Main tab type the name of the new variable in the Generate Variable box cprod_ae 4 Change the Variable type to double 5 For the Contents of new variab
99. he do file Several days or weeks from now you may not remember Comments in a do file start with slash and then an asterisk and end with an asterisk and a slash this is a comment which runs to two lines Stata will not run multiple lines as a command if it begins with these symbols To create a two way table do the following 1 From the menus click on Statistics Summaries Tables amp Tests Tables Two way tables with measures of association The Tabulate2 two way tables dialog box opens 2 In the Row Variable box choose ca2 from the drop down choices 3 In the Column Variable box choose ca4 from the drop down choices We would like to see row percentages and column percentages 4 Under Cell Contents click in the box next to Within column relative frequencies to put a v 5 Click in the box V next to Within row relative frequencies 6 Click on the copy button switch to the Do File Editor and paste the command Write a comment and then switch back to the dialog box to click on the Submit button The command will be 37 Stata 11 Sample Session Section I Basic functions Files Descriptives Data Transformations Key frequency row percentage column percentage relation to head executed The Stata command is tabulate ca2 ca4 column row The Key box in the Review window specifies which statistics appears on each row in the cells head wife husband
100. he first button the right Create value label Another dialog box opens 3 In the Label name box where there is a prompt lt Enter new label name here gt type age_gp 4 In the Value box type 1 In the Label box type 0 to 10 Click on the Add button below the Label box The dialog box remains open 5 Continue defining the labels for the values Type 2 in the Value box and in the Label box type 11 to 19 and click on the Add button Type 3 in the Value box and in the Label box type 20 to 60 and click on the Add button Type 4 in the Value box and in the Label box type 61 and older and click on the Add button 6 All the values have been assigned a label To close the dialog box click on the Ok button to close the Create label dialog box 7 Click on the Close button to close the Manage value labels dialog box As you can see in the Results window the Stata command is label define age_gp 1 O to 10 2 11 to 19 3 20 to 60 4 61 and older Copy this command to your do file The command creates a label name age_gp and defines the labels for the four values 44 Stata 11 Sample Session Section I Basic functions Files Descriptives Data Transformations The label values commands Second method 8 1 BUP 10 11 Now that the label has been defined we can assign this label to the variable we created with the 4 categories Click on Data then Data utilities then Label utilities then
101. ht Plot 2 and click on Edit 7 Change the type of plot to quadratic prediction plot w Cl Click on the Accept button 8 Click on the Submit button to view the graphic What are these graphs telling you 9 Ifwe want to see the distribution by district click on the By tab In the Variables box select district 10 Click on the OK button to view the graphic What are these graphs telling you The Stata commands are twoway scatter ae cprod_tt twoway scatter ae cprod_tt by district twoway scatter ae cprod_tt Ifit cprod_tt twoway scatter ae cprod_tt Ifit ae cprod_tt by district twoway scatter ae cprod_tt qfitci ae cprod_tt twoway scatter ae cprod_tt qfitci ae cprod_tt by district Stata provides statistical commands that have been developed specifically for survey analyses The Stata User s Guide discusses these commands as well as the manual called Survey Data Most of these commands begin with the letters svy There are a few of the survey commands that do not begin with these letters Survey data generally have three importance characteristics 1 The weights applied to survey data are sampling weights also called probability weights 2 The sample is clustered 3 Stratification is used in selecting the sample If data meets any one of the above characteristics the survey commands can be used for analysis Briefly sampling weights are used in analysis to give estimators that are appr
102. i level data by Chris Wolf MSU Department of Agricultural Economics This document can be downloaded as a separate document in English or French 2 Data Preparation and Analysis by Margaret Beaver and Rick Bernsten June 2009 CDIE reference number pending Acknowledgments Funding for this research was provided by the Food Security III Cooperative Agreement between the Department of Agriculture Economics at Michigan State University and the United States Agency for International Development Global Bureau Office of Agriculture and Food Security Stata 11 Sample Session Section 0 File structure and Basic Operations for Stata 11 SECTION 0 File structure and Basic Operations for Stata 11 How Stata uses MEMOry 2 2 80dccs used nsn tue ta ibe oan teas di eaia eas maniac a a The set memory commanda anran e deh ea ci arvvetaadevisnedens darn A A Increasing the amount of memory in the middle of a Stata session The drop AML COMMA sassen ena Ea T N A E aT AAE dalek eta etuae Deeddtaa nde Types of files used by Stata and their extension names oes cesses eeseseeteseeeeseeeenenteneneesenss Dat ifil s ninni na aen a a a a aa a aeaa LOG FINES EN E E E A tubes ladda eh dune A E E ote eeaittaa ado bate The log using command Sonates teno aon Eee ions cadet EE beatae Rene a EAE GREER aM dee aes Lhe cmadlog Using commandi asenada haaa a aaa ae a a RS The log clos command sessar o e aE E R A A E AA ATE E EEE E er E DO filesinin ainoa a EEEE
103. iable is a special case of a categorical variable An indicator variable has two groups only whereas other categorical variables can have more than two groups Usually 99 Stata 11 Sample Session Section 3 Tables and other Types of Analysis Converting continuous variables to indicator variables the values in indicator variables are 0 and 1 or no yes Examples of indicator variables are Is a person a citizen of the U S no yes Does a farmer use fertilizer no yes Stata can convert continuous variables to categorical and indicator variables and it can also convert categorical variables to indicator variables Suppose we want to create a new variable that indicates whether a person is 18 years old or older You could have generated a new variable and assigned it a value of 1 if ca3 gt 18 Then you would need a second step to recode the system missing to 0 There is another way to create this variable We will use the file c_q1a dta Open the file and then create a new variable using the generate command following the steps below 1 Click on File then Open 2 Select c_qla dta and click on Open Copy the command to the do file editor Delete the reference to the directory 3 Check to see if there are any missing values in the age variable ca3 Use the list command list if ca3 gt We are checking to see if there are missing values because Stata considers missing values to be greater than any numb
104. in the analysis you are doing and if they are correct the lookup file and or the production file and run your procedure again We can now calculate total kilograms by multiplying the number of units p1b by the conversion factor 1 Select Create or change variables from the Data menu 2 Select Create new variable The Generate Create a new variable dialog box opens 3 Under the Main tab change the Variable type to double 4 Type the name of the new variable in the Variable name box qprod_tt 5 For the Contents of new variable box type in p1b conver 6 Click on the copy button switch to the do file editor paste the command switch back to the dialog box and click on Ok 7 Add a comment in the do file to explain what you have done The Stata command is generate double qprod_tt p1b conver Note that there were 49 cases where no value was generated We had 27 cases with no conversion value Why do the rest of the cases 22 cases not have values Now that the kilograms have been calculated we need to look up the value of a kilogram in calories for each product This information is in the table lookup file called calories dta This file has two variables product and number of calories per kilogram The key variable is product prod In order to add the calorie conversion variable to the working data file we need to do another merge with keyed table lookup joinby This time the key variable only n
105. ind that the average age of heads of households is 41 5277 years while the average age of their spouses is 33 1871 years Four observations have no value for ca2 summarize ca3 by ca2 sort ca2 head Variable Mean Std Dev ca3 41 5277 14 12719 ca2 wife husb Variable Mean Std Dev ca3 33 1871 11 80466 ca2 son daugh Variable Mean Std Dev ca3 8 133844 5 797507 ca2 mother fa Variable Mean Std Dev ca3 48 16667 22 09449 ca2 other rel Variable Mean Std Dev ca3 12 55245 10 06785 ca2 Variable Std Dev ca3 40 12 24745 Stata 11 Sample Session Section I Basic functions Files Descriptives Data Transformations Data Transformations Converting continuous variables to categorical variables The generate command The replace command The label variable command The label define command First method The generate command After examining the results of the descriptive statistics you will often want to do data transformations A data transformation is an operation that takes an existing variable and either changes the values in a systematic way or uses the values to calculate a new variable The following example shows a common data transformation the conversion of a continuous variable to a categorical variable The information we received from the summarize command
106. iously so that you can run them again without the need to rebuild the commands It is important to understand that the commands you put in the Do file Editor will not be executed no output will be produced until you send the commands to the processor The Do file Editor is simply an area that helps you prepare the commands To send the commands to the processor you use the Execute Do icon s in the Do file Editor window toolbar This command runs the commands in the current do file and shows the output in the Results window Another icon to the left of this icon called Execute quietly Run also executes the commands in the current do file but does not show any output in the Results window Choosing either one sends all the command s to the processor which reads the commands written in the Do file Editor and executes them To send only specific commands block the commands you want to send and select the Do icon When you have successfully completed each step in your analysis or when you are ready to end a STATA 11 session even if it was not completely successful you should save the commands to a file for future use To save the commands make the Do file Editor active and select Save from the File menu or click on the diskette symbol on the Tool Bar A file created from the Do file Editor is called the command file It is a file containing only commands it never contains any of the data you may be analyzing with
107. is interesting but it might also be useful to see the actual distribution of the ages into groups or categories so we can tell for example how many heads of household are older than 60 Since the age variable ca3 is continuous we cannot do this directly first we have to transform it Let s suppose we re interested in four categories 0 10 years old 11 19 years 20 60 years and over 60 years of age To categorize a variable we can use the generate command Categorizing a continuous variable makes detailed information more general To keep the detailed information as well as the new general information you must recode the variable into a new variable If you recode into the same variable the original values will be lost There are several methods that can be used to recode a continuous variable First method If you wish to see the category values of 1 2 3 and 4 where 1 0 10 2 11 19 3 20 60 and 4 over 60 you can do the following 1 From the Data menu select Create or change variables Create new variable The Generate Create a new variable dialog box opens 2 Under the Main tab type the name of the new variable in the Variable name box age_gp 3 For the Contents of new variable box type in 1 This is the value that you want the new variable to have 4 Inthe drop down box for the Variable type select byte 5 Click on the If In tab 41 Stata 11 Sample Session Section I Basic functions Fi
108. itch back to the dialog box and click on Submit You can now view the graphic 8 Close the graph and return to the dialog box We want to see the distribution by district Click on the By tab V check the box next to Draw subgraphs for unique values of variables 9 Inthe Variables box select district 10 Click on the copy button switch to the do file editor paste the command switch back to the dialog box and click on Ok button to view the graphic What are these graphs telling you Close the graph Note you could have also added a title and other options as well Graphs can also be overlaid 1 Select Graphics Two way graphs scatterplot lines etc You will see Plot 1 already defined 2 Click on the Create button to define a second plot Click on the radio button next to Fit plots Highlight Linear prediction in the box labeled Fit plots select one 3 For the Y axis select the variable ae For the X axis select the variable cprod_tt 4 Click on the Accept button You now see that Plot 2 has been defined and is highlighted 5 Click on the copy button switch to the do file editor 107 Stata 11 Sample Session Section 4 Tables and Graphs Survey estimation Survey Estimation Accounting for Design Effects paste the command switch back to the dialog box and click on Submit to view the graphic What are these graphs telling you 6 Close the graph Return to the dialog box highlig
109. itor and type levelsof district local levels 3 We can now create variables containing the rank of the household within each district We must type these commands into the do file because the command is multiple lines You are already in the do file editor Type foreach z of local levels xtile quartile z cprod_ae if district z nq 4 z is a local macro name which is set to each value in the variable levels The values we know are 1 2 and 3 In the first loop of this programming command z is equal to 1 in the second loop z is equal to 2 etc quartile z refers to a variable name where the contents of z is appended to the name quart e g quart quart2 quart3 etc district z means that for the first loop district is equal to 1 for the second loop district is equal to 2 etc Very important note The macro name z must be surrounded by a left single quote found in the upper left hand corner or the keyboard to the left of the key with the number 1 and a right single quote found on the key to the left of the lt Enter gt key If you do not use the left single quote you will see an error message that says in red invalid name Be sure that you end the first line with a left curly brace e g and that you place on a line by itself after all commands that you want to be included in the loop a right curly brace e g Since we have 3 districts 3 new v
110. l Use the lt PageUp gt key until the cd command is in the Command window Copy and paste it just above the use command We also want to add comments to define what the purpose of the do file Above the commands that you just pasted insert some lines You can type session basic functions descriptives your name here the current date here example beaver 5 Jan 2009 member level file Other commands that are important and should be included are command to close any log file that may be open clear the memory work space and drop all macro variables An example of the commands that should be added to the do file are change to directory where files are stored cd C Documents and Settings aec_user My Documents StataTraining data 20 Stata 11 Sample Session Section I Basic functions Files Descriptives Data Transformations Describing the contents of a data file The describe command define the log file to capture output name of log file is log_sessionl capture log close log using log_sessionl replace Purpose of do file Author and date Tasks to be done in this do file program setup version 11 clear all macro drop _all 6 Save the do file From the File menu in Do File Editor select Save As 7 Enter the filename session1 The do extension will be added to the name automatically 8 Click
111. le box type in cprod_tt ae 365 6 Click on the copy button switch to the do file editor paste the command switch back to the dialog box and click on OK 7 Add a comment in the do file to explain what you have done 74 Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation The new variable does not yet have a variable label To assign a variable label 1 Click on Data then Data utilities then Label utilities then Label variable 2 Inthe Variables box select cprod_ae 3 Inthe Attach label to variable up to 80 characters box type Calories produced per adult equivalent per day 4 Click on the copy button switch to the do file editor paste the command switch back to the dialog box and click on OK 5 Adda comment to explain what you have done The Stata commands are generate double cprod_ae cprod_tt ae 365 label variable cprod_ae Calories produced per adult equivalent per day Computing quartiles Before we can produce the table we want we have to l create one more variable denoting which calorie The xtile command using if production quartile each household falls into within each district The Stata command to use is called xtile This command is not available through the menus To look at the structure of the command we can use the Help menu 1 Click on Help Stata command 2 Inthe Command box type xtile and click on OK Under the Descriptio
112. les Descriptives Data Transformations The replace command 6 8 10 11 12 13 14 15 16 17 18 19 In the If expression box type in ca3 gt 0 amp ca3 lt 10 Note you must use the ampersand symbol amp not the word and Click on the copy button switch to the do file editor paste and switch back to the dialog box and click on Ok The Stata command is generate byte age_gp 1 if ca3 gt 0 amp ca3 lt 10 Stata will indicate in the Results window how many missing cases were generated 949 missing values generated That means that from the total of 1524 cases 949 were not assigned a value Now that the new variable has been created another command is used to assign the codes for the cases with no values for this new variable The command is the Replace command From the Data menu select Create or change variables Change contents of variable The Replace Replace contents of existing variable dialog box opens In the Variable box select the name of the variable that was just created age_gp Type 2 in the New Contents box Click on the If In tab In the If expression box type in ca3 gt 10 amp ca3 lt 19 Click on the copy button switch and paste in the do file editor switch back and click on Submit The dialog box remains open and the command is run The Results window indicates how many changes were made 271 real changes made Now make the chan
113. mands you ve blocked but in the Results window you do not see the commands or any output from analysis The second icon Do Selected Lines runs the commands yov ve blocked and in the Results window you can see which commands were run as well as the output from any analysis Your SESSION2 DO should look similar to lines below your documentation comments may not match exactly what has been included in this listing Comments start with at the beginning of each comment and ending each comment with a You can also just use an if the command is one line open log file capture log close log using log_session2 append STATA do file section 2 Cross sectional Stata Tutorial Purpose Calculate food production in calories per adult equivalent per day M Beaver January 2009 Tasks 1 Compute total kgs produced compute value of production in calories for specific food crops and aggregate to the household level to obtain total food calories produced 2 Compute adult equivalents and aggregate to the household level 3 Merge the two files and calculate food production in calories per adult equivalent per day 4 Produce a table showing average food production in calories per adult equivalent in quartiles for each district Stata recommends you include the version that the do file was written in version 11 clear all macro drop _all modify next command to match the directory you ar
114. mbination both our product variable and our unit variable are the key variables The CONVER DTA file is already sorted by prod and unit We must sort the current working file that is in memory the same way while taking account of the fact that the unit variable is named p1a and not unit To sort the cases 1 From the Data menu select Sort Ascending data The Sort Sort data dialog box will open 2 Inthe Variables box select prod and pla 3 Click on the copy icon and then click on OK 4 Switch to the do file editor and paste the command The Stata command is sort prod pla Let s look at the two variables using the tabl command We can type in the Command window tab1 prod pla There are 1 693 cases We have many products For the tabulation of pla we see 2 values that have no labels 0 and 1 and note that there are only 1670 cases that contain a value for pla There are possible data problems We would expect to see a value in pla for every crop that was harvested How would you determine if there are missing data in the pla variable If it were possible corrections should be made before proceeding further We cannot merge the two files unless the variables that 57 Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation The joinby command l 1 we want to merge by have the same names We will rename pla to unit From the Data menu select Data utilities
115. me cee 2 allows you to enter notes about variable varname notes 3 calls up all notes in memory Notes are saved in the dataset label define lIblname 1 assigns labels to integers and stores these in the value label1 label2 label bIname 2 label values varnamel 1biname 2 associates the value label Iblname to the variable varname1 e g label define gender 1 female 2 male label values sexhead gender label list lists all value labels modifies the value of a variable using rules specified set memory changes the amount of memory allocated to the data area Stata suggests setting the memory to at least one and half times the size of the file you want to load in the memory of the computer when used with if it counts the number of observations that meet the specified condition otherwise it counts the number of observations in the dataset collapse converts the data file in memory into another data set of means medians etc merge varlist using filename merge joins corresponding observations from the dataset currently in memory called the master dataset with those from the Stata format dataset stored as filename called the using dataset into single observations performs a match merge on varlist when these are specified the variable merge which gives information on the results of the merge command is added to the file _merge 1 obs from master data _merge 2 obs from using data
116. mmand switch back to the dialog box and click on Ok 71 Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation Step 3 Merge the two files created in steps 1 amp 2 to compute calories produced per adult equivalent The merge command 5 Run the describe command again To verify that this variable was created summarize the variable ae 1 Statistics then Summaries tables and tests then Summary and Descriptive Statistics then Summary Statistics 2 Variable is ae 3 Don t forget to copy the command into the do file editor then click on the Ok button You should find that the average adult equivalent over all households is 3 49 The Stata commands are label variable ae Adult equivalents per household summarize ae This completes step 2 Save this file as HH FILE2 DTA Click on File Save As Filename is hh file2 Click on the Save button Copy the command from the Results window and paste it into the do file editor delete the reference to the directory and add a comment to explain what you have done EPD IT The Stata command is save hh file2 dta If you run the syntax again and try to save the hh file2 dta you will get an error message To save to a file that already exists on the hard disk an additional subcommand must be added replace save hh file2 dta replace We have created two files hh file1 dta which contain
117. mmands that you pasted either from the Command window or from the Review window or from dialog boxes Note Whenever you do any substantial amount of work you should always copy the commands to a do file and save the file so that you have documentation on what analysis you have done and so you can repeat the analysis without building all the commands again Documenting the do file with comments can save you much time trying to remember what analysis you did and why Let s see how you would retrieve the do file you just created To exit Stata 1 Click on File then Exit Stata will prompt you if you have not saved the data file and will give you an opportunity to return to the program to save the data file If you do not want to save your data file click on yes to exit Start Stata again To open our do file 1 Click on Window then Do file editor then New Do file or press lt CTRL 8 gt or you can click on the Do file editor icon on the tool bar 81 Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation The Do file editor window will open 2 Click on the yellow file folder tool and select the file session2 do 3 Click on Open You can then re execute these same commands or edit them as you wish There are 2 icons on the tool bar and P You can block lines in the do file and click on either of these icons The first icon Run Selected LInes P runs the com
118. n 9 Stata 11 Sample Session Adding comments to document commands The doedit command Discussion of the Windows used in STATA Section 0 File structure and Basic Operations for Stata 11 E provided in the dialog box where commands are built and then switch to the Do file Editor to paste the command d You can copy commands from the Results windows into the Do file Editor using lt Ctrl C gt to copy what you have blocked in the Results window and then switching to the Do file Editor and pressing lt Ctrl V gt to paste the command that was copied from the Results window e You can select the command from the Review window which places it back into the Command window where you can block the command press lt Ctrl X gt cuts the command from the Command window switch to the Do file Editor and press lt Ctrl V gt to paste the command Option c may become your preferred method to build the do file Option e is also useful Comments can be placed in the do file as you copy and paste commands Comments in a do file can start with an asterisk if the comment is one line If the comment covers several lines use before the comment and end with so that STATA will not think the comments are commands You can also use a double slash This option is useful if you want to add a comment after a command Example of the various styles of comments are your name here and the date the file was created do file
119. n heading the definition of xtile is that it is a command that categorizes a variable into the specified quantiles and places the information into a new variable Examples can be found under the Examples heading Since we want to divide the data into quartiles within each district we can use the if subcommand e g quart is the new variable that is created cprod_ae is the variable used to rank the data district is the controlling variable nq 4 is short for nquantiles number which specifies the number of quantiles to use 1 Click on Statistics then Summaries tables and tests then Summary and Descriptive Statistics then Create variable of quantiles 75 Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation The for z in num 1 3 looping command The foreach looping command The levelsof command 2 The dialog box opens In the New Variable box type quart In the Expression box type cprod_ae In the Options section select 4 quintiles 5 Click on the If In tab and in the if expression type district 1 RIA This command would have to be repeated for the other two districts so that 3 variables are created where the observations for that district are divided equally into 4 groups Using the if expression works where you have only a few codes within the variable We have 3 districts so it would not be a problem to use the if expression
120. n value of ae for all individuals is 79 with a standard deviation of only 17 We will assume that the nine individuals with missing age or sex 69 Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation Calculate the adult equivalents for the household The collapse command codes are all average individuals and assign them the adult equivalent value of 79 Warning be very cautious about filling in missing data this way Careless use of this technique can give you misleading results We are using this example to illustrate the use of Stata commands and not recommending that you do this routinely to compensate for missing data We will use the Replace command to change the system missing values in the ae variable to 79 1 Data then Create or change variables then Change contents of variable The replace Replace into same variable dialog box will appear 2 Under the Main tab select ae in the Variable box In the New Contents box type 79 4 Under the if in tab in the Restrict to observations if box type ae The period represents system missing 5 Don t forget to copy the command into the do file editor then click on the Ok button 6 Check the results of your replace command by rerunning the tabulate command 0S You should see 9 cases in the frequency with a value of 79 The Stata commands are replace ae 79 if ae tabulate
121. nd click on Submit 7 Write a comment in the do file to explain what the command does The Stata command is table ca2 age_gp contents freq row col Note the word contents can be abbreviated to c e g c freq The results are relation to Age group head 0 to 10 TI 29 20 to 60 61 and older Total head 6 296 41 343 wife husband 25 280 5 310 son daughter 503 184 31 718 mother father 5 1 6 other relative 70 55 16 2 143 Total 573 270 628 49 1 520 Comparison of the commands summarize tabulate and table We want to put something in place of the blanks for the cells with no data 1 Go back to the dialog box and under the Options tab tick Show missing statistics with period 2 Click on the Main tab and tick Superrow variables Using the drop down arrow select district The table now shows a period full stop where there are no data and there are subtables for each district The following is a comparison of computing averages using summarize tabulate and table based on an example from section 2 90 Stata 11 Sample Session Section 3 Tables and other Types of Analysis AB WN l Next 1 Click on File then Open Select hh file3 dta Click on Open Copy the command and paste it into the do file editor First we will use the Summarize command From the Statistics menu select Summaries tables amp tests then Summary and descriptive statistics then Summary
122. o copy the commands from the Results window and paste them into the Do file editor Another method is to copy commands from the Command window and paste them into the Do file editor A data file must be loaded into memory before any analysis can be done Stata SE uses 10 megabytes of memory for data Intercooled Stata uses 1 megabyte of memory and Small Stata uses 300 kilobytes of memory for data You cannot change the amount of memory used for Small Stata For the other versions the amount of memory can be temporarily changed or permanently changed The command to change the memory is set memory amount of memory example set memory To check to see how much memory is being used and how much is remaining use the following command memory Before loading a file into memory the result of this command in Intercooled Stata is Stata 11 Sample Session Section 0 File structure and Basic Operations for Stata 11 Details of set memory usage overhead pointers data data overhead free Total allocated 0 00 0 00 0 00 1 048 568 100 00 100 00 Other memory usage system overhead set matsize usage programs saved results etc Grand total use c qla dta clear memory Details of set memory usage overhead pointers data data overhead free Total allocated 67 056 T3192 1 048 568 Other memory usage system overhead set matsize usage programs saved
123. o missing cases You can see we have nine missing cases This tells us that our data file is missing either the age or the sex for nine people This problem should have been identified during the cleaning process At this point it would be ideal for the researcher to go back to the original questionnaires to determine the reason why these data are missing Since we can t do this we will use an alternative method If we leave these values missing the total adult equivalents of those households will appear to be slightly smaller which may distort the results We could avoid this problem by eliminating the households with missing information from our analysis but then we can t use the information about the food production from those households Instead we will try to make a reasonable assumption about those nine missing members We know that the adult equivalent values range from a low of 6 for children to a high of 1 0 for adult males which is not a very wide range We can determine the mean adult equivalent value for the whole sample and use that value to fill in the missing data To find out the average adult equivalent value for our sample 1 Statistics then Summaries tables and tests then Summary and Descriptive Statistics then Summary Statistics 2 Select the variable ae 3 Don t forget to copy the command into the do file editor then click on the Ok button The Stata command is summarize ae We can see that the mea
124. oes not always display a frequency table for a categorical variable After examining the variables we will begin to examine the data by running descriptive statistics e g counts averages maximum minimum and standard deviations for all variables This type of analysis helps you to find data entry errors It also gives you a feel for what kind of data are in the file to see that missing values have been defined correctly etc It may be tempting to skip this step for some data sets or for some variables but this is an important step that will almost always save time later and improve analysis For example finding out the average age of all respondents may not be something you are interested in knowing but if the average age turns out to be 91 3 years you would be alerted that that something is probably wrong with the data Basic descriptive statistics can be obtained from two commands Summarize and Tabulate Summarize is used for continuous variables while Tabulate is used for categorical variables There are three types of variables 1 A continuous variable is a variable that does not have a fixed number of values It measures something e g age weight population The variable ca3 age is a continuous variable because age can take on many different values 2 A categorical variable is a variable that has a limited number of values that form categories or groups e g geographic location relation to head For example
125. of the new variable 45 Stata 11 Sample Session Section I Basic functions Files Descriptives Data Transformations Variation on the second method The recode function age _gp1 9 We can also specify a name for the value labels Click in the box next to Specify a name for the value label defined by the transformation rules 10 In the box type age_label 11 Click on the copy button switch to the do file editor paste the command switch back and click on Ok The Stata command is recode ca3 0 10 1 0 to 10 Ill 10 001 19 2 11 to 19 19 001 60 3 20 to 60 60 001 max 4 61 and older generate age_gp1 label age_label Note to continue a command on another line the end of the line should have to tell Stata that the command continues to the next line There is a limit to the line length that Stata will read so if the command is long you will need to place the last part of the command on a new line and use the continuation symbols Let s add a variable label to the new variable The Stata command is label variable age_gp1 Age group second method Now compare the age_gp variable with the age_gp1 variable Use a cross tabulation tabulate2 command The counts should be identical The same results can be achieved by using one command the recode function in conjunction with the Generate command The recode function takes three or more arguments
126. ommand will clear the memory clear all 3 There are several set commands that are useful to put at the beginning of the do file as well set memory 70000 sets the size of memory set matsize 100 limits number of variables that can be specified in an estimation command 116 Stata 11 Sample Session Annex I I Survey Instrument ANNEX II Questionnaire Socio Economic Survey of Family Sector Farms in the Province of Nampula Angoche Monapo e Ribate July August 1991 Departamento de Pre os e Mercados Food Security Project Name of Household Head Household Number HH Aldeia VIL Distrito DIST Subset of questions from original questionnaire Filename c hh dta I HOUSEHOLD CHARACTER STICS H1 1 How many persons are in this household H4 4 Has your family always lived in this village l yes 2 no H8 8 Is your family registered as deslocada l yes 2 no H19 19 Do you presently have lands in fallow l yes 2 no H21 21 What is the total area of these fallowed parcels hectares H24 24 Do you have lands that you have completely abandoned l yes gt question 25 2 no gt question 27 H25 25 What is the total area of these abandoned lands hectares H26 26 What was the principal motive for abandoning these lands 1 no security 2 lands lost fertility 3 lack of labor 4 insect attacks 5 other We would like to ask you about the food crops you grow H29 29 Over the last five years have you
127. on 2 Restructuring Data Files Table Lookup amp Aggregation Step 1 Generate a household In executing this step we must keep three things firmly in level file containing the mind number of calories produced per household First all production is currently measured in non standard units Each unit can have a different weight for each of the products Thus we must first convert all production into kilograms Second we want to know many calories are produced by each household not kilograms Thus after converting all production to kilograms we must convert kilograms to calories Third an examination of the file shows that we have data for each product produced by the household But we want to know the total calories produced by the household for specific food products not the total calories from each separate product After we convert all production to calories we must sum the calories within each household to arrive at the household total Let s begin by creating a new do file Open the Do File Editor Start by including comments about the purpose of the do file your name as the creator of the do file and the date Other items to include are the Stata version the set memory command if you have not changed the startup memory the cd command to switch to the directory where you want to work the log command to record the session Example version 11 set memory 30m cd C Documents and Settings aec_user My
128. on Save The do file is now saved to disk We need to run all the commands from the do file to start the log and record the commands in the log file Block all b the commands and click on the Do button _ We have opened the household member data file which is now the current file in memory Stata only allows one file to be open at a time A key piece of information we need to know about a data file is what variables it contains We can find this out along with other information by using the Describe data command on the Data menu 1 From the Data menu select Describe data 2 There are several choices under this option Select Describe data in memory A dialog box opens There are several options in this dialog box 21 Stata 11 Sample Session Section l Basic functions Files Descriptives Data Transformations E describe Describe data in memory Variables leave empty for all variables Examples yr all variables starting with yr xyz abc all variables between xyz and abc Options C Display only variable names C Display only general information C Display additional details C Do not abbreviate variable names C Display variable number along with name C Replace data in memory with describe report At the bottom of the dialog box there are three icons on the left The first a question mark opens the Help screen to explain the options in the dialog box The second an
129. ormat is the third column which describes how the data are to be displayed Stata will make an assumption with new variables so it is not always necessary to specify the format Format information always begins with a percent sign to indicate the start of the format information Refer to the PDF documentation Section 12 5 for more details In this example the 9 describes the width of the variable After the decimal the 0 indicates no fixed number of decimals will be displayed If you wished to see only 2 decimals the example would be 9 2g The letter following indicates what type of format e scientific notation e g 1 00e 03 f fixed format e g 1000 03 g general format c optional along with either e f or g will display a comma e g 1 000 03 Variable label Label describing the variable Value label If the variable has value labels the name of the label appears in this column Stata assigns a name to the label which contains the values and labels The label is then applied to the variable More will be said about value labels later There are several ways to view the labels and values for variables If you wish to see what labels have been defined for specific values for the variables that have value labels as indicated above you can run the command to create a codebook of the labels From the menus 1 From the Data menu select Data utilities Label utilities 2 Select Produce codebook of value labels
130. otal calories produced per product for each household The final output table asks only for information about the staple food crops These are defined as peanuts prod 5 rice prod 6 nhemba bean prod 30 manteiga bean prod 31 manioc prod 41 sorghum prod 44 maize prod 47 We can find the product code by looking at prod in the questionnaire Since we are only interested in those products we need to exclude the rest of the cases about other crops Stata uses the keep command Once you run this command you will no longer have the complete data set available You must remember that you should never save a file to the same name after you have selected out a set of data You will overwrite the original data and no longer have the complete set To select just a subset of cases 1 Click on Data then Create or change variables then Keep or drop observations You should see the drop keep or drop observations dialog box 2 Under the Main tab select the round button next to Keep Observations 3 In the Observations to keep if box type prod 5 prod prod 30 prod 31 prod 41 prod 44 prod 47 63 Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation The is a symbol for the word OR We are telling Stata to select all cases with prod equal to 47 or prod equa 30 or prod equal 31 and soon 4 Click on the copy button switch to th
131. oximately unbiased for whatever is being estimated for the whole population i e one observation represents many elements in the population from which the sample is drawn Clustering by districts or villages is used in almost all survey sampling rather than selecting an independent sample Further 108 Stata 11 Sample Session Section 4 Tables and Graphs Survey estimation sub sampling may occur within a district or a village as well Units at the first level of sampling are called the primary sampling unit or PSU or cluster To summarize weights are used to obtain the correct point estimates Clustering and stratification are used to get the correct standard errors The svy commands also calculate the design effects of deff and deft Deff is equal to the design based variance estimate divided by an estimate of the variance that would have been obtained if the survey was carried out using simple random sampling Deft is approximately equal to the square root of deff Further explanation of these two terms can be found in the Survey Data manual under the command Svymean We will use a data set from Zambia from the Post harvest survey of the 2001 2002 agricultural season where the area planted for specific types of crops is tested 1 Click on File then Open 2 Select Zambia_PHS0102_crop_area dta and click on Open 3 Paste the command into the do file editor and delete the reference to the directory Use
132. ping in the command window you can pick the variables from the Variable window and the names will be pasted into the Command window Note that to list a subset of observations Stata uses the key word in e g in 1 10 The key word IN restricts the list to a range of observations Examples are 34 Stata 11 Sample Session Section I Basic functions Files Descriptives Data Transformations list in 1 lists first observation list in 1 lists last observation list in 2 4 lists observations 2 through 4 list in 3 2 lists 2 observations starting with the 3 from the last observation To limit the listing to a specific criterion use the if key word Examples are list district vil hh mem ca3 if ca3 gt 70 list district vil hh mem ca2 ca3 if ca3 lt 15 amp ca2 lt 3 If the variables you want to list are in the order in the file that you want to see rather than list each of the names you can type the first variable then a dash then the last variable in the list e g list district ca3 if ca3 lt 15 amp ca2 lt 3 If we want to see the observations with the five lowest values and five highest values we would first sort by that variable and list the first five cases and the last five cases For example if the question is What is the age of the 5 youngest head of households and what is the age of the 5 oldest head of households Stata commands sort ca2 ca3 list district vil hh mem cal c
133. r append one file to the end of another file An example would be that you entered data for harvest in one file for one district and entered data for harvest for another district into another file We want the data to be in the same file To do that we would use the append command Merge datasets Merge combines datasets horizontally matching corresponding observations An example is a survey asking questions about the household in Part 1 and another set of questions about the household in Part 3 Each part of the survey is entered into a different data file To combine Part and Part 3 where both sets of data are at the household level we would use the merge command Joinby datasets This type of merge combines datasets horizontally matching all pairwise combinations possible An example is a set of data on parents and a set of data on children Joinby would match the parents to every observation of the children within that family 54 Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation steps 1 The key word unmatched is used and within parentheses the type of join is specified There are four types of joins none all unmatched observations are ignored this is the default i e if there is not a matching observation in both files the observation is dropped from the final dataset both unmatched observations from the master or file that is in memor
134. riable label oe district district vil village household member number does this person work relation to head age sex level of schooling marital status where entered district Float vil Float hh Float mem Float cal Float ca2 Float ca3 Float ca4 Float cad Float ca6 Float univ Float AP o NO ole AP o oO ole oe OOOO OO WO WO WO OO WO oe Sorted by An explanation of each of the columns follows Data storage types Storage type Stata has 6 storage types Float real numbers 8 5 digits of precision width of 8 with 5 decimals default unless another type is specified Double real numbers 16 5 digits of precision width of 16 with 5 decimals byte integer between 127 and 100 int integer between 32 767 and 32 740 long integer between 2 147 483 647 and 2 147 483 620 strX string The X is replaced by the maximum number of characters allowed for the variable For Intercooled Stata maximum size a string variable can be is 244 e g str244 Since Stata stores the data from the file in memory when you define a variable you want to define it with appropriate storage 23 Stata 11 Sample Session Section I Basic functions Files Descriptives Data Transformations Display format Labels Documenting variables and labels The labelbook command more type to maximize the amount of data that be opened in the program Display format The display f
135. s windows metafile wmf Click on the drop down arrow next to the box labeled as Save as Type to see the different formats Word processors can import a graph with the extension wmf or tif into a graphic box Once the graph window has been closed you cannot reopen it unless you have saved the graph to a file You can rerun the 106 Stata 11 Sample Session Section 4 Tables and Graphs Survey estimation Scatter plot using by subcommand Overlaid graphs command that created the graph to see the graph again You cannot have more than one graph window open at a time Let s look at another graph We will use the file created in the last session hh file4 dta We can plot adult equivalents per household with total calories produced 1 Click on File then Open 2 Select hh file4 dta and click on Open Copy the command to open the file to the do file editor 3 Select Graphics Two way graphs scatterplot lines etc 4 The dialog box opens for the twoway graph Click on the Create button to define the graph The default type of plot is scatter You could pick Line Connected Area Bar Spike or Dropline 5 For the Y variable select ae from the dropdown arrow for the X variable select cprod_tt from the dropdown arrow 6 Click on the Accept button You now see that Plot 1 has been defined and is highlighted 7 Click on the copy button switch to the do file editor paste the command sw
136. s see appendix Section V Agricultural Sales question 64 have you increased the quantities sold over the last five years All of the variable names associated with this question begin with H64 Open the file Select File then Open Select c hh dta Click on Open Copy the command and paste it in the do file editor Delete the directory reference PEs In this survey 1 yes and 2 no Questions you might ask are A How many respondents increased sales quantities of maize 94 Stata 11 Sample Session Section 3 Tables and other Types of Analysis The count command The recode command The egen command To answer this question you can count the number of times the value of 1 appears in the variable associated with maize To count the number of times a value appears in the variable The command is count count if h64a 1 In the Results window you see the value of 86 You could also run a frequencies tabulate h64a The tabulate shows that 147 did not increase sales of maize as well as 86 households who did Now we change the question B How many crops increased in sales within the household For this question we can sum the number of 1 s in the variables associated with this question using the egen command We need to recode the value of 2 to 0 Recode 1 Select Create or change variables from the Data menu 2 Select Other variable transformation commands then Recode categorical varia
137. s the calorie production data for all households and hh file2 dta which contains the adult equivalent data for all households We need to combine these files case by case matching by district village and household to get both sets of data into one file To do this we use Combine datasets Merge two datasets under the Data menu choice 72 Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation We noted earlier that key variables are required for any merge When you re joining two files which are at the same data level as we re about to do it may not seem important to include key variables but it is The key variables determine which observations are to be combined Note You should never use Combine datasets without Key Variables because without them you have no guarantee that the program will combine the cases in the manner that you wish The command will execute without any warnings or error messages but the results may be incorrect At this point if you have not closed Stata hh file2 dta is still the working file A very important point Stata cannot merge two datasets unless they are both sorted in the order of the key variables One way to check to see if Stata knows the file is sorted is to use the Describe command In the Results window you can see at the end of the list of variables the words sorted by and the list of variables that the file is sorted by
138. sack of rough rice is 39 44 for a 50 kg sack of cotton it is 17 5 while a 50 kg sack of peanuts is 41 67 The variables in this file are a prod product crop code b unit unit of measure C conver conversion factor equal to the number of actual kilograms for the combination of prod and unit Below a sample of data from CONVER DTA shows that rice prod 7 measured in a 20 liter can unit 8 weighs 19 kg rice prod 7 measured in a 50 kg bag unit 24 weighs 53 kg beans prod 30 measured in a 20 liter can weighs 17 kg beans prod 30 measured in a 50 kg bag weighs 47 kg 53 Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation 4 l 7 30 30 prod unit conver Product unit conversion factor calories dta This also is a table lookup file created for convert kilograms of food into calories of food It contains two variables a prod the product crop b calories number of calories per kilogram of each of the crops To create a data files that will produce the output table described above we need to combine the data from different files There are different methods that can be used to combine files depending on what is desired In Stata we can Append datasets Appending data sets means that the data in different files have the same variables and the desire is to add one data set of observations to the end of another data set o
139. scriptives Data Transformations values place a next to the box Display numeric codes rather than labels values 5 Click on the copy button and then click on Ok to run the command 6 Inthe Results window you see a list of the observations If the information for each observation is wrapping to the next line you can resize the Results windows so that it is wider Place your mouse pointer on the right border of the window and when you see a double arrow click the Left Mouse Button hold it and drag the right side out to make the window wider If you see the More at the bottom of the Results window there are several methods you can use to continue Press lt Enter gt Press any key Click on the More button on the tool bar Click on the more at the bottom of the Results window If you wish to interrupt a Stata command you can click on the Break x button on the Tool bar or press lt Ctrl Break gt or type q the letter q for quit in the Command window To rerun the command you just ran click on the last command in the Review windows You see the command is now in the Command window Press lt Enter gt to run the command Copy the command to the Do file editor and add comments to explain what you have done The Stata command should look like list district vil hh mem ca1 ca2 ca3 ca4 ca5 ca6 in 1 10 clean If you wish to you can type the list command in the Command window If you are ty
140. sleat es Se RI AG es Bates Sha A east So eee Hate ake The label variable command The label define commands ensinaram esettesshee bis a a a aaa The label valuescommandesim iain occas sheet it rn ade necisenateansiernanes EEE sei hae tet ate aden eee Stata 11 Sample Session Section 0 File structure and Basic Operations for Stata 11 Thevrecode fUN CU ONS rec eeihe ied ade iid cid piace eee cerca aden EON E SECTION 2 Restructuring Data Files Table Lookup amp Aggregation Restructuring Data Filesis sie csceraeitenae a asned n a ade bcitexteristt Devan ths eld dais Aa Ghee eben tee aa Step 1 Generate a household level file containing the number of calories produced per household 56 Rename any key variables in both files to the Same name oo cece ccc eseseseeseeeeseeeeneseceeseceeseeeeneneeteseeteneeeenenees Thejoinby COMMA inene octane eluents tenes ENE AA A asec aeTaaNe sea entered nce Hage dees cttabe Compute total kilograms produced The generate command cee The drop COMMANG ce ccecceescseeteeeees Calculate the total calories produced Select only staple food products Th keep if commMand s snacsee eee ei eee Miia ee eon aa nde ei Create a new file which is a household level file rather than a household product level file Mee COllapSe COMMA sooo oie eects sesh aloes E Mots ae Suns vers cca tedhas A E A sameew eS ose Step 2 Generate a household level file containing the num
141. son daughter mother father other relative How many cases are included Only 1 504 cases are included in this table What about cases with missing values 1 Return to the dialog box and place a Y mark in the box next to Treat missing values like other values 2 Click on the copy button switch to the Do File Editor and paste the command Write a comment 38 Stata 11 Sample Session Section I Basic functions Files Descriptives Data Transformations Summary statistics on a continuous variable for each value in a categorical variable The by sort summarize command and then switch back to the dialog box to click on the Ok button The command will be executed tabulate ca2 ca4 column miss row Now we see the missing data included We wanted counts row percentages and column percentages Row percentages sum to 100 across all the cells in a row while column percentages sum to 100 down the cells ina column The table produced by this command tells you that there are 21 female heads of household and that 6 12 of the total heads of households are female row percent Of those who are female 2 87 are head of household column percent For this analysis the same command is used as for general summary statistics with a slight modification This command will show how the mean and other statistics for a continuous variable differ by the values of one or more categorical variables
142. statistics The summarize Summary statistics dialog box opens Select cprod_ae in the variables box Be sure that under Options in this tab Standard Display has been selected Click on the by if in tab Click in the box Repeat command for groups defined by In the box below this option select district quart Click on the copy button switch to the do file editor paste the command switch back to the dialog box and click on Ok For each combination of district and quart we see the summary statistics This output is difficult to read we will use the tabulate command From the menus click on Statistics then Summaries tables amp tests then Tables then One two way table of summary statistics The tabsum One two way table of summary statistics dialog box opens Inthe Variable 1 box select district Inthe Variable 2 optional box select quart In the Summarize Variable box select cprod_ae For output we are only interested in the mean so tick the boxes next to Y Suppress standard deviation Y Suppress frequencies Y Suppress number of observations Click on the copy button switch to the do file editor paste the command switch back to the dialog box and click on Ok In the Results window we see Means of Calories per adult equivalent per day Calorie production quartile district i 2 3 4 Total monapo 1248 7023 2539 3641 3997 4884 9150 0217 4206 5071 ribau
143. stimation 3 We can specific a variable or just run the command to look at the complete dataset If we were interested to know which strata have only one sampling unit we could put a tick next the box labeled Display only the strata with a single sampling unit 4 Click on the copy button switch to the do file editor paste the command switch back to the dialog box and click on Ok Once the survey design has been specified and the file saved it is not longer necessary to specify it again The specification is saved with the data file We can use the svytotal command to look at the total estimates 1 Click on Statistics Survey data analysis 2 Then click on Means proportions ratios totals then Totals 3 Inthe Variables box select maisea ricea milleta sunfa 4 Click on the copy button switch to the do file editor paste the command switch back to the dialog box and click on Submit svy linearized total maizea ricea milleta sunfa running total on estimation sample Survey Total estimation Number of strata 69 Number of obs 6601 Number of PSUs 394 Population size 807414 Design df 325 Linearized Total Std Err 95 Conf Interval maizea 649230 9 25105 89 599840 3 698621 5 ricea 14472 95 2360 009 9830 125 19115 77 milleta 61770 91 7346 125 47318 95 76222 87 sunfa 24319 15 3418 858 1759326 31045 04 111 Stata 11 Sample Session Section 4 Tables and Graphs Survey
144. t Other variable transformation commands then select Make dataset of means medians etc The Collapse Make dataset of summary statistics dialog box will appear 2 On the Main tab in the Statistics box for 1 change mean to sum by clicking on the drop down arrow In the Variables box select cprod tt 64 Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation 3 Click on the Options tab and in the Grouping variables box select district vil hh in that order because those variables represent the identify cation of an individual household The Grouping variable s is used to specify the variables to be used for combining cases in the collapsed file Any cases from the original file that have identical values for all 3 of the grouping variables will be combined into a single case in the collapsed file We want the collapsed file to have one case per household so we use the variables that identify a household in our survey district vil and hh 4 Click on the copy button switch to the do file editor paste the command switch back to the dialog box and click on Ok 5 Add a comment to explain what you have done The Stata command is collapse sum cprod_tt by district vil hh In the Variables Window you should see only 4 variables Look at the resulting file click on the Data Browser tool You should see only one case per household The collapse comman
145. ta 11 Sample Session Annex I I Survey Instrument I HOUSEHOLD CHARACTERISTICS Table IA Household Characteristics Filename c qla dta This person Relation to Head Level of Schooling Marital Status works on farm or off 1 head enter the last 1 monogamous farm 2 spouse completed year 2 polygamous 3 child 3 single 4 parent 0 illiterate 4 widowed 5 other kin 12 post high school 5 divorced 6 other 98 no formal 6 emigrant wife schooling but literate husband out longer than six months CA6 Pema CA2 CA3 CA4 gt un 120 Stata 11 Sample Session Annex I I Survey Instrument IV PRODUCTION Product Quantity harvested B cotton S peanuts 6 rice 1 cashew nut 30 beans B 1 manteiga bean 41 dry manioc 47 corn 44 sorghum Table IV Characteristics of Production Filename c q4 dta Quantity Existing stocks Month in Amount to be How long Quantity reserved harvested ina at harvest time which last stored from this will this for seed normal year year s stock year s harvest for year s ran out consumption stocks last Unit enter the i i Qty 1 sack 100 2 sack 50 B kilo 4 liter 5 can 20 121 Stata 11 Sample Session Annex I I Survey Instrument V AGRICULTURAL SALES Table V Sales of Farm Products Filename c q5 dta 3 cotton 5 peanuts 6 rice 21 cashew nut 30 beans 31 manteiga bean 41 dry manioc 47 corn 44 sorghum Quantity sold Units 1 sack
146. the file you opened will be used In the Results window you will see documentation of the Stata commands that were executed while you were in the Data Editor e g replace hh 2 in 1 1 real change made The second method to look at the data is to use the Browse mode You cannot modify the dataset if you use this method This method will prevent you from accidentally modifying the data Click on the browse button In this window you can sort the data and also hide columns if you wish To exit the Browser click on the x in the upper right hand corner of the Browser Stata 11 automatically writes all messages and output to the Results Window from the execution of your commands For example if you run a tabulate command then the frequency table will be written to the Results window If you wish to save the information in the Results window you must remember to turn on a log file See the explanation above on Log files The Command window is used to type commands directly If you use the menus the command is run immediately The command is placed in the Review window If you want to rerun a command that is in the Review window click on the command The command is placed in the Command window To execute the command press lt Enter gt Useful keystrokes within this window lt PageUp gt recalls the last command run and places it in the Command window If you continue to press lt PgUp gt the next command
147. tions The tabl command The histogram command Saving a graph to a file were tabulated Why 1 Go back to the dialog box Tab1 One way Tables 2 There is an option Treat missing values like other values Place a tick mark on this option 3 Copy the command switch to the do file copy the command return to the dialog box 4 Click on Ok Looking at the table for ca4 there are 16 cases that are missing values Stata s default is to not show the missing values The Stata commands are tab1 ca1 ca2 ca4 cad ca6 tab1 cai ca2 ca4 cad ca6 missing Note to produce a tabulation frequency of just one variable you can use the tabulate command However if you want to list several variables in the frequency command you must use the tabl command Below you will see that if you use the tabulate command and list 2 variables a cross tabulation is produced Another useful way to examine a continuous variable is to Graph the variable to view the distribution of the values 1 From the menus select Graphics Histogram 2 Click on the drop down area for the Variables box and select ca3 3 Tick the box VY for Width of bins and type in 5 in the box next to this option The ages will be grouped into 5 year ranges 4 For the Y axis click on the radio button next to Frequency so we will see the number of cases in the age groups 5 Click the copy icon and then click on Ok to run the command The
148. trol language commands and output Extension log ASCII text commands only Extension txt Stata can record a copy of the commands and the output from the commands in a log file If you wish to record this information in a file you must turn on the log There are two types of logs 1 Log One records everything that you submit for execution and all the output resulting from the commands You can specify one of two formats either SMCL or ASCII text log From the Menu Select File then Log then Begin You are prompted for a file name The default extension is SMCL The file is formatted in the Stata markup and control language Type a name for the file and click on OK If you prefer to record the information in ASCII text then you would need to type the file extension of log e g session1 log From the Command window type log using session1 append The above command opens a file to record the session and uses SMCL format This file can only be opened in the Stata Viewer 8 Stata 11 Sample Session The cmdlog using command The log close command 3 Do files Section 0 File structure and Basic Operations for Stata 11 or type log using session1 append text The above command opens a file to record the session and uses ASCII format This file can be opened in any text editor or word processor The other type of log file records only the commands and not the output from the commands The
149. ts of variables dialog box opens In the Variables box select the name of the variable that was just created ae Type 84 in the New Contents box Click on the If In tab In the Restrict to observations if box type in ca4 2 amp ca3 gt 10 amp ca3 lt 19 Click on Submit The dialog box remains open and the command is run In the Restrict to observations if box change the criteria to ca4 2 amp Ca3 gt 20 Click on the Main tab Type 72 in the Contents box Click on Submit The dialog box remains open and the command is run Type 6 in the Contents box Click on the If In tab In the Restrict to observations if box change the criteria to ca3 lt 10 67 Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation 22 Click on Ok The statements we need are detailed in the table below Numeric value If statement 1 ca4 1 amp ca3 gt 10 84 ca4 2 amp ca3 gt 10 amp ca3 lt 19 72 ca4 2 amp ca3 gt 20 6 ca3 lt 10 23 Copy the 4 commands from the Results window and paste them into the do file editor and add a comment to explain what you have done The new variable does not yet have a variable label To assign a variable label 1 Click on Data then Data utilities then Label utilities then Label variable 2 In the Variables box select the name of the variable name ae 3 In the Attach label to variable
150. umenting variables and labels The labelbook command 0065 Generating descriptive statistics Descriptive statististics USING ONE variable eee cs eseseseseseseesesescenesesesescsnesesesesesescessesesesesesueeseseseseeteeneaeseeeaees Descriptive 2005 fic26 isa i nE aim nae toner easter EA ETA E E ia E The summarize COMMANG x cissaslese linkin kisi ate Gee anni A AR Information returned by Stata COMMANAS 0 cece cess eesesessesecseseeeseeeeneeceneseeseserecneeeeneneeeseeeeneeees ABULATE Frequencies cc2 857 0i ni ed ee A N RR Th tabi command zee enren E AAE Ge tet dant sd ee E The histogram comman ess es cscs cds heed ches bod i aaa eh Seek Gin aaa a aoaaa araa aait Saving a graphito a fil s Sinna es chatted es r A a a Ws eh oe eS The list command Descriptive Statistics using two or more variables Two way Tables with Categorical Variables Cross tabulation occ esceseseseseeseeeeseeecneseeeeneeeeneeeenenees he tabulate commands oirne E ea EE eis bande ara EEES Summary statistics on a continuous variable for each value in a categorical variable The by sort summarize command Data Transformations oerien linnen ann r RE nected EEEE anand Mas EGE Converting continuous variables to categorical variables 2 2 eee eceeseseeseseeeeseeeseeeeeesneneenees he generate comManCscccic ccc teeetsewaesitin tia peas eaena aerial a ait iat AES Thereplace command css eh
151. up These windows are Viewer used to view help files and log files SMCL markup and control language files and print log and other files This window is not contained in the STATA 11 program window but stands alone and appears on the task bar as another icon Stata 11 Sample Session How Stata uses memory a The set memory command Section 0 File structure and Basic Operations for Stata 11 Data Editor where you can view the data you have loaded into the program s memory Do file Editor text editor where you can build a do file a file that contains commands that Stata can execute This window is not contained in the STATA 11 window but stands alone and appears on the task bar as another icon You can switch between the windows within Stata by using the Window choice from the Menu Note that shortcuts are also listed e g to switch to the Command window you can press lt Ctrl gt 4 to switch to the Variables window press lt Ctrl gt 6 Version 11 of Stata provides menus to help the user However the user can also type all the commands in the Command window Throughout this tutorial if the action desired can be done using the menus directions will be given on how to use the menus The Stata command that will do the same action will also be given so that you become familiar with the commands Stata provides a mechanism to paste commands into a do file that you can then execute You can als
152. up to be coded until all groups are defined e g recode ca3 10 19 60 100 Stata will use the value as the code assigned to all cases that fall within that group The value of 10 will be assigned to all observations with ages between 0 and 10 the value of 19 will be assigned to all observations that fall between ca3 gt 10 and lt 19 and so on Click on OK to exit the expression builder dialog box Click on the copy button switch to the do file editor paste the command and switch back Click on Ok to run the command The Stata command is generate byte agecat recode ca3 10 19 60 100 Run a tabulate on the new variable agecat and compare the number of cases in each category between the new variable and the age_gp variable The numbers will be the same You would need to add a variable label and create a value label with labels associated with the values to complete the variable 47 Stata 11 Sample Session Section I Basic functions Files Descriptives Data Transformations These new variables are not yet part of the data file stored on disk We must save the data file for these variables to be included permanently in the data file It is a good practice to save the file under a different name in case we want to go back to a previous version of a file For this reason we will use the Save As command from the File menu The new file name will be Q1A AGE DTA 1 From the File menu select Save AS The cursor should
153. up to 80 characters box type Adult equivalents 4 Click on the Ok button 5 Click on the copy button switch to the do file editor paste the command switch back to the dialog box and click on the Ok button Add a comment to explain what you have done in the do file The Stata commands are generate byte ae 1 if ca4 1 amp ca3 gt 10 replace ae 84 if ca4 2 amp ca3 gt 10 amp ca3 lt 19 replace ae 72 if ca4 2 amp ca3 gt 20 replace ae 6 if ca3 lt 10 label variable ae Adult equivalents To verify that the new adult equivalent variable ae has been calculated display a frequency table for it 1 From the menus click on Statistics then Summaries tables amp tests then Tables then One way tables The tabulate1 One way Tables dialog box opens 2 Select the variable name ae in the Categorical Variables box which is found under the tab labeled Main 3 Check the box Y next to Treat missing values like other values 68 Stata 11 Sample Session Section 2 Restructuring Data Files Table Lookup amp Aggregation Replace missing values with a mean value 4 Click on the copy button switch to the do file editor paste the command switch back to the dialog box and click on the Ok button The Stata command is tabulate ae missing You should see there are 1524 total cases Ideally there should be four values represented in the table 1 72 84 and 60 and n
154. vince value as part of the district code Close the browser and use the gen command to create the variable cluster1 To be able to use the survey commands we must first define the stratified random sampling method that was used to account for weighting clustering and stratification We will use the svyset command to specify the method 1 Click on Statistics then Survey data analysis Then click on Setup amp utilities then Declare survey design for dataset In the Primary sampling unit box select cluster In the Strata box select dist Click on the Weights tab Click on the radio button next to Sampling Weight Variable 7 Click on the drop down arrow for the Sampling weight variable box and select hhwgt 8 Click on the copy button switch to the do file editor paste the command switch back to the dialog box and click on Ok N DNAD The Stata command is svyset cluster1 pweight hhwgt strata dist vce linearized singleunit missing After running the command we see a summary of the command in the Results window pweight hhwgt VCE linearized Single unit missing Strata 1 dist SU 1 cluster1 FPC 1 lt zero gt We can use the syvdesc command to look at the strata and PSU arrangement of the dataset 1 Click on Statistics then Survey data analysis 2 Then click on Setup amp utilities then Describe survey data 110 Stata 11 Sample Session Section 4 Tables and Graphs Survey e
155. y and using file that is not in memory data are included master unmatched observations from the master data are included but not unmatched observations from the using file using unmatched observations from the using data are included but not unmatched observations from the master file Cross datasets In this type of merge the first observation in the first file is joined horizontally with every observation in the second data set The second observation in the first file is then joined with every observation in the second data set and so on This type of file combination is rarely used In this tutorial we will use the merge and the joinby commands With this information in hand we can now think about the specific steps we must take to create the file we need to produce the output we want Logically there are three We need to know how many calories each household produced for the year We can generate a file with this information using data we have stored in three files the production file c q4 dta and two table lookup files conver dta and calories dta We need to know how many adult equivalents are in each household We can generate a file with this information using data from the member file c qia dta We need to combine the results from steps 1 and 2 into one file so we can compute calories produced per adult equivalent per day 55 Stata 11 Sample Session Secti

STATA 11 for Windows SAMPLE SESSION

Contents

Download Pdf Manuals

Related Search

Related Contents