Home

Student Guide to SPSS

1. Dependents together Tm imena ja o e 3t Duration gt J None Height 1 Descriptive F Prediction L v Stem and leaf Accurate Mi Histogram Po E vi Normality plots with tests Spread vs Level with Levene Test None Power estimation Transformed 4 Untransformed I 3 D TEET am el a ELILLLLLLLLLLIL um 16 Guide to SPSS Barnard College Biological Sciences Look at the results in the Output Viewer The Explore tool produces a full set of descriptive statistics by default this is an alternative to the Descriptives tool explained above Note that in the Yes category for accuracy the median value of geyser eruption interval is very close to the mean there is little skew in the data When this is the case it is usually reasonable to assume that the data are normally distributed but here we have tested that assumption directly SPSS calculates two statistics for testing normality Komogorov Smirnov and Shapiro Wilk Tests of Mormal ity P dI TA im C NM Accurate Sratk r tic re Statistic Interval 180 901 150 150 a Lilliefors Significance Correction b There are no valid cases for Interval when Accurate 000 Statistics cannot be computed for this level Kolmogorov Smirnov D test is a test of normality for large samples This test is similar to a chi square test for goodness of fit testing to see if the observed data
2. Paste Cancel OK The results show that the mean pair wise difference of 12 21 is significant p 0 003 Paired Samples Test oig Paired Differences t df 2 tailed 29 Guide to SPSS Barnard College Biological Sciences 95 Confidence Std Interval of the Deviation Difference Lower Upper Pair BEFORE 1 AFTER Comparing two groups Non parametric Two independent groups Mann Whitney U When the assumptions of normality are not met for a two group comparison there are powerful non parametric alternatives For independent unpaired groups which are non normally distributed the appropriate test is called the Mann Whitney U test First open up the Cloud xls example file These data show results of cloud seeding experiments we want to know if releasing silver nitrate into the atmosphere from a plane increases rainfall These data are highly skewed verify this using the Explore procedure In the Variable View code the Treatment values as 0 Unseeded and 1 Seeded Value Labels Value 1 Value Label Seeded Remove C Cancel OK s Then go to Analyze Nonparametric Tests Two Independent Samples Tests Place Treatment as the Grouping Variable and Rainfall as Test Variable 30 Guide to SPSS Barnard College Biological Sciences ABAO Two Independent Samples Tests Test Variable List Rainfall Grouping Variable
3. Remove F amd Fi E E R Fa P Cancel OK _ Cn _ M Go to Analyze Compare Means Independent Samples T Test Place Month in the Grouping Variable and Photosyn in the Test Variable boxes Note that Month is followed by Even though there are only two months of data in this example July and September SPSS requires you to manually enter in the codes for the two groups when running a t test Click Define Groups to do so o Independent Samples T Test eo n d D Test Variable s ate A Species d Photosyn E T leaf RH Options 9 Reset Paste Cancel OK a ai a 2 Guide to SPSS Barnard College Biological Sciences Here type in the names of the months to be used as the two independent groups of data Define Groups Group 1 july Group 2 sept 2 Cancel Continue When examining the results in the Output Viewer note that SPSS in fact runs two tests whenever conducting a t test The first examines whether or not the variance around both means is the same this is the same homogeneity of variance test encountered in the Descriptive Statistics section If the variances are the same we should use a standard t test Equal variances assumed If not we use a corrected test Equal variances not assumed How do you know which one to use Look at the fourth column Sig under Levene s Test for Equality of Variances If this value is greater
4. Simple Matrix Scatter t Scatter n t 3 D Scatter Simple Dot Cancel You will now be asked which continuous variable to put along the x axis and which one to put on the y axis For a regression consider which characteristic did you set as the independent variable this should be on the x axis For a correlation this choice is arbitrary For example in the bird diversity used for the correlation analysis species richness No Species could be the y and species density Total density could be the x variable To have each point labeled by the site it represents place the Site variable in the Label Cases by box e02 7 Simple Scatterplot Elevation Y Axis Profile Area ProfileArea d No Species No Species Height X Axis Half height Halfheight d Total density Totaldensity Latitude l Set Markers by Longitude r b Label Cases by d Al SITE Panel by Rows i Columns lun Template Use chart specifications from Titles Options o Reset Paste Cancel OK I 65 Guide to SPSS Barnard College Biological Sciences For these labels to show up on the chart click the Options button and select Display chart with case labels Otherwise the points will be labeled but only when you select an individual point in the Chart Editor Options Missing Values Exclude cases listwise Exclude ca
5. day12 dayl13 dayl4 day15 P mm n 7 m I va Reset Repeated Measures Within Subjects survival T EF a 1 F day __ __ 2 day __ __ 3 day __f__ 4 day __ 5 day __ __ 6 day C Amarai Between Subjects Factor s e Covariates C Model 9 Contrasts Plots p Post Hoc Save Options Paste 4 Cancel OK It should now look like this 54 Guide to SPSS Barnard College Biological Sciences 8 00 Kepeated Measures dsh Within Subjects survival dayl dap A day2 2 day day4 3 day day5 4 day day6 5 day day7 6 day day8 7 day 3 210040 Hs Between Subjects Factor s dose Covariates Reset Paste Cancel In this example we only have two doses If we had more levels in this factor we would want to examine the differences between each category using the Post Hoc dialog Tukey To create a graph of the results click Plots Move dose or whatever between subjects factor you have into the Seperarte Lines box and survival into the Horizontal Axis box Repeated Measures Profile Plots Factors Horizontal Axis dose H survival Separate Lines Separate Plots k Add Change Remove Plots survival dose e Cancel C3 99 Guide to SPSS Barnard College Biological Sciences Finally choose Options and at least cl
6. ra E E i Reset Paste Cancel j OK n as LV oa The MM Place Percentage Low Weight Births in the Test Variable List and Region Code in the Grouping Variable The following Region Code indicates that SPSS needs your direction about which group values to use in this test Click Define Range and place 1 in the minimum and 4 in the maximum value boxes Several Independent Samples Define Range Range for Grouping Variable m Minimum 1 Maximum 4 F a C NNI us Cancel t a The output of the Kruksal Wallis test first shows the table of ranks which shows that the values in the West region have a much lower mean rank than the others Ranks MEN Region Code N Mean Rank Percentage Low Northeast Weight Births Midwest South West Total This result is highly significant p 0 001 Note that there are k 1 degrees of freedom with k 4 groups there are three degrees of freedom 51 Guide to SPSS Barnard College Biological Sciences Test Statistics a b Percentage Low Weight Births Chi Square 37 918 df 3 Asymp Sig 000 a Kruskal Wallis Test b Grouping Variable Region Code Two Way Friedman Nonparametric alternatives to for two factor analyses of variance have been generally described by some statistical authorities as unsatisfactory particularly since the parametric tests are relatively robust to violations of the assumptions of normality Zar 1999 52 Guide
7. 034 979 5 Guide to SPSS Barnard College Biological Sciences Hotelling s Trace 46 139 11 535 a 12 00 3 00 034 979 Roy s Largest Root 46 139 11 535 a 12 00 3 00 034 979 oe Pies Trace 737 699 a 12 00 3 00 717 737 Wilks Lambda 263 699 a 12 00 3 00 717 737 Hotelling s Trace 2 798 699 a 12 00 3 00 717 737 Roy s Largest Root 2 798 699 a 12 00 3 00 717 737 a Exact statistic b Design Intercept dose Within Subjects Design survival Next comes the results for the repeated measures ANOVA This requires that the covariance matrix of the data have sphericity as explained above These data definitely do not the covariances differ at different points in the experiment Mauchly s Test of Sphericity b Measure day Within Subjects Effect Approx Chi Mauchly s W Square Epsilon a Greenhouse Geisser Huynh Feldt survival 233 195 Tests the null hypothesis that the error covariance matrix of the orthonormalized transformed dependent variables is proportional to an identity matrix a May be used to adjust the degrees of freedom for the averaged tests of significance Corrected tests are displayed in the Tests of Within Subjects Effects table b Design Intercept dose Within Subjects Design survival Therefore when we look below at the within subject effects we will look at them in the following order 1 Examine the results when sphericity
8. Treatmentt Define Groups Test Type v Mann Whitney U Moses extreme reactions Kolmogorov Smirnov Z Wald Wolfowitz runs Options Reset aste Cancel OK Th al i F i F As with the t test you have to manually assign the groups for the test Here they are simply 0 and 1 for unseeded and seeded Two Independent Samples Define Groups Group 1 0 Group 2 1 z Recall that nearly all nonparametric tests are based on ranking the data and then examining how the sums of the ranks differ between groups The first table shows the ranks for these cloud seeding data Ranks la N Mean Rank Rainfall Unseeded Seeded Total The second table is a summary of three different test statistics here focus on the first and last rows the Mann Whitney U statistic and the significance Cloud seeding increases rainfall p 0 013 Test Statistics a ST Rainfal Mann Whitney U 203 000 Wilcoxon W 554 000 Z 2 471 31 Guide to SPSS Barnard College Biological Sciences Asymp Sig 2 tailed 013 a Grouping Variable Treatment Paired groups Wilcoxon Signed Rank Test Similarly for paired research designs there also exist powerful nonparametric tests Return to the fluoride example If you used the Explore tool to assess how these data are distributed you would find that the rates of tooth decay before fluoride treatment are non normally distributed Shap
9. two way design with one repeated measure in the terminology of Portney amp Watkins with two treatment levels applied to different blocks of subjects and many measurements in time for each subject Go to Analyze General Linear Models Repeated Measures The first dialog requires you to define factors Here we need to make a name two new objects the Within Subject Factor Name which you can name by what is actually being assessed at each measure In this case it is the number of termites surviving There are 13 measures in our data set they skipped days 3 and 9 Second you need to type in the Measure Name This should just be the time units for the repeated measures which in this case is day Type in each making sure to click Add then choose Define 53 Guide to SPSS Barnard College Biological Sciences Repeated Measures Define Factor s Within Subject Factor Name survival Number of Levels 13 Add Change i b Remove Measure Name day Add Change Remove a om y O X yp 7C C CP Reset Cancel The next dialog shows all the 13 levels of the survival factor named as day We want to match these up with the 13 columns of measurements we have Select day1 to day15 and click the arrorw to move them into the Within Subjects box Then mo ies into the Between Subjects Factors box dish dose amp dayl day2 4b day4 day5 4b day6 4b day7 day8 day10 dayll
10. In the Properties window which comes up choose Linear for the fit method This will draw a straight line through the data The line will show visually what the regression analysis calculated 70 Guide to SPSS Barnard College Biological Sciences SPSS 13 Chart Editor File Edit View Options Elements Transform Help GP 99 A 4 11 43 3 e AGA Chart Editor eoo Properties O amp Xv a uUEZ cilm II 3 Chart Size Lines Fit Li Variables B w Display Spikes j Fit Method ze Mean of Y 4C O Quadratic 204 m d Linear s Cubic ge Loess Q e 96 of points to fit 50 gt 7 o Kernel Epanechnikov wu S ho Confidence Intervals ri e f None Mean Individual e 96 95 e 4 RS o e a d Help Close Apply Profile Area In the output for this example in the Coefficients box the intercept was calculated to be about 3 and the slope is 0 23 Looking at the line this looks right mentally extend the line out to the left until height is zero to imagine where the intercept is In the Lines tab of the Properties window for this line you can change the color to red make it dashed in the Style drop down window and increase the thickness to make it stand out more Also since you will likely shrink this chart down to display it you may want to increase the font size for the axis labels and the R box An appropriate caption for thi
11. because it shows you the distribution of the frequency of occurrence for the values In addition we want to know the summary statistics like means and standard deviations These are the essential steps in single sample estimation Frequency distributions Histograms bar plots of the data grouped by frequency of observation are excellent first summaries of the data These figures show us immediately what the most frequent values are and also which values are least frequent this is what is meant by a frequency distribution In addition you can get a sense of where the center of the data is the mean and how much variance there is around that center In statistical terms these are called measures of central tendency and dispersion or location and spread Also we can easily see if there are any truly bizarre numbers as sometimes happens when a measurement error is made outliers can then be examined closely to see if they are real values or just mistakes You can produce histograms for any continuous variable A continuous variable is a value like body length or number of individuals which might vary continuously from zero to very large or go negative or be fractional A variable like sex is considered categorical since you would use only one of two categories female or male to describe a given case Note What SPSS calls Scale variables can be either ratio or interval variables in the terminology of Portney amp Watkins 200
12. go to the SPSS Viewer and select File Export The window below will pop up and ask you to choose where to save it Browse Make sure to remember where you save it the default location is unfortunately buried in the hard drive so choose a readily accessible location like the desktop or the Documents folder You also need to tell SPSS what type of file to save it as You will usually want to select All Visible Objects and export it as a Word RTF doc file This is the easiest way to save all your work in useful format RTF is Rich Text Format which can be read in nearly any text application on any platform Export Output Export Output Document ES Options Chart Size Export File File Name Users danflynn Documents Output Browse Export What Export Format B AIL Objects File Tvpe All Visible Objects v HTML file htm Selected Objects Text file txt Excel file xls E Word RTF file doc 7 PowerPoint file ppt You can also get to this dialog by clicking the Export button on the menu bar 11 Guide to SPSS Barnard College Biological Sciences Describing data The first task when beginning any analysis is to simply look at the data One of the best ways to do this is to create a histogram which is a graph that shows the measured value on the x axis and how many observations of each value on the y axis This is also known as a frequency distribution
13. you must adjust the analysis to acknowledge that you are doing multiple comparisons This is done by adjusting the a of the analysis using what is known as a Bonferroni correction This is a simple procedure If for example you have three within subject measures you need to divide your a by 3 So if you normally use a 0 05 now it becomes 0 05 3 0 017 arw the family wise error rate In SPSS this adjustment is made in the paired t test options dialog First using the example data from above choose Analyze gt Compare Means gt Paired Samples T Test Select each pair of comparisons to make and place them in the Paired Variables box eor Paired Samples T Test amp Subject Paired Variables de Pronation Pronation Neutral Neutral Pronation Supination amp Supination Meutral Supination Current Selections Variable 1 ra Options 3 Variable 2 i i JT or a E Qu Reset Paste J Cancel 0K Here is the trick You must manually change the error rate which here is presented as the confidence percent To change this correctly enter in the value 100 arw here this is 100 0 017 99 983 Paired Samples T Test Options Confidence 99 983 Missing Values Exclude cases analysis by analysis Exclude cases listwise 9 Cancel Continue The results of the paired samples t test show that pronation differes highly 81 Guide to SPSS Barnard College Biological Sci
14. 18 668 RegionCode 9 241 3 3 080 13 094 TobaccoUseCode RegionCode 2 84 464 1 973 Error 19 760 235 Total 327 351 Corrected Total 40 568 a R Squared 513 Adjusted R Squared 449 We also chose to make a profile plot of these results This shows that the mean percentage of low weight births from mothers of Unknown smoking status in the West is quite low much lower than the other regions While there was no 46 Guide to SPSS Barnard College Biological Sciences significant interaction between the predictor terms this crossing of lines is exactly what an interaction test aims to reveal Estimated Marginal Means of Percentage Low Weight Births Tobacco Use Code Wes Wo Unknown Estimated Marginal Means Northeast Midwest South West Region Code Note Some investigators in biology will report p values between 0 1 and 0 05 as marginally significant This implies that they view the chance of detecting a result as extreme as the one observed only 1 in 10 times by chance alone as important The use of marginally significant varies between journals and you might be better off simply reporting the p values without such commentary We can refine our interpretation of these results with better graphs First make a graph to see the effects of both main factors using a clustered bar chart Go to Graphs Bar and choose Clustered in the dialog box Bar Charts ali Simple C
15. Data Editor 3 AS Open Ee M T F Open Database Syntax 4 PE z E 3 SEX Read Text Data Output l Other CCMID Save eS WEIGHT Mark File Read Only Display Data File Information gt 7 3 1902 36 1035 64 184 Cache Data 4 Stop Processor db 1860 24 1027 58 514 5 Page Setup ep ar 2264 25 1281 63 958 6 j Print HP L3 2216 40 1272 61 690 1 Print Preview i du J Recently Used Data gt p OT e EEEa e ae Full e I i i id First select the desired location on disk using the Look in option Next select Excel from the Files of type drop down menu If you don t do this it will only look for files with the sav extension which is the SPSS format The file you saved should now appear in the main box in the Open File dialog box O60 8 Open File Enable Excel xls H a fio Biol_3386_Data Q search E foo Name Date Modified die p Network Birds xls Yesterday 73 Macintosh HD Bodyfat xls Yesterday i CoffeeProduction xls Yesterday 2 Documents Dowjones 1900 1993 xls Today A Applications a IQ Brain Size xls Yesterday SE Desktop M Kidney xls Yesterday 4 danflynn 1 OldFaithful xls Yesterday amp Music 4 Originals Yesterday E Movi Seaslug xls Yesterday ovies is Seed Ant xis Yesterday DE la I SPSS Files Today SiteData Paste New Folder Cancel Open You will see one more dialog box Guide to SPS
16. a P value The last column of the ANOVA table is the significance value if it is below 0 05 then we say we have rejected the null hypothesis of no difference between the group means and that there is a significant difference ANOVA Percentage Low Weight Births Sum of oquares Mean Square Between Groups Within Groups Total ourprisingly there is a highly significant effect of region on the rate of low birth weight births accounting for nearly 2596 of the variance in these births Look the sums of squares to understand the variance the between groups variance is compared to the within groups variance to calculate the significance of the test hence analysis of variance To look a bit deeper create a plot of the values using Graphs Bar Simple with Region Code as the Category Axis and the mean of the percentage of low birth weights as the bar height 41 Guide to SPSS Barnard College Biological Sciences 2 5 be be N e ui e Mean Percentage Low Weight Births e ua 0 0 Northeast Midwest South West Region Code Error bars 1 00 SE It seems that the rate of low weight births in the West is much lower than the other regions now we will test this hunch specifically In order to do this we go through the steps to re do the ANOVA Only this time in the One way ANOVA window select the Post Hoc button Post hoc means after this in Latin and refers to tests we do after the fa
17. differed significantly between these months there was much less leaf gas exchange happening in July than September atmospheric carbon is fixed into 62 Guide to SPSS Barnard College Biological Sciences carbohydrates as most twice as fast in July than September according to these data Mean Photosyn Month Error bars 1 00 SE We can modify this figure by double clicking it bringing up the Chart Editor First click the cur and edit it to add the units j oem xy xlutzZzicetmBu EUR E Lucida Grande Sanss 1 Auto e B Il A a Pim uw E S v v 7 00 Photosynthesis rate micromoles CO2 m2 sec Photosynthesis rate micromoles CO2 m2 s july sept Month Error bars 1 00 SE Next click one of the two bars to select both bars and then double click to bring up the Properties window In the Fill amp Border tab we can change the fill color for the bars and in the Bar Options tab we can slide the Width slider down to 5096 making the bars thinner 63 Guide to SPSS eoo Properties Chart Size Fill amp Border Categories Bar Options H gt Preview Fill a ee 40 42 115 mmmmmnm AMO Border m m m m m w 0 0 0 m m m m m m Pattern m m m am 5 L JUL LLL JL OL JL J 40 42 115 Border Style Weight Style End Caps 1 iv t H
18. good that s why we also look at the R Here the p value for Profile Area is 0 008 which is significant Furthermore the slope is 0 23 indicating that for every unit increase in vegetation density bird population density increases by 0 23 The middle table labeled ANOVA presents another view of how good this model is at explaining the data If we had tried more than one model the ANOVA procedure would let us pick out the best model Also note here that the ratio of the sums of squares of the model to the total sums of squares is the calculation for R 102 17 454 88 0 225 Once you do the regression you will also want to make a graph to see what the relationship looks like and to make sure that the assumptions of normal distributions hold up See Scatter plots below for how to add the regression line n at a No Species e R Sq Linear 0 145 Profile Area Note There is such a thing as nonparametric regression which is available in SPSS through the Curve Estimation tool This is appropriate when you are specifically testing a particular nonlinear relationship or know that you cannot assume that the variables have normally distributed error terms 39 Guide to SPSS Barnard College Biological Sciences Comparing Multiple Groups Parametric One Way Analysis of Variance ANOVA Additional Topics Post hoc tests Multiple comparison test The third major t
19. homogeneity of variance before proceeding Decimals Values Value Labels TobaccoUs Numeric Tobacco Use Co None Value 4 Year Numeric None RegionCod Numeric Ir Region Code None Value Label West TotalBirths Numeric 1l i Total Births None l Add 1 Northeast 2 Midwest 3 South LowWeight Numeric Low Weight Birt None Chanae Percentage Numeric 12 Percentage Low None ey Cancel OK select the variable you want to look at and put it in the Dependent List You can choose more than one for example if we also measured leaf thickness for 40 Guide to SPSS Barnard College Biological Sciences these plants we could place that in the Dependent List as well The Factor is your independent variable which will define the groups to compare An One Way ANOVA Tobacco Use Code Tot Dependent List 4b Year 4b Percentage Low Weight Bi Total Births TotalBirths amp Low Weight Births 1 7 Factor 4 Region Code RegionCode Contrasts Post Hoc Options P mm Q Q Qu Note There is only one variable allowed in the Factor box this is what is meant by a one way ANOVA since we are looking at how a single categorical variable explains the variance in a continuous variable The SPSS output for ANOVA is fairly concise Again there is really one value which answers our question and again it is
20. is assumed 2 Since we know that the data have failed the sphericity test look at the results after the Greenhouse Geisser correction has been applied 3 If these agree we are done If they disagree and the G G results show no effect but the sphericity assumed results do show an effect look at the Huynh Feldt corrected results This will be our final answer Tests of Within Subjects Effects Measure day Type III Partial Sum of Mean Eta source Squares Square Squared survival Sphericity Assumed 7130 356 12 594 196 112 924 000 890 Greenhouse Geisser 7130 356 2 059 3463 254 112 924 000 890 Huynh Feldt 7130 356 2 591 2751 769 112 924 000 890 Lower bound 7130 356 1 000 7130 356 112 924 000 890 survival dose Sphericity Assumed 561 952 12 46 829 8 900 000 389 58 Guide to SPSS Barnard College Biological Sciences Greenhouse Geisser 561 952 2 059 272 943 8 900 001 389 Huynh Felat 561 952 2 591 216 870 8 900 000 389 Lower bound 561 952 1 000 561 952 8 900 010 389 Error survival Sphericity Assumed 884 000 168 5 262 Greenhouse Geisser 884 000 ai 30 669 Huynh Feldt 884 000 s 24 368 Lower bound 884 000 ab 63 143 Notice that the F ratios for all of the test are the same for the two groups Even though the sphericity assumption has not been supported the corrections applied do not change the final story In particular both survival the day
21. lines show how much of the variation in strength measurements within subjects is not due to the treatments Error forearm and how much is just due to differences between subjects the Error term in the Between Subjects table Measure strength Source NENNEN Sphericity Assumed forearm Greenhouse Geisser Huynh Felat i Error forearm Sphericity Assumed Lower bound Huynh Feldt Lower bound Measure strength Transformed Variable Average Source Error Type IIl Sum of oquares Tests of Within Subjects Effects Type IIl Sum of Squares 36 889 36 889 36 889 117 111 117 111 117 111 df 1 498 1 765 DOO 16 980 14 121 8 000 Mean Square 368 444 492 065 417 463 7 319 8 293 14 639 Tests of Between Subjects Effects df Mean Square 50 338 50 338 50 338 Sig 000 000 000 80 Guide to SPSS Barnard College Biological Sciences Post hoc tests for repeated measures ANOVA After discoving that a within subject factor makes a significant difference in explaining the variation in the data you likely want to know where that difference is exactly One of the measures may account for all of the variation perhaps This requires a post hoc or multiple comparison test In SPSS it is possible to analyze the diffences between measurements with a paired t test Since you want to compare multiple groups using the same data
22. section Many additional details are listed in the Graphing and Finer Points sections Details about all of the real data sets used to illustrate the capacities of SPSS are in the Data Appendix Guide to SPSS Barnard College Biological Sciences Basics This section describes the essentials of how to start using SPSS to manage and explore your data effectively If you have previously used a spreadsheet program like Microsoft Excel many features of SPSS will be familiar However even if you have never used any quantitative program before the essential features of SPSS are easy to learn with a little patience Starting SPSS Go to the Applications folder and select SPSS from the list of programs or Start gt Programs gt SPSS on a PC A window will appear asking you what to do There are several options but you will often want to import data from Excel In that case you would go to Open another type of file select More files and navigate to the Excel file you want to use To just open it up for the first time click Type in data and select OK SPSS 13 0 for Mac OS X What would you like to do Run the tutorial Type in data 7 y Run an existing query li q Create new query using Database Wizard r Jj Open an existing data source More Files z Open another type of file More Files Don t show this dialog in the future Cancel Navigating SPSS uses several w
23. than 0 10 then you can assume that the variances are equal Independent Samples Test i ua lit m t test for Equality of Means 95 Confidence Interval of the Difference Sig Mean Std Error 2 tailed Difference Diff erence Photosyn Equal variances ds assumed 93 772 Um 607 000 pal 57 474 bs L67108 i47840 Equal variances nat assumed Um 639 228852 00K pal 57 474 bs L67511 i47436 Since in this case the variances are clearly not equal p 0 001 we want to use the version of the t test which does not assume equal variances In this case there is a highly significant difference between leaf gas exchange in July and September p 0 001 This table can be modified see Working with Tables for details to make it easier to read Making graphs of the two groups helps to convey these results quickly to your reader as well as helping you interpret the results see Bar charts and Box plots in the Graphing section 28 Guide to SPSS Barnard College Biological Sciences Paired T tests When the investigator takes two or more measurements under different conditions with the same subjects and then wishes to perform a t test to understand the effects of different conditions the correct test to use is a paired t test In this classic example rates of tooth decay were measured in 16 cities before and after the fluoridation of the municipal water supply The alternative hypothesis being tested here is that fluoridati
24. this means that there are too many digits for SPSS to display You can double click the chart to make it editable and then drag columns wider to make the values visible Analysis of Variance with Multiple Factors Additional Topics Fixed vs Random effects clustered bar graphs The example above describes how to conduct an ANOVA in SPSS that looks at the influence of only one factor one way ANOVA But what about when you are interested in the effects of two or more factors on the response variable For instance in this example A two factor analysis of variance does not merely run 43 Guide to SPSS Barnard College Biological Sciences two one way ANOVAs but can test how the two factors interact meaning how does the change in one of the predictors determine the change in the dependent variable given the change in the other predictor We will return to the birth weight data set and now ask three questions of the data 1 Between 1995 and 2002 in the US did the number of children born at low birth weights differ between regions 2 Did the cigarette smoking status of the mother significantly affect the proportion of low birth weight births 3 Is there any interaction between these two predictor variables First re open Natality sav with the region codes and tobacco use codes In SPSS two way ANOVAs are considered just one version of what is known as a General Linear Model see Portney amp Watkins p 450 Selec
25. xls or if you saved if from the correlation example Birds sav and go to Analyze gt Regression gt Linear Choose which variable will be your predictor Independent and which will be the predicted Dependent Note you should only use continuous variables for this analysis To be able to identify individual points easily place SITE in the Case Labels box e an Linear Regression E SITE Dependent qe Elevation de Total density Totaldensity qi Profile Area ProfileArea Block 1 of 1 Height 4 Latitude revious Next 4 Longitude Independen s qe No Species No Species onse Profile Area ProfileArea Method Enter ES Selection Variable Case Labels gt Bj SITE WLS Weight Statistics C Plots Save Options Reset Paste Cancel 3 Guide to SPSS Barnard College Biological Sciences SPSS produces more output than necessary to report when writing your results but it is all useful There are two values that you want to look at and make sure to put in your lab report The first is the R written R Square in the output This is the correlation coefficient otherwise known as the goodness of fit for your statistical model Unlike for P values there is no critical value for R you just have to report it and let the reader decide Here the R is 0 225 meaning 22 5 of the variance in bird population density is explained by the
26. 0 These are Continuous variables since data values represent quantitative differences Categorical variables simply place the data in different categories These should be coded as Nominal in SPSS Other ways of viewing frequency distributions include frequency polygons and stem and leaf plots Frequency polygons are essentially line plot representations 12 Guide to SPSS Barnard College Biological Sciences of histograms while stem and leaf plots are numerical representations showing which integer values fall within the larger bins of numbers In SPSS Begin by opening the file OldFaithful xls in SPSS These data show the date and time of every eruption of the geyser Old Faithful in Yellowstone National Park for one month For each eruption several variables were recorded including the interval since the last eruption and the duration of the eruption see the Data Appendix for more information View your data in the Data Editor in the Data View Note that the duration values have many decimal values we can clean this up Change the view to the Variable View and reduce the number of decimals shown for the Duration variable Oaks OldFaithful sav SPSS Data Editor 3 Je EJ oO 4B HES Ste 5 iR Numeric Scale Numeric Scale Numeric Scale i m t i t Numeric i Scale p Numeric i Scale Numeric i Scale String
27. 1 Summary statistics for heights of female and male residents of Morningside Heights Standard N Hange Minimum Maximum Mean Deviation mem 11 31 42 154 59 186 0 167 1185 7 7612 En Height 15 25 93 152 90 178 83 163 1684 7 4585 T tests A t test compares two groups and looks for significant differences in the mean of some variable In the example the leaf gas exchange rates of trees were measured in July and September Compare the table below with what SPSS produces you need substantially less information to get your point across This table shows the absolute minimum the ft statistic degrees of freedom and P value Each table should be accompanied by a legend which completely describes the results Table 2 Summary of t test results for leaf gas exchange rates for three Northeastern tree species df P Photosynthetic rate umol m sec 5 639 229 0 001 Working with Tables In the leaf gas exchange example the t test output first shows the results of the test for homogeneity of variances When reporting our results we only want to show one version of this table for equal variances not assumed since the Levene test shows that the variances are not equal Double click the Independent Samples Test table Now you can edit it freely In the menu bar go to Pivot Pivoting Trays Each of the arrow icons represents one aspect of the results table Hover your mouse over the second one on the bottom and you will see the t
28. 1 for the first value and Male for the first value label do the same for females 19 Guide to SPSS Barnard College Biological Sciences Value Labels Value 2 Value Label Female Add 1 Male Change Remove a UMEN uum a ae wu Cancel OK J oimilarly the numerical values in the DiseaseType variable represent different diseases enter in the value labels accordingly Value Labels Value 3 Value Label Other Add 0 Glomerula nephritis 1 Acute nephritis 2 Polycistic kidney disease Remove Othe M Change Pa gilt b eusesoE Cancel OK C3 save this file in an appropriate location as Kidney sav these codes will be saved for future use Now we can test the degree of association between these two categorical variables is the frequency of these kidney diseases significantly associated with sex We will use the Crosstabluation method in SPSS for this example Go to Analyze gt Descriptive Statistics gt Crosstabs In the Crosstabs window select DiseaseType as the row variable and Sex as the column 20 Guide to SPSS Barnard College Biological Sciences aan Crosstabs Patient Row s qi Time qe Disease lype 4b Status 4b Age Frailty Columns 4b Sex By Previous Layer 1l of 1 Next L Display clustered bar charts Suppress tables Statistics Cells A Format ee ronnie 2 Reset Paste Can
29. 25 point Leave all boxes checked in the next window and choose Continue eoo Save Chart Template Please select the settings that you want to save in the template Y All settings Y M Layout M Chart size Text data and other frames M Orientation v M Styles M Text formatting gt M Non data element styles gt CO Data element styles v O Axes gt C Scale axes Y O Data value labels O Display O Content O Font style font font size etc O Position aka Justification O Fill and border including option to color border by r Cancel Expand All ee p lt T gt Description of Template You may enter a brief description of the template Name your template something sensible and save it in a place where you will find it easily 67 Guide to SPSS Barnard College Biological Sciences eoe Save Template File scatter template danflynn H4 Nae Sa Date Modified A5 Applications Tuesday May 22 2007 3 25 PM 8E Desktop Thursday August 30 2007 9 44 PM w Documents Thursday August 30 2007 9 29 PM e3bgrads Thursday August 23 2007 1 00 PM LJ ie ImageMagick 6 3 0 Tuesday July 17 2007 4 35 AM E Library Tuesday August 28 2007 8 10 PM T a Movies Sunday August 19 2007 1 42 PM i File Format Templates sgt B Help The template is a file with the extension sgt which you will be able to use only for SPSS chart objects It might be helpful to have a folder for template
30. Nominal Return to the Data View and select Analyze gt Descriptive Statistics gt Frequencies as in the image below SPSS13 File Edit View Data Transform BAE Graphs Utilities Window Help eee Old Reports bL c z b 2 d yE Descriptive Statistics gt Frequencies 3 3 c 5 KD B id Tables Descriptives Compare Means j Explore Uy General Linear Model Crosstabs Mixed Models je Ratio Time Numeric Correlate gt Interval Numeric Regression gt Loglinear gt Duration Numeric Classify gt Height Numeric Data Reduction p Scale p Prediction Numeric scale Scale Nonparametric Tests A Accurate String Time Series Nominal Survival gt 4 Multiple Response kl 13 Guide to SPSS Barnard College Biological Sciences Next select the measurement that you want to analyze Note that different variable types will have different icons identifying them In the example below the variable Interval has already been double clicked aan Frequencies f Date 4b Time Interval F Duration Height Prediction zn A Accurate Variable s rd Display frequency tables Statistics Charts Format 5 C Reset 9 Paste Cancel To produce the histogram click Charts and then select His
31. O Ice RR OH t 52 Repeated measures ANOVA ccccscccsecccuecseeccucneeensuessuessusesnensuessusssusessuessuessusssusenenesanesanes 53 GADNO oco mes a eee ak eus ues E cue reve ae anew eee etd ales terol eee CuDu DUE anit pe cineca CURE OU DuCUE 61 cepi ee ee ene I 61 rs 072 0 a 6 PEE 65 Adang a Tedlessiobllli sese esee seme rp ciel ied RR M MN I nh eee 69 iig PONS eme P M 72 FneUNNG MEC AUA isaac ah ae alt 72 EATON SS CUM OI yee cerca c 72 WOKING WHN CAS CS eaea a achat ateaatiniolinghatiah atbathatiapsatberhatodatbarkatoarhatimbeRabotiaeliabatsen alias 73 Modar ODT asi mtt MM Me M MM dM sone iaendaeaeasem 73 Biene SII ETC E 74 icr c W O 19 bae sieoRw alert nece 75 ANOVA E 76 Examples from Portney amp WatKins ccccccccesseceseeesseeseeeseeeseeenseoeseeaesenseenseoessoeaeseaseoneeoneess 78 HebDeatedg MeasUres AINOV Avice font ccs teat tate at A oae ae oe oU o RUM a Da dE 78 Post hoc tests for repeated measures ANOVA c1cccecsceeseeeeeeeeenensseeseeeseeeneseneeenseeesenenesenesons 81 FROTCRCINCCS MT TOC TTE TTL 82 Introduction Why SPSS After the experiment is run and the data are collected you the biologist face the task of converting numbers into assertions you must find a way to choose among your hypotheses the one closest to the truth Statistical tests are the prefe
32. S Barnard College Biological Sciences Opening Excel Data Source Users danflynn Documents Columbia Work Barnard statproject BIOL 3386 SPSS Guide Biol_3386_Data IQ Brain Size xls v Read variable names from the first row of data Worksheet Q Brain A1 121 Range Maximum width for string columns 32767 Cancel OK This dialog box allows you to select a worksheet from within the Excel Workbook You can only select one sheet from this menu if you want both sheets you need to import the second sheet into another Data Editor window This box also gives you the option of reading variable names from the Excel Workbook directly into SPSS Click on the Read variable names box to read in the first row of your spreadsheet as the variable names It is good practice to put your variable names in the first row of your spreadsheet SPSS might also change them slightly to put them in a format it likes but they will be basically what you entered in your Excel file You should now see data in the Data Editor window Check to make sure that all variables and cases were read correctly the Data Editor should look exactly like your Excel file Manually entering data If you only have a few data points or simply like typing lots of numbers you can manually enter data into the Data Editor window Open a blank Data Editor as explained above and enter in the data in columns as necessary To name your variables which are always in co
33. Student Guide to SPSS Barnard College Department of Biological Sciences Dan Flynn Table of Contents MOGUC HOR feme iE DE REED ER 2 scie M es 4 SIDON EL EMEN UTE 4 MEI m 4 Br Ma 0 E S e renee 5 FV WE Ie m aaa 6 Geting VOUl ONAIN m 7 Openingiadsxcellile eee eee ere are eee ee eae mn iain ates data usa nti de un naui 7 Manually zie irae geriet 9 Obesnimng amexisudg SFO o MEss rinri eese lebte iba pan oem Ead uad D MM RU MPd Em ME 10 SAVING VOUN WOK E t n 10 Gu ting and Das scie cerunt ertet tet m ud oca D nica Sande ce decade dsce deed ce Desde is 10 EON LING catalase teenie 11 PIOSCHID ING Re C MEM alee TL le duee ee eeii ai eaaa aE 12 Frequency distributions cccccccecceeeceeeceeceeeceeceeecaeecseceuecaeeceeceuesaeeceeseueseeeceeeeeeseeesenseees 12 Parametric vs Non parametric statistiCs eese esee enne nnns 15 pedis HEMH L 16 Homogeneity of Variance cccccccecceecceeccecceeceueceeeceeceueseeecseceueceeecaeseueseeeseeseeeseeesenseeeas 16 Igel gode RR E I E E DD 16 Data ANALY EST CENE EET m 19 Analyzing Frequencies Chi square 1 ssccsseccseccsuecnusenuccuseneensusssusssusensuensuessusssusssenenenssanssanes 19 Comparing two groups ssssssssesseesen nennen nnn nnnnnnnnnnnna nsa ns
34. a MANOVA by default for a repeated measures ANOVA with the results in the Multivariate Tests table There is rarely any major difference between them in terms of significance values but if necessary to choose the appropriate test consult a specialized text on multivariate statistics e g Manly 2005 SPSS performs two tests related to sphericity Box s Test for Equality of Covariance Matrices and Mauchly s Test of Sphericity Portney amp Watkins provide a succinct description of Mauchly s test p 447 56 Guide to SPSS Barnard College Biological Sciences If the result of the Mauchly test is significant p lt 0 05 there is a significant violation of the assumption of sphericity Therefore we should correct the degrees of freedom when performing the ANOVA SPSS does this automatically and notes it in a footnote beneath the Mauchly test table The correction is called epsilon SPSS reports all possible significance values using the different epsilon corrections Here are the meanings of each of these e ophericity Assumed Original degrees of freedom assuming that the covariance matrix shows equal covaraince between the independent factors e Greenhouse Geiser Degrees of freedom adjusted conservatively If the uncorrected effect is significant but the Greenhouse Geiser corrected effect is not check the next line Huynh Felat e Huynh Feldt Degrees of freedom adjusted If the G G corrected effect is not significant
35. a seriously violate these requirements then it is safer to use non parametric statistics which are tests of data which make fewer assumptions Such tests also have reduced ability to detect significant differences so should be used only when necessary Throughout this guide we will present the non parametric alternatives to the standard parametric tests Normality The concept of normality is central to statistics For data to be normal they must have the form of a bell curve or Gaussian distribution with values dropping off in a particular fashion as they increase or decrease from the mean Specifically a normal distribution contains 68 26 of the data within 1 standard deviation from the mean Homogeneity of Variance For parametric statistics to work optimally the variance of the data must be the same throughout the data set This is known as homogeneity of variance and the opposite condition is known as heteroscedasticity In SPSS Both normality and homogeneity of variance can be assessed through the Explore tool in SPSS Analyze Descriptives Explore Select the Interval variable as the dependent and Accurate as the factor See the Data Appendix for a full description of these data In the Plots options window select Histogram Normality plots with tests and Untransformed These are explained below OldFaithful sav SPSS Explore Plots Explore E n Boxplots Date Dependent Li l f Factor levels together
36. able Descriptives Descriptive Statistics ee Elevation 521 28 581 B23 No Species 78 13 5 7 B3 Valid N listwise 2 Ah Frequencies 7 1 SPSS Processor is ready l l Guide to SPSS Barnard College Biological Sciences The left frame of the SPSS Viewer lists the objects contained in the window In the window above two kinds of descriptive statistics summaries were done and these are labeled Frequencies and Descriptives Everything under each header for example Descriptives refers to objects associated with it The Title object refers to the bold title Descriptives in the output while the highlighted icon labeled Descriptive Statistics refers to the table containing descriptive statistics like the range mean standard deviation and other useful values The Notes icon would take you to any notes that appeared between the title and the table and where warnings would appear if SPSS felt like something had gone wrong in the analysis This outline is most useful for navigating around when you have large amounts of output as can easily happen when you try new tricks with SPSS By clicking on an icon you can move to the location of the output represented by that icon in the SPSS Viewer a red arrow appears on both sides of the frame to tell you exactly what you are looking at Getting your data in Opening an Excel file Importing data into SPSS from Microsoft Excel and other applications is relatively pa
37. actor to group the subjects only one within subjects factor the elbow flexor strength To see how this example looks in SPSS load the data ElbowFlexor from the workbook Portney_Watkins xls Then go to Analyze gt General Linear Model gt Repeated Measures Recall that the key step in running a repeated measures ANOVA in SPSS is correctly defining the within subject factor Here name the factor forearm and set it to three levels Click Add and then name the measure strength Click Add and then Define Repeated Measures Define Factor s Within Subject Factor Name forearm Number of Levels 3 Remove Measure Name strength xr Reset Cancel Define In the Repeated Measures dialog select all three measurement variables Pronation Neutral and Supination and click the right pointing arrow to move them into the Within Subjects box There is no between subjects variable in this example 8 Guide to SPSS Barnard College Biological Sciences eo 7 Repeated Measures 4 Subject Within Subjects forearm mr E EE Between Subjects Factor s Covariates Save Copons r OD i Y Choose Plots and add a forearm plot Repeated Measures Profile Plots Factors Horizontal Axis forearm FW forearm Separate Lines Separate Plots Add Change Plots Run the analysis and examine the results The first major difference between the SPSS outpu
38. and neither is the Huynh Feldt then you cannot reject the null hypothesis e Lower bound Degrees of freedom adjusted very conservatively Only use this in cases when it would be extremely risky to make a Type error incorrectly reject the null hypothesis However because it is so conservative it is likely to lead to Type II errors incorrectly fail to reject the null hypothesis Returning to the model results we first see the multivariate analysis of variance tests These test the effect of the within subject factor survival as if each measurement were a different variable that is what makes this a multivariate test The different flavors of MANOVA are all identical here showing a significant effect of the day measured this is not interesting or surprising since we expect that termites will start dying off in the petri dishes quite naturally However the next set of values survival x dose show no effect This indicates that the survival of termites did not differ depending on the concentration of tree bark extract This indicates that the tree bark extract would not be useful as an anti termite treatment But this result should be treated very cautiously since the multivariate test is less powerful than a repeated measure ANOVA Multivariate Tests b Hypothesis Partial Eta Effect Value df Squared survival Pillas Trace 979 11 535 a 12 00 3 00 034 979 Wilks Lambda 02 11 535 a 12 00 3 00
39. ble AgeBand Age Banded Minimum 10 Nonmissing Values Maximum 69 H IE 1 e ees eee ey e 16 00 14 54 19 08 23 62 28 15 32 69 37 73 41 77 46 31 50 85 55 38 5992 64 46 69 00 73 54 Enter interval cutpoints or click Make Cutpoints for automatic intervals ess EIL A cutpoint value of 10 for example defines an interval starting above Grid the previous interval and ending at 10 Cases Scanned 76 Label Upper Endpoints 34 Included lt Missing Values 0 CET f Excluded lt 46 53 Make Cutpoints From Another Variable BEL Make Labels To Other Variables U Reverse scale Copy Bands After clicking OK a message window will appear letting you know that one new variable AgeBand will be created 25 Guide to SPSS Barnard College Biological Sciences Now we will assess the independence of age and disease type using a Chi square test To quickly return to the Crosstabs menu click the icon and select Crosstab Replace the Sex variable with our new AgeBand variable in columns all the other options will be the same as we specified before Click Ok and look at the results in the Output Viewer There is a highly significant association between age and disease type p lt 0 001 Looking at the standardized residuals highlighted below this appears to be largely driven by the Other category which was much more frequent in youn
40. cel Then click the Statistics button and select Chi square Note that many other Statistics of association are available most of which are described in Portney amp Watkins 2000 Crosstabs Statistics W Chi square f Correlations Nominal Ordinal Contingency coefficient f Gamma Phi and Cram r s V f 1 Somers d Lambda Kendall s tau b Uncertainty coefficient Kendall s tau c Nominal by Interval Kappa Eta LJ Risk f McNemar Cochran s and Mantel Haenszel statistics Test common odds ratio equals 1 Ay Fr C k l uy anceli j Finally click the Cells button In the following window add the Counts Expected Percentages Row and Residuals Standardized options add the Column and Total percentages to make the resulting table directly comparable with that in Portney amp Watkins 21 Guide to SPSS Barnard College Biological Sciences Crosstabs Cell Display Counts v Observed vi Expected Percentages v Row l Column f Total Residuals Unstandardized v Standardized _ Adjusted standardized Noninteger Weights f Round cell counts 5 Truncate cell counts C No adjustments Round case weights Truncate case weights The results in the Output Viewer break down the observed and expected frequencies Count for each sex and disease type We can look at the frequency values within sex to visually estimate how much the ob
41. change in vegetation density For an ecological study this would be considered an important effect Model Summary b GEEK Adjusted R Std Error of Model R Square Square the Estimate 225 197 3 549 a Predictors Constant Profile Area b Dependent Variable Total density a ANOVA b Sum of F Model oquares Mean Square Regression 102 174 1 102 174 8 111 Residual 352 707 12 597 Total 454 881 a Predictors Constant Profile Area b Dependent Variable Total density Coefficients a Standardized Unstandardized Coefficients Coefficients Std Error Beta 1 Constant 3 022 2 883 Profile Area 230 081 a Dependent Variable Total density The other value to look is again a p value of the predictor Here we want to look at the P value for the slope of the regression line The equation for a straight line is y a bx The independent variable is x the dependent variable is y ais the intercept and b is the slope Regression analysis figures out what the best values of a and b are and reports these as coefficients It then tests whether the coefficient b the slope is different from zero 38 Guide to SPSS Barnard College Biological Sciences A slope of zero means that the dependent variable changes arbitrarily as the independent variable changes However just because the slope is different from zero doesn t mean that the relationship is necessarily any
42. ct to see how knowing that the main effect is significant each treatment level relates to the others If the main effect is not significant post hoc tests are not useful In the post hoc window select Tukey for Tukey s honestly significant difference HSD and click Continue and then OK Notice that there are lots of different tests we could choose from and they may give you different answers Portney amp Watkins consider LSD and Duncan too liberal for example Zar 1999 also promotes the Tukey test for multiple comparisons both for parametric and nonparametric ANOVAs Portney amp Watkins also discuss the merits of the ocheff post hoc test One Way ANOVA Post Hoc Multiple Comparisons Equal Variances Assumed LSD S N K f Waller Duncan Bonferroni M Tukey Type Type ll Error Ratio 100 Sidak Tukey s b f Dunnett Scheffe f Duncan Control Category Last HS C R E G W F C Hochberg s GT2 Test 2 sided lt Control gt Control R E G W Q C Gabriel Equal Variances Not Assumed C Tamhane s T2 Games Howell C Dunnett s T3 Dunnett s C Significance level 05 Cancel Continue 42 Guide to SPSS Barnard College Biological Sciences The resulting table takes some time to interpret First notice that the first column is one category and then the second column has all the other categories corresponding to it This means we first start with one treatment No
43. d Butted Hd Barnard College Biological Sciences eoo Properties ep LT r gre Fill amp Border Categories BarOptions Variables 4b Width Bars lip 7777977777 n 50 Scale boxplot and error bar width based on count Clusters iiu nnt 100 Boxplot and Error Bar Style Stacked Bars as t il Scale by statistic ch Whiskers Il J Scale to 100 A Bars only After applying these options your chart should look like this Photosynthesis rate micromoles CO2 m2 sec Month Error bars LH SE A graph like this should have a caption to the effect of Canopy leaf photosynthesis rates sharply decline over the course of a season mean 1 s e The parenthetical comment indicates that the bars represent group means and the error bars represent one standard error above and below each mean 64 Guide to SPSS Barnard College Biological Sciences Scatter plots Additional Topics Saving formats as a template fitting a regression line Whenever your hypothesis is asking a question about how two characteristics might relate to one another over a range of values you should use a scatter plot to represent the data These plots are appropriate for both correlation and regression analyses To make a scatter plot go to Graphs Scatter Dot Choose Simple Scatter from the menu that appears and then click Define Scatter Dot
44. d ending at 10 Upper Endpoints Included lt Excluded lt Make Cutpoints Make Labels f Reverse scale Cases Scanned 76 Missing Values Copy Bands From Another Variable To Other Variables There are several possible ways to divide the data with cutpoints An easy way to make four categories which contain equal numbers of cases is to choose Equal Percentiles based on Scanned Cases Choose 3 cutpoints and select Apply 24 Guide to SPSS Barnard College Biological Sciences Make Cutpoints C3 Equal Width Intervals Intervals fill in at least two fields First Cutpoint Location Number of Cutpoints Width Last Lutpoint Location Equal Percentiles Based on Scanned Cases Intervals fill in either field Number of Cutpoints 3 Width 25 00 C3 Cutpoints at Mean and Selected Standard Deviations Based on Scanned Cases 4 1 Sta Deviation 2 Std Deviation j 4 3 Std Deviation e Apply will replace the current cutpoint definitions with this specification A final interval will include all remaining values N cutpoints produce N 1 intervals Cancel C3 Finally choose Make Labels back in the Visual Bander This will create value labels similar to what we did manually for sex and disease type Visual Bander Scanned Variable List Name Label Level Variable Current Variable Age 49 Age Age Banded Varia
45. dition you can specify the Values of the data This is most useful for nominal data usually string type meaning words instead of numbers where you have a few categories and want to label them in helpful ways The Values column is a way of using a simple code in the actual data like O and 1 or M and F but showing a descriptive term like control and treatment or Male and Female Do this double click a variable name in the Data View Then click the corresponding box in the Values column Enter the value like O or 1 and what its label is like control or treatment 2 Guide to SPSS Barnard College Biological Sciences Value Labels ales Value Labels Value F Cancel Label Female M Male Go back to the Data View Select View Value Labels Now instead of your codes the full names appear for each value are automatically displayed Also when you click on a box in that column you will be presented with a pull down menu giving you a choice of the value names you entered This feature is particularly useful for sharing data with colleagues and making sure only the allowed values are entered in Lastly you can format how the data are displayed using the Alignment Decimals Missing and other characteristics of the variables Making your data easier to read will make it easier for others to quickly understand what data you have and what you want to do with them Working with cases SPSS al
46. ean Simple Scatterplot Elevation Y Axis 4 Height d amp Total density Totaldensity Latitude X Axis Longitude Profile Area ProfileArea No Species No Species Set Markers by Label Cases by Al SITE Panel by Rows Nest variables no empty rows Columns Nest variables no empty columns Template File W Use chart specifications from F mm jii uu SA nO vE Reset Paste Titles Options Cancel OK 69 Guide to SPSS Barnard College Biological Sciences The chart will have identical formatting to the correlation scatterplots produced previously Here we re going to add a regression line so it will be clearer if we hide the labels for each site Go to Elements Hide Data Labels SPSS 13 Chart Editor File Edit View Options Elements Transform Help e608 Chart Editor Bi Data Label Mode E O A 4 2 3 X Y e A U EK C E EE Hide Data Labels Lucida Grande SansS w 9 v B I T lii St Hide Data Labels m a Show Line Markers B od uw ESL S uc ee ee ae See DN MR N Fit Line at Total i Fit Line at Subgroups lv Interpolation Line Explode Slice 20 Th 2 e wu 3 Carmienae erma SnGabesOP Caspers maar MadR Miramonte 20 25 30 35 40 45 50 Profile Area Now right click the plot and select Add fit line at total or click on the V symbol in the Chart Editor menu bar
47. ely not in biology Also because the P value is below the critical value of 0 05 you should highlight it in bold This way if you have many ANOVA results the reader can quickly refer to the significant ones Also note the way the table lines are drawn this is standard format for publication Table 3 Summary of analysis of variance results for the effects of elevated atmospheric CO concentration on plant growth 6 Guide to SPSS Barnard College Biological Sciences SS df MS F P eae 3 089 2 1545 45 237 001 Groups Within Groups 2 458 72 034 Total 5 547 14 In addition you should produce a bar chart with error bars just as you would do for a t test analysis An example bar chart from these data is below 1 2 1 0 0 8 0 6 0 4 Relative growth rate cm day 0 2 0 0 Ambient Ambient 150 Ambient 350 CO2 concentration ppm Figure 5 Atmospheric CO concentration significantly affects relative growth rate of greenhouse plant seedlings bars are mean 1 SD Guide to SPSS Barnard College Biological Sciences Examples from Portney amp Watkins Repeated Measures ANOVA Additional Topics Multiple comparisons for repeated measures ANOVA Portney amp Watkins give an example of a simple repeated measures analysis of variance in which nine subjects had their forearm strengths measured in three different positions Table 20 3 p 444 In this example there is no between subjects f
48. ences significantly p 0 001 from either neutral or supination postures but the latter two do not differ significantly from each other p 0 127 Paired Samples Test Sig 2 Paired Differences tailed Std Std Error 99 983 Confidence Deviation Mean Interval of the Difference Lower Upper Pronation Neutral Pronation Supination Neutral Supination References Devore J L Probability and Statistics for Engineering and the Sciences 6th edn Belmont CA Thomson Learning 2004 Manly B F J Multivariate Statistical Methods A Primer 3rd edn Boca Raton FL Chapman amp Hall CRC Press 2005 Portney L G Watkins M P Foundations of Clinical Research Applications to Practice 2nd edn Upper Saddle River New Jersey Prentice Hall 2000 Zar J H Biostatistical Analysis 4th edn Upper Saddle River NJ Prentice Hall 1999 82
49. est for homogeneity is superior particularly when the underlying distribution can be assumed to be near normal but SPSS has no packaged Bartlett test There are several statistics reported here the most conservative one is the Based on Median statistic Since the Levene s Test is highly significant the value under Sig is less than 0 05 the two variances are significantly different and this provides a strong warning against using a parametric test Note Because parametric tests are fairly robust to violations of homoscedasticity it is generally recommended to use parametric tests unless the above tests for normality and homogeneity show strong departures Or If your data are all nominal or ordinal you can only use non parametric tests In order to focus on only the assumption of normality ignoring the homogeneity of variances assumption repeat this procedure without a factor variable 18 Guide to SPSS Barnard College Biological Sciences Data Analysis Investigating the patterns and trends in the data is the core feature of SPSS This section describes four groups of tasks that you will have to be able to complete over the course of this lab Only the fundamental concepts and steps are presented here for more detail on the statistics or program details refer to your text or ask an instructor Analyzing Frequencies Chi square Additional Topics Transforming continuous variables to categorical In order
50. ext Assumptions Drag this icon to the left hand side of the Pivoting Trays window which is called Layers 75 Guide to SPSS Barnard College Biological Sciences A Pivoting Trays 1 Layers Columns Now the results table has a drop down menu called Assumptions and you can toggle back and forth between the version where equal variances are assumed or not assumed Independent Samples Test Equal variances not assumed t test for Equality of Means Sig 2 Mean Std Error 95 Confidence Interval t df tailed Difference Difference of the Difference Lower Upper Photosyn 5 639 228 852 000 2 57474 45657 1 67511 3 47436 ANOVA oee the Analysis of variance section for a complete description of how to conduct an ANOVA This example shows the results of an analysis of how rising CO might affect plant growth rates SPSS produces a table which is very useful and shows all the major components of the ANOVA test Sums of Squares SS degrees of freedom df mean squares MS the F statistic and the P value While you may not know what these all mean it is useful to report them so that the reader can see exactly how you got your results There are other formats for presenting ANOVA results but this a standard one Note how the P value is reported not as 000 which is what SPSS returned but rather as 7 001 This is a more accurate representation a probability can never really be zero definit
51. f the file name like doc for Word files or x s for Excel are different for these two file types Data Editor files are saved as sav while output files from the SPSS Viewer are saved as spo Remember that when you are sharing your work with your partner make sure to give him or her both files Hemember that SPSS produces more output than you really need to present for almost every analysis It is worthwhile to spend a little time trimming unnecessary information from the output when preparing a lab report or paper This will make it easier for the reader to understand what you want to communicate with your table or graph You can read more about how to trim down the output in Model output Cutting and pasting Output in the SPSS Viewer can also be cut and pasted into Word or Excel files with all the formatting preserved This is useful when you want to prepare a lab report or paper and want to insert a graph or table Simply right click an object 10 Guide to SPSS Barnard College Biological Sciences select Copy and then paste into your report You can also right click the name of an object in the left hand pane of the SPSS Viewer or even several objects and do the same sometimes when pasting a graph SPSS crops the image in unexpected ways If this happens to you try exporting the output instead The next section tells you how to do this Exporting If you want to save all the graphs and tables in one file
52. fit a normal distribution If the results are significant then the null hypothesis of no difference between the observed data distribution and a normal distribution is rejected Simply put a value less than 0 05 indicates that the data are non normal Shapiro Wilks W test is considered by some authors to be the best test of normality Zar 1999 Shapiro Wilks W is limited to small data sets up to n 2000 Like the Kolmogorov Smirnov test a significant result indicates non normal data Both of these test indicate that both categories of results ones for which the predicted of eruption time was accurate and those not the sample data are not normally distributed On this basis alone it may be more appropriate to choose non parametric tests of the hypotheses In addition to the normality tests we chose to test the homogeneity of variance in this sample You can only do this when you have groups to compare this requires some categorical variable 17 Guide to SPSS Barnard College Biological Sciences Test of Homogeneity of Variance Statistic dt 2 Inte ral Based on Mean 257 Based on Median 4 26 Based on Madian and with adjusted df Based on trimmed mean 181 204 267 a here are no valid cases for Interval when Accurate 000 Statistics cannot be computed for this level There are several tests for homogeneity of variance SPSS uses the Levene Test Some statisticians Zar 1999 propose that Bartlett s t
53. ge that elevation and latitude are not independent the sites sampled higher up on the coast more north higher latitude were generally at lower elevations than those sampled further down the coast Correlations o P Etwaon Latitude Spearman s rho Elevation Correlation Coefficient Sig 2 tailed N Latitude Correlation Coefficient Sig 2 tailed N Correlation is significant at the 0 01 level 2 tailed 36 Guide to SPSS Barnard College Biological Sciences Regression When our question centers on answering if one variable predicts another regression is the key statistical tool not correlation For example does ozone concentration in a given city predict how many children develop asthma or does height predict average annual income in corporate America We address such questions with linear regressions which test for the presence of straight line relationships between the predictor variable and the response variable Other shapes of relationships are possible and in fact common in biology but we will start with linear regression In the following example we examine whether vegetation density predicts the density of breeding bird populations in California forests A significant positive relationship would indicate that birds seek out dense vegetation for breeding while a negative relationship would indicate that less dense vegetation is preferred perhaps because ease of access to food resources Open Birds
54. ger patients than expected by chance and much less frequent in older patients DiseaseType lt 34 Glomerulo nephritis Count Age Banded Total 34 45 46 53 54 Expected Count within DiseaseType Std Residual Count 24 Expected Count l l 24 0 within DiseaseType Std Residual Count Acute nephritis 100 0 Polycistic kidney disease Expected Count within DiseaseType Std Residual Count Other Expected Count l l l l 26 0 within DiseaseType Std Residual 100 0 Expected Count within DiseaseType 18 0 23 7 20 20 0 26 3 18 18 0 23 7 20 20 0 26 3 76 76 0 100 0 26 Guide to SPSS Barnard College Biological Sciences Comparing two groups T tests We use t tests to compare the means of two groups A t test looks at the two distributions as we did above and determines whether or not their means are significantly different The null hypothesis in a t test is that there is no significant difference between the two means For this test we will answer the question for common trees in the Northeast are leaf photosynthesis rates different over the course of the year Open the file Leafgas xls In the Variable View code the species names by double clicking the corresponding Values cell and entering the full names Value Labels Value YB Value Label Yellow birch Add OAK Red oak RM Red maple Cha ie
55. he results show that there is a strong positive and significant relationship between the number of bird species in a community and the total number of breeding pairs r 0 507 p 0 01 This is partially because there must be more individuals to have more species but suggests that there may be an interesting 34 Guide to SPSS Barnard College Biological Sciences story behind what causes population density and number of species to change in sync Correlations No Species density No Species Pearson Correlation 507 40 40 Total density Pearson Correlation 507 1 Sig 2 tailed 001 N 40 Correlation is significant at the 0 01 level 2 tailed When we graph these data the strong positive association is clear Graphic representations of data make your job of convincing the reader much easier by showing how the two variables change together Solano CuyamacaOP Descanso ColdCk a a LaGiganta Thurston Topanga o Hastings Big Ck Hayward KitCarson E eo CuyamacaLO E Chiquita a BellCny2 30 StaBarbara CuyamacaOC 5 a go No Species MadR 3 Lompoc e e 5 Jasper e e Caspers Sonoma CCCo e BellCyni E SierrGlen e o CALiveOak Redding Goshen Sutter e oc Carmichael tad SnGabesOP Chico SnlCapistran Miramonte o Cleveland 9 SnGabesO Total density This chart was created and modified using these steps Nonparametric Spearman s rho In cases where the distribution of
56. ick the Estimates of effect size and Homogeneity tests boxes Repeated Measures Options Estimated Marginal Means Factor s and Factor Interactions Display Means for OVERALL dose survival dose survival Compare main effects Confidence interval adjustment ISD nani Display v Descriptive statistics m Transformation matrix Estimates of effect size v Homogeneity tests Observed power Spread vs level plots f Parameter estimates f Residual plots 1 SSCP matrices C Lack of fit test f Residual SSCP matrix General estimable function Significance level 05 Confidence intervals are 9556 Choose Continue and then OK to run the test Before looking at the results it is necessary to digress briefly to discuss the concept of sphericity Sphericity In other parametric tests we have been concerned with the normal distribution of data and homogeneity of variances In a repeated measures design we are also concerned with equal correlations between the data at different time points this is known in statistics as sphericity This assumption considers the covariance between measurements If the sphericity assumption is violated the chance of a Type l error incorrectly rejecting the null hypothesis of no difference between groups increases This is a troubling outcome and unfortunately difficult to resolve Alternatives include multivariate analyses of variance MANOVA which do not require sphericity SPSS runs
57. indows to manage data output graphs and advanced programming You will use two windows for everything you need in this class the Data Editor and the SPSS Viewer Guide to SPSS Barnard College Biological Sciences Data Editor The Data Editor window displays the contents of the working dataset It is arranged in a spreadsheet format that contains variables in columns and cases in rows There are two sheets in the window The Data View is the sheet that is visible when you first open the Data Editor and contains the data This is where most of your work will be done Unlike most spreadsheets the Data Editor can only have one dataset open at a time However you can open multiple Data Editors at one time each of which contains a separate dataset Datasets that are currently open are called working datasets and all data manipulations statistical functions and other SPSS procedures operate on these datasets The Data Editor contains several menu items that are useful for performing various operations on your data Here is the Data Editor containing an example dataset oO6e6 Birds sav SPSS Data Editor BASHCC APH DHA Bi 1 SITE Thurston sme Elevation Proflearea Height Halfheight Latitude Longitude No spec ee 9 oom om mom oe as o cam oam Diem 0 oom ome 9s us 23 er 4m mem os sol 3 oem we OG 13 n 732 le ni spss Processor is ready F A T Notice that there are t
58. inless We will start with an Excel workbook which has data we later use for several of our example analyses These data are the IQ and brain size of several pairs of twins with additional variables for body size and related measures There are 10 pairs of twins five male and five female o e e IQ Brain sav SPSS Data Editor l 158 o Ara xg EX EI 9 1 CCMIDSA 6 08 se o dp o 3 l omae os sev Ca sa o a ewm a saos e w mo 1 mese os ski sas a a a a si a a 3 3 l s a eese EE EE ONE NE NE NN NCC ONE bs 24 03 RAA 1R6R 9Q 1051 144 iA T Pata view E EET LAN ip 5 Processor is reac 1 2 3 4 5 6 It is important that each variable is in only one column It might seem to make sense to divide the data into male and female and have separate columns for each However working with SPSS will be much easier if you get used to this format one row one individual Guide to SPSS Barnard College Biological Sciences First go to the Data Appendix and download the file IQ Brain Size xls Relationship between IQ and Brain Size This will be the first step for all the examples in this Guide Open SPSS and select Type in data To open an Excel file select File Open Data from the menu in the Data Editor window SPSS 13 Edit View Data Transform Analyze Graphs Utilities Window Help eee New PSS
59. iro Wilk p 0 029 Therefore a more conservative approach would be to use the Wilcoxon Signed Rank Test the nonparametric alternative to a paired t test Go to Analyze gt Nonparametric Tests gt 2 Related Samples select both Before and After and move them into the test pairs list aA Two Related Samples Tests 4 ID Test Pair s List 4 BEFORE BEFORE AFTER 4b AFTER Current Selections Test Type Variable 1 Mi Wilcoxon Variable 2 Sign McNemar Options Y Reset Paste Cancel C ok Note that in the results SPSS organizes the variables alphabetically so calculates the difference from Afterto Before Therefore the Positive Ranks are have a much greater sum than the negative ones Here positive means that the rates of tooth decay were higher before treatment than after Ranks Sum of Mean Rank Hanks AFTER BEFORE Negative Ranks Positive Ranks Ties Total a AFTER BEFORE b AFTER BEFORE c AFTER BEFORE 32 Guide to SPSS Barnard College Biological Sciences Looking at the test statistic summary we see that this difference is significant p 0 006 Test Statistics b AFTER BEFORE Z 2 767 a Asymp Sig 2 tailed 006 a Based on negative ranks b Wilcoxon Signed Ranks Test 33 Guide to SPSS Barnard College Biological Sciences Testing associations between continuous variables Correlation To what extent are tw
60. lows you to manage your data in the Data Editor in several ways The tools in the Data menu of the Data Editor allow you to change how your data are structured Restructure sort your whole data set according to one column Sort Cases or even just choose certain cases to use for a particular analysis Select Cases Much more detail is available in the SPSS Tutorial Help gt Tutorial gt Using the Data Editor Model Output SPSS generally produces more output than you need For lab reports and even for manuscripts for publication the reader only needs to see a small fraction of the output The goal of producing tables and graphs is to summarize your data and analyses in a clear concise fashion Therefore you should never simply cut and paste output from the SPSS Viewer into your report without cleaning it up This section demonstrates exemplary output for the four major types of analyses you will do Descriptive statistics t tests regression and ANOVA 3 Guide to SPSS Barnard College Biological Sciences All of the analyses used to produce the following output are described step by step in the relevant sections of this guide Descriptive Statistics Descriptive statistics are primarily used for exploratory tasks and can provide substantial information on central tendency the degree of variability and normality of the dataset among other uses For these purposes histograms are an efficient means of communication Fo
61. lumns in the Data View double click the grey heading square at the top of each column which will be named var until you change them When you do this the Data Editor will switch to the Variable View now each variable is in one row not column Enter the name in the first column You can also add a label to each variable giving a longer explanation of what the data are see Fine tuning the data for more on this Guide to SPSS Barnard College Biological Sciences A e l IQ Brain sav SPSS Data Editor __ B AS 5 EH N A JE ees E aja E Ti Ca a Numeric Full scale ICQ None Opening an existing SPSS file If you have already saved your work see below or are sharing a file with a partner you can open the existing file in two ways Either choose the file when first opening SPSS by choosing Open an existing data source or while already in SPSS go to File Open Data and choose the appropriate file Saving your work As mentioned above SPSS works with different windows for different tasks you will use the Data Editor to manage your data and the SPSS Viewer to examine the results of analyses and create graphs much more on this below So you also need to save each window separately This will be clear when you go to File gt Save in either window the first time you save each window you will be asked to name the file and choose where to save it The file extension the letters at the end o
62. lustered ait Stacked Data in Chart Are f Summaries for groups of cases C Summaries of separate variables oo Values of individual cases m C3 Cancel 47 Guide to SPSS Barnard College Biological Sciences Place mean percentage low weight births in the Bars Represent box and tobacco use and region in either of the Category Axis and Define Clusters By boxes aa Define Clustered Bar Summaries for Groups of Cases Vaan Bars Represent ON of cases of cases Total Births TotalBirths 2 f _ Cum N _ Cum 96 Low Weight Births 1 5 kg f Other statistic e g mean Variable d E MEAN Percentage Low Weight Bii Chana tatistic Category Axis 7 4 Tobacco Use Code TobaccoUseCode Define Clusters by d Region Code RegionCode Panel by Rows Columns Template Use chart specifications from Reset Paste Cancel OK In the Options dialog choose error bars representing 1 standard error of the mean The resulting graph shows the data in an alternative format This shows that the variability in the Unknown category is quite large within each region except for the West where the mean and the variability are low We now might wonder why the data are so different for the western region and might begin to suspect that this represents a systematic difference in the way the data were collected not necessarily a truly differe
63. nly one dependent variable measured in subjects at only two time points 60 Guide to SPSS Barnard College Biological Sciences Graphing Looking at tables of means P values and the like may become interesting as you learn more about statistics but most people want to see a graph of the data first Graphs can be information rich was to present your results to the reader in an instantly salient way they can also be deceptive or confusing when poorly done This section briefly introduces three common graph types Bar charts Additional Topics Formatting Chart objects Whenever comparing groups of cases by some single continuous variable bar charts are preferred This is true for cases where you did a t test to compare two groups or where you did an ANOVA to compare three or more groups Below we will use the t test example of leaf gas exchange in two months However you can follow the exact same procedure for a multiple group comparison ANOVA First go to Graphs gt Bar In the Bar Charts window which come up choose Simple then click Define Leave the Data in Chart Area as Summaries for groups of cases Bar Charts E Simple ail Clustered ait Stacked Data in Chart Are f Summaries for groups of cases C Summaries of separate variables oo Values of individual cases C3 In the Define Simple Bar window click the Other statistic e g mean button and move Photosyn the continuous
64. nt pattern of tobacco use in the western states 48 Guide to SPSS Barnard College Biological Sciences Region Code BB Northeast Bl Midwest South B west MJ UJ e e Mean Percentage Low Weight Births z e 0 0 Yes NO Unknown Tobacco Use Code Error bars 1 00 SE Finally we can show these data in panels using the Interactive graphing feature described in detail the Graphing section Unknown Martell Northeast South Northeast South Northeast South Midwest West Midwest West Midwest West Region Code Region Code Region Code pu e ha e D fh idol Low Weight Births e e For another example of repeated measures ANOVA following Portney amp Watkins see below 49 Guide to SPSS Barnard College Biological Sciences Comparing multiple groups Nonparametric One Way Kruksal Wallis As before when the assumptions of normality are not met by the data the typical parametric tests lose power In such situations non parametric tests are not only more justifiable on theoretical grounds but are more likely to identify the underlying factors structuring the data In comparing the means of multiple groups the Kruksal Wallis test is the analog of a one way ANOVA It is also called a distribution free ANOVA since it is free of any assumptions about how the data are distributed Devore 2004 This test is a variation of the Mann Whitey U test for two groups where all the da
65. o variables related We examined this question for categorical variables using chi square tests previously and now address continuous variables These tests examine how two variables change together without addressing questions of causality Parametric Pearson correlation coefficient Also known as the Pearson product moment correlation or r this statistic is the standard measure of association between two independent normally distributed variables We will look at how to use this test using data on bird diversity surveys in oak forests in California Open Birds xls from the Data Appendix and go to Analyze gt Correlate gt Bivariate Here the tool is called Bivariate but in fact it is possible to put in more than two variables Place the species richness and population density variables in the Variables box Here we will look at the strength of association between these two measures of bird communities without asking whether one causes the other Leave Pearson checked and click OK ean Bivariate Correlations Elevation Variables Profile Area ProfileAre No Species No Specie 4 Height Total density Totalden Half height Halfheight Latitude xa Longitude Correlation Coefficients v Pearson f Kendall s tau b f Spearman Test of Significance 9 Two tailed C One tailed W Flag significant correlations Options 7 Reset Paste Cancel amp E e i n o T Et P li E T
66. of measurement and the interaction between day and dose are highly significant explanatory factors of the termite numbers This differs from the MANOVA results and since this is a more powerful test we should focus just on the repeated measures The tree bark extract does have an effective anti termite compound Finally examine the profile plot This immediately explains the results the higher dose of tree bark extract led to significantly lower termite suvival 59 Guide to SPSS Barnard College Biological Sciences Estimated Marginal Means of day 25 Dose mg 5 10 20 9 E 15 bw A a to m 1 aa a eee 3 af 0 1 2 3 4 5 6 7 8 9 10 11 1 13 Day of study Other options In the Repeated Measures dialog box if you have multiple explanatory factors you can choose which interactions to include in the model using the Model option This dialog also gives you the option to choose which type of sums of squares to use This is a complex topic but essentially if the cell frequencies in of the between subject factors are unbalanced i e the values between the different treatments are unequal Type IV sums of squares is recommended Additinonally there are other procedures which can accomplish appropriate analysis e Linear Mixed Models When you have only one dependent variable this procedure has more options for modeling the within subject effects e Paired t test When o
67. on causes changes in the rates of tooth decay Open the data file Fluoride xls in SPSS to see what this looks like Note that what requires the investigator to use a paired t test and not a typical independent samples t test is that the same subjects were used more than once For example a given city may have had particularly low tooth decay rates to start with so it is important to look at the changes for that particular city not the before and after groups as a whole Using a paired t test allows the investigator to identify the effects of the treatments in spite of effects unique to certain individuals To begin you would place the seasons in separate columns and each row must have both measurements for a single individual test subject Because you have two columns that are different measurements of one dependent variable this is rather different from a typical t test For a typical t test a dependent variable is placed in its own column and the groups or treatments here before and after would be specified in a categorical column titled Treatment To conduct this test go to Analyze Compare Means Paired samples T Test In the dialog box select both Before and After then click the arrow to move them over to the right side as shown below Then click OK eA Paired Samples T Test 4 ID Paired Variables 4b BEFORE BEFORE AFTER 4 AFTER Current Selections Variable 1 Options Variable 2 E Reset
68. onferroni Tukey Type I Type Il Error Ratio 100 C Sidak f Tukey s b Dunnett Scheffe Duncan R E G W F C Hochberg s GT2 Control Category Last f R E G W Q f Gabriel Test 2 sided Q lt Control gt Control Equal Variances Not Assumed f Games Howell f Tamhane s T2 f Dunnett s T3 f Dunnett s C Led D Cancel C Then in the Plots dialog choose a RegionCode x TobaccoUseCode profile plot as below Click Add after adding the variables to the axis and lines boxes 45 Guide to SPSS Barnard College Biological Sciences Univariate Profile Plots Factors Horizontal Axis TobaccoUseCode 4 RegionCode RegionCode Separate Lines Separate Plots Li Add Change Remove _ Fe S rs 7 Cancel ea aM The output lists each factor listed and then reports the interactions between the factors An interaction is labeled with an between the factors whose interaction is being tested Here both of the main factors are by themselves highly significant but the interaction is not although see the note below This means that tobacco use does not affect birth weight differently in one region than in the others Tests of Between Subjects Effects Dependent Variable Percentage Low Weight Births Source Squares Mean Square Corrected Model 20 808 a 1 892 8 041 Intercept 286 783 1 286 783 1219 117 TobaccoUseCode 8 783 4 391
69. r any graph or table including histograms there are several intuitive guidelines to follow Number each figure and table and refer to it in the text by its number e g Figure 1 demonstrates that female heights approximate a normal distribution Axes and headings should be labeled clearly and large enough to read easily including units Figures should always be accompanied by a caption beneath Avoid unnecessary information leaving the SPSS default mean and standard deviation on these graphs would be of minimal assistance as the graph illustrates these points Dont state illustrate If an aspect of a graph is notable make it obvious Again note that to achieve such clear output for your assignments it may be necessary to eliminate most of the output that SPSS produces Most of the same guidelines apply to tables used to summarize descriptive statistics One difference is that tables should have captions above rather than below SPSS provides numerous table formats that make attractive tables easy to make Usually however they need to be edited in Word after export from SPSS or customized in SPSS as they tend to contain extraneous information To adjust the settings for a table first double click the table To format point to Format gt Table Properties To use a pre packaged SPSS format point to Format gt TableLooks 4 Guide to SPSS Barnard College Biological Sciences Table
70. rred way to do this and software programs like SPSS make performing these tests much easier SPSS is a powerful program which provides many ways to rapidly examine data and test scientific hunches SPSS can produce basic descriptive statistics such as averages and frequencies as well as advanced tests such as time series analysis and multivariate analysis The program also is capable of producing high quality graphs and tables Knowing how to make the program work for you now will make future work in independent research projects and beyond much easier and more sophisticated What this guide is Guide to SPSS Barnard College Biological Sciences This document is a quick reference to SPSS for biology students at Barnard College The focus is on using the program as well as laying the foundation for the statistical concepts which will be addressed How to use this guide Much of the information in this guide is contained in the help files and tutorial which are in the SPSS program We strongly recommend that you at least glance at the tutorial which shows you how to do all the essential tasks in SPSS You can find it in the Help menu under Tutorial Throughout this document we will simply write for example Help Tutorial to tell you where to find a certain action or file the first name will always be a selection from the menu bar at the top of the screen The core content for how to do a given statistical test is given in each
71. rtheast and then compare its mean to the means of the three other regions Notice for example that the difference between Northeast and Midwest is the same as the difference between the Midwest and Northeast only with the sign reversed Multiple Comparisons Dependent Variable Percentage Low Weight Births Tukey HSD 95 Confidence Mean Interval Difference I I Region Code J Region Code J Std Error Bound Northeast Midwest 0 227 0 168 0 534 0 213 0 668 Sout 0 050 0 168 0 991 0 391 0 491 MESI 781 0 1068 0 000 0 904 1 223 ERE perigee 0 227 0 168 0534 0 668 0 213 South 0 178 0 168 0 718 0 618 0 263 West 554 0 168 0 008 0 1141 0 995 pou Boe 0 050 0 168 0 991 0 491 0 391 MINES 0 178 0 168 0 718 0 263 0 618 MED 731 0 168 0 000 0 291 1 173 est poli 781 0 168 0 000 1 223 0 341 Midwest 554 0 168 0 008 0 995 0 114 Soul 731 0 168 0 000 1173 0 291 The mean difference is significant at the 05 level Pay attention to is the asterix mark next to the mean difference If it is there then we know that this difference is significant Here the west has significantly lower rates of low weight births than all three other regions and they are not significantly different from each other This quantifies our hunch from the bar graph Why this difference comes about would require further study Note If your table has cells filled with
72. s graph would be Vegetation density determines density of breeding bird populations in California woodlands R 0 225 p 0 008 1 Guide to SPSS Barnard College Biological Sciences Finer Points SPSS is a powerful program with many features The best way to explore the capabilities of the program is to take advantage of the tutorial and help files in the program itself and learn about new features as you find a use for them Below are a few selected features that you may find useful as you move beyond the initial use of SPSS Fine tuning the data Data presentation When sharing your data with others it can be helpful to make your variable names easier to understand SPSS has a two important ways of doing this First in the Variable View of the Data Editor you can specify not only the variable name but also a Label A label can be as long as 255 characters so you can input the units used for these data as well as comments about how they were collected and what the names mean These labels appear when you hover the mouse over the variable name in the Data View lt a Exampledata DataSet1 SPSS Data Editor File Edit View Data Transform Analyze Graphs Utilities Add ons Window Help amp D 4 F Ed yQ Name Type E KEIN Label Vales Values string F Female Numeric iE a Height in cm None Numeric 13 12 eR MMS E None gt Data View Variable View a i SP55 Processor is ready In ad
73. s where you store your SPSS files and data To save this graph as an image file exit the Chart Editor Right click or Ctrl click on a Mac the chart and choose Export Graph o Solano What s This E agnis Cut o Copy on Topanga Copy objects Hastings PY J d KitCarson a o Paste After ICny2 SPSS Chart Object P BellCyn1 Ld errGTen e Sonoma o CALiveOak Redding Goshen e Sutter o oo Carmichael o SnGabesOP Chico Sn Capistran e Miramonte Cleveland 0 SnGabesQJ Total density Choose Charts Only in the Export menu name your chart and choose a file type JPEG is a safe format for any operating system and is reasonably good quality 68 Guide to SPSS Barnard College Biological Sciences Output Document Output Document No Charts E Export File File Prefix Users danflynn Documents Output Browse Export What Export Format l All Visible Charts JPEG File JPG f Selected Charts z Fm Cancel pann Adding a regression line In the regression example we looked at how the density of vegetation Profile Area predicts breeding bird population density Open up the Simple Scatterplot dialog again placing the appropriate variables in This time check the box Use chart specifications from and click the File button to navigate to the template you stored before for the correlations
74. served differs from the expected and then examine the result of the chi square test This shows that there is no significant association between kidney disease type and sex p 0 255 highlighted in blue below DiseaseType Sex Crosstabulation S e o Male Glomerulo nephritis DiseaseType Acute nephritis Polycistic kidney disease Other Total Female oun 6 12 18 Expected Count 47 13 3 18 0 within DiseaseType 33 3 66 7 100 0 Std Residual 6 3 Count 4 20 24 Expected Count 6 3 17 7 24 0 within DiseaseType 16 7 83 3 100 0 Std Residual 9 6 Count 4 4 8 Expected Count 21 59 8 0 within DiseaseType 50 0 50 0 100 0 Std Residual 1 3 8 Count 6 20 26 Expected Count 6 8 19 2 26 0 within Disease Type 23 1 76 9 100 0 22 Guide to SPSS Barnard College Biological Sciences Std Residual 3 2 Total Count 20 56 76 Expected Count 20 0 56 0 76 0 within DiseaseType 26 3 73 796 100 0 Chi Square Tests Asymp Sig 2 sided 255 Association 093 1 818 N of Valid Cases a 2 cells 25 0 have expected count less than 5 The minimum expected count is 2 11 Note Chi square tests are also found in Analyze gt Nonparametric Tests gt Chi square This method is easier to use for simpler tests such as testing observed data against a uniform distribution An additional question we could ask is whether patient age and disease type are as
75. ses variable by variable _ Display groups defined by missing values v Display chart with case labels 5 Cancel Continue Double click the resulting chart to open the Chart Editor Select the background double click to open the Properties window and change it to white Then select the points and change them to dark blue Finally select the text and change the font size from Automatic to 10 point If you want to make more charts that look like this one as you might if you are preparing a manuscript or a poster you can save this set of formats as a template After you have made all the formatting changes you want to save go to File Save Chart Template 66 Guide to SPSS Barnard College Biological Sciences SPSS 13 Chart Editor SIE Edit View Options Elements Transform Help eoo Save Chart Template Oo mx Y x1 Apply Chart Template L KUM E 4 Export Chart XML v a O mi ow dE 9 50 o Solano CuyamacaOP 40 e Descanso ColdCk o a e o LaGiganta Thurston Topanga s a A Hastings Big Ck Hayward KitCarson u a ee CuyamacaLO o Chiquita BellCny2 30 StaBarbara CuyamacaOC e o eo MadR Lompoc o o Jasper e e Caspers Sonoma CCCo e BellCyni o SierrGlen e e CALiveOak e Redding Goshen o Sutter e o eo Carmichael 20 e SnGabesOP Chico SnJCapistran e Miramonte e Cleveland o SnGabesOJ Total density File IH 500 W 6
76. sna sess sese sias assa sess sese rasa asas sss as sanas 27 Dc 27 Pared IIG O E 29 Comparing two groups Non parametric ossis esee nennen nenas nnns n nans nnns 30 Two independent groups Mann Whitney U ccccecccseceseeeceeeeeeeeeeeeseeeseeeseeeseeeneeeneess 30 Paired groups Wilcoxon Signed Rank TeSt ccccceccceeeeeeeeseeeseeeeeeeeeeeeseeeseeeeeeeseeeseeess 32 Testing associations between continuous variables sessi eese enne nnns 34 Bo ior PM PRD RR ORE NNT PRE 34 Parametric Pearson correlation COePfICIONL ccccccecceeeceeeneeeeeeeeeeseeeaeeseeeseeeseeseeeseeeneees 34 Nonparametric Spearman s rho ssssssssssssessssesee esee nnne nennen nennen sensn nna nnns 35 POOS SO Ree eR eee ere Tere eee Cree erat ee ee ny ee 37 Comparing Multiple Groups Parametric ccsccccsseeceseecneeeneesenseeenseeeneecensesensesenseseneesens 40 One Way Analysis of Variance ANOVA sissessssesssses seen nana nana nna a nns 40 Additional Topics Post hoc tests Multiple comparison test susssse 40 Guide to SPSS Barnard College Biological Sciences Comparing multiple groups Nonparametric eseeeseeeeeeee eene een 50 One Way Kruksal Wallisi 1 da i ao dod do od a Mon 50 TO VV AV TAC
77. sociated Currently age is a continuous variable and we could analyze it as such But a simpler approach would be to convert Transform in SPSS this variable as categorical and take advantage of the robust and easy to interpret chi square test In the menu bar choose Transform Visual Bander in the resulting window choose Age as the variable to band Here banding refers to dividing a continuous variable into categories 23 Guide to SPSS Barnard College Biological Sciences enon Visual Bander Select the variables whose values will be grouped into bands Data will be scanned when you click Continue The Variables list below contains all numeric ordinal and scale variables Variables Variables to Band Patient Patien MP AgelAge Time Time l Limit number of cases scanned to In the Visual Bander window click Age and name the new variable to be created as AgeBand Select Excluded for Upper Endpoints and choose Make Cutpoints Visual Bander Scanned Variable List Name Label Current Variable Age Banded Variable AgeBand Age Banded Minimum 10 Nonmissing Values Maximum 69 i I E 10 00 14 54 19 08 23 62 28 15 32 69 37 23 41 77 46 31 50 85 55 38 59 92 64 46 69 00 73 54 Enter interval cutpoints or click Make Cutpoints for automatic intervals jae A cutpoint value of 10 for example defines an interval starting above Grid the previous interval an
78. t and the example output in Portney amp Watkins is that SPSS by default runs a multivariate analysis of variance MANOVA We can ignore this for now Mauchly s Test of Sphericity tells us which version of the ANOVA we should use These data do not violate the assumption of sphericity p 0 239 so we can focus on the Sphericity Assumed results This table looks very similar to the table on p 445 9 Guide to SPSS Measure strength Within Subjects Effect forearm Mauchly s W Approx Chi Square Barnard College Biological Sciences Mauchly s Test of Sphericity b Epsilon a Greenhouse Geisser Huynh Feldt Tests the null hypothesis that the error covariance matrix of the orthonormalized transformed dependent variables is proportional to an identity matrix a May be used to adjust the degrees of freedom for the averaged tests of significance Corrected tests are displayed in the Tests of Within Subjects Effects table b Design Intercept Within Subjects Design forearm Now we can see some large differences in the presentation of results What Portney amp Watkins present in four lines of one table SPSS presents over two tables in a total of 10 lines The story that we are looking for is in the first line of the first table the within subjects effect of forearm position on elbow flexor strength There is a highly significant relationship here p lt 0 001 The other two
79. t to Analyze gt General Linear Model Univariate As demonstrated below put your dependent variable in the Dependent Variable space in this case Percentage Low Weight Births Your independent variables are called slightly different names from what we are used to Here a Fixed Factor is an independent variable which is set by the experimenter in some way like a drug concentration or species grouping You should only consider a predictor as fixed factor if all of the possible values of that variable are represented in the data A Random Factor is a predictor variable which was not set by the experimenter and whose values represent a sample from a larger population Both geographic region and tobacco use can be considered fixed factors for this analysis 44 Guide to SPSS eoo 4b Year Total Births TotalBirths Barnard College Biological Sciences Univariate Dependent Variable Percentage Low Weight Birt 4b Low Weight Births 1 5 Fixed Factor s M Tobacco Use Code Tobaccc d Region Code RegionCode Random Factor s l Covariate s WLS Weight Come rs B Before continuing set several of the options First select Post Hoc and choose Tukey for both variables Univariate Post Hoc Multiple Comparisons for Observed Means Factor s Post Hoc Tests for TobaccoUseCode RegionCode 77 Equal Variances Assumed J LSD S N K f Waller Duncan C B
80. ta are ranked and then the distribution of ranks is compared against a uniform distribution using a chi square test Return to the low birth weight data set Natality xls or Natality sav if you saved it as an SPSS file In this example we examined how regions of the US differ with respect to the percentage of children born at low birth weights using data from the CDC We proceeded with a one way ANOVA but if we test the assumptions of normality using the Explore tool we find that the distributions are non normal and the variances are unequal not shown Thus a non parametric test is the conservative option Tests of Normality o LL Keimogorov Smimov a Shapiro Wilk 0 Region Code Statistic Statistic 1 sa Percentage Low Northeast Weight Births Midwest South West a Lilliefors Significance Correction Go to Analyze gt Nonparametric Tests gt K Independent Samples Here K refers to K number of groups a naming convention in statistics for a number of categories in a factor variable 50 Guide to SPSS Barnard College Biological Sciences eoo Tests for Several Independent Samples Tobacco Use Code Tob Test Variable List 4b Year 4b Percentage Low Weight Birt Total Births TotalBirths 4 Low Weight Births 1 5 Grouping Variable RegionCode Define Range Test Type v Kruskal Wallis H Median Options te l E
81. the data is highly skewed violating the assumption of normality you should not use the Pearson correlation coefficient In this same data set two environmental variables are highly non normally distributed Elevation and Latitude You can check this using the Explore tool 35 Guide to SPSS Barnard College Biological Sciences Here we are not asking any question about the biology of this system but simply whether the data collection process tended to choose sites where elevation and latitude are correlated In a well designed study these should be independent Open the Bivariate Correlations dialog box again remember you can do it quickly with the Dialog Recall icon and place these two in the Variables box Choose Spearman and unselect Pearson eer Bivariate Correlations Profile Area ProfileAre Variables 4b Height Elevation Half height Halfheight Latitude Longitude No Species No Specie qi Total density Totalden Correlation Coefficients Pearson Kendall s tau b v Spearman Test of Significance f amp 9 Two tailed C One tailed M Flag significant correlations Options 77 Reset Paste Cancel Surprisingly and unfortunately for these researchers there is a strong negative significant relationship between elevation and latitude rs 0 615 p lt 0 001 This means that any general conclusions drawn from this study need to be tempered by the knowled
82. to SPSS Barnard College Biological Sciences Repeated measures ANOVA Additional Topic Sphericity Also known as within subjects design these tests are used when each subject is measured multiple times Different treatments may applied to each subject over time or to groups of subjects in a uniform way Similar to paired ttests these tests increase the power of the analysis by accounting for the idiosyncratic differences between subjects The following conditions make a study appropriate for repeated measures ANOVA e Several measurements taken on each subject over time e Distinct treatments applied either to each subject at different times orto groups of subjects at a single time or throughout the study e More than two time points e One or more continuous response variables Questions which might be suitable for this type of analysis include Does an experimental diet lead to better test performance of two groups of study animals Which medium leads to the most proliferation in several cell lines over time Do subjects improve their balance over time when given a sequence of experimental treatments Here we will use a real data set to ask whether different concentrations of a tree bark extract lead to different survival rates of termites These data can be used to see if the tree bark compound would be suitable for development as an anti termite treatment Open Termites xls see the Data Appendix This study has a mixed design or
83. to asses the relationship between two categorical variables use a chi square y test A chi square test is a widely used non parametric test which examines if the frequency distribution of the observed data matches that of either the expected data or another known distribution A typical question for this type of test is whether there is an association between two categorical variables Open up the file Kidney xls in SPSS By default when this file is read in all variables are assumed to be scale or continuous data In fact several of them are categorical variables and you must manually change them in the Variable View tab of the Data Editor See the Data Appendix for details This following process is an example of how to manipulate data variables eoe Untitled SPSS Data Editor 130 Ej oo Ara Na El x Eg 9 0 Values Columns Measure Decimals Missing 1 Patient Numeric Right Scale 2 Time Numeric 11 None None Right Scale 3 Status Numeric 11 None None Right Scale Right Scale Right Nominal Right ale Right Scale None None F y rh one 4 Age Numeric du 5 ISex Numeric 11 6 DiseaseTyp Numeric 11 of ol ol ol ol o l ool ool ool col ool oo 7 Frailty Numeric 11 None None SPSS Processor is readv First change the Measure of the Sex variable to Nominal Then click on the Values cell for this variable and enter
84. tograms in the window that pops up Check With normal curve Frequencies Charts Chart Type y None y Bar charts y _ Pie charts Histograms M With normal curve Chart Values Frequencies Percentages ae Cancel Select Continue and OK and then examine the results in the SPSS Viewer 14 Guide to SPSS Barnard College Biological Sciences Histogram Frequency Mean 77 82 Std Dev 16 5272 M 22742 qo Bn Bn 10 120 Interval Again notice that the red arrow in the left pane indicates where in the output you are looking Note that the black line representing a normal distribution does not represent the data well at all This has important consequences for how we choose to proceed Parametric vs Non parametric statistics otatistical tests are used to analyze some aspect of a sample In practice we want the results of the test to be generalizable to the population from which that sample was drawn in other words we want the sample to represent the parameters of the population When we know that the sample meets this requirement we can use parametric statistics These are the first choice for a researcher The use of parametric statistics requires that the sample data e Be normally distributed e Have homogeneity of variance e Becontinuous 15 Guide to SPSS Barnard College Biological Sciences These assumptions are explained below If the sample dat
85. variable into the Variable box This will make the bars represent mean values for each group Select Month the categorical variable and place it in the Category Axis box 61 Guide to SPSS Barnard College Biological Sciences eoo Define Simple Bar Summaries for Groups of Cases Al Species Bars Represent 4 T leaf ON of cases 396 of cases db RH Cum N C Cum Other statistic e g mean Variable P 4 MEAN Photosyn Change Statistic Category Axis d Month Panel by Rows Columns t variables no empty columns Template Use chart specifications from Titles Options Reset Paste Cancel C ok Finally click Options and select Display error bars Error bars show how much variation there is around the mean and are essential to report you should always be suspicious of bar charts without error bars Choose Standard error with a multiplier of 1 Click Continue and OK to produce the graph r Options Missing Values Exclude cases listwise J Exclude cases variable by variable L Display groups defined by missing values Display chart with case labels v Display error bars Error Bars Represent Confidence intervals Level 4 95 f Standard error Multiplier 1 Standard deviation Multiplier 2 as Cancel The resulting graph reveals why we found that the gas exchange rates
86. wo tabs on the bottom Data View and Variable View Data View is typically the working view and shows the data just as an Excel worksheet does Guide to SPSS Barnard College Biological Sciences Oe e Birds sav SPSS Data Editor OAs OC ms d xlgS ES ES s e Rm pp m m Gpems wee c o8 0 em eme ooo mmm IR em Wee o m a ee Qoo qmm poe m e Wee uo op o m Temm Wwee og o m9 a esse are Cu 5o peses em poe re panere gogo oaee poe po eem y SPSS Processor i reariv gt For example in the above window SITE is defined to be what SPSS calls a string or simply a set of characters with no numerical value All the others are and defined to be a continuous numerical variable with two decimal points shown Strings are called a categorical variables in contrast to continuous numeric variables more on this in Fine tuning the data It is not essential to use the Variable View and we will mostly ignore it for now SPSS Viewer All output from statistical analyses and graphs is printed to the SPSS Viewer window This window is useful because it is a single place to find all the work that you have done so if you try something new and it doesn t work out you can easily go back and see what your previous work was 0060 Output 1 SPSS Viewer sie a Er Output E Descriptives LE Title i3 Notes Li Descriptive Statisti iil Frequencies LE Title A Notes L Statistics E Frequency T
87. ype of analysis you will want to know how to do is Analysis of Variance or just ANOVA Just as t tests are useful for asking whether the means of two groups are different ANOVA can answer the question of whether the means of many groups differ from each other Biologists find these useful because we often design experiments with many treatments like different drugs and then want to know whether some variable like proliferation of cancer cells is different between the groups In this example we will consider a data set of low birth weight births from the Center for Disease Control which are categorized by region and the tobacco use status of the mother This is clearly not a manipulative experiment but we can still apply statistical tools using the observed data Open Natality xls and add the region names and tobacco use code names in the Values boxes in the Variable View Save this as Natality sav To begin the ANOVA go to Analyze gt Compare Means gt One way ANOVA Note he explanatory variable Region in this case has to be in numeric not string format for SPSS to run an ANOVA This means you may need to go into the Variable View as described below and make sure that the variable type is numeric Use values like 1 2 3 for the different groups You can then create labels in the Values box to make the results easier to interpret Also note that the Explore tool should be used to examine the assumptions of normality and

Student Guide to SPSS

Contents

Download Pdf Manuals

Related Search

Related Contents