Home

STATPOINT, LLC

image

Contents

1. tn 33 2 2 2 Reading Data from an Excel ASCII XML or Other External Data File 34 2 2 3 Transferring Data Using Copy and Paste osuianeisaspaveccie bcati bandeau ip rei abe 36 2 24 Querying ag ODBC Database sisse nissan tid aida natn Gra ipud ri E rE Ge dare p 36 2 3 Manipulating Data mcm 37 2 5 1 Copying and Pasting DOabasuosnebarbaibasru d beds Fes ac a erra d ai PS Eu 37 2 3 2 Creating New Variables from Existing Coliimins iscosessisssssorseivereesndsonvneniesvernesvavorvedsnny 38 20 Trans ob Data isasi inia a mee Ee ae i 41 EAR rid MEE 44 E NUUAM ME 46 2 3 6 Combining Multiple Cobb need stai ppl bn ran rti inn nte beri bn eb er De 47 24 Generating M M 50 24 1 Generating Patt rned DObiiisnadeeiuniriiterribisincigad duce UR E epe rcr AREA rU E RA Uu cR 50 24 2 Generating Random Nuibetessusisbauteneicbacetishentaschat tcichenaic iutetebdbn or ui prbebeuiE us 53 2 5 DataBook Properties P M 54 Running Statistical Analyses s ssseesssseeessssoesssseeesssseeesssseeresseoeessseeesssseerssssoeessseeesseeesessee 57 CRMBricMierMPiier door Mmm 59 S Analysis VA LICL m tng aba 61 3 2 1 Tint Dialog Tih T 63 iti Table of Contents B22 Tables DUU ON nrnna EEE 63 32 9 Graphs BUTON M ndn nnaou ia eais iieii daaa 64 3 2 4 Save Results EB Aisa soc dsvnasiisndh iiiad
2. e D Figure 4 25 Scatterplot Smoothing Dialog Box Smoothing a scatterplot is done by selecting a set of locations along the X axis and at each location plotting a weighted average of the specified fraction of the points that are closest to that location One of the best methods for smoothing is called LOWESS LOcally WEighted Scatterplot Smoothing usually with a smoothing fraction between 40 and 60 The result of smoothing the Matrix Plot of the automobile data is shown below Figure 4 26 Smoothed Matrix Plot using Lowess with a 50 Smoothing Fraction The smooth helps illustrate the type of relationships between the variables 96 Graphs 4 5 Identifying Points To display the row number and coordinates corresponding to any point on a graph you may hold the mouse button down on the point A small box will be displayed in the upper right corner of the plot showing the row number and coordinates of the point E Simple Regression MPG City vs Weight A ES Plot of Fitted Model mm MPG City exp 2 1328 2799 07 Weight ou 55 To ps 35 MPG City CINE SP NN EE 15 1600 2100 2600 3100 3600 Weight 4100 4600 Figure 4 27 Displaying Information about a Selected Point At the same time the row number of the point will be placed into the Row field on the analysis toolbar rs EI E zi EM s SEL xX Label d Row 42 d Figure 4 28 Analy
3. In the latest edition of Statistics for Experimenters by Box Hunter and Hunter John Wiley and Sons 2005 the authors present a new display designed to show the results of an ANOVA in graphical format Their Graphical ANOVA 1s displayed by default in the lower right pane Graphical ANOVA Groups P 0 0000 Residuals Figure 12 8 Graphical ANOVA 189 Comparing More than Two Samples Along the bottom of the plot is a dot diagram of the model residuals In a oneway ANOVA the residuals are equal to the difference between each observation and the mean of all observations in its group In the current example the observed vatiability in the residuals is indicative of the natural variability amongst widgets made of the same material Plotted above the central line are scaled deviations of the group means from the overall mean of all 7 48 observations These group deviations are scaled so that their variability can be compared to that of the residuals Any groups whose points are too far apart to have easily come from a distribution with spread similar to that of the residuals likely correspond to different populations In Figure 12 8 group A appears to be well separated from the other groups Separation of the other three means is less clear A more formal comparison of the four sample means is described in the next section 12 3 Comparing Means If the P Value in the ANOVA table is small then the sample means
4. Figure 2 23 Rowwise Statistics Analysis Window 48 Data Management The Toza line in the Summary Statistics pane shows statistics for the combined data If you now press the Save Results button on the analysis toolbar you can save the combined sample back into a single column of a datasheet Save Results Options X m Save Target Variables OK v Data Column DATACOL Cancel MV Code Column copEcoL dii Hel Counts COUNT Means MEAN y re Standard Deviations SIGMA CA Minimums MINIMUM f B Bie Maximums MAXIMUM CD Ranges RANGE CE CF C amp G CH C f n Autosave V Save comments Figure 2 24 Rowwise Statistics Save Results Dialog Box Each result that you check will be saved in a column with name equal to the corresponding Target Variable Saving both the Data Column and Code Column creates the following data structure 49 Data Management lt untitled gt DATACOL 5 e ee h ee ee o o sou 0 IN 1 1 1 1 2 2 2 2 3 3 3 3 n 4 B C D E F 8 H 1 17 lal Figure 2 25 New Columns Created by Rowwise Statistics The 12 data values are now in a single column for use in other statistical procedures 2 4 Generating Data STATGRAPHICS Centurion has the ability to generate data and place it in columns of a datasheet This section describes two important examples 1 Generating data with
5. 10 1 Running the One Variable Analysis Procedure To analyze the body temperature data first load the bodytemp sf3 file into a datasheet To accomplish this 1 Select File Open Open Data Source from the main menu On the Open Data Source dialog box indicate that you wish to open a SUATGRAPHICS Data File 3 Select bodytemp sf3 from the list of files on the Open Data File dialog box The data should appear as shown below ess bodytemp sf3 Temperature Heart Rate beats per de quee minute pea ue 98 Male 98 Fenale 91 98 91 99 98 98 98 97 4 98 8 4 B C DS Figure 10 1 Datasheet with Body Temperature Data 146 One Sample Analysis The body temperatures are in the leftmost column measured in degrees Fahrenheit The One Variable Analysis procedure can be invoked from the main menu as follows 1 If using the Classic menu select Describe Numeric Data One Variable Analysis 2 If using the Six Sigma menu select Analyze Variable Data One Variable Analysis On the data input dialog box indicate the column to be analyzed One Variable Analysis X Data gt Temperature Select v Sort column names Cancel Delete Transform Help Figure 10 2 One Variable Analysis Data Input Dialog Box Leave the Select field blank to analyze all 130 rows Press OK An analysis window will appear with four panes 147 One Sample Analysis
6. Axis ves Profile Title Weight in pounds Vertical From 1500 0 em ioo 0 By n Skip Rotate Axis Labels jo NoPower Log MV Hold Title Fonts Tickmark Fonts Cancel Apply Help Figure 4 12 X Axis Tab on Graphics Options Dialog Box There are several important fields on this dialog box 1 Tite title plotted along the axis 2 From To By and Sf sets the tickmark scaling The value in S amp 7 is used to prevent displaying certain tickmarks if they run into each other For example a value of 1 in the S amp p field would skip showing very other tickmark 3 Rotate X axis Labels changes the tickmark labels to vertical 4 No Power suppresses the display of large and small numbers using labels such as X 1000 87 Graphs 5 Log draws the axis using a base 10 logarithmic scale 6 Hold freezes the axis scaling and prevents it from changing Normally axes rescale themselves whenever the data change 7 Fonts press these buttons to change the color size or style of the title and tickmarks The output generated from the above dialog box changes is shown below Fitted S Curve Model from 93cars File MPG City exp 2 1328 2799 07 Weight D 2 2 gt mE o Qo A 1500 2000 2500 3000 3500 4000 4500 Weight in pounds Figure 4 13 Plot after Modifying the Axis Titles and Scaling 88 Graphs 4 1 7 Fill Options Some plots su
7. 50 8 0 80 Figure 16 26 Optimization Output The above table estimates that the maximum yield obtainable within the experimental region is about 88 7 grams at the factor settings shown in the rightmost column If maximization was not the goal other goals could be selected using Pane Options 277 Design of Experiments Optimize Response Options E Factor Low High Start m Type of Optimization temperature 150 180 165 Maximize Cancel flow rate Minimize concentration Help Maintain at fo agitation rate Additional Starting Points catalyst v Best Design Point All Design Points Best Vertex AllVertices SERERE Ul TT FLLLEL LULL 1111 TTT EERE Figure 16 27 Optimization Pane Options You may elect to maximize the response minimize it or maintain it at a specified value The Low and High fields on the right define the region over which the optimization will be performed You may also define multiple starting points from which to start the search for the optimal conditions For complicated response functions searching from multiple starting points can help insure that the global optimum is found 16 6 Further Experimentation If further experimentation is desirable STATGRAPHICS Centurion can help in two ways 1 If you select Augment Design from the main menu you can have additional runs added to the current experiment at new levels of the factors This w
8. One Variable Analysis Temperature One Variable Analysis Temperature Data variable Temperature degrees Scatterplot 130 values ranging from 96 3 to 100 8 The StatAdvisor This procedure is designed to summarize a single sample of data I Also included in the procedure are confidence intervals and hypott Graphical Options buttons on the analysis toolbar to access these embers Box and Whisker Plot fag 1 E 6 ee Temperature Figure 10 3 One Variable Analysis Window The top left pane indicates that the sample has 130 values ranging between 96 3 and 100 8 degrees The top right pane shows a scatterplot of the data with the points randomly scattered in the vertical direction Note that the points are densest between 98 and 99 degrees thinning out at either end This type of behavior is typical of data that are sampled from a population whose distribution has a well defined central peak The bottom panes show summaty statistics and a box and whisker plot described in the following sections 10 2 Summary Statistics The table in the bottom left pane displays several sample statistics Additional statistics can be added by maximizing that pane double click on it with your mouse and selecting Pane Options 148 One Sample Analysis Summary Statistics Options IV Average v Standard Deviation M Maximum Intersextile Range v Median v Coeff of Variation lv Range Skewness Mode Standard Er
9. 1 To print all of the tables and graphs within the analysis window press the Print button on the analysis toolbar or select Print from the Fie menu 2 To print a single table or graph click within its pane with the alternate mouse button and select print from the popup menu that is displayed 72 Running Statistical Procedures When printing the entire analysis the following dialog box will be displayed Print Analysis X Printer System Printer ALELANDS amsung Print Range Cancel All Panes Setup Visible Panes see C Al Text Panes Help C All Graphics Panes Print Quality 600 dpi Copied 12H Print to File All Analyses Figure 3 16 Dialog Box for Printing an Analysis Under Print Range specify the panes to be printed You may simultaneously print the output in other analysis windows by checking AX Analyses Additional options used when printing are contained on the dialog box accessible by selecting Page Setup from the Fi menu m Margins OK Top 05 in Left o1 in Cancel Bottom 5 in Right o1 in Help m Header Text Date and Time File Names Output Graphics Scaling 1 Pane Page Horizontal 100 Multiple Panes Page Vertical 50 58 Black and White Print Background IV Wide Lines Figure 3 17 Page Setup Dialog Box 73 Running Statistical Procedures On this dialog box you can 1 Specify margins for th
10. 215 Regression Analysis amp Surface Plot Options Horizontal Divisions DK um Lx Cancel Vertical Divisions 10 Help Type Contours 1 C Wire Frame From N 02 C Solid To 07 Contoured By 01 Resolution 101 C Lines C Painted Regions Contours Below I DENS C Continuous with arid Figure 13 19 Surface Plot Pane Options In the dialog box above Type has been set to Contoured and the Contour field to Continuous The final plot is shown below Fitted Model Function Em 0 02 E 0 03 Ex 0 04 E 5 A Horsepower Figure 13 20 Plot of Fitted Model The cars that use the most fuel are in the back right corner of the plot big cars with big engines 216 Regression Analysis Chapter Tutorial 5 Analyzing Attribute Data Frequency tabulation contingency fables and a Pareto analysis Each of the first four tutorials deals with variable data where observations are represented as numbers along a continuous scale This tutorial examines a set of attribute data in which each observation represents a categoty into which an attribute has been classified rather than a measurement As an example consider the data contained in the file defects sf6 A portion of that file is shown below Defect Facility Misaligned Virginia Contaminated Texas Contaminated Virginia Contaminated Texas Missing parts Texas Misaligned Virgin
11. Deviations Chi Squared Values Adjusted Residuals Figure 14 9 Pane Options Dialog Box for Crosstabulation An interesting choice for the current data is to display Row Percentages rather than Table Percentages Frequency Table for Defect by Facility Texas Virginia Row Total 4 67 92 320896 44 17 Damaged 10 6 dG 625o 13 33 6667 333390 2 50 Misaligned 28 Misaligned f8 20 28 28579 7143 9e 233396 Misshapen 0 B o 0 l 3 os oo 250 _ Missing parts 2 Missing parts 2 Ho B 666m 333 2 5000 Poorcolor 2 8 J75009e 25 0096 16 67 400096 60 00 417 Wrngsize 0 H 100009 0 0096 40 83 55 8390 4417 100 0096 Cell contents Observed frequency Percentage of row Figure 14 10 Two Way Table with Row Percentages The tabled percentage now indicates the percentage that each cell represents of its row For example 67 92 of all contaminated items were produced in Texas while 71 43 of all 226 Analyzing Attribute Data misaligned items were produced in Virginia This suggests that some defect types may occur more frequently in one facility than another a hypothesis that will be tested formally in the following section Various graphical displays are also helpful For example the barchart shows
12. Figure 1 27 One Variable Analysis Window with Added Frequency Histogram Note that the bars in the histogram extend a little farther above the peak than below it characteristic of positively skewed data If you double click on the histogram to maximize it and then press the Pane options button a dialog box is displayed with options specific to the histogram 24 Getting Started Frequency Plot Options X Number of Classes EE Cancel Lower Limit io Help Upper Limit 30000 J7 Hod r Counts Plot Type Relative Histogram Cumulative C Polygon Figure 1 28 Frequency Histogram Pane Options Dialog Box Using this box the number of bars in the histogram can be changed as well as the range that they cover If Number of Classes is set to 15 and the OK button is pressed the histogram will change to reflect the new selection Hal One Variable Analysis Per Capita Income Histogram frequency 0 15 18 21 24 Per Capita Income 30 X 1000 Figure 1 29 Frequency Histogram After Changing the Number of Classes 25 Getting Started You may also change the fill pattern and or color of the bars in the histogram by pressing the Graphics options button This displays a tabbed dialog box that allows you to change most features of the graph If you click on the Fi tab the following will be displayed Graphics Options Layout Grid Fils Top Title XAxis Y Axis Prof
13. Figure 15 4 Process Capability Analysis Window When a capability analysis is first run a normal distribution is fit to the data The Capability Plot shows a histogram of the data together with the best fitting normal distribution 239 Process Capability Analysis Process Capability for Strength LSL 190 0 Nominal 210 0 USL 230 0 N oa Normal Mean 202 809 Std Dev 6 23781 N e TTTTTTTTTTTTTTTU TTTTTTTTT c1 Cp 1 16 Pp 1 07 Cpk 0 74 Ppk 0 68 K 0 36 e gt o c o 2 oO o i uc 200 210 220 230 240 Strength Figure 15 5 Capability Plot with Normal Distribution The tall vertical lines in the plot show the location of the specification limits and the nominal value The shorter vertical lines are located at the sample mean plus and minus 3 standard deviations Particularly notable in the plot above are 1 The fitted normal distribution does not match the data very well Although the normal bell shaped cutve has the same mean and standard deviation as the data the skewness in the data cause the curve to do a poor job in matching the bars of the histogram 2 The sample mean is located at 202 8 which is considerably less than the nominal value of 210 3 Although none of the observations are less than the lower specification limit a fair amount of the lower tail of the normal distribution is below that limit 4 The lines at
14. HH MM C Date Time HH MM Time HH MM SS C Date Time HH MM SS Fixed Decimal 2 Formula Figure 2 2 Dialog Box Used to Modify Column Properties ata Management 30 Data Manag You may specify 1 31 Data Management Name from 1 to 32 characters When performing statistical analyses columns are identified using these names Each column in a datasheet must have a unique name though columns in different datasheets may have the same name Names may include any character except the following 19 symbols LI ee ae The restricted characters are those that need to be parsed when used in algebraic expressions such as 100 MPG City MPG Highway In addition names may not begin with a numeric digit Spaces are allowed in variable names Variable names are not case sensitive Comment from 0 to 64 characters providing additional information about the contents of the column Type the type of data permitted in the column The following types may be specified Type Contents Example Numeric Any valid number 3 14 Character An alphanumeric string Chevrolet Integer An integer number 105 Date Month day and year 4 30 05 Month Month and year 4 05 Quarter Quarter and year Q2 05 Time HH MM Hour and minute 3 15 Time HH MM SS Hour minute and second 3 15 53 Date Time Month day year hour and 4 30 05 3 15 HH MM minute Date Time Month day
15. HH MM SS C Date Time HH MM SS Fixed Decimal 2 Formula Figure 1 13 Dialog Box Used to Define Columns Each column in a STATGRAPHICS Centurion datasheet has a name comment and type associated with it Name Give each column a unique name containing from 1 to 32 characters These names are used by the program to identify the variables to be analyzed when a statistical procedure is selected They also serve as default labels on most graphs Names may contain any characters except those used to indicate arithmetic operations such as or Names may not however begin with a numeric digit Names are not case sensitive Spaces are permitted The program will display an error message if you try to specify an invalid name Comment Enter a comment identifying the data in the column Comments may have up to 64 characters and are optional Type Specity the type of data to be entered in the column In this case the first column containing state names must be set to Character The other columns may be left as Numeric or set to Integer or Fixed Decimal if you want to restrict the type of data that may be entered For detailed information on column types see Chapter 2 After defining each column press OK When all five columns have been defined press Cancel An empty data sheet will be displayed showing the columns you have created 14 Getting Started Population Percent Female Figure 1 1
16. ole Page 1 Next Page Prey Page First Page Last Page m Figure 6 1 The StatGallery Window The buttons along the top of the window permit you to navigate to other pages in the gallery If you want to change the number of graphs displayed on a page press the alternate mouse button and select Arrange Panes Arrangements containing up to 9 graphs may be selected for a single page StatGallery Options X Arrangement 0 K Two by Two C By Columns Cancel Top and Bottom Column Rows Left and Right 1 Two Left One Right 2 One Left Two Right 3 Three by Three de Help Ju One Position Only Figure 6 2 Alternative StatGallery Page Configurations The seven arrangements on the left each correspond to rectangular sets of rows and columns The By Columns option allows you to create an arrangement with different numbers of rows in each of 3 columns 114 Using the StatGallery You may also use the slider bars in the StatGallery window to move the panes into any arrangement you wish 6 2 Copying Graphs to the StatGallery To place a graph in the StatGallery you must first copy it to the Windows clipboard while in the analysis window where it was created For example suppose you wanted to display contour plots created in the DOE Analyze Design procedure at two different levels of a selected experimental factor The steps to be followed are 1 Configure a selected page of th
17. r a uro cys cane v ko zoo oo psima nsns 52 230 2 70 pi 28100 180 244773 647733 219 The StatAdvisor The table of unusual residuals lists all observations which have Stud ITE value Studentized residuals measure how many standard deviations ario s ert iiir zd Memes moe ao ntis e 28 Weight Studentized residual Figure 13 6 Simple Regression Analysis Window The Analysis Summary in the top left pane summarizes the fit 203 Regression Analysis Simple Regression MPG City vs Weight Dependent variable MPG City miles per gallon in city driving Independent variable Weight pounds Linear model Y a b X Coefficients LeastSquares Standard T Statistic Intercept 47 0484 1 67991 28 0064 0 0000 Slope 0 00803239 0 000536985 14 9583 0 0000 Analysis of Variance Residual 840051 91 923135 J T Total Com 290557 9 T O Correlation Coefficient 0 843139 R squared 71 0883 percent R squared adjusted for d f 70 7705 percent Standard Error of Est 3 03831 Mean absolute error 1 99274 Durbin Watson statistic 1 64586 P 0 0405 Lag 1 residual autocorrelation 0 176433 Figure 13 7 Simple Regression Analysis Summary Of the many statistics in the above table the following are the most important 1 Coefficients the estimated model coefficients The fitted model that would be used
18. select Improve Experimental Design Analysis Analyze Design The data input dialog box displays the two response variables Analyze Design X strength jd Data gt veld Select v Sort column names Cancel Delete Transform Help Figure 16 12 Analyze Design Data Input Dialog Box Separate models would be built for each The analysis window for Y7e d initially shows the following output 266 Design of Experiments Ha Analyze Experiment yield Analyze Experiment yield File name C docdataltutorial sfx Comment Tutorial 7 Estimated effects for yield 83 7405 0098953 Bhtempershue 340125 0221266 10 B flow rate 029125 0221266 10 Concentston 1 66625 0221266 10 029375 0 221266 10 1 0 345156 1 6 85131 3 1 09726 5 0 0855562 0 0 805506 Figure 16 13 Analyze Design Analysis Window The window displays four panes d Standardized Pareto Chart for yield Ampe ratre Comtat m E 4 8 12 Standardized effect Main Effects Plot for yield N 4 fed temperature ometati catahst perature S M rate attin rate 1 Analysis Summary lists the estimated main effects and interactions 2 ANOVA Table contains P Values then can be used to test the statistical significance of each effect 3 Standardized Pareto Chart display the effects in decreasing order of significance wi
19. year hour 4 30 05 3 15 53 HH MM SS minutes and second Fixed Decimal Number with 1 to 9 places 34 10 Formula Calculated from other columns MPG City MPG Highway Figure 2 3 Column Types When entering data into a datasheet the data must conform to the type of column in which it is entered For example attempting to type a name into a numeric column will result in it being rejected When entering data the format of the data must also match your current Windows settings In particular STATGRAPHICS Centurion honors the current Windows settings for 1 Decimal separator for numeric values 2 Time format and time separator for times 3 Short date format and date separator for dates To check the settings of your computer access the Windows Control Panel When entering a date you must use the format specified on the Edit Preferences dialog box either 4 digits years as in 4 30 2005 or 2 digit years as in 4 30 05 If a 2 digit year is used it is assumed to fall within the years 1950 through 2049 More information about formula columns may be found in a later section of this chapter titled Manipulating Data 2 2 Accessing Data Chapter 1 showed how data can be entered into a datasheet by hand More often users will access data that already exists in another file or application There are 3 basic ways of putting existing data into a STATGRAPHICS Centurion datasheet 1 Read an existing data file If the data has previous
20. 1 Cotte laton Analysis ussas essiant e e a aT even ns dem ded 198 US reli rr TU 202 15 3 Fittings Nonlineat BOO issii eisirean eenas nra SES 205 TO Examining the Residuals nisust i od iaeia ii Si 207 13 5 Multiple RECTOR SION MX 209 Tutorial 5 Analyzing Attribute Data sicscaccccerievesnneaesseoceteacsonapeceanssankasaceetssasnscsassecaseniers 217 DIEN beret isis E Attribate DatAsisiossisiniieinies iare wen tart arena errata trea errs 218 DN tka EEE EEEE EAEE mite E 219 14 3 Crosstabulatidfis secundis aeos p Ne ED UN Hout E RD E E MEE 222 14 4 Comparing Two or More Samples io vam UNI Hip ndi DENM eM 229 I t cEeo nu or dri mM annan E E EE E E 233 Tutorial 6 Process Capability Analysis eee eene eene 235 15 1 Plotting the ati aostnendionad bon Betas durata tavi E ba led ultra Peiper Fb dde t 236 15 2 Capability Analysis Proce Gish maa uec EU ra RR UR E 238 15 3 Dealing with NonsNormab Date uses mairies tirer tact ar Ed x parade d 241 15 4 C pability Docs acsi xui siio ne uia sb trad eda GB ba sr aed i c 248 lkcwod Siema Calculator MEN TET 251 Tutorial 7 Design of Experituetits assess seisecki bas kvaYi ves Uode Vk Ep s I Ve rE Ala RA EE MR RE 253 16 1 Selecting a Screening Ts ei esp artsd uapedee epe MES tap Ef M NUM iR S 254 10 2 Creating the IBID oannes ini a n EM et ME haac i uM ep vi I MEE 258 16 Analyzino the i M 265 10 4 Plot ng the Fitt d Models siiis naaie 273 15 5 Optimizine the RESPONS Gnssnimani en a a aiaa 2
21. 180 0 150 0 165 0 150 0 180 0 150 0 180 0 165 0 150 0 180 0 165 0 150 0 150 0 180 0 180 0 NN NM NN NSD NN ND NS SE EP EB SB m m a oe a 10 0 11 0 10 0 10 0 10 0 12 0 12 0 11 0 12 0 12 0 10 0 12 0 11 0 12 0 12 0 11 0 12 0 10 0 10 0 10 0 n 4 B C D E F 8 H 1 J Figure 16 10 Final Design The datasheet contains a column with block numbers 5 columns with the settings of the experimental factors and 2 columns for entering the responses once the experimental runs have been performed After creating the design save it by selecting Fie Save Save Design File from the main menu Before performing the experiment it is helpful to select Aas Structure from the analysis toolbar in the Screening Designs Attributes window which displays the following 264 Design of Experiments Alias Structure S JAD O jp E f BD The StatAdvisor The alias structure shows which main effects and interactions are confounded with each other Since this design is resolution IV the main effects will be clear of the two factor interactions However at least one two factor interaction will be confounded with another two factor interaction or a block effect You will not be able to estimate these interactions Check the table to determine which interactions are confounded Figure 16 11 Alias Structure of Selected Design Each line of the above table indicates a quantity th
22. 2 By creating a new column in any of the 10 datasheets in the DataBook For example suppose information was desired about the ratios of miles per gallon in city driving versus miles per gallon in highway driving for each automobile in the 93cars data file That file contains 2 separate columns one named MPG C7fy and one named MPG Highway To summarize the distribution of the ratios you could select the One Variable Analysis procedure and specify the ratio directly in the Dafa field of the data input dialog box One Variable Analysis Data Domestic 100 MPG City MPG Highway Select Horsepower Length vi v Sort column names Cancel Delete Transform Help Figure 2 10 Creating a Transformation On The Fly When OK is pressed an analysis will be generated for 100 times the ratio without ever changing the data in the datasheet 38 Data Management B E One Variable Analysis 100 MPG City MPG Highway One Variable Analysis 100 MPG City MPG Highwa Data variable 100 MPG City MPG Highway Scatterplot 93 values ranging from 64 0 to 93 9394 The StatAdvisor This procedure is designed to summarize a single sample of data It will calc Also included in the procedure are confidence intervals and hypothesis tests Graphical Options buttons on the analysis toolbar to access these different 69 v4 79 E a9 9 100 MPG City MPG Highway Summary Statistics for 100
23. 2 and 2 Even removing the largest data value only reduces the standardized skewness to 2 81 A frequency histogram may also be displayed by pressing the Graphs button on the analysis toolbar and selecting Frequency Histogram on the Graphs dialog box Histogram gt o Cc o 2 Oo o il to 205 210 Strength Figure 15 2 Frequency Histogram The data appear quite clearly to be positively skewed extending farther to the right of the peak than to the left 237 Process Capability Analysis Non normal data such as that shown above ate commonplace One typical approach to dealing with such data unfortunately is to simply ignore the non normality and calculate indices such as C using formulas designed for data from a normal distribution As will be seen in this tutorial ignoring non normality can lead to incorrect results often significantly overestimating or underestimating the percent of the product that is beyond the specification limits 15 2 Capability Analysis Procedure STATGRAPHICS contain procedures for performing a capability analysis on data collected either one at a time individuals data or in subgroups such as 5 observations every hour Assuming the sample data are individuals a process capability analysis may be conducted by 1 If using the Classic menu selecting SPC Capability Analysis Variables Individuals 2 If using the Six Sigma menu selecting Analyze Variabl
24. 38 5r 0 L 180 190 200 210 220 230 240 Strength Figure 15 10 Fitted Largest Extreme Value Distribution 243 Process Capability Analysis Notice that the distribution is skewed to the right matching the observed data much better than the normal distribution The short vertical lines have been positioned at equivalent 3 sigma limits i e limits within which the same 99 73 of the fitted distribution is located as is the case for the mean plus and minus 3 sigma for a normal distribution Note that these limits are not symmetrically spaced about the peak of the distribution due to its positive skewness The Analysis Summary shows a dramatic difference in the estimated percent of the product that is likely to be out of spec compared to the earlier fitted normal distribution Process Capability Analysis Individuals Strength Data variable Strength specs are 190 230 Transformation none Distribution Largest Extreme Value sample size 100 mode 200 036 scale 4 80179 mean 202 808 sigma 6 15853 Equivalent 6 0 Sigma Limits 99 865 percentile 231 761 median 201 796 0 134996 percentile 190 969 Po Observed Estimated Defects Beyond Spec Beyond Spec Per Million USL 230 0 0 000000 0 194758 1947 58 Nomial 2100 tg LSL 190 0 0 000000 0 030805 308 05 0 000000 0 225563 2255 63 Figure 15 11 Analysis Summary after Fitting Largest Extreme Value Distribut
25. 5f6 5 3 sf sf Cancel Help Figure 1 18 Open Data File Dialog Box The sample file is located in the default data directory usually c Program Files Statgraphic STATGRAPHICS Centurion XV Data Opening the file loads the full 51 rows of data into the datasheet 17 Getting Started EH census 2000 sf6 California Colorado Connecticut Delaware p c Florida 1 2 3 4 5 6 7 8 9 Georgia Hawaii Mi I ln A B C D E Population Median Age 4441100 35 8 626932 4 5130632 2 2613400 33871648 4301261 3405565 783600 572059 15982378 8186453 1211537 F G H Figure 1 19 Datasheet Showing Contents of Census2000 sf6 File 1 5 Analyzing the Data Percent Female 51 48 50 51 50 49 51 51 52 51 50 49 7 orn wu PA 0 wo OM B QC Per Capita 18819 22660 20215 16904 22711 24049 28766 23305 28659 21557 21154 21525 Col 6 Once the data have been loaded into the STATGRAPHICS Centurion DataBook any of the more than 150 statistical procedures may be applied to it in any of several ways 1 By selecting the desired procedure from the main menu 2 By pressing one of the shortcut buttons on the toolbar 3 Byinvoking the StatWizard by pressing the button on the toolbar displaying a wizard s cap Let s begin by summarizing the variability in per capita income amongst the states The best procedure for summarizing a single column of numeric da
26. 96 0 1962 961 0 j0000 fo 0000 Jj J964 jo66 965 0 0000 p J00155 4 oe6 1968 967 3 0023 5 10088 j oes 970 969 Q2 j0015 7 710053 o 970 972 foi 6 00465 I3 0108 jy o2 974 973 Je 0045 9 0143 j foa 1976 97 5 Je 00465 25 01908 J976 978 977 fho 0075 35 0273 is 98 8 990 959 7 00s43 t6 0892 16 990 19922 9 1 6 00465 12 09457 8 994 996 95 j 00078 27 09845 9 99 6 998 997 o 00000 127 09845 Qo 958 100 0 99 f2 j00155 29 1000 Qi 1000 1002 1001 fo j0000 129 1000 22 1002 1004 1003 0 0000 129 1000 23 1004 1006 1005 O0 00000 129 1000 Q4 100 6 11008 1007 0 0000 129 11000 25 i008 101 0 1009 O0 0000 129 1000 ave 1010 0 j0000 029 1000 Mean 98 2295 Standard deviation 0 70038 Figure 10 17 Frequency Tabulation Table Note that observations are counted as falling within an interval if they are greater than the lower limit of the interval and less than or equal to the upper limit The column on the far right
27. If the data in the data sources changes between the time a StatFolio is saved and the time it is reloaded the analyses will change to reflect the new values This provides a simple method for rerunning analyses that need to be repeated on a periodic basis without having to recreate them NOTE 2 The data and the StatFolio are stored in different files If you need to move a StatFolio from one computer to another be sure to move the data file s as well 28 Getting Started Chapter Data Management Accessing data from files and databases transforming data values generating patterned data In order to analyze data in STATGRAPHICS Centurion it must first be placed in the STATGRAPHICS Centurion DataBook The DataBook is a tabbed window consisting of 10 datasheets A datasheet is a rectangular array of rows and columns Each column in a datasheet represents a variable Each row represents a case or observation For example the datasheet below contains information on a number of different makes and models of automobiles E 93cars sf6 m oes x Model Min Price Mid Price Integra s Legend Audi 90 Audi 100 BIW 535i Buick Century Buick LeSabre Buick Roadmaster Buick Riviera Cadillac DeVille cadillac Seville 12 Chevrolet Cavalier w AA bw moo Cc SE NHS SY een A oO ek wD e 13 Chevrolet Corsica 7 8 01 3 01 1 4 4 1 14 Chevrolet Camaro S n B C D E F 8 H 1 F
28. Java applet updated every StatReporter seconds Data Sheet requires Java Add interactivity to applets Cancel Help Figure 5 6 StatPublish Dialog Box for Creating HTML Output The fields on this dialog box are used to specify e HTML file on local directory This is the name of the HTML file that will hold the Table of Contents for the StatFolio It will list the contents of the StatFolio and provide links to other HTML files corresponding to each window in the StatFolio By default it is placed in the same directory as the StatFolio itself with the same name as the StatFolio but an extension of rather than sgp To view a published StatFolio a browser would normally be directed to open this file 109 StatFolios e FTP site URL All published output is first placed in the local directory indicated above This includes HTML files image files containing the graphs and other support files If an entry is made in the FIP Siz URL field all of the files will also be uploaded to the location referred to by the URL This would commonly be a directory on a server Note that you must have FTP write access to the indicated URL which may have to be set up by the network administrator e FTP Username uset name for FTP access to the indicated URL e FTP Password password for FTP access to the indicated URL e Include Check all StatFolio windows that are to be published e Graph Width and Height in Pixels th
29. Management HE X Y Plot MPG City vs LOG Weight 7 6 LOG Weight Figure 2 17 X Y Plot Procedure Using Transformed values of Weight STATGRAPHICS Centurion operators may also be used when creating formula columns similar to the illustration in the preceding section 2 3 4 Sorting Data The contents of a datasheet may be sorted by highlighting the column or columns to be used to define the sort order and then selecting Sort Data from the Edit menu For example to sort the data in the 93cars file according to miles per gallon highlight the columns named MPG Cty and MPG Highway and then select Sort Dara The following dialog box will be displayed 44 Data Management Order Ascending C Descending C Random Apply to Entire File C Selected Range Only Primary Column Secondary Column MPG City MPG Highwa Figure 2 18 Sort Options Dialog Box You may specify either one or two columns on which to base the sort as well the sort order Sorting by MPG Cif and then MPG Highway sorts first by miles per gallon in city driving and then for automobiles with the same value of MPG City by miles per gallon in highway driving x2 iet Wea iC Pa o Lecce Road iia lee 17098 LCD EDS w Ow c c ci W Cc ow Q O BPM ee EP OCF SCR N SB BE OS 14 rs r KIKIA nit A ABIMGR CD Gen AGA ae Figure 2 19 93cars sf File after Sorting 45 Data Management NOTE T
30. Tolerance Limits Data Sample E Standard Deviation 0 70038 Figure 10 25 Dialog Box for Statistical Tolerance Limits The resulting output is shown below 168 One Sample Analysis Statistical Tolerance Limits Sample size 129 Sample mean 98 2295 Sample standard deviation 0 70038 95 0 tolerance interval for 99 0 of the population Xbar 2 88436 sigma Upper 100 25 Lower 96 2093 The StatAdvisor Assuming that the data comes from a normal distribution the tolerance limits state that we can be 95 0 confident that 99 0 of the distribution lies between 96 2093 and 100 25 This interval is computed by taking the mean of the data 2 88436 times the standard deviation Figure 10 26 Analysis Summary for Statistical Tolerance Limits The interpretation of the StatAdvisor summarizes the results succinctly The confidence level and percentage of the population that is bound may be changed using Pane Options Also created by the Statistical Tolerance Limits procedure is a Tolerance Plot which displays the tolerance limits Normal Tolerance Limits n 1 29 mean 98 2295 sigma 0 70038 UTL 100 25 Conf level 95 0 Pop prop 99 0 101 103 Figure 10 27 Tolerance Plot No more than one individual out of every 100 is likely to lie outside the calculated limits 169 One Sample Analysis 170 One Sample Analysis Chapter Tutorial 2 Comparing Two Samples Gr
31. W eigh MAX Weigh MIN Weigh The parentheses are necessary to insure that the subtractions are done before the division Expressions are not case sensitive nor is the inclusion of blank spaces relevant Every data input dialog box includes a button labeled Transform as in Figure 2 14 This button may be used to help create STATGRAPHICS Centution expressions if you do not remember which operators to use If you place the cursor in a data field and then press Transform a dialog box similar to that shown below will be displayed 42 Data Management 8 Generate Data X Expression Variables 2 wl Ed Delete Operators Figure 2 16 Dialog Box Displayed by the Transform Button Along the right is a list of all STATGRAPHICS Centurion operators with an indication of the number of arguments that must be supplied Clicking on an operator name places it in the Expression field After you replace the question marks with column names or numbers you may press the Display button to see the first several values generated by the expression or press the OK button to have the expression entered into the data input dialog box NOTE You do not need to use the Transform button if you would rather type the expression yourself on the data input dialog box Once a transformation has been specified on the data input dialog box as in Figure 2 14 that transformation will be used when the procedure is run 43 Data
32. apply to all tables and graphs in the current analysis H Pane options Selects options that apply only to the currently maximized table or graph Graphics options Allows you to change the titles scaling and other features of the currently maximized graph Figure 1 25 Important Buttons on the Analysis Toolbar Additional buttons to the right allow other actions when a graph is maximized as explained in Chapter 5 For example if the Graphs button E is pressed a dialog box will be displayed listing other graphs available in the One Variable Analysis procedure Graphs V Scatterplot V Box and Whisker Plot V Frequency Histogram Quantile Plot Normal Probability Plot Density Trace Symmetry Plot Cancel A Help Figure 1 26 List of Available Graphs 23 Getting Started Checking the box next to Frequency Histogram and pressing OK adds a third pane to the right hand side of the analysis window One Variable Analysis Temperature One Variable Analysis Temperature Data variable Temperature degrees 130 values ranging from 96 3 to 100 8 The StatAdvisor This procedure is designed to summarize a single sample of data It Also included in the procedure are confidence intervals and hypothe Graphical Options buttons on the analysis toolbar to access these d Tempe rature Box aicPAT be r Piot ary Statistics TED perature Tapaa Minimum 96 3 Maxum 100 8 Temperature
33. assumes normality Test statistic 2 75487 P Value 0 676064 Figure 10 11 Outlier Identification Output after Removing Row 15 The most extreme value among the remaining observations is row 95 Since the P value for Grubbs test is well above 0 05 all of the remaining observations appear to have come from the same population Ideally one would go back to the original study and attempt to find an assignable cause for the abnormal value for individual 15 Since that is impossible to do now we will accept the results of Grubbs test and remove row 15 from all subsequent calculations Modifying the data input dialog box for the One Variable Analysis in the same manner as in Figure 10 10 the modified summaty statistics are shown below Summary Statistics for Temperature Interquartile range 0 9 Figure 10 12 Summary Statistics after Removing Row 15 157 One Sample Analysis 10 5 Histogram Another common graphical display that illustrates a sample of measurement data is the frequency histogram Returning to the One Variable Analysis procedure a histogram may be created by pressing the Graphs button on the analysis toolbar and selecting Frequency Histogram The default histogram is shown below Histogram gt Oo c I 2 oO I uc 98 99 100 101 Temperature Figure 10 13 Frequency Histogram with Default Classes The height of each bar in the histogram re
34. color L1 Rusted Wrong size 10 20 30 40 50 50 frequency Frequency Table for Defect ative mulative iechart for Defec Rel Cumul Piech art f Defect Prop wt sti 0 4417 2 50 Contaminated 0 1333 ae pa 0 0250 Pee ar 02 33 Mestre i i rt 5 Mesapm f3 08230 m 155 arts j Rusted 6 Missmgpets 3 0 0250 8 Pereder fe 00667 Ms Chong sie Rusted 5 00417 i Wrongsize 0 0083 120 Figure 14 2 Tabulation Analysis Window The upper left pane shows that 9 unique values were found in the n 120 rows The barchart and piechart on the right illustrate the observed frequency of each type of defect which is also tabulated in the bottom left pane The most common type of defect is Contaminated which represents about 44 of all defects 14 2 Pareto Analysis The Frequency Tabulation procedure orders the types of defects in alphabetical order To order the types from most frequent to least frequent use the Pareto Analysis procedure instead The Pareto analysis is accessed by 1 If using the Classic menu select SPC Quality Assessment Pareto Analysis 2 Ifusing the Six Sigma menu select Analyze Attribute Data One Factor Pareto Analysis 219 Analyzing Attribute Data The data input dialog box should be completed as shown below Pareto Analysis t3 m Data Untabulated Observations gt Defect C Tabulated Counts Lab
35. column names Cancel Delete Transform Help Figure 2 14 Transforming Data on a Data Input Dialog Box Instead of typing the name of a column in a data field you may type a STATGRAPHICS Centurion expression STATGRAPHICS Centurion expressions are formulas that operate on data using algebraic symbols and special operators A wide variety of operators are available as 41 Data Management B described in the PDF document titled STATGRAPHICS Operators The table below shows commonly used operators Operator Use Example Addition X 100 Subtraction X 100 Division X 100 Multiplication X 100 a Exponentiation X 2 ABS Absolute value ABS X AVG Average AVG X DIFF Backward differencing DIFF X EXP Exponential function EXP 10 LAG Lag by k periods LAG X k LOG Natural logarithm LOG X LOG10 Log base 10 LOG10 X MAX Maximum MAX X MIN Minimum MIN X SD Standard deviation SD X SORT Square root SORT X STANDARDIZE Conversion to Z scotes STANDARDIZE X Figure 2 15 Commonly Used STATGRAPHICS Operators When constructing a STATGRAPHICS Centurion expression multiple operators may be combined using normal algebraic precedence rules For example the following expression converts each value in the column named Wezght to a fraction equal to the distance between the minimum and maximum values amongst all of the automobiles Weight MIN
36. data and are not in general very useful STATGRAPHICS converts evetything back to the original units automatically for you To compare the two approaches the Probability Plot may be selected from the Graphs dialog box for each approach and pasted side by side in the StatGallery 246 Process Capability Analysis EE StatGallery Page 1 Next Page rey Page First Page Last Page Probability Plot Probability Plot N Co e EN m _ re UU Mode 200 036 Mean 9 32622E Scale 4 80178 Std Dev 1 4762 N N e N eo e Normal Distribution wo 41 210 220 230 41 61 281 101 121 Strength transformed Strength Largest Extreme Value Distribution 141 X 1 E 15 Figure 15 14 Probability Plots in the StatGallery If the assumed distribution is correct the points should fall along a diagonal line when displayed on this plot Both methods appear to have handled the non normality well making it difficult to choose between them Whichever method is used it is important to establish a protocol for how to handle a particular variable such as Szrength and apply that same protocol every time such data is analyzed It would be a mistake to do the type of exploratory data analysis described in this chapter every time a set of similar data was collected Instead this type of analysis should be done once to determine how a selected variable needs to be handled and then the selected approach should be applied
37. diamonds creates the following plot 83 Graphs Plot of Fitted Model MPG City exp 2 1328 2799 07 Weight 1600 2100 2600 3100 3600 4100 4600 Weight Figure 4 9 Plot after Modifying the Point Type 84 Graphs 4 1 5 Top Title Options The Top Title tab is used to specify the text and font type for the information displayed above a graph Graphics Options X Axis Y Axis Profile Layout Grid Lines Points Top Title Title Fited S Curve Model from S3cars Filed 3 Vertical Line 1 Fonts Change Font for All Titles Line 2 Fonts Cancel Apply Help Figure 4 10 Top Title Tab on Graphics Options Dialog Box Graphs have up to 2 title lines An entry such as 3 in a title field indicates that the text is automatically generated by the analysis procedure usually containing variable names or calculated statistics You may change any title including those that are automatically created You may also drag the title to a new location with your mouse 85 Graphs 55 45 35 25 15 Fitted S Curve Model from 93cars File MPG City exp 2 1328 2799 07 Weight 1600 2100 2600 3100 3600 4100 4600 Weight Figure 4 11 Plot after Modifying the Top Title 86 Graphs 4 1 6 Axis Scaling Options The Graphics Options dialog box also contains tabs that allow you to modify the axis titles and scaling Graphics Options t3 Layout Grid Lines Points Top Title
38. event Small enough is usually defined as less than 0 05 which is called the significance level or alpha risk of the test procedure If there is less than a 5 chance that the sample would have arisen given that the null hypothesis was true then the null hypothesis is rejected In this example the test statistic equals the largest absolute Szudenzized Value Without Deletion 3 479 It has a P value equal to 0 0484 Since the P Value is less than 0 05 we would reject the null hypothesis thereby concluding that row 15 is an outlier compared to the rest of the data sample You can remove row 15 by pressing the Data Input button on the analysis toolbar and entering an expression in the Se ect field such as that shown below Outlier Identification Gender Heart Rate Bas Temperature gt Temperature Select Temperature lt 100 MV Sort column names Cancel Delete Transform Help Figure 10 10 Outlier Identification Dialog Box with Entry Removing Outlier Since row 15 is the only observation that exceeds 100 degrees the e e field entry above will select only the other n 129 rows The modified Outher Identification output is shown below 1 56 One Sample Analysis Sorted Values Studentized Values Studentized Values Modified S ooo opo LL 99 9 4 L6713 1696522 L1489 97 999 23852 24492 21584 k Grubbs Test
39. fractions Iv Plackett Burman resolution V Iv Plackett Burman resolution IV Plackett Burman resolution IIl Maximum runs per block f D Minimum centerpoints per block fo Experimental error sigma jog Figure 16 1 Initial Screening Design Selection Dialog Box 254 Design of Experiments The input required is Number of factors the number of experimental factors X to be varied during the experiment In the example the engineer wished to study 5 factors Designs to Consider the type of designs to be evaluated STATGRAPHICS Centurion will attempt to find the best design of each specified type that meets the requirements The available designs ate 1l Factorials rans are made at all combinations of a high and low level of the factors 2 Fractional factorials runs ate made at a subset of the runs in the full factorial where the subset equals one half one fourth one eighth and so on 3 Irregular fractions rans are made at a subset of the runs in the full factorial but the fraction is irregular such as three eighths of the runs 4 Mixed level factorials one factor is run at 3 levels while all the others are run at 2 levels 5 Plackett Burman designs two level designs where the number of runs is not a power of 2 Designs ate classified according to their resolution o Resolution V designs can estimate all main effects and two factor interactions o Resolution IV designs can estimate all m
40. locate the median or 163 One Sample Analysis 50 percentile which is the value of ezzperature at which the proportion displayed on the vertical axis equals 0 5 A table of percentiles may also be created by selecting Percentiles from the Tables list Percentiles for Temperature Percentiles Lower Limit Upper Limit _ 99 0 99 9 99 6479 100 19 Output includes 95 0 normal confidence limits Figure 10 19 Percentiles Table The p percentile estimates the value of temperature below which p of the population lies Pane Options has been used to add 95 confidence limits to those percentiles based on the assumption that the sample comes from a normal distribution For example the 90 percentile is the value of temperature exceeded by only 10 of the individuals in the population The best estimate of that percentile based on the sample of data is 99 1 degrees However given the limited size of the sample the 90 percentile could lie anywhere between 98 98 and 99 31 degrees with 95 confidence 10 7 Confidence Intervals Having removed the outlier from the sample we can proceed to establish final estimates for the parameters of the distribution from which the data came Selecting Confidence Intervals from the Tables dialog box displays Confidence Intervals for Temperature 95 0 confidence interval for mean 98 2295 0 122015 98 1074 98 3515 95 0 confidence interval for standard devia
41. means of the 4 variables at the 95 0 confidenc significantly different from which others select Multiple Range Tests Figure 12 3 Multiple Sample Comparison Analysis Window The top left pane summarizes the size of each sample and its range The top right pane shows a scatterplot of the data enlarged below 186 Comparing More than Two Samples Scatterplot by Sample o o c o Q wn i Figure 12 4 Scatterplot of Strength versus Material Note that many of the observations plot on top of one another To alleviate this problem t double click on the graphics pane to maximize it and then press the J7 er button P on the analysis toolbar and add a small amount of horizontal jitter by moving the top slider slightly to the right Jittering Horizontal A p Less More Cancel Vertical Help _ _ _ Less More Figure 12 5 Jittering Dialog Box This randomly offsets each point a small amount in the horizontal direction making the individual points easier to see 187 Comparing More than Two Samples Scatterplot by Sample o c o Q o put Figure 12 6 Scatterplot after Jittering Jittering affects only the display not the data or any calculations made from it 12 2 Analysis of Variance The first step when comparing multiple samples is usually to perform a oneway analysis of variance ANOVA The ANOVA is used
42. only by STATGRAPHICS When saving the file you may change the setting in the Save as type field to a different file format that other programs can read Note that data saved in other file types may take longer to read into STATGRAPHICS than data saved in an SF6 file 1 4 Reading a Saved Data File Once the data have been entered into the data sheet it is ready for analysis To make the example more interesting however let s retrieve the census data for all 50 states and the District of Columbia which is shipped with STATGRAPHICS Centurion in a file named census2000 5f6 To open that data file select F7 e Open Open Data Source from the top menu You will first be asked to specify the location of the data you wish to access 16 Getting Started pen a Source m Data Source STATGRAPHICS Data File C External Data File C ODBC Query Clipboard Figure 1 17 Open Data Source Dialog Box The default selection is correct in this case Next select the name of the file containing the data Look in E Data t ex EJ E 93cars a beetles a breadwrapper gi 0 a absorbers bloodpressure breaking a checksheet aircraft boards a bspline e chemical rez m arima charts m Bodyfat m candidates m circuits a Arrhenius e bodytemp E cans cities iB cloth ie baseball ie bottles E capacitors Ka m A File name census2000 Files of type STATGRAPHICS Files
43. path of steepest ascent 279 percentiles 150 164 piechart 219 Plackett Burman designs 255 power 256 Preferences 106 139 Capability tab 249 EDA tab 158 Stats tab 151 Print Setup 142 printing analyses 72 background 74 header 74 margins 74 wide lines 74 process capability analysis 235 P values 156 quantile plot 163 180 quantile quantile plot 182 quartiles 150 RANDOM 60 random numbers 53 randomization 262 286 Index Recode Data 46 references 281 regression analysis 197 regression coefficients 272 REP 52 RESHAPE 53 residual plots 194 207 residuals 194 207 resolution 255 response surface plots 274 RNORMAL 54 ROWS 60 Rowwise Statistics 47 R squared 204 206 Save Results 65 Screening Designs Attributes 264 screening experiments 254 SD 42 searching for tests and statistics 135 select fields 60 selecting analyses 130 serial number 2 setup exe 1 Shapiro Wilks test 241 Sigma Quality Level 250 signed rank test 166 significant digits setting default 140 Simple Regression 58 202 Six Sigma 235 Six Sigma Calculator 251 Six Sigma menu 9 140 skewness 150 skychart 228 smoothing a scatterplot 96 Sort Data 44 sorting variable names 141 SQRT 42 square plots 274 standard deviation 150 STANDARDIZE 42 standardized Pareto chart 267 StatAdvisor defaults 141 StatFolios publishing 109 saving 27 103 start up scripts 104 108 142 StatGallery 24
44. plus and minus 3 sigma are tight enough to fit within the specs However they ate shifted to the left The Analysis Summary in the upper left pane quantifies the fit 240 Process Capability Analysis Process Capability Analysis Individuals Strength Data variable Strength specs are 190 230 Transformation none Distribution Normal sample size 100 mean 202 809 std dev 6 23781 6 0 Sigma Limits 3 0 sigma 221 522 mean 202 809 3 0 sigma 184 096 Observed Estimated Defects Beyond Spec Beyond Spec Per Million USL 230 0 0 000000 0 000654 Nominal 2100 IS LSL 190 0 0 000000 2 001465 20014 65 0 000000 2 00211996 20021 19 Figure 15 6 Capability Analysis Summary Of primary interest is the lower table which estimates the percent of the product that is likely to be out of spec Based on the fitted normal distribution the estimated percent of the product beyond the specification limits is about 2 equal to 20 021 defects per million DPM 15 3 Dealing with Non Normal Data The estimated DPM calculated above relies heavily on the assumption that the data come from a normal distribution A formal check of that hypothesis may be conducted by selecting Tests for Normality from the Tables dialog box Tests for Normality for Strength Shapiro Wilks W__ 0 931784 _ 0 0000321356 Figure 15 7 Tests for Normality Depending on your system preferences one o
45. retrieval In this chapter a typical analysis is described in detail The goal of the analysis is to construct a statistical model relating the miles per gallon achieved in city driving for the 7 93 automobiles in the 93cars sf6 data file to their weight A scatterplot of the data is shown below Plot of MPG City vs Weight 1600 2100 2600 3100 3600 4100 4600 Weight Figure 3 1 X Y Plot of Miles per Gallon in City Driving versus Weight in Pounds As might be expected miles per gallon is negatively correlated with vehicle weight Some non linearity is evident in the relationship and at least one point appears to be a potential outlier The primary procedure in STATGRAPHICS Centurion for fitting a statistical model relating two variables is the Simple Regression procedure That procedure fits both linear and nonlinear models The simplest model relating one dependent variable Y to one independent variable X is a straight line of the form Y atbX 58 Running Statistical Procedures where P equals the slope of the line and a equals the Y intercept Curvilinear models such as the exponential model Y exp a b X may be used if the relationship is not linear 3 1 Data Input Dialog Boxes The Simple Regression procedure is located on the main menu 1 If using the Classic menu under Relate One Factor 2 If using the Six Sigma menu under Improve Regression Anal
46. s 2 Contains no more X variables than it necessary to generate a good prediction The latter consideration is sometimes referred to as parsimony Typically models involving a small set of well selected predictors perform best in practice 197 Regression Analysis This chapter considers several types of regression models As an example the miles per gallon in city driving for the automobiles in the 93cars sf6 file will serve as the response variable Y The goal is to build a model from the other columns in that file that can successfully predict the miles per gallon of an automobile 13 1 Correlation Analysis A useful place to start when beginning to build a regression model is with the Multiple Variable Analysis procedure This analysis may be found on the main menu under 1 If using the Classic menu select Describe Numeric Data Multiple Variable Analysis 2 If using the Six Sigma menu select Analyze Variable Data Multivariate Methods Multiple Variable Analysis The analysis begins by displaying the following data input dialog box Muttiple Variable Analysis elect V Sort column names Cancel Delete Transform Help Figure 13 1 Multiple Variable Analysis Data Input Dialog Box Six possible predictor variables have been selected in addition to MPG City The potential predictors are 198 Regression Analysis X Engine Size liters X Horsepower maximum X Le
47. separate entry is provided for saving numerical results back to the datasheet O Sort Variable Names whether to list column names in alphabetic order on data input dialog boxes Otherwise column names will be listed in the same order as in the datasheets 4 Digit Years whether dates should be displayed with 4 digit years rather than 2 digit years By default 2 digit years such as 2 1 05 are assumed to represent dates between the years 1950 2049 Changes to this option will not take effect until the program is restarted Autosave Enabled whether to save the current S atFo io and data files automatically in the background and the duration of time between saves If enabled and there is a computer ot program malfunction you will be given the chance to restore the state of the StatFolio and datasheets when the program is next restarted Update Links on Each Value whether to recalculate all statistics whenever a data value changes in one of the datasheets Normally statistics are not recalculated until an analysis receives the focus is printed or published or the StatFolio is saved Graphics options that apply to all graphs Maintain 1 1 Aspect Ratio whether to display all graphs with equal length horizontal and vertical axes Normally the horizontal axis will be longer than the vertical axis Always Black and White whether to display graphs in black and white overriding any other color settings Suppress Tickmark Gap w
48. should be examined to determine which means are significantly different from which others A useful plot for this purpose is the Means Plot available from the Graphs dialog box Means and 95 0 Percent Tukey HSD Intervals 67 F J 4d T 0 i 2 Mean Figure 12 9 Means Plot 190 Comparing More than Two Samples The means plot shows each sample mean together with an uncertainty interval surrounding it Interpretation of the intervals depends upon the type of interval plotted which may be changed using Pane Options The two most commonly used intervals are 1 Fisher s LSD Least Significant Difference Intervals These intervals are scaled in such a way that one can select a single pair of samples and declare their means to be significantly different if the intervals do not overlap in the vertical direction While the chance of incorrectly declaring two samples to be different with this method is fixed at 5 making comparisons amongst many pairs of means may result in an error on at least one pair with a considerably higher probability 2 Tukey s HSD Honestly Significant Difference Intervals These intervals are scaled to control the experiment wide error rate at 5 Using Tukey s method you will not incorrectly declare any pair of means to be significantly different when they re really not in more than 5 of the analyses you do The intervals in Figure 12 9 use Tukey s method Since the interval for s
49. simple patterns 2 Generating random numbers 2 4 1 Generating Patterned Data Several procedures in STATGRAPHICS Centurion particularly those that perform an analysis of variance expect the data to be analyzed to be placed into a single column of the datasheet together with one or more code columns identifying the explanatory factors For example consider the data in the following two way table 50 Data Management Blend Treatment 1 Treatment 2 Treatment 3 1 75 82 91 2 78 85 93 3 77 84 92 4 75 85 96 To analyze this data using the Multifactor ANOVA procedure it needs to be placed into a datasheet in the following format E untitled Treatment a 1 2 3 4 5 6 7 8 9 a Pe TAA ons B oC No RB Whe Rm WO IN RM Ny 13 M 4 gt j AY BY C D Figure 2 26 Desired Data Structure The first two columns indicate the levels of the factors corresponding to each data value The third column contains all of the observations To create such a file the easiest solution is often to type in the first two columns However since the columns follow simple patterns you could generate them instead using special STATGRAPHICS Centurion operators For example the blend numbers can be generated by clicking on the column 1 header and then selecting Generate Data from the Edit menu This displays the following dialog box into which an expression has been entered 51 Data Manageme
50. text and fonts 4 1 9 Adding New Text Additional text may also be added to any graph by pressing the Add text button E on the analysis toolbar A dialog box will be generated in which to enter the new text Text Options Text outlier Direction Horizontal C Vertical Cancel Fonts Help Figure 4 16 Dialog Box for Adding New Text 90 Graphs The text string will be initially positioned under the top title but may be dragged to any location with the mouse Fitted S Curve Model from 93cars File MPG City exp 2 1328 2799 07 Weight D um 5 gt o i9 DL 1500 2000 2500 3000 3500 4000 4500 Weight in pounds Figure 4 17 Plot after Adding New Text String After text is added click on it and then press the Graphics Option button if changes need to be made 4 2 Jittering a Scatterplot When one or both of the variables in a scatterplot are discrete the chance of points being exactly in the same location and obscuring each other can be large The analysis toolbar has a Jitter button that overcomes this problem by randomly offsetting points in the horizontal and or vertical direction For example consider the following plot of the data in the 93cars sf6 file 91 Graphs Plot of MPG City vs Cylinders gt i 5 6 Cylinders Figure 4 18 Scatterplot of Miles per Gallon versus Cylinders Although there are 93 rows in the datasheet there are many le
51. the StatGallery 6 4 1 Adding Items To add an item to a graph 1 Double click on the desired graph to maximize its pane 117 Using the StatGallery 2 3 Press the alternate mouse button and select Add Item from the popup menu The following floating dialog box will appear StatGallery vIpioimie A Figure 6 5 Add Item Dialog Box Select the type of item that you want to add to the plot The first 5 buttons on the dialog box in Figure 6 5 work by holding the mouse button down and stretching the line or figure until it fills the desired area The last button activates text mode so that a text entry dialog box is displayed the next time you click on the graph The added text may then be dragged to the desired location 6 4 2 Modifying Items To modify an item in the StatGallery 1 2 3 Double click on the desired graph to maximize its pane Click on the item to be changed with the mouse to mark it Small rectangular blocks will be placed around an item that has been marked Press the alternate mouse button and select Modify Item from the popup menu A dialog box corresponding to the type of item marked will be displayed on which desired changes may be indicated 6 4 3 Deleting Items To delete an item in the StatGallery 1 2 3 Double click on the desired graph to maximize its pane Click on the item to be deleted with the mouse to mark it Press the alternate mouse butto
52. the data by both defect and facility Barchart for Defect by Facility Facility Texas Virginia Contaminated Damaged Leaking Misaligned Misshapen CMissing parts Poor color Rusted Wrong size m O 2 frequency Figure 14 11 Clustered Barchart The difference between the two facilities is quite apparent A related plot called a Mosaic Plot is also quite informative 227 Analyzing Attribute Data Mosaic Chart for Defect by Facility Facility m Texas Contaminated BS Virginia Damaged Leaking Misaligned wipe aERE Poor color Wro sieg Figure 14 12 Mosaic Plot In this chart the height of each bar is proportional to the total number of defects of each type The width of the bars is proportional to the relative percentage of each defect type at each location Consequently the overall area of each rectangle is proportional to the frequency of the corresponding cell in the two way table If desired the cell frequencies may also be displayed in three dimensions by selecting Skychart from the Graphs dialog box 228 Analyzing Attribute Data Skychart for Defect by Facility a c o 2 o o d Virginia Figure 14 13 Three Dimensional Skychart In a Skychart the height of each bar represents the frequency of a cell in the contingency table 14 4 Comparing Two or More Samples To determine whether or not the apparent diff
53. the specified value will be converted to empty cells when placed in the STATGRAPHICS Centurion datasheet When OK is pressed the data from the Excel file will be read into STATGRAPHICS Centurion Each column will be scanned and an appropriate column type assigned to it If any invalid column names are encountered reserved symbols will be converted to underscores The data is then ready to be analyzed 35 Data Management 2 2 3 Transferring Data Using Copy and Paste The easiest way to transfer data from another application to STATGRAPHICS Centurion is often via the Windows clipboard For example if data resides in an Excel file Excel may be started and the data copied to the clipboard by selecting the desired data within Excel and then choosing Copy from the Excel Edit menu Upon returning to STATGRAPHICS the data may be pasted directly into a STATGRAPHICS Centurion datasheet by selecting Paste from the STATGRAPHICS Edit menu When data is pasted into a column of a datasheet STATGRAPHICS Centurion automatically scans the data and selects an appropriate type for the column When copying and pasting data column names and comments may also be transferred Include the column names and comments in Excel when copying the data to the clipboard On the STATGRAPHICS Centurion side click in the header row of the STATGRAPHICS Centurion datasheet before selecting Pasze The information at the top of the clipboard will then be pasted into the head
54. to make predictions is MPG City 47 0484 0 00803239 Weight 2 R squared the percentage of the variability in Y that has been explained by the model In this case a linear regression against Weight explains about 71 1 of the variability in MPG City 3 Model P Value tests the null hypothesis that the fitted model is no better than a model that does not include Weight A P Value below 0 05 as in the current example indicates that Weight is a useful predictor of MPG Cir The plot in the top right pane displays the fitted model 204 Regression Analysis Plot of Fitted Model MPG City 47 0484 0 00803239 Weight 45L o 35 MPG City 25 15 H Weight Figure 13 8 Plot of Fitted Linear Model The plot shows the least squares regression line and two sets of limits The inner limits provide 95 confidence intervals for the mean value of Y at any selected X This indicates how well the location of the line has been estimated given that the relationship is linear The larger the sample the tighter the limits The outer lines are 95 prediction limits for new observations It is estimated that 95 of additional observations similar to those in the sample would fall within those bounds It is worthy of note that 3 observations at low values of Weight fall fairly far beyond the 95 prediction limits This may be indicative either of outliers or of the failure of the model to account for the non
55. to that variable whenever it is analyzed 247 Process Capability Analysis 15 4 Capability Indices The essence of a capability analysis lies in estimating the percentage of the product that falls outside the specification limits or equivalently DPM the defects per million To summarize process capability practitioners have also derived various capability indices The most widely calculated index is C defined as E a min a LSL USL oO 36 Put simply C is the distance from the estimated process mean to the nearer specification limit divided by 3 times the estimated process sigma The Process Capability Analysis procedure in STATGRAPHICS displays capability indices on the Capability Plot and also on the Capability Indices table If a normal distribution is assumed both long term and short term indices ate calculated Capability Indices for Strength Specifications USL 230 0 Nom 210 0 LSL 190 0 Short Term Long Term Capabilitr Performance IK 00 0 0 035905 DPM Based on 6 sigma limits Short term sigma estimated from average moving range The Sigma Quality Level includes a 1 5 sigma drift in the mean 95 0 Confidence Intervals Pp 0 920008 1 21725 Cpk 0 619618 0 864129 Ppk 0 568904 0 800059 0 61885 0 777645 Figure 15 15 Table of Capability Indices 248 Process Capability Analysis The short term indices which are calculate
56. you to analysis procedures that calculate them It can help in defining data transformations or in selecting subsets of the data It can repeat desired analyses for each unique value in a data column The StatWizard appears whenever you load STATGRAPHICS Centurion unless you elect to suppress it The wizard can also be invoked at any time by pressing the StatWizard button amp on the main toolbar 125 Using the StatWizard 8 1 Accessing Data or Creating a New Study If the DataBook is empty when the StatWizard is activated it displays a dialog box inquiring about your data needs StatWizard Welcome to the Stat Wizard The StatWizard can help you select the appropriate STATGRAPHICS analysis for collecting and analyzing your data What task do you want to perform Enter New Data or Import It from an External Source C Design a New Experiment Gage Study Control Chart or Sampling Plan Perform an Analysis that Does Not Require Data Show the Statwizard at Startup Cancel Help Figure 8 1 StatWizard Data Input Dialog Box There are 3 choices 1 You wish to load new data into the STATGRAPHICS Centurion DataBook The wizard will then take you through a sequence of additional dialog boxes in order to define the columns of a datasheet or select a data source as described in earlier chapters of this manual You wish to design a new study before you collect data In this case the
57. 000 0 0000 0 0000 0 0000 0 000 0 0000 0 7205 0 8671 0 6444 0 8221 0 8750 0 8072 O O 93 93 93 93 93 0 0000 0 0000 0 0000 0 0000 0 0000 0 0000 Correlation Sample Size P Value Figure 13 4 Correlation Matrix The table shows the correlation coefficient for each pair of variables the number of observations used to obtain the estimate and a P value A correlation coefficient ris a number between 1 and 1 which measures the strength of the linear relationship between two variables The closer the correlation is to 1 or 1 the stronger the relationship The sign of the correlation indicates the direction of the relationship A positive value means that Y goes up as X goes up A negative value means that Y goes down as X goes down To determine whether or not two variables are significantly related to each other a P Value is calculated for each correlation coefficient Any pair of variables for which the P Value is less than 0 05 exhibits a statistically significant linear correlation at the 5 significance level The top row shows the correlations between MPG C7y and the 6 predictors The strongest correlation is with Wezghr at 0 8431 The negative sign implies that as Wezght increases MPG City decreases which is not at all surprising 201 Regression Analysis 13 2 Simple Regression The first statistical model that will be fit is a straight line of the form MPG City p B Weight
58. 0769 3333333333 9677419355 8571428571 3703703704 4444444444 5294117647 8571428571 Pie RB Be RB BR BR oe ee so 14 Js 3240 M4 gt 7 A B C D E F G H I Jf Figure 2 13 Appearance of a Formula Column in a Datasheet 40 Data Management If the values in the MPG City or MPG Highway columns change MPG Ratio will be automatically recalculated to reflect those changes NOTE recalculation of formula columns does not normally occur until the data in those columns is needed for a calculation or is saved or printed You can force a recalculation to occur immediately by selecting Update Formulas from the Edit menu 2 3 3 Transforming Data STATGRAPHICS Centurion also contains a large number of mathematical functions that may be used to transform existing data As when creating new variables transformations may be done either directly within fields of a data input dialog box or by creating new columns in a datasheet For example suppose it was desired to plot the miles per gallon that an automobile obtained versus the natural logarithm of vehicle weight Selecting the X Y P ot procedure from the main menu displays the following data input dialog box X Y Plot y Domestic gt MPG City Drive Train Engine Size Fueltank x Horsepower LOGWeighi Length E IN eight Luggage N Make Manual Select Max Price Mid Price Min Price m kA n dal J Sort
59. 11 0 214 5 201 5 200 9 206 8 205 8 200 3 196 1 205 9 195 1 203 9 192 9 199 0 195 5 203 1 197 4 194 8 201 0 202 5 199 0 200 7 197 6 198 5 205 3 197 1 202 8 201 6 197 4 200 9 203 3 209 4 201 4 199 5 207 8 204 9 205 5 203 0 208 1 200 2 218 2 202 0 209 3 201 2 200 4 201 0 195 7 229 5 199 9 208 1 210 3 202 0 202 6 213 6 198 0 197 8 196 7 216 0 211 6 208 7 199 4 200 8 201 1 195 3 206 8 211 3 201 5 200 0 211 8 195 6 201 9 199 0 200 3 197 8 200 8 194 8 199 5 195 5 201 0 206 0 215 3 202 6 199 9 200 6 197 6 207 4 235 Process Capability Analysis This chapter describes how to conduct a typical capability analysis for this type of variable data 15 1 Plotting the Data The first step in examining any new set of data is to plot it For a set of data such as that shown above the One Variable Analysis described in Chapter 10 provides several useful tools To analyze this data 1 Open the file called ctems sf6 2 Execute the One Variable Analysis procedure using the column named Strength The initial analysis window is shown below EE One Variable Analysis Strength One Variable Analysis Strength Data variable Strength specs are 190 230 Scatterplot 100 values ranging from 191 3 to 229 5 The StatAdvisor This procedure is designed t
60. 15 which is highlighted in red It has a Szudentized Value Without Deletion of 3 479 Studentized values are calculated from A value of 3 479 indicates that an observation is 3 479 sample standard deviations above the sample mean when the observation is included in the calculation of x and s The Szudentized Values With Deletion indicate how many standard deviations each observation lies from the sample mean when that observation is wot used in the calculations If not included in the calculation row 15 is 3 67 standard deviations out Observations more than 3 standard deviations from the mean are unusual unless the sample size nis very large or the distribution is not normal A formal test may be made of the following hypotheses 1 55 One Sample Analysis Null hypothesis The most extreme value comes from the same normal distribution as the other observations Alternative hypothesis The most extreme value does not come from the same normal distribution as the other observations A widely used test of these hypotheses is Grubbs test also called the Extreme Studentized Deviate test STATGRAPHICS Centurion conducts this test and displays a P Valve In general a P value quantifies the probability of obtaining a statistic as unusual or more unusual than that observed in the sample if the null hypothesis were true If the P Value is small enough the null hypothesis can be rejected since the sample would have been an extremely rare
61. 2 Pressing the Graphics options button on the analysis toolbar clicking on the X Axis tab and checking the Rotate Axis Labels box 3 After exiting the Graphics options dialog box the labels may not fit completely on the screen If not you can hold your mouse button down within the main part of the graph and drag it higher or you can drag the X axis up to reduce the size of the vertical axis When finished the Pareto chart should look like that shown below 221 Analyzing Attribute Data Pareto Chart for Defect 96 67 99 17 100 00 94 17 87 50 91 67 80 83 gt o c o 2 oO D uc Contaminated Misaligned Poor color Missing parts Misshapen Leaking Wrong size Figure 14 5 Enlarged Pareto Chart The vertical bars in the Pareto chart are drawn with height proportional to the number of times each defect type occurred The line above the bars is a cumulative count from left to right Shown above each bar is the percentage of defects occurring in a particular class or classes farther to the left The basic Pareto principle states that a large majority of defects are usually due to a small number of possible causes In this case the 3 most frequent defect types account for over 80 of all the defects 14 3 Crosstabulation The defects sf6 data file also contains an identification of which facility produced each defective item To summarize the data by both defect
62. 3 3 62 4 59 8 67 0 61 6 60 3 58 3 64 9 61 0 60 6 56 4 63 7 63 8 60 0 61 6 61 8 60 9 60 3 59 5 64 3 65 1 62 4 62 0 64 3 61 5 61 9 61 4 65 9 60 0 63 1 58 6 63 6 62 9 60 2 59 5 64 6 60 6 58 6 60 0 183 Comparing More than Two Samples It is of considerable interest to determine which of the materials produces the strongest widgets as well as which materials are significantly different from which others There ate two ways to enter data for multiple samples into a datasheet 1 Use a separate column for each sample 2 Use a single column for all of the data and create a second column to hold codes identifying which sample each observation comes from For this example the first approach has been selected The data for the widgets have been placed in four columns of a file called widgets sf6 which you can open by selecting Open Open Data Source from the File menu 12 1 Running the Multiple Sample Comparison Procedure The Multiple Sample Comparison procedure is available on the main menu under 1 If using the Classic menu select Compare Multiple Sample Comparisons Multiple Sample Comparison 2 If using the Six Sigma menu select Analyze Variable Data Multiple Sample Comparisons Multiple Sample Comparison The initial dialog box is used to indicate how the data have been structured Multiple Sample Comparison mm Multiple Data Columns Cancel Data and Code Columns C Sample St
63. 4 STATGRAPHICS Centurion Data Sheet with Column Names Now enter the data as you would in any spreadsheet using the arrow keys to move from cell to cell DO NOT enter commas when entering large numbers When done the datasheet should have the following appearance Population Percent Female Per Capita Per Capita 4447100 51 7 626932 48 3 Arizona 5130632 50 1 Arkansas 2613400 51 2 California 33811648 50 2 Colorado 4301261 49 6 Figure 1 15 STATGRAPHICS Centurion Data Sheet after Entering 6 Rows of Data 15 Getting Started 18819 22660 20275 16904 22711 24049 Finally you need to save the data file Choose File Save Save Data File from the main menu Select a file name in which to save the data Save Data File As Save in O Data gt 53 ex Eg 93cars census2000 Film m injection aircraft checksheet Fish B iris f baseball E cities gagestudy e Linearity bloodpressure m collisions m galactose rj mines r i Bodyfat m crabs m golden gate e nephrectomy E breaking i empty a houses m nlreact gil mi F3 Filename oensusdta S S Save as type SG Centurion Files sf6 X Cad Help Figure 1 16 Save Data File Selection Dialog Box It is good practice to assign a meaningful name to each data file Data files in STATGRAPHICS Centurion are saved on disk by default with an extension of sf6 and are readable
64. 6 configuring 113 copying graphs to 115 modifying graphs 117 overlaying graphs 116 printing 119 Statistical Tolerance Limits 168 Statistics for Experimenters 189 StatLink 54 108 StatPublish 109 StatReporter 121 copying output to 122 modifying 123 287 Index saving 123 StatWizard 9 12 125 stepwise regression 211 Studentized residuals 208 Studentized values 155 Sturges rule 159 Summary Statistics 21 148 173 237 Surface and Contour Plots 213 surface plots 274 t test 166 178 Tables 63 Tabulation 218 tolerance limits 168 tolerance plot 169 transformations 134 Two Sample Comparison 171 two way tables 225 Update Formulas 41 updating links 141 XML files 34 Z scotes 250
65. 63073 0 908321 0 3662 0 000223526 0 00028967 0 771658 0 4424 Analysis of Variance 0 00705967 6 0 00117661 6764 0 0000 Residual 0 001496 86 0 0000173054 IEEE Total Corr 0 00855567 d R squared 82 5145 percent R squared adjusted for d f 81 2946 percent Standard Error of Est 0 00417077 Mean absolute error 0 00304978 Durbin Watson statistic 1 6264 P 0 0306 Lag 1 residual autocorrelation 0 186005 0 0 0 0 00072849 0 000980504 742974 0 4595 0 0000132632 0 000014911 0 889485 0 3762 Length 0 000101355 0 0000608857 1 66468 0 0996 0 The StatAdvisor The output shows the results of fitting a multiple linear regression model to describe the relationship between 1 MPG City and 6 independent variables The equation of the fitted model is 1 MPG City 0 0155897 0 00072849 Engine Size 0 0000132632 Horsepower 0 000101355 Length 0 0000149727 Weight 0 000148 122 Wheelbase 0 000223526 Width Since the P value in the ANOVA table is less than 0 05 there is a statistically significant relationship between the variables at the 95 confidence level Figure 13 14 Multiple Regression Analysis Summary with 6 Predictor Variables Notice that the R squared statistic has risen to 82 5 However the model is unnecessarily complicated Near the top of the output is a column of P Values These P Values test the 210 Regression Analysis hypothesis
66. 65 3 2 5 Analysis Options Citi ais sonesessassnnsnsncdesincabasscssvahassnasaneca asst saian anian nata ALBUM UR 66 326 Pane Options DUELOR sauepeceia eit tpi snt bn eti iie ae arais iian Sae EMI e esida a TEREE 68 32 Graphics BUMONS et dt eiie ui e a Siei ee ei anie ou eerie ln du 70 NC Bo TOU POL OD einen dpud rade e en ene Sea esen ieee aser t bd ica adu 71 3 Printime the Results MENTRE RAE 72 3 4 Publishing the cc ENIM NU NND NUN P INT IR E SU EUN 74 GPA PICS ice e HERR RAN MER U THERME FRAN RUE QUERN AENEQAR FRU NA FEMA E UM EI UU PARRA E FECE UA QVE 75 UB DEC 76 XEM I SS UU ain TI AA Gid ONG HO IS 79 LENSES Edu cic euina e EEEE EEE ETERS 81 ALA Points HO ieina e eee eae nae 83 xU Top Title Option Pm 85 4 16 Axis Sealing CODO vind bani enean an moana a ADI AE E UNDE 87 LANES M Pere 89 4 1 8 Text Labels and Legends Options anssiecpiaiktat kai ban escas vb arid redu MUR ERE 90 41 9 Adding New Text assisia aeniea Edi at P tad a d D D Er 90 7 2 ARCHIE a SCARCE ONO e caasa reas was rada a RERUM exi Sk a pa RA BRA PEU EE Fn iu A 91 43 Brushing a Seat berplabsososate epa iaon UNSER DU UNS a a a a 93 4 4 Smoothing Mr TH 95 QUE div cui TTE 97 4 6 Copying Graphs to Other Ap Pl Cat Otis sioe bp oebrtrs kt iiia etri eb Il three ia REN Ow Pn tK KR 100 t Saving Graphs in Image Filesi it riii cra UFU ri xc
67. 7 E3 Texas 16 Contaminated El Virginia ges 4 83 Misaligned Medged o s nega age roRysed Wiro 0 3 Figure 14 7 Crosstabulation Analysis Window The table in the bottom left pane tabulates the data by both defect type and facility 224 Analyzing Attribute Data Frequency Table for Defect by Facility 300090 14 17 44 17 Damaged 106 Ba 5 00 133396 L67 08396 2 5099 Misaligned S 20 83 J6 67 e 166796 1233396 Mishapn o BO B 0 0079 2 50 2 50 Poorcolor 6 2 fB 5 009 L679 6675 L67 2509 far Wrogsize Mo 0 f 0837e 0 0096 0 8396 55 8390 44 17 100 0096 Cell contents Observed frequency Percentage of table Figure 14 8 Two Way Table with Table Percentages As initially displayed each cell of the table displays the number of rows in the data file corresponding to a particular row column combination It also indicates the percentage of the entire table represented by that cell For example there were 36 contaminated items produced in the Texas facility representing 30 percent of all defective items in the sample Pane Options allows you to select other items to display in each cell 225 Analyzing Attribute Data r Frequency Table Options Include OK Table Percentages Iv Row Percentages Cancel Column Percentages dd Help Expected Frequencies
68. 77 16 6 Further EXperutte atatiofkoses nivis inisini ainiai a iaaiiai niiina 278 S gg st ed Reading M 281 c 282 jn 283 vi Table of Contents Preface This book is designed to introduce users of STATGRAPHICS Centurion XV to the basic operation of the program and its use in analyzing data It provides a comprehensive overview of the system including installation data management creating statistical analyses and printing and publishing results Since the book is intended to get users up to speed quickly it concentrates on the most important features of the program rather than trying to cover every small detail The Help menu within STATGRAPHICS Centution XV gives access to an extensive amount of additional information including a separate PDF file for each of the approximately 150 statistical procedures The first nine chapters cover basic use of the program While you could probably figure out much of this material on your own while using the program thorough reading of those chapters will help you get up to speed quickly and ensure that you don t miss any important features The last seven chapters include tutorials intended to 1 Introduce you to some of the more commonly used statistical analyses 2 Illustrate how the unique features of STATGRAPHICS Centurion facilitate the data analysis process Itis recom
69. 8 comparing several medians 192 comparing several standard deviations 194 comparing standard deviations 177 correlation coefficient 201 mean 166 median 166 normality 241 outliers 156 regression 204 two way table 229 installation 1 interaction plot 269 jittering a scatterplot 91 187 Ej 285 Index K 250 Kolmogorov Smirnov test 181 242 Kruskal Wallis test 192 kurtosis 150 LAG 42 largest extreme value distribution 242 LAST 60 launching the program 6 Levene s test 194 license agreement 2 license manager 6 linear regression model 205 LOG 42 LOG10 42 LOWESS 200 Lowess smoothing 96 LSD intervals 191 main effects plot 268 Mann Whitney Wilcoxon test 179 matrix plot 99 199 MAX 42 maximum 150 mean 149 means plot 190 median 150 median notch 153 menu systems 9 MIN 42 minimum 150 Modify Column 30 mosaic plot 227 multiple range tests 191 Multiple Regression 209 Multiple Sample Comparison 184 nonlinear regression model 205 nonparametric methods Friedman test 192 Kolmogorov Smirnov test 181 242 Kruskal Wallis test 192 Mann Whitney Wilcoxon test 179 signed rank test 166 normal distribution 150 240 normal probability plot 246 ODBC queries 36 One Variable Analysis 18 146 236 optimization 277 OR 61 outliers 154 195 outside points 153 Page Setup 73 Pane Options 24 68 panes 61 Pareto Analysis 219 parsimony 197
70. ATGRAPHICS Centurion in the following way 1 Return to the datasheet and click on the header of the Defects column to select it 2 Press the alternate mouse button and select Recode Data from the popup menu 3 Complete the Recode Data dialog box as shown below to combine the less common defect types into a single class labeled Other 230 Analyzing Attribute Data Recode Data X Lower Limit Misshapen exng Upper Limit Wngsze m New Value I m Limit Conditions Unmatched Lower lt Value lt Upper Leave as is C Lower lt Value lt Upper C Set to Missing C Lower lt Value lt Upper C Lower lt Value lt Upper Extrapolate Cancel Help Figure 14 15 Recoding Least Frequent Defect Types The entries on the Recode Data dialog box instruct the program to search for values in the Defects column falling within each defined interval Any label falling alphabetically between the limits shown in a given row is recoded to the value specified in the New Value column After performing the recode operation return to the Crosstabulation analysis window In response to the change in the datasheet the analysis will have automatically been updated The new Osher class now has a reasonably high frequency as shown in the revised Mosaic Plot 231 Analyzing Attribute Data Mosaic Chart for Recoded Defect by Facility Facility Tex
71. Chapter Graphics Modifying graphs saving graphics profiles interacting with graphs saving graphs in image files and copying graphs to other applications Together the 150 statistical procedures in STATGRAPHICS Centurion create hundreds of different types of graphs To facilitate the data analysis process default titles scaling and other attributes are selected whenever a new graph is created For analysis purposes the defaults often suffice But when it comes time to publish the final results creating a publication quality graph is important This chapter describes everything you need to know to work with graphs in STATGRAPHICS Centurion It shows you how to dress them up for publication It shows you how to copy them to applications such as Microsoft Word and PowerPoint It also shows you how to interact with graphs For example you might see an interesting point and wish to know more about it Or you might want to spin a 3D plot around to get a sense of any relationship that might be present between the variables portrayed on the X Y and Z axes As an example we will consider again the data in the 93cars sf6 file To begin the fitted model plot relating miles per gallon in city driving and vehicle weight will serve to illustrate some of the important graphics operations 75 Graphs 4 1 Modifying Graphs The Simple Regression procedure is commonly used to fit curves relating a response variable Y and a second explanatory varia
72. City and 6 independent variables The equation of the fitted model is 1 MPG City 0 0034427 0 0000260839 Horsepower 0 0000129513 Weight Since the P value in the ANOVA table is less than 0 05 there is a statistically significant relationship between the variables at the 95 confidence level Figure 13 16 Multiple Regression Analysis Summary after Backward Selection Only two variables remain in the model Horsepower and Weight Both variables have P values below 0 05 Once a mathematical equation has been found it is informative to plot that equation When the model contains 2 predictor variables the equation represents a surface in 3 dimensions usually 21 2 Regression Analysis referred to as a response surface In this case the fitted equation corresponds to a plane since Horsepower and Weight enter the model in a linear manner To plot the model you can either 1 Use the General Linear Models procedure which will automatically plot a regression model with respect to two predictor variables Note the GLM procedure is only available if you have the Professional Edition of STATGRAPHICS Centurion 2 Use the Surface and Contour Plots procedure That procedure is available in all editions although it requires that you copy in the function to be plotted and define your own titles and scaling Taking the latter approach 1 If using the Classic menu select Plot Surface and Contour Plots 2 If using the Six
73. Dialog Box 131 Using the StatWizard The fields on this dialog box are e Data or Response Variables Y one or more response variables containing the values to be analyzed If only one column contains data to be analyzed it must be entered here e Type the type of data contained in the response variable s The analyses displayed in subsequent dialog boxes depend on this choice Quantitative Explanatory Factors X any quantitative factors that are to be used to predict the response variables In a regression the independent variables go here Categorical Explanatory Factors X any non quantitative factors that are to be used to predict the response variables In an ANOVA the explanatory factors go here e Case Labels a column containing labels for each of the observations rows The procedures offered in subsequent dialog boxes depend on the data entries made in Figure 8 7 The next dialog box asks you which rows of the file to analyze 132 Using the StatWizard StatWizard Row Selection X You can analyze all of the rows in the datasheet or select a subset of rows Specify the data you want to analyze All Rows LI First 93 Rows Last 5 Rows Rows iT to 300 Random 93 Rows Rows for which Mae amp v fo Each Unique Value of Mae a e e 7 Back Cancel Help Figure 8 8 StatWizard Row Selection Dialog Box The first s
74. EA rinadi nni 100 Stat POOS si dA RI RUIN UNSER AEQ ER RII NDAANIU IN NNI MM FAI SRI SE NERA SAI RAEE EEA EI NI M UE 103 SN sn fonds i oian 103 S2 StAtP Ol SS NES MM 104 5 3 Polins Data SOURCES oae tei oi Rab DIVI skis Gea ea mn M RR IE 108 54 Pablisbuo Data in HIME Porttidlacieeesioewtredvitinqueho aho vgl irvipotety eder ven veris 109 Using thie StatGallety scien isiasisisscniscvsnsatscovsasiscsnsate ston save sein PAR Ibl IER PROS UYM INN PUR ER FI EY PEE d PEE 113 6 1 Configuring a StatGallery Papesasodsnteiquenedosqorquas and ipe GERE QUA edu dH 113 6 2 Copying Graphs tothe Sea ayes eaa esie tachoncictirseck iinan eaae e iE S 115 6 3 Overlaying CHEXDDS au nach onines aeii iadaaa Li DR NUM ub DAD oU ema D EE 116 6 4 Modifying a Graph in the StatGallety ica eoctaescs choc icabofoupst baa Lol cn resa o ELEM NEM RE 117 64 1 Adding Mirco TTD A 117 6 4 2 M difyinp MENS eritreai eri eiee i e dae a osia ia md 118 043 Deletino Treng ee ee cn PE oP ne nee OP ea 118 iv Table of Contents 6 5 PUNISH M 119 Using the Stat Beportel acs ccavatavacaradavacsvecatassisn civacsinatavntavadanansivatavacavatsnatavacavaGadatayatavagsnetanedas 121 TA The StatRepottet WiIndOW sssini eioan bedia a eaaa a D U SR OR cU gU UR ER Ud 121 Ge Copying Output tothe StSEREDOPGIE stan vacasacsvectinniscenscbaretisnanscensieaenctiaseaseisdsnacalannnnsau
75. Graphs in Image Files Individual graphs may also be saved in image files by maximizing a graph and then selecting Save Graph from the File menu A dialog box will be displayed on which to specify a file name and image format 1 00 Graphs Save Graph 2JE3 Save in a Data e amp ex E3 Filename matixplol Save as type JPEG 24 Bit Color ipa j Cancel Help Figure 4 33 File Selection Dialog Box for Saving Graph in Image File For saving graphs that are to be read into Word or PowerPoint saving the graph as a Windows metafile gives the most flexibility If the graph is to be displayed on a web page saving it as a JPEG file is recommended 101 Graphs 102 Graphs Chapter StatFolios Saving your session publishing results in HTML format and automating analyses using start up scripts Each time you select a statistical analysis from the STATGRAPHICS Centurion menu a new analysis window is created You may save all of the analysis windows at any time by creating a StatFolio A StatFolio is a file containing the definition of all statistical analyses that have been created with pointers to the data used by them By saving a StatFolio and later reopening it you effectively save and retrieve your current STATGRAPHICS Centurion session When a session is saved in a StatFolio it is the definition of the analyses that is saved not the output When reopening a StatFolio the data in the associat
76. In the above equation P is the slope of the line in units of miles per gallon per pound while B is the Y intercept To fit this model 1 If using the Classic menu select Re afe One Factor Simple Regression 2 If using the Six Sigma menu select Improve Regression Analysis One Factor Simple Regression The data input dialog box should be completed as follows Simple Regression t3 Air Bags la v Cylinders Domestic gt MPG City X Horsepower ae Length E N eig Luggage N Manual Select Max Price Mid Price Min Price m kisini l m 2 E o eo a T J Sort column names Cancel Delete Transform Help Figure 13 5 Simple Regression Data Input Dialog Box The initial analysis window has four panes providing information about the fitted model and the residuals 202 Regression Analysis B Simple Regression MPG City vs Weight Simple Regression MPG City vs Weight Dependent variable MPG City miles per gallon in city driving Plot of Fitted Model Independent variable Weight pounds MPG City 47 0484 0 00803239 Weight Linear model Y a b X Coefficients F Intercept 470404 i6591 220064 O Slope 000803239 0 000536985 149583 0 Analysis of Variance r 1600 2100 2600 3100 3600 4100 4600 Source Sum of Squares Df Mean Square I Weight r3 lt m Unusual Residuals EEE F Residual Plot ow r r
77. MPG City MPG Highway Box and Whisker Plot Minimum 640 Meimum 939394 el E nu 79 cr E a 100 MPG City MPG Highway Figure 2 11 One V ariable Analysis of Transformed Data The average ratio is approximately 76 3 ranging from a low of 64 0 to a high of 93 9 The ability to do analyses without modifying the datasheets is very important in facilitating the exploration of data If desired a new column could be created in a datasheet containing the transformed values For example you could return to the window containing the 93cars data and double click on the column header labeled Co 27 The Modify Column dialog box could then be used to define a new variable of type formula with the desired transformation 39 Data Management MPG Ratio Comment Da NE Define Help Cancel Type Numeric Date Character C Month Integer C Quarter Time HH MM C Date Time HH MM Time HH MM S5 C Date Time HH MM SS Fixed Decimal A Formula f 00 MPG City MPG Highway Figure 2 12 Creating a Formula Column M M MM This will create a new column whose values are calculated from the original two columns containing the miles per gallon data Formula columns are displayed in the datasheet using a gray scale since they are automatically calculated from other columns 2705 3560 3315 3405 3640 2880 3410 4105 3495 3620 3935 2490 2785 9230769231 076923
78. Ppk upper C From average s From median MR Cpk Ppk lower C From pooled s C From mean SSD CCpk 7 m Com r Indices for Non Normal Data VK Use corresponding Z scores Beyond Specs C Use distance between percentiles Iv Defects per Million Sigma quality level Sigma limits for platting 6 Apply bias correction for s Iv 1 5 sigma shift Sigma limits for indices fE Cancel ply Help Figure 15 16 System Preferences for Capability Indices The left hand side of the dialog box lists the indices that may be calculated In addition to C the available indices include 249 Process Capability Analysis 1 C a two sided capability index calculated from USL LSL uic nm 66 This index measures the distance between the specification limits relative to the distance covered by six standard deviations C is always greater than or equal to C A substantial difference between the two indices indicates that the process is not well centered 2 K a measure of how fat off center the process is K is calculated from i NOM USL LSL 2 where NOM is the nominal or target value A value of K close to 0 is indicative of a well centered process 3 Sigma Quality Level an index used in Six Sigma to indicate the level of quality associated with a process A Sigma Quality Level of 6 is usually associated with a defect rate of 3 4 defects per million The Preferences dialog box
79. Regression Save Results Dialog Box To save information check the items of interest in the Save field For each item to be saved assign a column name under Target Variables and indicate the desired Datasheet If you wish to save a comment along with the data check Save comments The Axtosave box is used to automatically resave the selected item if and when the analysis is rerun This is useful if you intend to save the analysis in a StatFolio since analyses are rerun whenever StatFolios are loaded By checking the Awsosave box you can set up a StatFolio to automatically calculate and save desired statistics When combined with the scripting capability described in Chapter 5 this enables you to automate many tasks 3 2 5 Analysis Options Button 2 Most analyses have multiple options When first run default values are selected for these options which are often sufficient However pressing the Analysis Options button within any procedure will allow these basic settings to be changed For Simple Regression the Analysis Options dialog box specifies the type of model to be fit and the method for estimating the unknown model coefficients 66 Running Statistical Procedures Simple Regression Options Type of Model C Linear C Square Root Y C Exponential C Reciprocal Y C Squared Y C Square Root C Double Square Root C LogY Square Root C Reciprocal Y Square Root Squared Y Square RootX Logarithmic Square Root Y Log Mu
80. STATPOINT Inc STATGRAPHICS Centurion XV User Manual STATGRAPHICS CENTURION XV USER MANUAL 2005 by StatPoint Inc www stateraphics com All rights reserved No portion of this document may be reproduced in any form or by any means without the express written consent of StatPoint Inc Reference as STATGRAPHICS Centurion XV User Manual STATGRAPHICS is a registered trademark STATGRAPHICS Centurion XV StatPoint StatFolio StatGallery StatReporter StatPublish StatWizard StatLink and SnapStats are trademarks All products or services mentioned in this book are the trademarks or service marks of their respective owners Printed in the United States of America Table of Contents Table of Contents so iasscevncascadencwssensndesaasesteseennnasuncssncantanesduauea EEEE Orre or CE Eae aneks iii AEE T E E T vii Getting crc 1 iN 1 L5 Bue the EPOD esien iiai d vt ee deseada fehl a COQUE 6 BT Ha ou Data MNT E 10 DEN Coupon o Nbr cR T eSa n 16 to Anayzing the Data TIN M 18 16 Usine the Analysis TOoolba Nnnna eea aeii 22 INN ecu m 27 1 8 Saying Vout Wore aerogen n Eei EEEE EERENS AE PERI UNTERE rp 27 Data Management sssessesseessssseesrsseessesseeessseeesssseeeesssesesseeeesseeesesseessesseeesssseesseseeesesseeeseee 29 21 Th Da Oe i anonim NU 30 22 Accessing LAA MH 32 2 2 1 Reading Data from a STATGRAPHICS Centurion Data File
81. Sigma menu select Tools Surface and Contour Plots On the data input dialog box enter the model expressing the two predictor variables as X and Y The easiest way to do this is to paste in the equation generated by the Multiple Regression procedure changing Horsepower to X and Weight to Y 213 Regression Analysis Function 0 003442 0 0000260839 0 00001 29513 Y ie From fi 500 To Cancel Help Figure 13 17 Response Surface and Contour Plot Data Input Dialog Box The scaling of X and Y should also be changed to be representative of the data used to fit the model When you press OK a surface plot will be generated The initial plot takes the form of a wire frame surface 214 Regression Analysis 0 0034427 0 0000260839 X 0 0000129513 Y X 0 001 72 c 62 S s e 42 pl 171 5 7 wae 32 LEL L i a a se 27 rr i a 50 100 150 200 250 300 1 Y X Figure 13 18 Surface Plot with Default Labels and Scaling You can improve the plot greatly by Selecting Graphics options from the analysis toolbar and changing the labels and scaling on the Top Title X Axis Y Axis and Z Axis tabs In particular e Change the X axis title to Horsepower e Change the Y axis title to Weight e Change the Y axis scaling to run from 1500 to 4500 by 1000 e Change the Z axis title to 7 MPG City Selecting Pane Options and changing the type of plot displayed
82. Structure Data and code columns i Number of Parts S amples Items C Single row for each part 0 Number of Trials V Randomize Trials pee Study Header Gage Study Wed Feb 23 09 26 27 2005 Cancel Help Figure 8 3 StatWizard Gage Study Setup Dialog Box In the dialog box enter the number of operators who will be involved in the study the number of parts that will be measured and the number of times each operator will measure each part You may also specify a header for the study A final dialog box requests names for the operators appraisers or labs that will be making the measurements 128 Using the StatWizard Operato r Appraiser Lab Names Options uw Operator Appraiser Lab Labels Cancel Help Figure 8 4 Dialog Box for Specifying Operator Names The StatWizard then creates the desired study and places it into a datasheet in the DataBook Gage Study Wed Feb 2 bnin memm el ele el ele H s n 4 B C7 D E F 8 H 1 J Figure 8 5 Gage Study Created by the StatWizard The study would then be performed and measurements entered in the datasheet The StatWizard could then be invoked again to select an analysis procedure or you could go directly to the relevant analyses on the main menu 129 Using the StatWizard 8 2 Selecting Analyses for Your Data If data has already been loaded into the DataBook clicking on the Stat
83. Wizard button displays a dialog box from which to select one or more analyses to perform StatWizard To analyze your data you may select analyses based on the type of data choose an analysis by name select a SnapStat or Quick Pick or search for analyses that calculate a statistic of interest Select Analysis Based on Type of Data C Search C Select Analysis by Name C Select from the Following Quick Picks G atize a Single Column of Data aq f iT tl Cr C C Cr d C C Si f C Select a SnapStat C Fit pl Analyze a Single CF re li 5 1 p t cju ej tf C j Ex ii a j CP af a C Fi CF aP CA biliti Mr gs Im m ts PS ak e Gage Measurement E 5 gt a Gage RRA Study Back Cancel Help Figure 8 6 StatWizard Dialog Box for Selecting Analyses There are five options 1 Select Analysis Based on Type of Data Displays additional dialog boxes requesting information about the data to be analyzed after which a list of relevant procedures is presented 2 Select Analysis by Name Displays all analyses in alphabetical order Selecting an analysis by name and pressing OK takes you directly to the data input dialog box for that analysis bypassing the usual menus 130 Using the StatWizard 3 Select a SnapStat Allows you to select a SnapStat SnapStats are streamlined analyses that produce a single page of preformatted output They have fewer options than other analyses but ate very easy to cre
84. X s A homogeneous group is a group within which there are no significant differences In this case sample A is in a group by itself since it is significantly different than all of the others Sample C falls in two groups one with B and one with D More data would be required to distinguish which group sample C actually belongs to 12 4 Comparing Medians If it is suspected that outliers may be present a nonparametric procedure may be used as an alternative to the standard analysis of variance by selecting Kruskal Wallis and Friedman Tests from the Tables dialog box These tests compare the sample medians rather than the means Null hypothesis the medians are all equal Alternative hypothesis the medians are not all equal The type of test may be selected using Pane Options Two types of tests are provided 1 Kruskal Wallis test appropriate when each column contains a random sample from its population In such a case the rows have no intrinsic meaning 2 Friedman test appropriate when each row represents a block i e the level of some other variable Typical blocking variables are day of the week shift or manufacturing location In the example row has no meaning so the Kruskal Wallis test is appropriate Kruskal Wallis Test Sample Size Average Rank 40 7917 B 2 257917 D fi2 121667 Test statistic 27 3735 P Value 0 00000491592 Figure 12 11 Multiple Range Tests 192 Compar
85. a xls gt VW Cc fuie o rr C D kuntitled gt BN CE futte gt i Nr C F lt untitled gt LI Cg Kmieb FOF C H lt untitled gt BEEN C w b rr C web 700 r n pda aver 5 Close Save As Help V Run script 2 mn Dum jaial C hours IV Display variable comments Figure 5 5 DataBook Properties Dialog Box for Polling Data Sources To query the data sources repeatedly 1 Place a checkmark in the Po box for each data source to be reread 2 Set the radio buttons in the Po ing field to On 3 Specify the frequency for requerying each data source 4 Check Run Script if you wish to run the StatFolio start up script each time the data is read 108 StatFolios By including a Publish step in the start up script you can have STATGRAPHICS Centurion automatically upload the output to a network server 5 4 Publishing Data in HTML Format The output of a StatFolio may be published in a format that is viewable using only a standard web browser by selecting StatPublish from the File menu A dialog box is displayed to specify which output to publish and where it should be placed Publish StatFolio HTML file on local directory Ic Data my statfolio htm BROWSE FTP site URL http wu mysite com FTP Username FTP Password myname pree Include Graph Width in Pixels Image Format M Analyses 550 f JPEG Iv Comments Graph Height in Pixels fae StatGallery 350 C
86. ain effects but some or all of the two factor interactions may be confounded mixed in with other interactions or block effects o Resolution III designs can estimate only main effects requiring that no interactions be present for proper interpretation Maximum Runs per Block In doing the experiment the engineer realized that she could do no more than 10 runs from a single batch of raw material Since batches might be different from each other the experimental runs need to be grouped into blocks of no more than 10 runs each Minimum Centerpoints per Block specifies the smallest number of centerpoints that ate desired in each block Centerpoints are experimental runs in the center of the experimental region and are often used to create replicates from which to estimate experimental error In 255 Design of Experiments this case the engineer decided to let the program determine the necessary number of centerpoints e Experimental Error Sigma the standard deviation of the experimental process This is the standard deviation that would be observed from repeated runs at the same set of experimental conditions From previous studies it was thought that this value would be around 0 5 for yzeld which was considered to be the most important response When OK is pressed the program will display a second dialog box Screening Design Selection Options Specify OK C Absolute Error e as Cancel C Relative Error Help i Pow
87. ain menu under 1 If using the Classic menu select Relate Multiple Factors Multiple Regression 2 If using the Six Sigma menu select Improve Regression Analysis Multiple Factors Multiple Regression The data input dialog box takes the following form Multiple Regression Air Bags TIEN indere Dependent Variable Domestic gt f MPG City Drive Train Engine Size Fueltank Independent Variables Horsepower Length Engine Size Luggage Horsepower Make Length Manual l Weight Max Price Wheelbase Mid Price Width Min Price Model MPG City MPG Highway Passengers Rear seat Select Revs per Mile m IN Type n U Tum Space wWeights Weight Mij v Sort column names Cancel Delete Transform Help Figure 13 13 Multiple Regression Data Input Dialog Box 209 Regression Analysis To begin all 6 predictors considered in the MzuZpie V ariable Analysis procedure discussed earlier will be entered as independent variables The dependent variable is the reciprocal of MPG Cit which equates to gallons per mile The resulting analysis summary is shown below Multiple Regression 1 MPG City Dependent variable 1 MPG City Independent variables Engine Size liters Horsepower maximum Length inches Weight pounds Wheelbase inches Width inches Lo Standard T O 0 6 0 0000149727 00000242804 1666 0 0000 Wheelbase 0 000148122 0 0001
88. also affects which indices are displayed on the Capability Plot and how they are labeled A detailed discussion of the various indices may be found in the PDF document titled Capability Analysis Variable Data In addition to the capability indices the table in Figure 15 15 includes confidence intervals that show the margin of error in estimating those indices For example the above table shows a C of 0 74 The 95 confidence interval extends from 0 62 to 0 86 This indicates that the true C in the process from which the data were sampled may be anywhere between 0 62 and 0 86 When the data do not follow a normal distribution the capability indices need to be modified The default option on the Preferences dialog box calculates non normal indices by first computing equivalent Z scores for the fitted non normal distribution For a normal distribution the Z score measures the number of standard deviations from the process mean to a specification limit and is directly related to the probability that an observation is beyond that limit For a non normal distribution an equivalent Z score is calculated by first determining the probability of exceeding the limit and then finding the Z score that equates to that probability After calculating equivalent Z scores for both the upper and lower specification limits C may be calculated from 250 Process Capability Analysis C k min Z Zu r3 p Note Although the Preferences dialog box giv
89. am The frequency histogram provides a back to back comparison of the two samples Using Pane Options to rescale the class intervals so that there are 25 intervals between 96 and 101 degrees generates the following plot 174 Comparing Two Samples a c o 2 oO ike amp Figure 11 4 Dual Frequency Histogram The histogram for the females is displayed above the horizontal line The histogram for the males is inverted and displayed below the line The shapes of the distributions are similar with a possible shift of the females distribution to the right of the males 11 4 Dual Box and Whisker Plot The analysis window also displays box and whisker plots for the two samples As explained in Chapter 10 the central boxes cover the middle half of each sample The whiskers extend to the largest and smallest data values in each sample except for any points that are unusually far from the boxes A vertical line is drawn within each box at the sample median while small plus signs indicate the locations of the sample means In this case it is particularly useful to add median notches by accessing Pane Options The resulting plot is shown below 175 Comparing Two Samples Box and Whisker Plot Female 97 98 99 Temperature Figure 11 5 Dual Box and Whisker Plot with Median Notches Evident in this plot are 1 An apparent offset of the center of the females distribution to th
90. ample A does not overlap any of the other intervals the mean of sample A is significantly different than that of all the other 3 samples Sample B is also significantly different than sample D since their intervals do not ovetlap Sample C however is not significantly different than either B or D The same analysis can be displayed in tabular form by selecting Multiple Range Tests from the Tables dialog box Multiple Range Tests Method 95 0 percent Tukey HSD Count Mean Homogeneous Groups D 2 598417 x amp B 2 Jetoss xX A B 279167 1 65755 ac P ps Lem 485833 165755 1105833 1 65755 BD 206667 1 65735 C D 1 00833 1 65755 denotes a statistically significant difference Figure 12 10 Multiple Range Tests 191 Comparing More than Two Samples The bottom section of the output shows each pair of means The Difference column displays the sample mean of the first group minus that of the second The Limits column shows an uncertainty interval for the difference Any pair for which the absolute value of the difference exceeds the limit is statistically significant at the selected significance level and is indicated by an in the Szg column In the current example four of the six pairs of means show significant differences The top section of the display arranges the samples into homogeneous groups shown as columns of
91. an temperature of the population from which the sample was taken In this case the sampling error is about 0 15 degrees in either direction A larger sample would result in a smaller margin of error 10 4 Testing for Outliers Before estimating any additional statistics it is worth taking a moment to investigate whether row 15 should be considered a true outlier and potentially removed from the data set STATGRAPHICS Centurion includes a procedure that performs a formal test to determine whether an observation could reasonably have come from a normal distribution The test is available on the main menu by selecting 1 If using the classic menu select Describe Numeric Data Outlier Identification 154 One Sample Analysis 2 If using the Six Sigma menu select Analyze Variable Data Outlier Identification Specifying Temperature in the Data field generates a large table of statistics displayed in the lower half of the left pane Of particular interest is the table showing the 5 smallest values in the sample and the 5 largest values Sorted Values Studentized Values Studentized Values Modified e oo LLL LLL jo 994 156055 r5900060 amp 11489 97 o9 X 225131 23008 0121584 Grubbs Test assumes normality Test statistic 3 47903 P Value 0 0484379 Figure 10 9 Selected Output from Outlier Identification Procedure The most unusual value is tow
92. aphical comparisons and hypothesis tests Often data to be analyzed consists of two samples possibly from different populations In such cases it is useful to 1 Display the data in such a way that visual comparisons are possible 2 Test hypotheses to determine whether or not there are statistically significant differences between the samples Tutorial 1 in the last chapter analyzed a set of body temperatures taken from 130 subjects Of those subjects 65 were female and 65 were male In this tutorial we will compare the data of the women to that of the men To analyze the body temperatures open the Lodytemp sf3 data file using Open Data Source on the File Open menu 11 1 Running the Two Sample Comparison Procedure The main procedure for comparing data from two samples is the Two Sample Comparison procedure invoked from the main menu as follows 1 If using the Classic menu select Compare Two Samples Independent Samples 171 Comparing Two Samples 2 If using the Six Sigma menu select Analyze Variable Data Two Sample Comparisons Independent Samples The data input dialog box for that procedure is shown below Two Sample Comparison Gender t Heart Rate un Temperature Temperature Sample Code Gender Select gt Temperature lt 100 m Input J Sort column names C Two Data Columns Data and Code Columns Cancel TUN Transform Help Figure 11 1 Two Sample Co
93. as Contaminated HS Virginia Damaged Misaligned Other Figure 14 16 Mosaic Plot for Recoded Data After recoding the chi squared test still shows a statistically significant difference between the Texas and Virginia facilities Tests of Independence Chi Squared 11 874 3 0 0078 The StatAdvisor This table shows the results of a hypothesis test run to determine whether or not to reject the idea that the row and column classifications are independent Since the P value is less than 0 05 we can reject the hypothesis that rows and columns are independent at the 95 confidence level Therefore the observed value of Recoded Defect for a particular case is related to its value for Facility Figure 14 17 Chi Squared Test after Recoding Data It thus appears that defect type is indeed related to the facility in which an item was produced It should be noted that the above test compares the distribution of defect types between the two facilities It does not compare the numbers or percentages of defective items at each location Such a comparison requires a different test as explained in the next section 232 Analyzing Attribute Data 14 5 Contingency Tables To determine whether one facility produces more defective items than another we need to know the total production at each facility Suppose the following describes one month s production Facility Number
94. at which other factors will be held Response Plot Factors Q ow Cancel Help o a zlz erg JV temperature JV flow rate concentration SERRRRRRERNRRA TETETTTUTT EER agitation rate catalyst r TTT TT TAT B E r Z i Figure 16 23 Response Plot Factor Options Dialog Box To create the plot below the Contours field has been set to Painted the Surface to Solid with Contours Below and the contours have been scaled to range from 81 to 86 by 1 275 Design of Experiments Estimated Response Surface concentration 6 5 agitation rate 137 5 catalyst 1 25 temperature Figure 16 24 Response Surface Plot with Contours Below The same plot can be displayed as a contour plot rather than a surface plot Contours of Estimated Response Surface concentration 6 5 agitation rate 137 5 catalyst 1 25 2H F F Cou m yield F j E 81 0 8N 82 0 E 83 0 984 0 M 85 0 E 86 0 pas Ke uc 155 160 165 170 175 180 temperature Figure 16 25 Contour Plot of Response Surface High values of y e d are obtained in the upper right corner 276 Design of Experiments 16 5 Optimizing the Response To determine values of the experimental factors at which best yields are obtained select Optimization from the Tables dialog box This will display the following output Optimize Response Goal maximize yield Optimum value 88 6736
95. at can be estimated from the selected experiment An entry such as A indicates that the main effect of factor A can be estimated clear of any other effects AB refers to the interaction between factors A and B which is also clear of other effects The only contrast that shows confounding of two effects is number 13 in which the CD interaction appears together with the block effect This implies that the design can estimate the combination of the CD interaction plus any difference between blocks 1 and 2 but it cannot separate those two effects Note that the design has arbitrarily sacrificed the ability to estimate the interaction between factors C and D which are concentration and agitation rate If this is an interaction that the engineer believes may be important she should change the order of the variables so that C and D correspond to two variables that are not likely to interact 16 3 Analyzing the Results After designing the experiment the engineer performed the indicated 20 runs She then restarted the program and entered the measured values of ye d and strength into the experiment datasheet 265 Design of Experiments To replicate her analysis you may load the w oria 7 5fx file in the same manner as you would any STATGRAPHICS data file by selecting Open Data Source from the Fi e menu After loading the data 1 Ifusing the Classic menu select DOE Design Analysis Analyze Design 2 If using the Six Sigma menu
96. ata sample The central box covers the middle half of the data extending from the lower quartile to the upper quartile The lines extending above and below the box the whiskers show the location of the smallest and largest data values The median of the data is indicated by the vertical line within the box while the plus sign shows the location of the sample mean The fact that the upper whisker is slightly longer than the lower while the mean is somewhat greater than the median is a sign of positive skewness in the data 1 6 Using the Analysis Toolbar When an analysis window such as the One Variable Analysis is first displayed only some of the available tables and graphs are included To display additional output you must push the appropriate button on the Analysis Toolbar which is displayed immediately above the analysis title Bata 4 amp 5 oe SE vel Row Figure 1 24 The Analysis Toolbar 22 Getting Started The buttons on the analysis toolbar are very important The actions of the 7 leftmost buttons are summarized below Name Function Input dialog Displays the data input dialog box so that the selected data column s may be changed H Tables Displays a list of other tables that may be created H Graphs Displays a list of other graphs that may be created B Save results Allows calculated statistics to be saved to columns of a datasheet Analysis options Selects options that
97. ate 4 Search Displays a pulldown list of statistics tests graphs and other output that may be created in STATGRAPHICS Centurion Selecting an item from the list changes the display in the Se ect Analysis by Name field to list only those analyses that calculate the desired item 5 Select from the Following Quick Picks Lists some of the more commonly used analyses Selecting an analysis and pressing OK takes you directly to the data input dialog box for that analysis If you elect option 1 the StatWizard will next display a dialog box in which to indicate the data to be analyzed For example if the 93cars sf6 file is loaded into the DataBook the dialog box takes the following form StatWizard Data Selection Select the columns you want to analyze by highlighting a variable name and pressing an arrow button to enter it into the applicable text box You must select at least one Y variable Air Bags Data or Response Variables Y Cylinders m gt MPG City Type Domestic General Numeric a ie Counts Integer Fueltank Proportions 0 1 Horsepower Categorical Lenat Quantitative Explanatory Factors X uggage Make Weight Manual Max Price Mid Price Model Categorical Explanatory Factors X MPG Highway Passengers p Rear seat Revs per Mile E RPM Type UTum Space r Case Labels Weinht M v Sort column names Back Cancel Delete Help Figure 8 7 StatWizard Data Selection
98. ate button on the analysis toolbar Of most interest is the top row of plots which show MPG C7fy plotted versus each of the 6 potential predictor variables All of the variables are clearly correlated with miles per gallon some in a nonlinear manner There is also a great deal of multicollinearity present correlation amongst the predictor variables which suggests that many different combinations of variables may be equally good at predicting Y The table at the bottom left shows a matrix of estimated correlation coefficients for every pair of variables in the analysis 200 Regression Analysis Correlations MPG City MPG City 0 7100 0 6726 0 6662 0 8431 0 6671 0 720 PC 03 93 93 93 93 93 0 0000 0 0000 0 0000 0 0000 0 0000 0 0000 Engine Size 0 7100 0 7321 0 7800 0 8451 0 7325 0 8671 05 O 93 93 93 93 0 0000 0 0000 0 0000 0 0000 0 0000 0 0000 0 6726 0 7321 05509 0 7388 0 4869 0 6444 D 93 93 93 93 93 0 0000 0 0000 100000 0 0000 0 0000 0 0000 Length 0 6662 0 7803 0 5509 0 8063 0 8237 0 8221 O 93 93 3 93 93 0 0000 0 0000 0 0000 0 0000 0 0000 0 0000 Weight 0 8431 0 8451 0 7388 08063 osno 0 8750 99 93 93 93 03 93 0 0000 0 0000 0 0000 0000 0 0000 0 0000 Wheelbase 0 6671 0 7325 0 4869 0 8237 0879 08072 D 93 93 93 93 O 0 0
99. atistics Help Figure 12 1 Initial Multiple Sample Comparison Dialog Box In this case the data have been placed in multiple columns of the datasheet The second dialog box requests the names of the columns containing the data 184 Comparing More than Two Samples Samples gt Select IV Sort column names Cancel Delete Transform Help Figure 12 2 Multiple Sample Comparison Data Input Dialog Box In the sample data file the observations have been placed in four columns named A B C and D When the analysis window opens it will have four panes 1 85 Comparing More than Two Samples Multiple Sample Comparison Multiple Sample Comparison Sample 1 A Scatterplot by Sample Sample 2 B Sample 3 C Sample 4 D Sample 1 12 values ranging from 61 8 to 67 0 Sample 2 12 values ranging from 60 0 to 65 1 Sample 3 12 values ranging from 58 3 to 63 1 Sample 4 12 values ranging from 56 4 to 62 0 response The SiatAdvisor This procedure compares the data in 4 columns of the current data manha ta camera the cammles The R tect in the NOT thla n ANOVA Table Graphical ANOVA for A Total Con 259961 a O The StatAdvisor The ANOVA table decomposes the variance of the data into two cor within group component The F ratio which in this case equals 22 76 to the within group estimate Since the P value of the F test is less th difference between the
100. ble X As illustrated in the last chapter an S Curve model provides a good fit to the relationship between the MPG Ciy column and the Weight column in the 93cars sf6 file When first created a plot of the fitted S Curve model is displayed as shown below Plot of Fitted Model MPG City exp 2 1328 2799 07 Weight 45 35 L 25 1600 2100 2600 3100 3600 4100 4600 Weight Figure 4 1 Fitted Model Plot with Default Titles and Scaling The titles scaling point and line types colors and other graphics attributes were automatically generated 76 Graphs 4 1 1 Layout Options To modify a graph once it has been created first double click on it so that it fills the analysis window Then click on the Graphics Options button located on the analysis toolbar A tabbed dialog box will be displayed with tabs corresponding to different graphics elements The Layout tab of the Graphics Options dialog box is used to change some of the basic features of the graph Graphics Options E Axis Y xis Profile Layout Grid Lines Points TopTitle Tickmarks and Colors C XAnis Tickmarks Y Axis C Zi rae Background Ed C Border m Mode Line Thickness Ia Thinnest Thickest IV 3D Effects Colors Cancel Apply Help Figure 4 2 Layout Tab on Graphics Options Dialog Box This includes the orientation of the axis tick marks the thickness of the axes and the col
101. bservations taken from a single population For example consider the following body temperatures taken from n 130 individuals 98 4 98 4 98 2 97 8 98 97 9 99 98 5 98 8 98 97 4 98 8 99 5 98 100 8 97 1 98 98 7 98 9 99 98 6 ITA 96 7 98 8 98 2 97 5 97 2 97 4 97 1 96 7 99 2 97 9 98 8 97 6 98 6 98 8 98 5 98 7 97 5 97 9 97 1 98 4 97 4 98 6 97 8 98 2 98 98 98 3 98 6 98 8 98 7 98 8 98 1 96 4 98 8 98 7 97 9 98 6 99 2 98 6 98 99 1 97 8 97 2 98 2 98 7 98 4 98 2 97 7 98 3 98 7 96 8 98 97 2 97 9 96 9 98 3 97 8 97 98 6 98 4 98 2 98 98 98 2 97 8 99 98 1 97 7 97 4 98 8 99 3 98 9 96 3 97 8 99 9 98 4 99 4 98 7 98 4 98 2 99 3 98 5 98 3 99 99 2 97 6 99 1 97 6 98 4 97 6 98 4 98 98 8 97 3 98 7 98 6 99 4 100 98 6 98 3 98 6 97 4 98 1 97 8 98 2 99 99 1 98 2 The data were obtained from the Journal of Statistical Education Data Archive www amstat org publications jse jse_data_archive html and are used by permission It has 145 One Sample Analysis been placed in a file named bodytemp sf3 in a column called Temperature that contains 130 rows one row for each person in the study The primary procedure in STATGRAPHICS Centurion for summarizing a sample taken from a population is the One Variable Analysis procedure The One Variable Analysis procedure summarizes the data in both numerical and graphical form and tests hypotheses about the population mean median and standard deviation
102. ch as histograms contain solid areas The Fi s tab on the Graphics Options dialog box controls the color and fill type of bars polygons and pie slices Graphics Options X Layout Grid Fils Top Title X Axis Y Axis Profile Fill Fill Types 2 C ER 20 falje Oona 5 om Color s Jj 7 ie Colors a 3D Effects s J 100m 2 d Fill C Outline eeeerteee Cancel Apply Help Figure 4 14 Fills Tab on Graphics Options Dialog Box Radio button 1 controls the first fill type on a graph In a histogram all of the bars use the first fill type On some graphs such as piecharts more than one fill type is used In those cases radio buttons 2 through 20 control the other fill types For plots such as histograms setting a non solid fill type is often a good idea when printing the results in black and white 89 Graphs Histogram oe oe Se WZ A S A SH 9494 SK 9490 CER ees 259 SSS BR SZ A SZ A 2 SKK 550 Ae A o D U c o o um o Q T A Z Lee A DAAALAALA I Ng 1494945254559 9 9 S092 ive 58 x XX d K 1500 2000 2500 3000 3500 4000 4500 Weight Figure 4 15 Frequency Histogram with Modified Fill Type 4 1 8 Text Labels and Legends Options For graphs containing additional legends or labels tabs are included on the Graphics options dialog box that allow you to change the
103. ct a setup type Typical All program features will be installed Requires the most disk space CO Minimal 18 Minimum required Features will be installed O Custom Choose which program features you want installed and where they will be installed Recommended for advanced users Figure 1 4 Setup Type Dialog Box Select one of the following Typical installs the program help files documentation and sample data files This requires a little more than 50MB of space on your hard disk Minimal installs only the program and help files This requires about 25MB of space on your hard disk Custom installs only the components you select You can save hard disk space by performing a minimal install but you won t have access to the on line documentation and accompanying sample data files Step 7 Follow the remaining instructions to complete the installation When the installation is complete a final dialog box will be displayed 5 Getting Started p STATGRAPHICS Centurion XV InstallShield Wizard t3 InstallShield Wizard Completed The InstallShield Wizard has successfully installed STATGRAPHICS Centurion XV Click Finish to exit the wizard Figure 1 5 Final Installation Dialog Box Click on Finish to complete the installation 1 2 Running the Program As part of the installation process a shortcut to STATGRAPHICS Centurion will be added to the Windows Szart menu and also to y
104. ctangular array of rows and columns 10 Getting Started STATGRAPHICS Centurion Untitled StatFolio File Edit Plot Describe Compare Relate Forecast SPC DOE SnapStats Tools View Window Help BEG x59 55H M ET KEM ee OA mW E DataBook g B mp ox AM x Labet d Row g 2 StatAdvisor aE StatGallery StatReporter il StatFolio Comments HH lt untitled gt For Help press F1 Figure 1 10 The STATGRAPHICS DataBook In a typical datasheet each row contains information about an individual sample case or observation while each column represents a variable For example suppose you wished to use STATGRAPHICS Centurion to analyze data from the 2000 United States Census A small section of the results of that census is shown below State Population Median Age Female Per Capita Income Alabama 4 447 100 35 8 51 7 18 819 Alaska 626 932 32 4 48 3 22 660 Arizona 5 130 632 34 2 50 1 20 275 Arkansas 2 673 400 36 0 51 2 16 904 California 33 871 648 33 3 50 2 22 711 Colorado 4 301 261 34 3 49 6 24 049 Figure 1 11 Data from the 2000 U S Census 11 Getting Started When entering this data into a STATGRAPHICS Centurion datasheet the information about each state would be placed into a different row Five columns would be created to hold the names of the states and the census data To enter data such as that shown above into STATGRAPHICS Centurion
105. ctions are 1 Excel files xls reads a selected sheet from a Microsoft Excel workbook 2 Text files txt csv dat reads an ASCII text file containing either delimited data ot data arranged into uniform columns 34 Data Management 3 XML xml reads data from a tagged format XML file After the file name is selected a final dialog box will be displayed to retrieve additional information about the data in the file If the selected file is an Excel workbook the dialog box will be that shown below Read Excel File amp Column Header oc iV variable name variable comment Help Sheet number Start row 1 1 Missing value End row 1000000 Cancel Figure 2 8 Options Dialog Box for an Excel Data File Specify 1 Column Header information contained in the first 2 rows of the specified range The two tows immediately above the data to be read may contain column names and or comments If names are not contained in the Excel worksheet then default names will be generated Sheet number number of the worksheet within the Excel workbook that will be read Only one sheet may be read at a time Start and End row the range of rows within the worksheet that will be read This range includes the variable names and comments if present Missing value any special symbol used in the Excel spreadsheet to indicate missing data such as NA Cells containing
106. cution of scripts by selecting Disable Start Up Scripts on the General tab of the Preferences dialog box accessible from the Edit menu 106 StatFolios Capability Control Charts Runs Tests Crosstabs Graphics Gage Studies EDA ANOVA Regression Forecasting Stats Dist Fit General r Confidence Level p System Options r Graphics C 90z Use Six Sigma Menu Maintain 1 1 Aspect Ratio IV Sort Variable Names Always Black and White 4 Digit Years Suppress Tickmark Gap IV Autosave Enabled Suppress Axis Powers cert Cie fo Ee Decimal Places for Labels Update Links on Each Value p Font r Stat amp dvisor Analysis Headers IV Add to Text Panes IV Display in Blue IV Highlight References in Red Ee 95 C 99 IV Disable Start up Scripts m Temporary File Directory fo BROWSE Figure 5 4 Disabling Start Up Scripts 107 StatFolios 5 3 Polling Data Sources Once a StatFolio has been created containing several analyses the data in the data sources can be reread at fixed intervals of time and all of the analyses updated This is accomplished using the DataBook Properties dialog box on the Edit menu or by selecting SzatLink from the Fie menu DataBook Properties t3 Read Sheet Data Source Only Poll C A C AProgram Files Statgraphics STATGRAPHICS Centurion XV Data 3cars sf6 W B lt C Data Process dat
107. d using an estimate of sigma obtained from observations close together in time describe what the process is capable of doing if the mean were held constant The long term indices which are calculated using an estimate of sigma obtained from the total variability amongst the observations throughout the sampling period describe how the process has actually performed An out of control process in which the mean has moved significantly over the course of the data collection may show considerably worse performance than it is capable of if it can be brought under control By default STATGRAPHICS Centurion labels capability indices using the letter C and performance indices using the letter P The Capability tab of the Preferences dialog box accessible under Edit on the main STATGRAPHICS menu specifies the indices to be calculated by default as well as other important options Preferences t3 General EDA ANOVA Regression Forecasting Stats Dist Fit Capability Control Charts Runs Tests Crosstabs Graphics Gage Studies Display q r Include r Capability Indices Limits iv Cp Pp Long term and short term Confidence intervals CR PR C Long term only labeled P C Lower confidence bounds CM PM IC Long term iy labeled C C None C Short term onl j Zhi 2 Zmin Short term sigma grouped data r Short term sigma individuals IV Cpk Ppk From average range From average MR Cpk
108. does not assume normality and is less sensitive to outliers than the t test In both cases the P value is way below 0 05 soundly rejecting the hypothesis that the sample comes from a population with a mean of 98 6 degrees NOTE the notation E 8 after a number means that the number is to be multiplied by 10 The P Value shown as 1 81264E 8 therefore equals 0 0000000181264 It should be noted that the confidence interval for the mean given in Section 10 8 did not include the value 98 6 Any values not within the confidence interval would have been rejected by the t test considered here You can thus think of the confidence interval as containing all possible values for the population that are supportable by the data sample 167 One Sample Analysis 10 9 Tolerance Limits One additional analysis is useful for the body temperature data It creates normal tolerance limits which are limits within which a selected percentage of the population is estimated to fall with a given confidence level Tolerance limits are available on the main menu by selecting 1 If using the Classic menu select Describe Numeric Data Statistical Tolerance Limits 2 If using the Six Sigma menu select Analyze Variable Data Statistical Tolerance Limits The procedure begins by displaying a dialog box in which you enter the sample size z and the sample mean and standard deviation Using the results in Fzgure 10 12 the proper entries are Statistical
109. e Otherwise the program will temporarily cease functioning To obtain an activation code press the Get Code button T Getting Started STATGRAPHICS Centurion Registration X Ki StatPoint Inc You must enter an Activation Code within 30 days of first using STATGRAPHICS or the program will stop functioning To obtain an Activation Code provide us with the the information below by one of the following methods 1 E mail us at activate statpoint com use the E Mail button below 2 Call us at 1 800 232 7828 or 1 540 364 0420 3 Print the information and fax it to us at 1 540 364 0421 Your name lgsm t Organization __ A ont E Mail a Address o O OE Address EII Country DC A Phone m Purchased From E dition License StatPoint Professional Perpetual Other Standard C 12 months C Student C 6months x h Hy cera PAGA If duplicate activation explain reason Product Key F141 091 C211 X250 We will return the Activation Code as soon as we verify the required information Submit via e mail Print for faxing Done Help Figure 1 7 Registration Dialog Box Enter the required information and then contact StatPoint in one of the following ways 1 Press the Submit via e mail button to send the information via the Internet 2 Press the Print for faxing button and fax the printed information 3 Call the indicated tel
110. e Data Capability Analysis Individuals The data input dialog box requests the name of a single column containing the data The sample data may be found in a column called Strength in the file named ctems sfo Process Capability Analysis Individuals Strength Data gt Strength USL Nominal LSL Select lv Sort column names ur sas Cancel Delete Transform Help Figure 15 3 Process Capability Analysis Dialog Box 238 Process Capability Analysis Upper and lower specification limits have also been indicated as has a nominal or target value The initial analysis window shows a summary of the data a table of capability indices and a capability plot Es Process Capability Analysis Individuals Strength Process Capability Analysis Individuals Strength Data variable Strength specs are 190 230 Tur E rocess Capability for Strength Distribution Normal LSL 190 0 Nominal 210 0 USL 230 0 sample size 100 60 Normal mean 202 809 std dev 6 23781 Mean 202 50 B 6 0 Sigma Limits Std Dev t 430 sigma 221 522 mean 202 809 E o Cp 1 16 Pp 1 07 Cpk 0 74 Ppk 0 68 K 0 36 Capability Indices for Strength Specifications USL 230 0 Nom 210 0 LSL 190 0 frequency S9 86 zie o 0 180 190 200 210 220 230 240 Strength Based on 6 sigma limits Short term sigma estimated from average movir iv
111. e P value associated with an F test of the hypotheses stated above A P Value less than 0 05 would indicate a statistically significant difference between the variance of the females and the variance of the males at the 5 significance level Since P is well above 0 05 there is no evidence upon which to reject the hypothesis of equal variances and thus equal standard deviations There is thus no significant evidence upon which to conclude that the variability of the female body temperatures is different that the variability of male body temperatures It should be noted that this test is quite sensitive to the assumption that the samples come from normally distributed populations an assumption that was shown to be reasonable based on the standardized skewness and standardized kurtosis values 11 6 Comparing Means The second comparison between the two samples tests the hypothesis that the means u of the two populations are equal Null hypothesis u p Alternative hypothesis u Z u To perform this test press the Tabs button again and select Comparison of Means The results are Comparison of Means for Temperature 95 0 confidence interval for mean of Gender Female 98 3562 0 170924 98 1853 98 5272 95 0 confidence interval for mean of Gender Male 98 1046 0 173144 97 9315 98 2778 95 096 confidence interval for the difference between the means assuming equal variances 0 251635 0 240998 0 0106371 0 492632 t
112. e StatGallery to show plots in a Left and Right format 2 Generate a contour plot within Analyze Design for one level of the experimental factor and copy it to the Windows clipboard 3 Activate the StatGallery window Click on the leftmost pane with the alternate mouse button and select Paste from the popup menu to put the contour plot in the StatGallery 4 Return to the Analyze Design window and generate a second contour plot at a different level of the experimental factor Copy it to the Windows clipboard 5 Return to the StatGallery window Click on the rightmost pane with the alternate mouse button and select Paste from the popup menu This will place the second contour plot in the StatGallery alongside the first The resulting display is similar to that shown below 11 5 Using the StatGallery E StatGallery J Ged Page 1 Next Page First Page Last Page Contours of Estimated Response Surface Contours of Estimated Response Surface polyethylene 1 0 polyethylene 2 0 strength L EAN m6 o m 0 n B c E strength m50 m6 o m70 meo N30 M100 mio e a cn co cn co maso N30 m10 0 mi10 cen cen cen en cen N cen N P a A o cooling bar temperature cooling bar temperature 4A Ce Ax e hM No e 240 260 280 300 240 260 280 300 sealing temperature sealing temperature N N e Figure 6 3 Side by Side Graphs in the StatGallery In the plot above the progres
113. e males 11 7 Comparing Medians If it is suspected that the data may contain outliers a nonparametric test can be performed to compare the medians rather than the means Nonparametric tests do not assume that data come from normal distributions and tend to be less affected by outliers if any are present Selecting Comparison of Medians from the Tables dialog box generates a Mann Whitney Wilcoxon W test In this test the two samples are first combined The combined data are then ranked from 1 to 7 7 and the original data values are replaced by their respective ranks A test statistic W is then constructed comparing the average ranks of the observations in the two samples 179 Comparing Two Samples Comparison of Medians for Temperature Median of sample 1 98 4 Median of sample 2 98 1 Mann Whitney Wilcoxon W test to compare medians Null hypothesis median median2 Alt hypothesis medianl NE median2 Average rank of sample 1 71 9219 Average rank of sample 2 58 1846 W 443 0 P value 0 0368312 Reject the null hypothesis for alpha 0 05 Figure 11 8 Two Sample Comparison of Medians Interpretation of the Mann Whitney Wilcoxon test parallels that of the t test described in the last section with a small P value leading to the conclusion that the medians of the two populations are significantly different 11 8 Quantile Plot To illustrate the difference between the two distributions side by side quantile
114. e printed pages 2 Indicate header information to be printed at the top of each page 3 Indicate whether each pane table or graph should be displayed on a separate page or whether multiple panes should be placed on a page if they will fit 4 Specify the relative sxe of graphs as a percentage of the page dimensions 5 Elect to plot the output in black and white even if your printer has color capabilities 6 Print the color background if any of your graphs 7 Plot wide lines using 2 pixels instead of 1 This last option can make graphs appear much bolder on a high resolution printer Other options such as whether to print the output in portrait or landscape mode are set by selecting Print Options from the File menu which accesses the dialog box supplied with your printer driver 3 4 Publishing the Results The output from a statistical analysis may be published in HTML format for viewing from within a web browser by selecting StatPublish from the Fie menu This enables you to make the output available to everyone in your organization whether or not they have STATGRAPHICS Centurion on their computers Publishing is described in Chapter 5 You may also copy the analysis to the StatReporter which allows you to annotate the output and save it in an RTF rich text format file which may be read directly into programs such as Microsoft Word Use of the StatReporter is described in Chapter 6 74 Running Statistical Procedures
115. e right of the males distribution Both the sample means and medians show a similar difference 2 The range covered by the females is wider than the range covered by the males but only if you include the lowest outside point 3 The median notch for the females overlaps that of the males slightly The notches are drawn in such a way that if the two notches did not overlap one could declare the two medians to be significantly different at the default system significance level which is currently 5 A more formal comparison is described in a later section Based upon this plot there appears to be a difference in the center of the two samples though the statistical significance of that difference remains in doubt 176 Comparing Two Samples 11 5 Comparing Standard Deviations The first formal comparison between the two samples is to test the hypothesis that the standard deviations 6 of the populations from which the data came are equal versus the hypothesis that they are different Null hypothesis o o Alternative hypothesis o 0 This will allow us to determine whether the apparent difference between the variability of the males and females is statistically significant or whether it is within the range of normal random variability for samples of the current size To perform the test press the Tab es button Hl on the analysis toolbar and select Comparison of Standard Deviations The result is shown below Comparis
116. e size of the graphs when imbedded in the HTML files e Image Format Graphs may be imbedded in the HTML files in one of three formats 1 JPEG static images saved in JPEG format Files are created with names such as pubexample analysisl graphl jpg 2 PNG static images saved in PNG format Files are created with names such as pubexample analysisl graphl png 3 Java applets dynamic output that can be updated while being viewed by the browser While in the browser the graph will be updated at the specified increment by reading an auxiliary file with a name such as pubexample analysis graph1 sgz This option is designed to be used in conjunction with real time polling of data using the StatLink feature as described in the PDF document titled Dynamic Data Processing and Analysis Note not all graphs will publish properly using this option If one or more graphs do not display correctly in the published output select a different option e Add interactivity to applets For graphs published as applets selecting this feature allows the viewer to display information about data values by clicking on a point with the mouse while in the web browser After completing the input fields press OK to publish the StatFolio 11 0 StatFolios To view a published StatFolio start any web browser and use its F7 e menu item to open the file specified in the top field in Figure 5 6 You can also view the output by selecting Wzew Published Res
117. ed data sources is reread and all analyses recalculated StatFolios thus provide a simple method for repeating analyses at a later time on different data You may also create a script that is executed whenever a StatFolio is loaded Details of this and other StatFolio features are described in this chapter 5 1 Saving Your Session To save the current status of your STATGRAPHICS Centurion session select Fe Save Save StatFolio from the main menu Enter a name for the StatFolio in the dialog box shown below 103 StatFolios Save StatFolio As Save in O Data 8 er Ed E acceptance chart i attcapi boxplots capabilit ie acceptattributes attcap2 a bspline E cchart E acceptvariables E autocast m bubblechart m cluster m anova E autocastsnapstat calibration compare ie ARIMA charts B barchart E canonical a compare E arrhenius a boxcox m capability compare ie m E2 File name my statfolio Save as type StatFolios sgp Cancel Help Figure 5 1 File Selection Dialog Box for Saving StatFolio StatFolios are saved in files with the extension sg They contain 1 A definition of all analyses that have been created including the input variables the tables and graphs settings of all options changes made to graphs etc When a StatFolio is reopened the analyses are recalculated and all tables and graphs updated 2 Links to the data sources contained in t
118. elect Copy Analysis to StatReporter The StatReporter described in Chapter 7 can be saved as an RTF file for import into programs such as Microsoft Word Save a graph in an image file Maximize the graph to be saved Then select Save Graph from the File menu Figure 1 31 Methods for Disseminating Analysis Results Each of these operations is described in later chapters 1 8 Saving Your Work You can save the current STATGRAPHICS Centurion session at any time by selecting Save StatFolio from the Fi menu and entering a file name 27 Getting Started Save StatFolio As Save in G Data aae acceptance chart attcap1 boxplots capabilits ie acceptattributes attcap2 E bspline E cchart iei acceptvariables autocast m bubblechart cluster m anova E autocastsnapstat calibration E compare ie ARIMA charts B barchart canonical a compare E arrhenius boxcox m capability compare F3 m gt 5 File name Analysis of per capita income Save as type StatFolios sap Cancel Help Figure 1 32 Dialog Box for Saving StatFolio A StatFolio consists of instructions on how to create each of the analyses in your current session with pointers to the files or databases containing your data If you reload the StatFolio at a later date it will automatically reread the data and recreate the analyses Any options you have selected for the analyses will be retained NOTE 1
119. els w eights Select v Sort column names Cancel Delete Transform Help Figure 14 3 Pareto Analysis Data Input Dialog Box The Pareto Analysis accepts data in two formats 1 Untabulated data that need to be counted as in the current example 2 Counts for data that have already been grouped by defect type This is applicable if you have two columns one identifying the types of defects and a second containing the number of times each defect type occurred The analysis window displays both a summary table and a Pareto chart 220 y Analyzing Attribute Data ma i5 Pareto Analysis Defect Pareto Analysis Defect Data variable Defect Total counts 120 0 Number of classes 9 The StatAdvisor Pareto Chart with Cumulative Frequencies gas Weighted 33 Contaminated Figure 144 Pareto Analysis Window This procedure performs a Pareto analysis on a set of tabulated or to highlight data values which occur most frequently It constructs ContamDateagBiistack shavremg size Misalid iatMtisieing d tking Of particular interest is the Pareto chart on the right which plots the frequencies of each type of defect from most common to least common Initially the bar labels overlap badly due to their number and length This may be fixed by 1 Double clicking on the graph with your mouse to maximize the pane within the analysis window
120. ephone number Be prepared to supply both the Serial number and the Product Key shown on the registration dialog box Whichever method you use StatPoint will verify the information you provide and return to you an activation code The next time you run the program enter that code into the Activation code 8 Getting Started field on the License Manager dialog box and press the Upgrade button From then on the License Manager dialog box will not be displayed Step 3 The first time you run the program you will also be asked which menu system you wish to use STATGRAPHICS Centurion kJ gt Welcome to STATGRAPHICS Centurion Professional Do you wish to use the Six Sigma DMAIC menu rather than the i standard menu Figure 1 8 Menu Selection Window You have a choice of the classic STATGRAPHICS menu which organizes the statistical procedures into the headings Plot Describe Compare Relate Forecast SPC and DOE or the Six Sigma menu which organizes the procedures into the headings Define Measure Analyze Improve Control and Forecast Both menus include the same procedures Only the organization is different You may change your initial choice at a later time by selecting Preferences from the Edit menu within the program after which you must exit the program for the menu change to take effect Step 4 The main STATGRAPHICS window will then be created The first time you run the program an additional dialog box will be d
121. er e D Effect to Detect i C Total Number of Runs i Confidence Level 95 x Figure 16 2 Second Screening Design Selection Dialog Box This box is used to specify the required power of the experiment Power is the probability that a factor with an effect of a specified magnitude will be declared to be statistically significant once the experiment has been completed and the data analyzed More specifically it is the probability of obtaining a significant P value in the initial ANOVA table when the true effect equals that specified in the Effect to Detect field operating at the significance level inferred from the Confidence Level entry In this case the engineer specified that she wished to have a 90 chance of detecting an effect equal to 3 times the experimental error sigma Effects smaller than that were considered to be too small to be of practical interest Since the confidence level is set to 95 a significant P Value will be one that is less than 0 05 256 Design of Experiments Pressing OK once more opens an analysis window listing the smallest experimental designs of each type that meet the specified requirements Screening Design Selection Input Factors per Block per Block sooo o o do 0 H0 o5 90 0 fis 95090 Selected Designs Lo Corner Center Error Factorial in 4 blocks2 5 32 V 32 0 D fa 999987 The StatAdviso
122. er row s Note if the Excel file contains column names but not comments select Ed DataBook Properties from the STATGRAPHICS Centurion menu and turn off the Display variable comments option before pasting the data 2 2 4 Querying an ODBC Database STATGRAPHICS Centurion also allows you to read data from an Oracle Access or other database using ODBC To access data from a database first select F7 e Open Open Data Source Then select Query Database from the initial dialog box Open Data Source Data Source C STATGRAPHICS Data File OK Cancel C External Data File __ Cancel ODBC Query Help C Clipboard Figure 2 9 Open Data Source Dialog Box A sequence of additional dialog boxes will be displayed on which you 36 Data Management 1 Select the name of the database to be read 2 Select the fields to be transferred 3 Specify a filter to limit the records that are retrieved 4 Specify a sort order for the results A SQL query is then constructed and the results placed in the active STATGRAPHICS datasheet Detailed information on constructing ODBC queries may be found in the PDF document titled Data Files and StatLink 2 3 Manipulating Data Once data has been placed into a STATGRAPHICS Centurion datasheet it can be manipulated in several important ways 1 The data may be copied and pasted into other locations 2 Additional columns may be created from existing columns 3 Da
123. erature 150 180 degrees C yes B flow rate 10 12 liters min yes C concentration 5 8 yes D agitation rate 125 150 rpm yes E catalyst 1 15 yes These limits were set to cover a reasonable operating range for the process The next dialog box defines the response variables 259 Design of Experiments Name Units or comment grams bos Qi Gro Geo Figure 16 6 Definition of Response Variables The entries for the two responses ate Name Units yield grams strength psi The fourth dialog box is used to select the design Runs Resolution Error d f Block Size maie feacvion in 2 blocks 275 106 oo ome ooo o ooo a r v Display Blocked Designs Cancel Back Help Figure 16 7 Design Selection 260 Design of Experiments To see the list of screening designs available for five factors click on the arrow to pull down the list The list shows 1 Name the name of each available design 2 Runs the number of runs in the base design before any centerpoints or replicate runs are added 3 Resolution the resolution of the design 4 Error df the number of degrees of freedom available to estimate the experimental error The powet of the statistical tests is related to the number of degrees of freedom as well as the total number of runs in the experiment Normally at least 3 degrees of freedom should be available although more is preferable 5 Block size
124. erences between the Texas and Virginia facilities are statistically significant select Tests of Independence from the Tables dialog box For a table of this size the procedure displays the results of a chi squared test Tests of Independence Chi Squared 18 438 8 0 0182 Warning some cell counts lt 5 Figure 14 14 Chi Squared Test of Independence The chi squared test is used to decide between two hypotheses Null hypothesis row and column classifications are independent Alternative hypothesis row and column classifications are not independent 229 Analyzing Attribute Data Independence would imply that the type of defect found in an item had nothing to do with the facility in which that item was manufactured For the chi squared test a small P Value indicates that the row and column classifications are not independent In this case the P value is less than 0 05 indicating at the 5 level of significance that the distribution of defect types is different in the Texas facility than in the Virginia facility A warning is also displayed however since some cell counts in the two way table are less than 5 Technically the warning occurs if the expected count in any cell is less than 5 assuming that the null hypothesis is true With small cell counts the P Value may be unreliable One solution to this problem is to group all infrequent defect types into a single class and rerun the test This is easily done in ST
125. ervals for Temperature 95 0 confidence interval for mean 98 2295 0 122015 98 1074 98 3515 95 096 confidence interval for standard deviation 0 624081 0 798114 Bootstrap Intervals Mean 98 1132 98 3519 Standard deviation 0 621373 0 785949 Median 98 1 98 4 Figure 10 22 Bootstrap 95 Confidence Intervals The earlier intervals calculated using Student s t distribution and the chi squared distribution are closely matched by the bootstrap intervals This is not unexpected since the data do not show significant skewness or kurtosis 165 One Sample Analysis 10 8 Hypothesis Tests Formal hypothesis tests may also be performed For example it is often asserted that normal human temperature is 98 6 degrees Fahrenheit To test whether or not the current data come from a normal distribution with such a mean a hypothesis test may be created to test between Null hypothesis u 98 6 degrees Alternative hypothesis u 98 6 degrees To run the test within the One Variable Analysis procedure select Hypothesis Tests from the list of Tables Before examining the results select Pane Options and specify the attributes of the desired test Hypothesis Tests Options Mean Alpha 38 8 5 Cancel Alt Hypothesis Help Not Equal C Less Than C Greater Than Figure 10 23 Pane Options for Hypothesis Tests The value entered for Mean represents the null hypothesis Under 4 Hypothesis yo
126. es the option of calculating capability indices from percentiles rather than equivalent Z scores doing so destroys the usual relationship between the capability indices and DPM 15 5 Six Sigma Calculator As an index C is a useful summary of process capability Provided it is calculated properly it can be related to DPM The STATGRAPHICS Centurion Too s menu contains a 2x Sigma Calculator that will convert between the two provided that either 1 The data come from a normal distribution 2 Equivalent Z scores ate used to calculate the indices The Six Sigma Calculator is shown below 251 Process Capability Analysis Six Sigma Calculator X m Input C Z Score aaa C DPM 2 C Defects fr C Yield 59 Cpk 33 C Sigma level az Sigma shift Jo two sided m Results Z Score 3 99 DPM 33 0518 Defects 0 00330518 Yield 99 9967 Cpk 1 33 Sigma level 3 99 OK Cancel Help Figure 15 17 Six Sigma Calculator To use the calculator 1 Select any of the input radio buttons and enter a value for the corresponding statistic 2 If you wish to calculate values based the nearer specification limit only uncheck the o sided checkbox 3 Indicate the value you wish to assume for the long term shift in the process mean In Six Sigma it is often assumed that the process mean will oscillate around its long term value by 1 5 sigma 4 Pres
127. esses a different statistical procedure All procedures however work in the same basic way 1 When an analysis is selected from the menu a data input dialog box is displayed The fields on this dialog box are used to specify the variables to be analyzed The specified data is then read and analyzed and a new analysis window is created with a set of default tables and graphs When first run default values are selected for all options in the analysis These options can be changed using the Analysis Options button on the analysis toolbar in response to which all tables and graphs in the analysis window will be updated If desired additional tables and graphs may be requested by pressing the Tables and Graphs buttons on the analysis toolbar Individual tables and graphs can be modified by maximizing the corresponding pane and selecting Pane Options from the analysis toolbar For graphs the default title scaling point types fonts etc may be changed by double clicking on the graph to maximize it and then selecting Graphics Options from the analysis toolbar 57 Running Statistical Procedures 7 Tables and graphs may be printed published as HTML files copied to other applications such as Microsoft PowerPoint or saved in the StatReporter 8 Numerical results may be saved to columns of any datasheet using the Save Results button on the analysis toolbar 9 The entire analysis may be saved to disk as a S azFo o for later
128. fill the window Plot of Fitted Model MPG City 47 0484 0 00803239 Weight 1600 2100 2600 3100 3600 4100 4600 Weight Figure 3 5 Simple Regression Analysis Window with Maximized Pane Double clicking on the pane a second time restores the multiple pane display When an analysis window has the focus a second toolbar is activated directly beneath the main STATGRAPHICS Centurion toolbar The analysis toolbar appears as shown below f l E E E B SF SEL Z Labet Row d Each of the buttons on this toolbar performs an important operation 62 Running Statistical Procedures 3 2 1 Input Dialog Button When pressed this button displays the data input dialog box originally used to specify the data variables as shown in Figure 3 2 If you change the data variables and press OK the analysis will change to reflect the new selections This enables you to try different combinations of data without having to start a new analysis 3 2 2 Tables Button amp I This button displays a list of additional tables that may be added to the analysis window For Simple Regression the available tables are Tables IV Analysis Summary Lack of Fit Test Forecasts MV Comparison of Alternative Models v Unusual Residuals Influential Points DK Cancel All Help Senet ar ne Figure 3 6 Simple Regression Tables Dialog Box For example if you elect to add tables showing alternative models and unusual
129. ggested Reading Data Sets 93cars sf6 This data was downloaded from the Journal of Statistical Education JSE Data Archive It was compiled by Robin Lock of the Mathematics Department at St Lawrence University and are used with his permission An article associated with the dataset appears in the Journal of Statistics Education Volume 1 Number 1 July 1995 bodytemp sf3 This data was also downloaded from the Journal of Statistical Education JSE Data Archive It was compiled by Allen Shoemaker of the Psychology Department at Calvin College and are used with his permission The data were derived from an article in the Journal of the American Medical Association 1992 vol 268 pp 1578 1580 entitled A Critical Appraisal of 98 6 Degrees F the Upper Limit of the Normal Body Temperature and Other Legacies of Carl Reinhold August Wunderlich by P A Mackowiak S S Wasserman and M M Levine An article associated with the dataset appears in the Journal of Statistics Education Volume 4 Number 2 July 1996 Journal of Statistical Education JSE Data Archive web site http www amstat otg publications jse jse data atchive html 282 Data Sets Index ABS 42 activation code 7 algebraic operators addition 42 division 42 exponentiation 42 multiplication 42 subtraction 42 alias structure 265 analysis headers 142 analysis of means 196 analysis of variance 188 Analysis Options 66 analysis toolbar 22 62 anal
130. h level with all other factors held constant at a value midway between their lows and their highs Note that the three factors with significant main effects have a bigger impact on the response than the others For example the average yield at low temperature is approximately 82 while the average yield at high temperature is approximately 85 4 The difference of 3 4 1s called the main effect of temperature To plot the interaction between semperature and flow rate first select Interaction Plot from the Graphs dialog box Then use Pane Options to select only those two factors 269 Design of Experiments Interaction Plot Options Factors OK temperature Be Cancel flow rate All concentration Help 4 v v E agitation rate catalyst r S E n oe Po ps Lp E el Reverse Factors Figure 16 16 Pane Options Dialog Box for Interaction Plot The resulting plot shows the average yie d as temperature is changed for each level of flow rate Interaction Plot for yield flow rate 12 0 86 J 84 flow rate 10 0 flow rate 10 0 82 L flow rate 12 0 80 150 0 180 0 temperature Figure 16 17 Interaction Plot for Flow Rate and Temperature Notice that at low flow rate temperature has little if any effect At high flow rate temperature is a very important factor 270 Design of Experiments Before using the statistical model underlying this analys
131. hange automatically whenever the source output changes in the analysis window from which the table or graph was copied 7 3 Modifying StatReporter Output The StatReporter toolbar allows you to modify output once it has been placed in the window To change text select the text to be changed and push any of the buttons on the StatReporter toolbar You may also insert the current date and time by pressing the Date Time button 7 4 Saving the StatReporter To save the StatReporter output select File Save Save StatReporter from the main menu and enter a name for the file to be saved StatReporter contents are saved in files of type 7 which may be read directly into programs such as Microsoft Word Whenever a StatFolio is opened it automatically loads the StatReporter that was present when the StatFolio was saved You may also open a StatReporter independently using the F7 e Open menu 123 Using the StatReporter 124 Using the StatReporter Chapter Using the StatWizard Selecting the right statistical analysis searching for desired statistics and tests and generating multiple windows by factor levels The StatWizard is a special feature of STATGRAPHICS Centurion designed to assist you in several ways I 2 4 5 It can help you create a new datasheet or read an existing data source It can suggest analyses based on the type of data to be analyzed It can search for desired statistics or tests and take
132. he DataBook If the data change between the time the StatFolio is saved and when it is reopened the analysis windows will reflect the changes 3 Links to a StatGallery and StatReporter file if material has been placed in them before the StatFolio was saved The program will ask you to supply names for the StatGallery and the StatReporter when the StatFolio is saved 5 2 StatFolio Scripts When a StatFolio is first loaded all of the analysis windows are restored to their previous condition STATGRAPHICS Centurion then looks to see whether a Start up script has been saved with the StatFolio and executes it if it has A script may be created by selecting StatFolio Start up Script from the Edit menu A dialog box is displayed with fields to define a sequence of actions to be performed 104 StatFolios Step Operation Argument Target Execute v Simple Regression MPG City vs weit v Doo Assign 100RESIDUALS MPGCiy RR Execute v OneVarableAnasis PERROR xl Pin ITA NN 1 2 B 4 5 6 if 8 3 10 11 12 13 14 15 pm m p e A m e e pm 7x BLUE Run F10 _ Insert Delete Cancel Help Figure 5 2 A Simple StatPolio Start Up Script The desired operations are specified in the order they should be performed The available operations are 105 StatFolios Operation Argument Target De
133. he number of network requests For a description of the options on the other tabs refer to the PDF document titled Preferences 9 2 Printing Two selections on the F e menu control printed output 1 Print Setup accesses the standard printer options dialog box supplied with your printer driver This dialog box typically sets paper size and chooses between andscape and portrait mode for the output 2 Page Setup a STATGRAPHICS Centurion specific dialog box that sets margins headers and other options This dialog box was discussed in Section 3 3 9 3 Graphics Maximizing a pane containing a graph within any analysis window activates the Graphics Options button on the analysis toolbar That button displays a tabbed dialog box that allows you to change the appearance of a graph as described in detail in Chapter 4 Also included on that dialog box is a tab labeled Profe which enables you to save sets of graphics attributes in user profiles and change the default profile that is used when a new graph is created 142 System Preferences Graphics Options Eg Layout Grid Points Top Title x Axis Y Axis Profile m Profile Load C System Color White Background C System Color Dark Background Save System Black and white My new profile User 2 User 3 User 4 User 5 User 6 User 7 User 8 User 9 User 10 User 11 User 12 OK Cancel Apply Help Figure 9 2 Profile Tab on Graphics O
134. he statistical procedures do not require you to sort the data before using them since they will automatically sort the data if necessary Also the data file on disk is not changed when you perform a sort unless you resave the data Sorting only affects the order in which the rows are displayed in the datasheet 2 3 5 Recoding Data It is sometimes convenient to recode data either by grouping it into similar groups or by assigning new labels To recode a column of data first click on the header of the column to be recoded Then select Recode Data from the Edit menu The following dialog box will be displayed Recode Data Lower Limit Upper Limit New Value 0 jo Foreign ee ee r Limit Conditions r Unmatched Lower lt Value lt Upper Leave asis Lower lt Value lt Upper C Set to Missing C Lower lt Value lt Upper C Lower Value lt Upper Extrapolate Cancel Help Figure 2 20 Dialog Box for Recoding Data For example the column named Domestic in the 93cars file contains a 1 for each car made by a U S automaker and a 0 for all other cars To change all 0 s in the column to Foreign and all T s to U S the dialog box above could be used Up to 7 ranges of values may be specified at one time for recoding 46 Data Management The PDF document titled Edit Menu has a detailed discussion of two recoding examples 2 3 6 Combining Multiple Columns Many statistical pr
135. here are thousands of observations in the sample far outside points are usually indicative of true outliers or of a non notmal distribution b Outside points points more than 1 5 times the interquartile range above or below the limits of the box Outside points are denoted by a point symbol but no superimposed plus sign Even when data come from a normal distribution the chance of observing 1 or 2 outside points in a sample of n 100 observations is about 50 and does not necessarily indicate the presence of a true outlier These points should be considered simply worthy of further investigation The box and whisker plot in Figure 10 7 is reasonably symmetric The whiskers are about the same length and the sample mean and median are similar and close to the center of the box Three outside points are marked but no far outside points Clicking on the rightmost outlier with the mouse indicates that it corresponds to row 7715 in the file If you select Pane Options from the analysis toolbar you can add a median notch to the plot 153 One Sample Analysis One Variable Analysis Temperature Hog Box and Whisker Plot 97 98 99 100 Temperature Figure 10 8 Box and Whisker Plot with a 95 Median Notch This adds a notch to the display covering an approximate confidence interval for the population median at the default system confidence level usually 95 It shows the margin of error when attempting to estimate the medi
136. hether to suppress the space normally left between the intersection of the horizontal and vertical axes and the first tickmark If the gap is suppressed some point symbols may lie directly on the axes Suppress Axis Powers whether to suppress the special notation for display of large and small tickmark values usually shown using notation such as X1000 Decimal Places for Labels default number of decimal places for legends displayed along the right margin of graphs The default font can also be set e StatAdvisor sets the default behavior of the StatAdvisor 141 System Preferences o Add to Text Panes whether StatAdvisor output should automatically be added to the bottom of text panes StatAdvisor output is always available by pressing the button on the main toolbar showing the graduation cap o Highlight References in Red whether to highlight in red values on text panes that are referred to by the StatAdvisor e Analysis Headers whether to use a blue font to display the analysis title at the top of the Analysis Summary pane e StatFolios check Disable Start Up Scripts to prevent start up scripts from being run when StatFolios are loaded e Temporary File Directory If specified StatFolios data files and other files will first be written to this directory before being copied to their final location By specifying a local drive this can greatly speed up the time required to save a file over some networks since it reduces t
137. ht an analysis 3 Press OK You will be taken directly to the data input dialog box for the selected analysis bypassing the usual menus 137 Using the StatWizard 138 Using the StatWizard Chapter System Preferences Setting preferences for system behavior STATGRAPHICS Centurion contains hundreds of options each of which has a default value that has been selected to meet most users needs If desired you can set new defaults for many of these options There are 3 main places in the program to do this 1 General system behavior set on the Preferences dialog box accessible from the Edit menu 2 Printing options set on the Page Setup dialog box accessible from the F7 e menu 3 Graphs set by selecting Graphics Options while viewing any graph The Profile tab of the Graphics Options dialog box allows you to save multiple sets of graphics attributes 9 1 General System Behavior The default values fot general system behavior and selected statistical procedures may be changed by selecting Preferences from the Edit menu This displays a tabbed dialog box with a General tab for overall system behavior and other tabs for statistical analysis defaults 139 System Preferences Preferences General EDA Confidence Level C 90x 95 r 199 m Significant Digits Me s C c co 7 CD Cn 4 3 1 or Save Results C E C System ptions Use Six S
138. ia Contaminated Texas Leaking Texas Damaged Virginia Contaminated Texas 21 7 Analyzing Attribute Data The data consist of n 120 rows each corresponding to a defect that was observed in a manufactured item The file also indicates the type of defect and the facility in which the item was ptoduced 14 1 Summarizing Attribute Data Ignoring for a moment the facility in which each item was produced the data on defect type may be summarized by 1 If using the Classic menu select Describe Categorical Data Tabulation 2 Ifusing the Six Sigma menu select Analyze Attribute Data One Factor Tabulation The data input dialog box expects a single column containing the attribute data Tabulation Data gt Defect Select v Sort column names Cancel Delete Transform Help Figure 14 1 Tabulation Data Input Dialog Box The procedure scans the column identifying each unique value It then displays an analysis window similar to that shown below 218 Analyzing Attribute Data Es Tabulation Defect Tabulation Defect Data variable Defect Barchart for Defect Number of observations 120 Contaminated Fd Nurber of unique values 9 Damaged Leaking E The StatAdvisor Mialigned This procedure counts the number of times each of the 9 unique values of De Misshapen 7 and graphs of the tabulation Missing parts Poor
139. ialog box displays the license agreement for the software 1 Getting Started enturion XV InstallShield Wizard License Agreement Please read the following license agreement carefully IMPORTANT READ BEFORE INSTALLING SOFTWARE WARRANTY AND LICENSE AGREEMENT FOR STATGRAPHICS c Centurion XV READ THE FOLLOWING TERMS AND CONDITIONS CAREFULLY BEFORE INSTALLING THE SOFTWARE INSTALLATION AND USE OF THE SOFTWARE INDICATES THAT YOU HAVE ACCEPTED THESE TERMS AND CONDITIONS ri NaFfinieianna The STATCONANUTCS Ces hd S mmm Sm v I accept the terms in the license agreement O1 do not accept the terms in the license agreement Installshield Figure 1 1 License Agreement Dialog Box Read the license agreement carefully If you accept the terms click on the indicated radio button and press Next to continue If you do not agree press Cancel If you do not agree with the terms you may not use the program Step 4 The next dialog box requests information about you and the serial number you were given when you purchased the program 2 Getting Started yl STATGRAPHICS Centurion XV InstallShield Wizard X Customer Information Please enter your information User Name your name Organization Your company Serial Number Install this application For Anyone who uses this computer all users O Only For me Installshield Figure 1 2 Customer Information Dialog Box Enter
140. igma Menu M Sort Variable Names 4 Digit Years v Autosave Enabled f 0 minutes Update Links on Each Value r Graphics Maintain 1 1 Aspect Ratio Always Black and White Suppress Tickmark Gap Suppress Axis Powers Decimal Places for Labels P Font Capability Control Charts Runs Tests Crosstabs Graphics Gage Studies ANOVA Regression Forecasting Stats Dist Fit m Stat amp dvisor IV Add to Text Panes v Highlight References in Red m Analysis Headers IV Display in Blue StatFolios Disable Start up Scripts m Temporary File Directory Figure 9 1 Preferences Dialog Box BROWSE Cancel Apply Help Some of the most important options that may be set are o Use Six Sigma Menu display the menu selections under headings corresponding to the Six Sigma DMAIC arrangement Define Measure Analyze Improve Control The same selections are available as with the classic menu except they are arranged under different menu headings 140 System Preferences System Options options that apply system wide Confidence Level default percentage used for confidence limits prediction limits hypothesis tests and interpretation of P Values by the StatAdvisor Significant Digits number of significant digits used when displaying numerical results The indicated number of digits will be displayed except for trailing zeroes that will be dropped A
141. igure 2 1 Sample Datasheet 29 Data Management This chapter describes everything you need to know data and STATGRAPHICS Centurion including how to access it how to manipulate it and how to use it in statistical analyses 2 1 The DataBook Each column in the STATGRAPHICS Centurion datasheet represents a different variable Variables are usually attributes or measurements associated with the items that define the rows of the datasheet For example in the 93cars datasheet there is a column identifying the make of each automobile a column identifying its type columns containing the recorded miles per gallon in city and highway driving columns containing the automobile s length height and weight and similar information Each column has a name and pe associated with it The name is used to identify the data to use in a statistical analysis The Ape affects how it will be analyzed Also associated with each column is an optional comment which is used to provide additional information about the contents of a column Note the data were obtained from the Journal of Statistical Education Data Archive www amstat org publications jse jse_data_archive html and are used by permission To display or change the properties of any column in a datasheet double click on the column name to display the Modify Column dialog box Modify Column Make Cancel Comment Help Type C Numeric C Date Character C Month C Integer C Quarter Time
142. ile m Fill Fill Types c2 c o Owl m Color 5 15 E PUE Fill C Outline y sd LA C Colors i L ale 3D Effects oo vic o A ME NE DE NE ME NE DK Cancel Apply Help ok Figure 1 30 Graphics Options Tabbed Dialog Box Clicking on radio button 1 and then selecting a new F Type or Color will change the bars in the histogram NOTE The operations of many of the buttons on the analysis toolbar can also be accessed by clicking the alternate mouse button in the pane containing a table or graph This displays a popup menu listing the available operations 26 Getting Started 1 7 Disseminating the Results Once an analysis has been performed the results can be disseminated in various ways These include Action Method Print the output Press the printer button on the main toolbar to print all tables and graphs or click on a single pane with the alternate mouse button and select Print from the popup menu to print a single table or graph Publish the output for viewing ina Select Publish as HTML from the File web browset menu A dialog box will be displayed for you to specify the location of the HTML output Copy the output to another Click on the table or graph to be application copied and select Copy from the Edit menu Then activate the other application and select Edit Paste Save the analysis in a report Press the alternate mouse button and s
143. imated quantile of the larger sample If the samples come from identical populations the points should lie close to the diagonal line A constant shift left or right indicates that there is a significant difference between the centers of the two distributions Points diverging from the line at a slope different than that of the diagonal line indicate a significant difference in variability In this case the difference between the populations may be a little more complicated than a simple shift in the mean since the points are closer to the line at high and low temperatures than they are at central temperatures It appears that the distribution of temperatures for the females is more concentrated in the center than the distribution for the males 182 Comparing Two Samples Tutorial 3 Comparing More than Two Samples Comparing means and standard deviations one way ANOVA ANOM and graphical methods Chapter When data fall into more than two groups a different set of techniques need to be employed than in the previous chapter For example suppose you wished to compare the strength of widgets made from 4 different materials In a typical experiment you might make 12 widgets from each of the four materials in order to compare them The following data represent the results of such an experiment Material A Material B Material C Material D 64 7 60 4 58 3 60 8 64 8 61 8 62 1 60 2 66 8 6
144. ine at the location of the sample median which divides the data in half If the data come from a symmetric distribution this line should be close to the center of the box 3 Drawing a plus sign at the location of the sample mean Any substantial difference between the median and the mean usually indicates either the presence of an outlier a data value that does not come from the same population as the rest or a skewed distribution In the case of a skewed distribution the mean will be pulled in the direction of the longer tail 152 One Sample Analysis 4 Whiskers extending from the quartiles to the largest and smallest observations in the sample unless some values are far enough from the box to be classified as outside points in which case the whiskers extend to the most extreme points that are not classified as outside STATGRAPHICS Centurion follows Tukey in flagging two types of unusual points a Far outside points points more than 3 times the interquartile range above or below the limits of the box Note the interquartile range is the distance between the quartiles which is equal to the width of the box Far outside points are denoted by a point symbol usually a small square with a plus sign superimposed on it If the data come from a normal distribution the chance that any point will be far enough away from the box to be classified as far outside is only about 1 in 300 in a sample of the current size Unless t
145. ing Kolmogorov Smirnov Test for Temperature Estimated overall statistic DN 0 242548 Two sided large sample K S statistic 1 37737 Approximate P value 0 0449985 Figure 11 10 Output from Kolmogorov Smirnov Test The maximum vertical distance denoted by DN equals approximately 0 24 for the body temperature data The P value is used to determine whether or not the distributions are significantly different from each other A small P value leads to the conclusion that there zs a significant difference Since the P value for the sample data is less than 0 05 there 1s a significant difference between the male and female distributions at the 5 significance level Warning If data is heavily rounded this test may not be reliable since the empirical CDF may jump in large steps When possible it is best to rely on a compatison of selected distribution parameters such as the mean standard deviation or median 181 Comparing Two Samples 11 10 Quantile Quantile Plot A final plot available by selecting Quantile Quantile Plot from the Graphs dialog box plots the estimated quantiles of one sample versus the quantiles of the other sample Quantile Quantile Plot for Temperature Male il D o E D O 98 99 Gender Female Figure 11 11 O O Plot of Body Temperature Data There is a point on this graph corresponding to each observation in the smaller of the two samples Plotted on the other axis is the est
146. ing More than Two Samples The important entry in the above table is the P Value Since the P Value is small less than 0 05 the hypothesis of equal medians is rejected Pairs of medians can also be compared by selecting Box and Whisker Plot from the Graphs dialog box and using Pane Options to add median notches Box and W hisker Plot 56 58 60 62 64 66 68 response Figure 12 12 Box and Whisker Plots with Median Notches The range covered by each notch shows the uncertainty associated with the estimate of that group s median The notches are scaled in such a way that any two samples whose notches do not overlap can be declared to have significantly different medians at the default system significance level usually 5 In the above plot the notches for samples B C and D all overlap but the median for sample A is significantly higher than that of the other 3 samples Note the folding back behavior observed in Figure 12 12 occurs when a notch extends beyond the edge of the box 193 Comparing More than Two Samples 12 5 Comparing Standard Deviations It is also possible to test the hypothesis of equal standard deviations Null hypothesis 6 Og Oc Op Alternative hypothesis the standard deviations are not all equal This is done by selecting Variance Check from the Tables dialog box Variance Check Pest P Vae 0 143286 0 933432 Figure 12 13 Comparison of Sample Variances O
147. ion The estimated percent out of spec is now only 0 23 percent or 2 256 DPM one tenth of what it was using the normal distribution In this case incorrectly assuming a normal distribution makes the process appear much worse than it really is Note Depending on the specification limits and the true distribution incorrectly assuming normality may make the process appear significantly worse or significantly better than when using the proper distribution 244 Process Capability Analysis An alternative to selecting a different distribution is to transform the data The Analysis Options dialog box gives several choices for selecting a Data Transformation Process Capability Analysis Options t3 r Distribution Bimbaum Saunders C Generalized Logistic Lognormal 3 parameter Gere C Cauchy C Half Normal 2 parameter C Maxwell 2 parameter C Exponential Inverse Gaussian Normal Heb C Exponential 2 parameter Laplace C Pareto Exponential Power C Largest Extreme Value Pareto 2 parameter Parameters Folded Normal Logistic Rayleigh 2 parameter C Gamma C Loglogistic C Smallest Extreme Value C Gamma 3 parameter C Loglogistic 3 parameter C Weibull C Generalized Gamma Lognormal Weibull 3 parameter m Include la Data Transformation Lower Threshold Sigma Limits Long term and short term C None a fe Long term only labeled P C Logarithm Long term only labeled C C Power 5551 C Shor
148. ions 273 Design of Experiments Response Plot Options Surface Cancel C Contour Factors C Square Faos C Cube Help Contours pues EIE a1 j Horizontal Divisions 10 To 86 Vertical Divisions By zz lo C Lines V Contours Below C Painted Regions m Continuous C Continuous with Grid f Wire Frame Sold Resolution 51 Figure 16 22 Pane Options for Response Plots C Contoured The types of plots that may be created are 1 Surface plots the fitted equation as a 3 D surface with respect to any 2 experimental factors The surface may be a wire frame a solid color ot show contours levels for the response Contours below includes contours in the bottom face of the plot 2 Contour creates a 2 D contour plot with respect to any 2 experimental factors Contours may be shown as zes as on a topographical map as painted regions ot using a continuous color ramp 3 Square plots the experimental region for any 2 experimental factors and displays the predicted response at each corner of the square 4 Cube plots the experimental region for any 3 experimental factors and displays the predicted response at each corner of the cube To create this plot you must first press the Factors button and select a third factor 274 Design of Experiments The Factors button is used to select the factors that define the axes of the plots and the values
149. is Temperature Data variable Temperature degrees Scatterplot 130 values ranging from 96 3 to 100 8 The StatAdvisor This procedure is designed to summarize a single sample of data It Also included in the procedure are confidence intervals and hypothe Graphical Options buttons on the analysis toolbar to access these d Temperature Statistics s Temperature Box and Whisker Plot Minimu 963 Maxam 1008 Ree 1e s 6 ee Temperature Figure 1 21 One Variable Analysis Window The window contains 4 panes divided by movable splitter bars The two panes on the left display tabular output while the two panes on the right display graphical output If you double click in the bottom left pane the table of summary statistics will be maximized 20 Getting Started i One Variable Analysis Per Income The StatAdvisor This table shows summary statistics for Per Capita Income It includes measures of central tendency measures of variability and measures of shape Of particular interest here are the standardized skewness and standardized kurtosis which can be used to determine whether the sample comes from a normal distribution Values of these statistics outside the range of 2 to 2 indicate significant departures from normality which would tend to irvalidate any statistical test regarding the standard deviation In this case the standardized skewness value is within the range expec
150. is it is important to remove insignificant effects To remove effects 1 Press the Analysis Options button on the analysis toolbar 2 Press the Exclude button on the Analysis Options dialog box 3 On the Exclude Effects Options dialog box double click on any effect you wish to exclude which moves it from the Include column to the Exclude column Exclude Effects Options Include Exclude C concentration E catalyst D agitation rate DE Cancel Help Figure 16 18 Dialog Box for Excluding Effects The rule to follow in excluding effects is 1 Exclude any insignificant two factor interactions 2 Exclude any insignificant main effects that are not involved in significant interactions In this case that means removing everything that was not significant on the Pareto chart except for the main effect of B That main effect is retained because it is involved in a significant interaction with factor A Once the effects are removed the Pareto chart should appear as shown below 271 Design of Experiments Standardized Pareto Chart for yield A temperature AB C concentration E catalyst B flow rate 6 9 2 15 Standardized effect Figure 16 19 Standardized Pareto Chart after Removing Effects Except for the main effect of factor B all of the remaining effects are statistically significant The final model may be viewed by selecting Regression Coefficien
151. is also of considerable interest since it shows the cumulative probability that an individual will fall in a selected class or earlier classes For example 89 92 of all data values are equal to or less than 99 0 degrees 162 One Sample Analysis 10 6 Quantile Plot and Percentiles Another way to display cumulative probabilities is by selecting Quantile Plot from the list of Graphs in the One Variable Analysis procedure al One Variable Analysis Temperature Temperature lt 100 Quantile Plot E O O a 99 Temperature Figure 10 18 Quantile Plot In this plot the data are first sorted from smallest to largest The 7 7 largest data value is then plotted at Y j 0 5 n This estimates the proportion of the population at or below the observed temperature Like the rightmost column in the frequency table the curve represents the cumulative probability of an individual having a temperature less than or equal to that shown on the horizontal axis Since the temperature data were only measured to the nearest 0 1 degrees there are vertical jumps in the above display Figure 10 18 also shows a set of crosshair cursors These are created by pressing the alternate mouse button while viewing the graph and selecting Locate from the popup menu You can then use your mouse to drag the crosshairs to any location The small numbers near the crosshairs indicate their position In the above plot the crosshairs have been used to
152. isplayed with information from the StatWizard 9 Getting Started BS STATGRAPHICS Centurion Untitled StatFoli ole StatWizard Welcome to the StatWizard The StatWizard can help you select the appropriate a DataBook STATGRAPHICS analysis for collecting and analyzing your data Row StatAdvisor What task do you want to perform RH StatGallery n oo Enter New Data or Import It from an External Source Col 5 StatReporter zii iB StatFolio Comme C Design a New Experiment Gage Study Control Chart or Sampling Plan C Perform an Analysis that Does Not Require Data Show the Statwizard at Startup ach Cancel Help L For Help press F1 Figure 1 9 Initial StatWizard Dialog Box The StatWizard is designed to help new users quickly create a data file and begin analyzing its contents You may follow the instructions of the StatWizard or click on Cancel to suppress the StatWizard If you don t want the StatWizard to appear each time you start STATGRAPHICS Centurion uncheck Show the StatWizard at Startup before you leave this dialog box The sections that follow use the StatWizard to create a data file containing data from the 2000 United States Census 1 3 Entering Data In order to analyze data in STATGRAPHICS Centurion it must be placed into the STATGRAPHICS DataBook The DataBook consists of 10 datasheets indicated by the letters A through J each containing a re
153. ity from Grouped Data C Analyze Gage Measurement Errors Forecast Time Series Data Automatically Figure 8 11 Using the StatWizard Search Option If you select an item from the list all analyses that calculate the selected item will be displayed in the Select Analysis by Name field 136 Using the StatWizard K EN GO H ONL OF Cl UH UHL CH WEL ON Oh SH WH CE CP Wal aY To analyze your data you may select analyses based on the type of data choose an analysis by name select a SnapStat or Quick Pick or search for analyses that calculate a statistic of interest Search C Select fro 2D control chart n 3RSR smoother UUN Accelerated life tests Test Acceptance chart z ACF plot Add a fraction Compare Several Data Columns Fita Line Relating Y tox Compare 2 or More Regression Lines Calculate Correlations and Display a Matrix Plot Fit a Multiple Regression Model Fit a Regression Model for Counts Construct Control Charts for Individual Measurements Construct Control Charts for Subgroup Data Perform a Process Capability Analysis for Variables Perform a Process Capability Analysis for Attributes Summarize a Time Series Forecast a Time Series Set up a Gage AR Study Design an Experiment Cancel To analyze your data you may select analyses based on the type of data choose an analysis by name select a SnapStat or Quick Pick or search for analyses that calculate a statistic of interest C Select A
154. ix options assume that you wish to create only a single analysis The last option will create multiple analysis windows one for each unique value contained in the indicated column This is an easy way to specify a BY variable for a set of analyses You will next be asked whether you wish to transform any of the indicated variables If you reply affirmatively the following dialog box will be displayed 133 Using the StatWizard StatWizard Variable Transformations X lt is sometimes necessary to transform data to achieve approximate normality Specify the transformation you would like to apply to each dependent variable MPG City Transformation None Square Root Natural Log Base 10 Log Reciprocal Square Root Power E C C Reciprocal C C C Are Sine Square Root Back Cancel Help Figure 8 9 StatWizard Variable Transformation Dialog Box You may select a transformation for one or more variables If a transformation is requested the appropriate expression will be created For example requesting a square root for MPG City would create the expression SORT MPG City for use by the analysis procedures A final dialog box will then be displayed listing all analyses appropriate for the type of data you have specified 134 Using the StatWizard StatWizard Analysis Selection You have selected a single numeric dependent variable Now select the analyses you want to perform
155. linearity of the actual relationship between MPG City and Weight 13 3 Fitting a Nonlinear Model The Simple Regression procedure includes the ability to fit a wide variety of nonlinear models To assess the relative improvement that various models could make select Comparison of Alternative Models from the Tables dialog box This will fit all of the possible models and list them in decreasing order of R squared 205 Regression Analysis Comparison of Alternative Models 80 65 80 44 79 54 79 14 79 00 78 83 78 35 78 03 77 16 75 78 75 14 74 15 73 56 Logistic mofe Log probit no fit gt o o Figure 13 9 Alternative Nonlinear Models The models at the top of the list explain the largest percentage of the variation in the response variable R squared is only one criteria that can be used to help pick a model Models with somewhat lower R squared values than the model at the top of the list may be preferable if they make more sense in the context of the data In the current example an attractive model near the top of the list is the Reciprocal Y model This model takes the form 1 z 8 B WKeit MPGCity Bo B A 206 Regression Analysis In it the reciprocal of miles per gallon gallons per mile is expressed as a linear function of weight It is not uncommon that transformations of Y X or both may lead to better models To fit a Reciprocal Y model press the Analysis Options b
156. loodpressure breaking IE checksheet i aircraft i boards ie bspline E chemical ree i arima charts i Bodyfat i candidates im circuits Arrhenius m bodytemp m cans m cities baseball m bottles capacitors cloth of J File name Bead Files of type STATGRAPHICS Files sf5 s 3 sf st w Cancel Help Figure 2 5 Selecting a STATGRAPHICS Data File You can read data files from STATGRAPHICS Centurion or any previous version of STATGRAPHICS including STATGRAPHICS Pius The data in the file will replace the contents of the currently selected datasheet 33 f Data Management 2 2 2 Reading Data from an Excel ASCII XML or Other External Data File To read data that has been saved in a data file created by another application select any of the 10 datasheets in the DataBook by clicking on its tab Then select File Open Open Data Source and specify External Data File on the dialog box shown below Open Data Source t3 Data Source emm OK C STATGRAPHICS Data File i External Data File Cancel C ODBC Query Help Clipboard Figure 2 6 Open Data Source Dialog Box After pressing OK select the desired file Open Data File x File name Process data Files of type Excel Files xls p Cancel Help Figure 2 7 Selecting an Excel Data File Use the Fikes of type field to specify the type of file to be read The most common sele
157. ltiplicative Reciprocal Y Log Squared Y Log Reciprocal Square Root Y Reciprocal 5 Curve Ceeeeeeee eeeeeee ve Double Reciprocal Squared Y Reciprocal Squared X Square Root Y Squared X Log Y Squared X Reciprocal Y Squared X Double Squared Logistic Log Probit Alternative Fit None least squares only Minimize absolute deviations C Use medians of 3 groups Cancel Figure 3 11 Simple Regression Analysis Options Dialog Box Help If you examine the output in Figure 3 7 it may be noted from the table of alternative models that several curvilinear models give a higher R squared value than the linear model At the top of the list is the Curve model If this model is selected on the Analysis Options dialog box and the OK button is pressed the entire analysis will change to reflect the new model As may be seen by examining the plot of the fitted model an S Curve captures the curvature in the data quite well 67 Running Statistical Procedures Plot of Fitted Model MPG City exp 2 1328 2799 07 Weight Weight Figure 3 12 Fitted S Curve Model 3 2 6 Pane Options Button In addition to options that apply to the entire analysis window many individual tables and graphs have options that apply only to them These options ate accessible by first maximizing the selected table or graph and then pressing Pane Options For a Fitted Model Plot the pane options are 68 Running Stati
158. ly been entered into a file you can read it into the datasheet by selecting F7 e Open Open Data Source This allows you to read data stored in various file formats including Excel XLS files delimited ASCII text files XML files and STATGRAPHICS files 2 Copy and paste using the Windows clipboard If you have the data loaded into a program such as Excel you can easily copy it to the Windows clipboard and then paste it into STATGRAPHICS Centurion by selecting Edit Paste 3 Issue a SQL query to retrieve it from a database If the data resides in an ODBC compatible database such as Oracle or Microsoft Access it can be retrieved by selecting File Open Open Data Source and then selecting ODBC Query 32 Data Management 2 2 1 Reading Data from a STATGRAPHICS Centurion Data File To read data that has already been saved in a STATGRAPHICS Centurion data file select any of the 10 datasheets in the DataBook by clicking on its tab Then select Fe Open Open Data Source and specify SLATGRAPHICS Data File on the dialog box shown below Open Data Source Data Source STATGRAPHICS Data File C External Data File C ODBC Query Help Clipboard OK Cancel s Figure 2 4 Open Data Source Dialog Box After pressing OK select the desired STATGRAPHICS file OpenDataFile 7 Je Look in G Data e m ter Eg 93cars a beetles e breadwrapper E census2000 IE absorbers a b
159. mended that you explore the tutorials since they will give you a good idea of how STATGRAPHICS Centurion is best used when analyzing actual data NOTE a copy of this manual in PDF format is included with the program and may be accessed from the He p menu In the PDF document all of the graphs are in color The data files and StatFolios referenced in the manual are also shipped with the program StatPoint Inc July 2005 vii Preface viii Preface Chapter Getting Started Installing SLATGRAPHICS Centurion XV launching the program and creating a simple data file 1 1 Installation STATGRAPHICS Centurion is distributed in two ways over the Internet in a single file that is downloaded to your computer and as a set of files on a CD ROM To run the program it must first be installed on your hard disk As with most Windows programs installation is extremely simple Step 1 If you received the program on a CD insert the CD into your CD ROM drive After a few moments the setup program should begin automatically If it does not open Windows Explorer and execute the file setup exe in the root directory on the CD ROM If you downloaded the program over the Internet locate the file that you downloaded and double click on it to begin the installation process Step 2 A number of dialog boxes will then be displayed The first dialog box welcomes you to STATGRAPHICS Centurion Just press the Nex button Step 3 The second d
160. mparison Dialog Box The Input field indicates how the data for the two samples have been entered 1 Two Data Columns the data for each sample is in a different column 2 Data and Code Columns the data for both samples is in the same column and a second column contains codes that differentiate between the two samples The odyfemp 5 2 file has the second type of structure with all 7 130 observations in one column named Temperature while a second column named Gender contains the label Female or Male In the Sect field an entry has been made to select only rows for which Temperature is less than or equal to 100 This will exclude row 15 from the analysis which was determined in Chapter 10 to be an outlier The initial analysis window contains four panes with a summary of the data a dual histogram summaty statistics by group and a dual box and whisker plot 172 Comparing Two Samples Hg Two Sample Comparison Temperature amp Gender Temperature lt 100 Two Sample Comparison Temperature amp Gender Ten Temperature Female Sample 1 Gender Female Sample 2 Gender Male Selection variable Termperature lt 100 Sample 1 64 values ranging from 96 4 to 100 0 Sample 2 65 values ranging from 96 3 to 99 5 frequency The StatAdvisor This procedure is designed to compare two samples of data It will calculate sample and it will run several tests to determine whether there are statistical tw
161. n Analysis Options Dialog Box Two stepwise options are provided 1 Forward Selection starts with a model containing only a constant and brings variables in one at a time if they improve the fit significantly 211 Regression Analysis 2 Backward Selection starts with a model containing all of the variables and removes them one at a time until all remaining variables are statistically significant In both methods removed variables may be reentered at a later step if they later appear to be useful predictors or variables entered early may later be removed if they are no longer significant Performing a backward selection results in the following model Multiple Regression 1 MPG City Dependent variable 1 MPG City Lo Standard e CONSTANT 0 0034427 0 00243602 1 484325 0 1610 Horsepower 0 0000260839 0 0000124356 2 09752 0 0388 Weight 0 0000129513 0 0000011041 11 7302 0 0000 Analysis of Variance Source Residual 0 00159524 90 00000177249 Total Corr 0 00855567 amp 92 T R squared 81 3546 percent R squared adjusted for d f 80 9403 percent Standard Error of Est 0 00421009 Mean absolute error 0 00313061 Durbin Watson statistic 1 62892 P 0 0338 Lag 1 residual autocorrelation 0 184113 The StatAdvisor The output shows the results of fitting a multiple linear regression model to describe the relationship between 1 MPG
162. n and select De ete Item from the popup menu 11 8 Using the StatGallery 6 5 Printing the StatGallery To print the items in the StatGallery 1 Activate the StatGallery window by clicking on it with your mouse 2 Press the Print icon on the main toolbar or press the alternate mouse button and select Print from the popup menu You may print all of the pages or a selected set of pages 1 19 Using the StatGallery 120 Using the StatGallery Chapter Using the StatReporter Copying analyses to the StatReporter annotating the output and saving the results in an RIF file for import into Microsoft Word The StatReporter is a window in which output from different statistical procedures can be integrated into a formal report It is a standalone version of WordPad running within STATGRAPHICS Centurion The StatReporter allows you to 1 Create a complete report within STATGRAPHICS without the necessity of using another application This can be very useful where resources are limited as on a production floor 2 Save the contents of the StatReporter in an RTF Rich Text Format file which can be read directly into programs such as Microsoft Word 7 1 The StatReporter Window The StatReporter consists of a separate window within STATGRAPHICS Centurion created automatically when the program is loaded It consists of a single rich edit control together with a toolbar 121 Using the StatReporter StatReporter Times Ne
163. n red any points with values equal to that entered in the Locate field used in conjunction with the Identify button Al Locate by row highlights in red any points corresponding to the row number entered in the Row field Each of these buttons is described in detail in Chapter 4 3 2 8 Exclude Button Some statistical procedures allow you to interactively remove suspected outliers from an analysis by maximizing a graph clicking on the suspect point and pressing this button For example the plot in Figure 3 14 shows one point that is well outside the prediction limits Clicking on that point and pressing the Exclude button causes the model to be refit without the point The fitted model plot shows the new model indicating which point or points have been removed with an X 71 Running Statistical Procedures Eg Simple Regression MPG City vs Weight BAX Plot of Fitted Model MPG City exp 2 15435 2721 98 Weight 1600 2100 2600 3100 3600 4100 4600 Weight Figure 3 15 Fitted S Curve Model after Excluding a Suspected Outlier All of the other tables and graphs in the analysis window will also change to reflect the new model Multiple points may be excluded from a model by clicking on them one at a time and pressing the Exclude button Clicking on a point that has been removed will put it back into the model 3 3 Printing the Results To print the results of a statistical analysis two options are available
164. nalysis Based on Type of Data C Search Kruskal Wallis test Select Analysis by Name C Select from the Following Quick Picks Analyze Experiment Multiple S ample Comparison One Way ANOVA Summarize a Single Column of Data Test Data for Normality Test Data for Outliers Compare Two Independent Data Columns Compare Two Paired Data Columns Compare Several Data Columns Fit a Line Relating Y to Compare 2 or More Regression Lines Calculate Correlations and Display a Matriz Plot Fit a Multiple Regression Model Fit a Regression Model for Counts Construct Control Charts for Individual Measurements Select a SnapStat Analyze a Single Column of Data Compare Two Columns of Data Compare Two Paired Data Columns Compare Several Columns of Data Fit a Curve Relating Y to Assess Process Capability from Individual Measurements Construct Control Charts for Subgroup Data Perform a Process Capability Analysis for Variables Perform a Process Capability Analysis for Attributes Summarize a Time Series Forecast a Time Series naly Set up a Gage HH Study orecast Time Series Data Automatically Design an Experiment ow Back Cancel Help Figure 8 12 List of All Analyses Matching the Search Option ssess Frocess Capability from Grouped Data ze Gage Measurement Errors LELEEEECELELELELELE C As C Anal For To run a selected analysis 1 Click on the Select Analysis by Name radio button 2 Highlig
165. nce of Boolean 0 s and 1 s where 0 represents FALSE and 1 represents TRUE When used in the Sect field of a data input dialog box the result is the selection of all rows for which the condition is TRUE and the exclusion of all rows for which the condition is FALSE 3 2 Analysis Windows Once the data have been specified a new analysis window is created Simple Regression MPG City vs Weight Simple Regression MPG City vs Weight Dependent variable MPG City miles per gallon in city dr Independent variable Weight pounds Plot of Fitted Model G City 47 0484 0 00803239 Wei 55 Linear model Y a b X Coefficients Teast Squares Standard 14 95 FT ea en ga Model s2 i mss Total Con 93537 9 Correlation Coefficient 0 843139 R squared 71 0883 percent R squared adjusted for d f 70 7705 percent Standard Eror of Est 3 03831 Mean absolute error 1 99274 16091 G6 001 BODI 04500 Durbin Watson statistic 1 64586 P 0 0405 Lag 1 residual autocorrelation 0 176433 W e j g ht lt m Figure 3 4 Simple Regression Analysis Window The window is a splitter window with multiple panes divided by movable splitter bars Tables are located along the left side of the window while graphs are located along the right 61 Running Statistical Procedures You can maximize the table or graph in any pane by double clicking on it in which case it will
166. nd ROWS 21 70 inclusive RANDOM k Selects a random set of amp rows RANDOM 50 column value Selects only rows for which column is less than value Passengers 5 column lt value Selects only rows for which column is less than or equal to value Passengers lt 5 column gt value Selects only rows for which column is greater than value Passengers gt 5 column gt value Selects only rows for which column is greater than or equal to value Passengers gt 5 column value Selects only rows for which column equals value Cylinders 6 column lt gt value Selects only rows for which column does not equal value Cylinders lt gt 4 condition amp condition2 Selects only rows that meet both Cylinders 6 amp value in Linarycolumn does not equal 0 conditions Make Ford condition condition2 Selects only rows that meet at least Cylinders 6 one of the conditions Make Ford binarycolumn Selects only rows for which the Domestic Figure 3 3 Allowable Entries for the Select field 60 Running Statistical Procedures When specifying a condition involving a non numeric variable va ue must be enclosed in double quotes and 7s case sensitive Multiple conditions may be combined using the conditional AND amp and OR symbols Each of the allowable entries in the Se field actually generates a seque
167. ne of four tests will be displayed depending on the settings for Pane Options Three of the available tests including Levene s test display P Values A P Value less than 0 05 leads to rejection of the hypothesis of equal sigmas at the 5 significance level In this case the standard deviations are not significantly different from one another since the P Value is well above 0 05 In summary it appears that the mean strength is different for different materials However the variability amongst widgets made of the same material is about the same across all four materials 12 6 Residual Plots Whenever a statistical model is fit to data it is important to examine the residuals from the fitted model In this analysis there is a residual corresponding to each of the 7 48 widgets defined as the difference between a widget s strength and the average strength of all widgets made of that same material The Graphs dialog box contains an entry for automatically generating plots of the residuals Depending on the selection in Pane Options you may plot the residuals by group versus predicted values or in row order as found in the datasheet The plot below shows the residuals plotted versus predicted strength 194 Comparing More than Two Samples Residual Plot residual predicted value Figure 12 14 Plot of residuals Versus Predicted Strength In these types of plots y
168. ngth inches X4 Weight pounds X5 Wheelbase inches X Width inches Pressing OK displays the analysis window Multiple Variable Analysis Multiple Variable Analysis Data variables MPG City miles per gallon in city driving Engine Size liters Horsepower rnaxiraum Length inches Weight pounds Wheelbase inches Width inches 2 3656 2 66774 t 828 5 61981 1 03736 523144 251274 38 8854 36 4146 O y ea p moa fone fose S 3 09 ooo fooooo jo Engine Size 07100 0732 0 o je um ESSE GEM Figure 13 2 Multiple Variable Analysis Window The upper left pane lists the input variables while the center left pane displays summary statistics There are a total of 93 rows in the data file that have complete information on all of the variables to be analyzed The matrix plot on the right displays X Y plots for each pair of variables 199 Regression Analysis MPG City Figure 13 3 Matrix Plot with Added Smooth To interpret the plot find a variable s label such as MPG City The indicated variable is displayed on the vertical axis of every plot in that row and on the horizontal axis of every plot in that column Each pair of variables is thus shown twice once above the diagonal and once below it Robust LOWESS smoothers have been added in the above figure by maximizing the pane and selecting the Smooth Rot
169. nt Generate Data X Expression REPECOUNTA AILI 000000000 E E uj Delete Operators Variables Figure 2 27 Generating Blend Numbers The Generate Data option evaluates a STATGRAPHICS Centurion expression and places the result into the selected column In the expression shown above two important operators are used COUNT from to by generates values beginning at from and ending at 7o at intervals equal to by COUNT 1 4 1 thus generates the integers 1 2 3 and 4 REP X repetitions repeats each value in X repetitions times in groups In this case each integer between 1 and 4 is repeated 3 times The treatment numbers can be generated in a similar manner by clicking on the column 2 header selecting Generate Data from the Edit menu and entering the following 52 Data Management E Expression RESHAPE COUNT 1 3 1 12 fa Delete perators 4 B B E E Variables DK Cancel Display Figure 2 28 Generating Treatment Numbers This expression uses an additional operator RESHAPE X size repeats the values in X in a circular fashion until s7ze values have been generated In this case the sequence 1 2 3 is repeated 4 times These pattern generators can be helpful when the data file to be created is large 2 4 2 Generating Random Numbers Random numbers may be generated in STATGRAPHICS Centurion in two ways 1 If the numbers come from an e
170. o samples Gender Female Gender Maie unt as Average 8 1046 Median 984 981 Maimu 1000 95 EE vi Temperature Figure 11 2 Two Sample Comparison Analysis Window After removing the outlier there are 7 64 observations for females ranging from 96 4 to 100 0 degrees and 7 65 observations for males ranging from 96 3 degrees to 99 5 degrees 11 2 Summary Statistics The Summary Statistics table shows statistics calculated for each sample 173 Comparing Two Samples Summary Statistics for Temperature Gender Female _ Gender Male nterquarilerange 0 8 10 Figure 11 3 Summary Statistics by Sample Several facts are of particular interest 1 The average temperature of the females is about 0 25 degrees higher than that of the males The difference between the medians is 0 30 degrees 2 The standard deviation of the females is slightly less than that of the males indicating that the body temperatures of the females may be less variable than those of the males 3 Both samples have standardized skewness and standardized kurtosis values within the range of 2 to 2 As explained in Chapter 10 values within that range are consistent with the hypothesis that the data come from normal distributions Whether or not the apparent difference between the females and the males is statistically significant remains to be determined 11 3 Dual Histogr
171. o summarize a single sample of data It will calc Also included in the procedure are confidence intervals and hypothesis tests dirai Options buttons on the analysis toolbar to access these different Strength S Statistics for Streng Boxand Whisker Plot Minimum 1913 Maxxum 2295 The StatAdvisor This table shows summary statistics for Strength It includes measures c 210 variability and measures of shape Of particular interest here are the sta Strength kurtosis which can be used to determine whether the sample comes fror iw Figure 15 1 One Variable Analysis Window Several interesting factors are immediately evident 236 Process Capability Analysis 1 The data are all within the specification limits but just barely ranging from 191 3 to 229 5 2 The box and whisker plot shows a far outside point a small square with a red plus sign drawn through it Such points are often considered to be outliers if the rest of the data appear to come from a normal distribution In this case however even discounting that apparent outlier the shape of the box is not very symmetric The upper whisker is longer than the lower whisker and the box extends farther above the median the vertical line within the box than it does below 3 Ifyou expand the Summary Statistics pane you will see that the standardized skewness equals 4 94 If the data came from a normal distribution this value should lie between
172. ocedures in STATGRAPHICS Centurion expect the data to be analyzed to be in a single column Sometimes data is not arranged in such a format As a simple example suppose you have a sample of 12 observations arranged into 4 columns as follows E lt untitled gt 1 2 3 4 5 6 7 8 9 Figure 2 21 Sample Data in Multiple Columns To place this data in a single column multiple copy and paste operations could be performed A simpler solution is to use the Rowwise Statistics procedure found under Describe if you are using the classic menu and under Analyze if you are using the Six Sigma menu This procedure first presents a data input dialog box requesting the names of the columns containing the data 47 Data Management B Columns Select Iv Sort column names Cancel Delete Transform Hep Figure 2 22 Data Input Dialog Box for Rowwise Statistics It then takes the data and displays statistics for each row Rowwise Statistics Data columns Col 1 Col 2 Col 3 Col 4 Scatterplot Number of rows 3 The StatAdvisor This procedure calculates summary statistics for the 3 rows in m S 3 o 1 Sara O Row Count Average Deviation Minimum Ma no 4 60 316028 20 90 2 Ja 137 236901 20 X 70 B o 4 325 2620080 jio 70 Total 12 433333 27734 io 9 0 The StatAdvisor v Thie tehle chore cala ctatictice fov the 2 voire in trany data
173. of Number of Items Defects Produced Texas 67 6 237 Virginia 53 7 343 Let 0 be the proportion of defective items produced in Texas Let 0 be the proportion of defective items produced in Virginia The estimated proportions are given by 7 00107 6 23 0 0072 6237 7343 Based on this data it appears that the percentage of defective items produced in Texas may be greater than the percentage of defective items produced in Virginia To determine whether this apparent difference is statistically significant create a datasheet as shown below E untitied Jal Attribute Texas Virginia 1 Defective 67 53 2 Hot defective 6170 7290 3 4 5 6 1 X namh B t D E F amp 8 amp Figure 14 18 Datasheet for Comparing Two Proportions The rows hold counts of defective and non defective items Then select Contingency Tables from the same menu as Crosstabulation Enter 233 Analyzing Attribute Data Contingency Tables Labels attibute Row scores fSolumn scores EL Select v Sort column names Cancel Delete Transform Help Figure 14 19 Contingency Tables Data Input Dialog Box The analysis will display a chi squared test of the 2 by 2 table Tests of Independence Chi Squared 4 783 0 0287 Figure 14 20 Chi Squared test of the 2 by 2 Table Recall that the chi squared test determines whether or not ro
174. on of Standard Deviations for Temperature ss Gender Female Gender Male Standard deviation 10 684262 0 698756 0 468214 0 48826 Ratio of Variances 0 958945 95 0 Confidence Intervals Standard deviation of Gender Female 0 582853 0 828723 Standard deviation of Gender Male 0 595887 0 844885 Ratio of Variances 0 584028 1 57609 F test to Compare Standard Deviations Null hypothesis sigmal sigma2 Alt hypothesis sigmal NE sigma2 F 0 958945 P value 0 8684 Do not reject the null hypothesis for alpha 0 05 Figure 11 6 Two Sample Comparison of Standard Deviations The most important output in this table is highlighted in red 1 Ratio of Variances displays a 95 confidence interval for the ratio of the variance of the population of females o divided by the variance of the population of males o Variance is a measure of variability calculated by squaring the standard deviation Note comparisons of variability amongst more than one sample are usually based on variances rather than standard deviations since they have more attractive mathematical properties 177 Comparing Two Samples The interval for 6 0 ranges from 0 58 to 1 58 This indicates that the variance of the females may well be anywhere between approximately 58 of the variance of the males to 158 of their variance This lack of precision is typical when trying to compare the variability of relatively small samples 2 Th
175. on that data r Descriptive Methods IW Create a Scatterplot of Y versus X Regression Analysis SNAPSTAT Fit a Curve IM Fit Y as a Function of X Fit Y versus Powers of x IM Compare Various Transformations of Y J Fita Calibration Curve to Predict X Given Y Fit a Parametric Model for Lifetime Data iT Fita Nonparametric Model for Lifetine Data Figure 8 10 StatWizard Analysis Selection Dialog Box Back Cancel Select one or more analyses from the list When you press OK an analysis window will be created for each selected analysis 8 3 Searching for Desired Statistics or Tests If you wish to calculate a particular statistic or test and are unsure which of the analyses calculates it you may enter your data into a datasheet and then press the Sa Wizard button on the main toolbar On the initial StatWizard dialog box select Search and pull down the list A list of all statistics tests and other calculations performed by STATGRAPHICS Centurion will be displayed 135 Using the StatWizard C Select Analysis Based on Type of Data C Select Analysis by Name Acceptance Control Chart Arrhenius Plot Augment Design Atitamatic Farerastina C Select a SnapStat Analyze a Single Column of Data Compare Two Columns of Data Compare Two Paired Data Columns Compare Several Columns of Data Fita Curve Relating Y tox C Assess Process Capability from Individual Measurements Assess Process Capabil
176. on that dialog box allows you to change the statistics calculated by default when the Ome V ariable Analysis is run as well as several other procedures that display summary statistics Preferences Capability Control Charts Runs Tests Crosstabs Graphics Gage Studies General EDA ANOVA Regression Forecasting Stats Dist Fit Summary Statistics IV Average v Maximum M Median M Range Mode M Lower Quartile Geometric Mean M Upper Quartile Trimmed Mean is X M Interquartile Range Winsorized Mean 1 6 Sextile Variance 5 6 Sextile v Std Deviation Intersextile Range IV Coeff of Variation Skewness Std Error M Std Skewness Winsorized Sigma Kurtosis MAD IV Std Kurtosis Sbi Sum Iv Minimum E Sum of Squares Figure 10 6 Preferences Dialog Box Used to Select Defauit Statistics 10 3 Box and Whisker Plot A useful graphical display for summarizing data invented by John Tukey is the box and whisker plot displayed in the lower right corner of Figure 10 3 and enlarged below 151 One Sample Analysis One Variable Analysis Temperature Yog Box and Whisker Plot 97 98 99 100 Temperature Figure 10 7 Box and Whisker Plot for Body Temperatures The box and whisker plot is constructed by 1 Drawing a box extending from the lower quartile to the upper quartile The middle 50 of the data values are thus covered by the box 2 Drawing a vertical l
177. ons 1 59 One Sample Analysis Frequency Plot Options X Number of Classes OK E Cancel Lower Limit 6 Help Upper Limit 101 Hold dii Counts r Plot Type Relative Histogram Cumulative Polygon Figure 10 15 Pane Options Dialog Box for Frequency Histogram In setting the classes the number of significant digits in the data should be considered For example body temperatures were measured only to the nearest 0 1 of a degree The width of the intervals covered by the bars should thus be an integer multiple of 0 1 That way each bar will cover the same number of possible measurements The plot below shows 25 intetvals between 96 and 101 degrees each covering an interval of 0 2 degrees 160 One Sample Analysis i One Variable Analysis Temperature Temperature lt 100 Histogram ak ok N oo oo frequency 97 98 99 100 101 Temperature Figure 10 16 Frequency Histogram with Redefined Classes With the greater number of classes more detail is apparent The general shape of the distribution is similar to that of a bell shaped normal curve The data displayed in the histogram may be shown in tabular form by pressing the Tables button H on the analysis toolbar and selecting Frequency Tabulation 161 One Sample Analysis Frequency Tabulation for Temperature Lower Upper Relative Cumulative Cum Rel fatorbelow 960 0 f 0000 00 0000 i
178. or of the graph s background and border For example changing the Background color to yellow and adding 3D Effects modifies the plot as shown below TEI Graphs Plot of Fitted Model MPG City exp 2 1328 2799 07 Weight 45 35 25 15 1600 2100 2600 3100 3600 4100 4600 Weight Figure 4 3 Plot after Modifying Background Color and Selecting 3D Effects 78 Graphs 4 1 2 Grid Options The Grid tab is used to add a grid to the plot Graphics Options B X Axis Y Axis Profile Layout Grid Lines Points Top Title Direction Horizontal Vertical Both C None Type Eg ELE Colors Line Thickness Thinnest Thickest Back Grid on 3D Plots Figure 4 4 Grid Tab on Graphics Options Dialog Box Adding a gray dashed line grid in Both directions produces the following graph 79 Graphs Plot of Fitted Model MPG City exp 2 1328 2799 07 Weight 1600 2100 2600 3100 3600 4100 4600 Weight Figure 4 5 Plot after Adding a Grid 80 Graphs 4 1 3 Lines Options The Lines tab is used to specify the type color and thickness of lines on a graph Graphics Options t3 TopTile Axis Y Ass Profile Layout Grid Lines Points Legend Line Set Line Types Ca C2 f 12 s L p Lp C amp C 15 Line Thickness C 8 16 thinnest Thickest 49 5 9 go ao Colors 19
179. ou should look for 1 Outliers isolated residuals far away from all of the others Such points would need further investigation to determine whether an assignable cause exists that explains their unusual behavior 2 Hleteroscedasticity a systematic change in the variance as the predicted values increase or decrease This condition typically results in a funnel like appearance in the plot and might necessitate transforming the original observations by taking logarithms of the data before performing the analysis Procedures such as the Maple Range Tests will not work properly when the within group variability differs significantly amongst the groups If desired the residuals may be saved to a column of any datasheet by pressing the Save Results button i on the analysis toolbar 195 Comparing More than Two Samples 12 7 Analysis of Means Plot ANOM A somewhat different way to compare several means is by using an Analysis of Means Plot also available on the Graphs dialog box Analysis of Means Plot With 95 Decision Limits wa 1 UDL 62 80 Str CL 61 83 c 63 i LDL 60 85 B at j ot E 60 1 59 E gt A B C D Figure 12 15 Analysis of Means Plot Designed to be similar to a control chart this plot displays each sample mean together with a vertical line drawn to the grand mean of all the observations Decision limits are included above and below the grand mean Any sample means that fall out
180. ould allow you to better refine the estimate of the response surface by adding second order terms such as temperature and flow tate 278 Design of Experiments 2 You could generate points along the path of steepest ascent in an attempt to move quickly to regions of higher yield The path of steepest ascent is the path that begins at the center of the experimental region and moves in the direction of greatest change in the estimated response for the smallest changes in the experimental factors Following that path can be very effective in obtaining dramatic improvements very quickly Points along the path of steepest ascent may be generated by selecting Path of Steepest Ascent from the Tables menu The Pane Options dialog box controls the location at which points are generated On the dialog box select a factor to step in even increments and the amount to step it by Path of Steepest Ascent Options Factor to Step Ca temperature iS TET flow rate concentration is o Heb agitation rate s C catalyst s fs 5 5 5 fE 5 Step By Number of Steps Resolution B fio 101 Figure 16 28 Path of Steepest Ascent Dialog Box In the dialog box above emperature has been selected and steps of 5 degrees each have been specified STATGRAPHICS Centurion then determines the values of the other factors that follow the path of steepest ascent as well as predicting what the yield will be 279 Design of Expe
181. our desktop To launch the program Step 1 Click on the shortcut that was added to your desktop or press the Windows Szart button in the bottom left corner of your screen and click on the Szafgrapbics icon You may also select Programs Files Statgraphics STATGRAPHICS Centurion XV using Windows Explorer and click on the sewin application icon to execute the program Step 2 When STATGRAPHICS Centurion loads it will open up a new window The first time you launch the program the License Manager dialog box will be displayed 6 Getting Started STATGRAPHICS Centurion License Manager t3 Kg Welcome to STATGRAPHICS Centurion XV License Summary Username Your name Organization Your company License type Activation pending Expiration March 15 2005 Product key F141 X091 C211 250 You must activate this license before continuing To activate the license enter your Activation Code in the field below and then press the UPGRADE button If you do not have an Activation Code press the Get Code button Please be prepared to supply the Product Key shown above m Instant Upgrade Serial number is UPGRADE Activation code l UPGRADE f you do not have an activation code press Get Code Gk Cancel Help Figure 1 6 License Manager Dialog Box Within 30 days after receiving your serial number you must contact StatPoint Inc to register your copy and obtain an activation cod
182. ove Experimental Design Creation Create New Design A series of dialog boxes will be presented through which you can configure the design The first dialog box requests the type of design the number of response variables and the number of experimental factors Create Design Options Design Class Screening Response Surface Cancel Mixture Multilevel Factorial Help Inner Outer Arrays Single Factor Categorical Multi Factor Categorical Variance Components hierarchical E EBENEN No of Response Variables 2 No of Experimental Factors 5 Comment Tutorial 7 Figure 16 4 Initial Design Creation Dialog Box Continuing with the example of the last section set up the experiment is to involve 2 response variables and 5 experimental factors The second dialog box is used to specify the experimental factors and the range over which they will be varied 258 Design of Experiments Factor Definition Options Name t t temperate Oi eg Low fe Back High Help E 80 Units or comment degrees q v Continuous didi SRM OF NT Sais SE SEIN Chow GRON eo Figure 16 5 Definition of Experimental Factors To specify the information for the 5 factors click on the radio buttons labeled A through E one at a time Enter the following information for the five factors in the current experiment Factor Name Low High Units Continuous A temp
183. ox will be displayed Z Brushing Air Bags 2 de Cylinders Brush with Domestic Horsepower Drive Train 1 Engine Size Fueltank Horsepower Length Luggage Make Manual Mj v Sort column names Cancel Delete Help Figure 4 22 Dialog Box for Selecting Brushing Variable Select a quantitative variable to use to code the points After selecting the variable to brush with a floating dialog box will appear 94 Graphs Brushing Figure 4 23 Floating Dialog Box for Selecting Brushing Interval The two slider bars are used to specify lower and upper limits for the variable All points in the plot are colored light blue if they fall within the specified interval For example in the plot below all automobiles with horsepower between 55 0 and 121 15 are colored light blue Figure 4 24 Matrix Plot after Brushing Points It is evident from the above plot that Horsepower is strongly correlated with the other variables 4 4 Smoothing a Scatterplot To help visualize the relationships between the variables in a scatterplot a smoother may be x added To smooth a scatterplot press the S7ooth Rotate button will display the following dialog box on the analysis toolbar This 95 Graphs Scatterplot Smoothing Options oN oe Cancel C Running Means C Running Lines C Locally Weighted Regression C Robust Lowess Smoothing Fraction
184. plots of each sample can be displayed by selecting Ouantile Plot from the Graphs dialog box Quantile Plot Gender oe Female Male proportion 96 97 98 99 100 101 Temperature Figure 11 9 Side by Side Quantile Plots 180 Comparing Two Samples The quantile plot illustrates the proportion of data in each sample that is below a given value of X as a function of X If the samples come from the same population the quantile plots should be close together Any offset of one plot to the right or left of the other indicates a difference between their means A difference in the slope of the curves indicates a difference between the standard deviations In the above plot it is quite evident that the distribution of the females is shifted to the right of the males The overall slopes however are similar 11 9 Two Sample Kolmogorov Smirnov Test One additional nonparametric test that may be performed if the assumption of normal distributions is not tenable is the two sample Kolmogorov Smirnov test This test is based on calculating the maximum vertical distance between the cumulative distribution functions of the two samples which is approximately the maximum distance between the two quantile plots in Figure 11 9 If the maximum distance is large enough the two samples may be declared to come from significantly different populations Selecting Ko mogorov Smirnov Test from the Tables dialog box displays the follow
185. presents the number of observations that fall in the interval of semperature covered by the bar The number of bars and their range is set by default based on the sample size z using whatever rule has been selected on the EDA Exploratory Data Analysis tab of the Edit Preferences dialog box 158 One Sample Analysis Preferences E3 Capability Control Charts Runs Tests Crosstabs Graphics Gage Studies General EDA ANOVA Regression Forecasting Stats Dist Fit m Histograms Number of Classes Bosplots r Probability Plots Sturges rule C Vertical Horizontal 10 log1O n Horizontal Vertical C Scott s rule Features 3 Fitted Line C Freedman Diaconis rule Median Notch C None Fixed number M Outlier Symbols Using Quartiles p IV Mean Marker C Using Least Squares m Density Trace 4 C Boxcar Cosine Interval Width fo Cancel Apply Help Figure 10 14 EDA Tab of the Preferences Dialog Box Using Sturges rule the number of bars is set to the smallest integer that is not less than 1 3 322log Other rules such as the 10log rule tend to produce more bars by default and may be preferable if you tend to work with large data sets A temporary override for a histogram once it has been created is available by double clicking on the histogram to maximize its pane and then selecting Pane Opti
186. ptions Dialog Box Hel Save as o 7 MV Make Default NEMO MIN MEIN M To change the system defaults 1 Modify the features of a graph in any analysis window Set the colors fonts and other options that you want future graphs to reflect 2 Select Graphics Options from the analysis toolbar and go to the Profile tab 3 Check Make Default 4 Select any of the 12 user profiles and press the Save as button the system profiles are read only 5 Enter a name for the profile to be saved 143 System Preferences Save Profile Profile name OK M y new profile Ls Cancel Help Figure 9 3 Save Profile Dialog Box 6 Press OK to save the current set of graphics attributes colors fonts point and line styles etc in a new profile The next graph created will use the newly saved profile You can also apply other saved profiles to a new graph by creating the graph with default settings and then 1 Select Graphics Options from the analysis toolbar and go to the Profile tab 2 Select any of the 15 profiles and press the Load button The current graph will be immediately updated to reflect the settings in the selected profile 144 System Preferences Chapter Tutorial 1 Analyzing a Single Sample Summary statistics histogram box and whisker plot confidence intervals and hypothesis tests A very common problem in statistics is that of analyzing a sample of n o
187. r The table shows 2 experimental designs which have at least a 90 chance of detecting an effect of magnitude 1 5 None of the designs have more than 10 runs in each block Figure 16 3 Selected Screening Designs Two designs have been suggested 1 A full 2 factorial design consisting of all combinations of 2 levels of the 5 experimental factors This is a relatively large design with 8 runs in each of 4 blocks It has much greater power than requested 2 A half fraction in 2 blocks of 10 runs each Each block consists of 8 factorial or corner points and 2 centerpoints The design is resolution IV and thus can estimate all of the main effects and some of the two factor interactions A quick calculation reveals that given 5 factors the effects of possible interest are 1 grand mean 5 main effects 10 two factor interactions 1 block effect magr Without the block effect the design would be resolution V since 16 factorial runs is enough to estimate the mean and the 15 other effects If this design is chosen only 1 two factor interaction will thus need to be sacrificed to the block effects Since the second design is much smaller than the first the engineer selected it 257 Design of Experiments 16 2 Creating the Design Once a design has been selected you can return to the main menu and 1 Ifusing the Classic menu select DOE Design Creation Create New Design 2 Ifusing the Six Sigma menu select Impr
188. r model Y a b X Coefficienis Least Squares 47 0484 1 67991 Slope 0 00803239 0 000536985 2100 2600 3100 3500 Sum of Squares Weight 206552 f1 2065 52 840 051 9 23133 Residual Plot Total Cor 2905 57 Bo j MPG City 47 0184 0 0C603239 We it Conrelation Coefficient 0 843139 R squared 71 0823 percent R squared adjusted for d f 70 7705 percent Standard Error of Est 3 03831 Mean absolute error 1 99274 Dubin Watson statistic 1 64586 P 0 0405 Lag 1 residual autocorrelation 0 176433 Studentized residual 2100 200 3100 39 00 4100 r m gt Weight Figure 3 9 Simple Regression Analysis Window with Added Graph 3 2 4 Save Results Button l This button allows you to save numetical results calculated by the statistical analysis back to columns of a datasheet For Simple Regression it displays the following choices 65 Running Statistical Procedures Save Results Options Save Target Variables IV Predicted Values PREDICTED Cancel Lower Limits for Predictions LOWERPLIMS Upper Limits for Predictions JUPPERPLIMS e Lower Limits for Forecast Means LOWERCLMS DARRER Upper Limits for Forecast Means UPPERCLIMS CA Residuals RESIDUALS V Studentized Residuals SRESIDUALS Leverages LEVERAGES Help gt eeerevreeee ae fe a Gal t i es Autosave MV Save comments Figure 3 10 Simple
189. r more tests for normality will be displayed Each of the available tests is based on the following set of hypotheses Null hypothesis the data come from a normal distribution Alternative hypothesis the data do not come from a normal distribution 241 Process Capability Analysis A P Value below 0 05 leads to rejection of the hypothesis of normality at the 5 significance level In the table above the Shapiro Wilks test soundly rejects the hypothesis that the data come from a normal distribution Therefore any estimated DPM values or capability indices based on the assumption of normality are not valid When the data are non normal one of two possible approaches may be followed 1 Select a distribution other than the normal on which to base the analysis 2 Transform the data so that it follows a normal distribution in the transformed metric To assist in selecting a different distribution STATGRAPHICS Centurion provides an option called Comparison of Alternative Distributions on the Tables dialog box This option fits several other distributions and lists them in order of their goodness of fit Using the default selection of distributions yields the following output Comparison of Alternative Distributions Gamma hB 013436 Figure 15 8 Fitted Distributions in Order of Goodness of Fit The distributions have been listed according to the value of the Kolmogorov Smirnov goodness of fit statistic
190. r uuu IEBIEITEGIS C 10 C 20 Cancel Apply Help Figure 4 6 Lines Tab on Graphics Options Dialog Box A plot such as that of the fitted model has three line sets the line of best fit the inner confidence limits and the outer prediction limits To change any of these types click on radio button 1 2 or 3 and then select the desired attributes Increasing the thickness of the center line and changing the other line types results in 81 Graphs Plot of Fitted Model MPG City exp 2 1328 2799 07 Weight 45 35 25 15 ee 1600 2100 2600 3100 3600 4100 4600 Weight Figure 4 7 Plot after Modifying the Line Types Note you can only change the thickness of solid lines 82 Graphs 4 1 4 Points Options The Points tab is used to specify the type color and size of points on a graph Graphics Options X TopTile XAxis YAxis Profle Layout Grid Lines Points Legend PointSet Paint Types ew LIK sie l lL Fill Point k Point Size Smallest Largest x GO d 9 X FP O6 x E NN E NE NEM MEM MI Ox u4 tb o x O Point Thickness f Colors Thinnest Thickest Cancel Apply Help Figure 4 8 Points Tab on Graphics Options Dialog Box Radio button 1 controls the attributes of the first set of points on a graph In the current example there is only one set Changing the points to solid
191. residuals new text panes will be added to the analysis window 63 Running Statistical Procedures S Hal Simple Regression MPG City vs Weight Simple Regression MPG City vs Weight Dependent variable MPG City miles per gallon in city drt Independent variable Weight pounds Plot of Fitted Model Linear model Y a b X G City 47 0484 0 00803239 Wei Coefficients zeastsgueres Standard T yj gt lt m Comparison of Alternative Models R Squarec 81 29 Recinrocal Y 0 8969 Unusual Residuals Ww r r f ea E fiso 4 amp 0 33454 125666 4 16001 G26 0D GX50DI 0000 278 Weight pi ouo io aam 64773 219 v Figure 3 7 Simple Regression Analysis Window with Added Tables 3 2 3 Graphs Button B Clicking on this button displays a list of additional graphs that may be created Graphs t3 Iv Plot of Fitted Model Observed versus Predicted IV Residuals versus X Residuals versus Predicted Residuals versus Row Number Cancel All Help Figure 3 8 Simple Regression Graphs Dialog Box 64 Running Statistical Procedures Adding a residual plot places an additional graph in the analysis window i Simple Regression MPG City vs Weight Simple Regression MPG City vs Weight Dependent variable MPG City miles per gallon in city driv Plot of Fitted Model Independent variable Weight pounds MIG Cly O DIGL DEE We Git Linea
192. riments Path of Steepest Ascent for yield Lo Predicted degrees C 165 0 137 5 83 7405 170 0 11 0775 6 73825 137 5 1 28119 84 5739 175 0 11 2385 6 95299 137 5 1 3093 85 57 180 0 11 4566 7 13861 137 5 1 3336 86 8115 Figure 16 29 Predicted Values on the Path of Steepest Ascent Of course no one knows what will actually happen when you move outside the experimental region but the path of steepest ascent suggests the best direction in which to look 280 Design of Experiments Suggested Reading The following books are excellent readable sources of information about the statistical techniques described in this guide Basic statistics Applied Statistics and Probability for Engineers 3 edition by Douglas C Montgomery and George C Runger 2003 John Wiley and Sons New York Analysis of variance Applied Linear Statistical Models 5 edition by Michael H Kutner Christopher J Nachtsheim and John Neter 2004 McGraw Hill Regression analysis Applied Linear Regression 3 edition by Sanford Weisberg 2005 John Wiley and Sons New York Statistical process control Introduction to Statistical Quality Control 5 edition by Douglas C Montgomery 2005 John Wiley and Sons New York Design of experiments Statistics for Experimenters Design Innovation and Discovery 2 edition by George E P Box William G Hunter and J Stuart Hunter 2005 John Wiley and Sons New York 281 Su
193. rion In particular it has shown how to read data from files and databases and how to manipulate that data once it has been placed in a STATGRAPHICS Centurion datasheet At any given time the status of the datasheets may be displayed by activating the DataBook window and selecting DataBook Properties ftom the Edit menu or by selecting StatLink from the F7 menu 54 Data Management Sheet Data Source C A eean oor ented pp kas bP kas r kas r kas r Kui Fr D rc Polling us Off gave Save As C On Update every go seconds Requery Run script C minutes C hours k m x Display variable comments Figure 2 30 DataBook Properties Dialog Box This dialog box shows the current source of the data within each datasheet If desired datasheets may be made read only so that data in them cannot be changed inadvertently It is also possible to poll the data source reread it at regular intervals and have the statistical procedures update automatically These important features are described in Chapter 5 55 Data Management 56 Data Management Chapter Running Statistical Analyses Generating an analysis selecting additional tables and graphs selecting options changing the input data and saving the results There are over 150 statistical selections on the main STATGRAPHICS Centurion menu Each selection acc
194. ror M Lower Quartile V Stnd Skewness Geometric Mean Winsorized Sigma Iv Upper Quartile Kurtosis Trimmed Mean b X MAD V Interquartile Range V Stnd Kurtosis Winsorized Mean Sbi 1 6 Sextile Sum Variance v Minimum 5 6Sextile Sum of Squares Cancel Al Help Figure 10 4 Summary Statistics Options Dialog Box Including the sample median quartiles and the interquartile range results in Summary Statistics for Temperature ranges Eea Jose Figure 10 5 Summary Statistics Table A common assumption for measurement data is that it comes from a normal or Gaussian distribution i e from a bell shaped curve Data from a normal distribution are fully described by two statistics n 25 1 The sample mean ot average x 2 98 25 which estimates the center of the n distribution 149 One Sample Analysis 2 The sample standard deviation s 0 733 which is related to the spread of the distribution For a normal distribution approximately 68 of all values will lie within one standard deviation of the population mean approximately 95 within two standard deviations and approximately 99 73 within three standard deviations The sample mean and standard deviation fully describe the sample only if it comes from a normal distribution Two statistics that may be used to check this assumption ate the standardized skewness and standardized kurtosis These statistic
195. s measure shape 1 Skewness measures symmetry or lack thereof A symmetric distribution such as the normal has zero skewness Distributions in which values tend to lie farther above the peak than below have positive skewness Distributions in which values tend to lie farther below the peak than above have negative skewness 2 Kurtosis measures the shape of a symmetric distribution A normal or bell shaped curve has zero kurtosis A distribution that is more peaked than the normal has positive kurtosis A distribution that is fatter than the normal has negative kurtosis If the data come from a normal distribution both the standardized skewness and standardized kurtosis should be within the range of 2 to 2 In this case the normal distribution appears to be a reasonable model for the data Another useful summary of the data is provided by John Tukey s five number summary Minimum smallest data value 96 3 Lower quartile 25 percentile 97 8 Median 50 percentile 98 3 Upper quartile 75 percentile 98 7 Maximum largest data value 100 8 These five numbers divide the sample into quarters and form the basis of his box and whisker plot described in the next section 1 50 One Sample Analysis Note Selecting additional summary statistics using Pane Options changes the selection for the current analysis only To change the default statistics for future analyses go to the Edit menu and select Preferences The Stats tab
196. s the Calculate button to display the associated values of the other statistics in the Results field Assuming that the process mean does not shift a C of 1 33 equates to about 33 defects per million beyond the nearer spec 252 Process Capability Analysis Chapter Tutorial 7 Design of Experiments Designing an experiment to help improve a process All data are not created equal Often a small but properly planned study provides more information than a large badly designed study This final tutorial examines some of the capabilities of STATGRAPHICS Centurion for creating and analyzing designed experiments Consider the case of an engineer who wishes to determine which of many process variables have the greatest impact on the final product She intends to investigate the impact of changing 5 factors input temperature flow rate concentration agitation rate and percent of catalyst In practice this problem could be approached in several ways including 1 Trial and error arbitrarily selecting a different combination of the factors each time she runs an experiment Such an approach rarely yields useful information 2 One factor at a time experimentation holding all but one factor constant to determine the effect of that factor This approach is extremely inefficient and can be misleading if any of the factors interact 3 Using a statistically designed experiment setting out a sequence of experiments to perform that will
197. scription Execute Analysis title Updates the indicated analysis Assign STATGRAPHICS Centurion Column name Evaluates the expression and expression assigns it to the indicated column Print Window s to print Prints the contents of the indicated windows Publish Runs StatPublish to publish the contents of the StatFolio in HTML format Shell Windows command to execute Command Causes Windows to execute a argument command Delay Number of seconds Pauses for the specified time Load Name of StatFolio Specifies StatFolio to load after script is run This allows StatFolios to be executed in a chain Exit Exits STATGRAPHICS Centurion Figure 5 3 Start Up Script Operators In the example shown in Figure 5 2 a Simple Regression is performed Within that analysis it is assumed that Save Results has been set to automatically save the residuals from the fitted model in a column called RESIDUALS The residuals are then divided by the original data values and multiplied by 100 to create percentage errors which are assigned to a new variable called PERROR The values in PERROR are then summarized using the One Variable Analysis procedure after which the results of both analyses are printed Note that StatFolios can be chained together using the LOAD operator in one script to load and start the script in another StatFolio You can also automatically exit STATGRAPHICS Centurion using the EXIT operator NOTE You can suppress exe
198. side the limits may be declared to be significantly different than the grand mean In this case the interpretation is that widgets from sample A are significantly stronger than average while widgets from samples C and D are significantly weaker than average This type of interpretation can sometimes be quite useful 196 Comparing More than Two Samples Chapter Tutorial 4 Regression Analysis Fitting linear and nonlinear models selecting the best model plotting residuals and displaying results One of the most heavily used sections of STATGRAPHICS Centurion is the set of procedures that fit statistical regression models In a regression model a response variable Y is expressed as a function of one or more predictor variables X plus noise In many but not all cases the functional form is linear in the unknown coefficients so that the model can be expressed as b B BX T Box F B X Pgh B X amp where the subscript represents the observation in the data sample the B s are unknown model coefficients and is a random deviation usually assumed to come from a normal distribution with mean 0 and standard deviation o Given a set of data with a response variable Y and one or more possible predictor variables the goal of regression analysis is to construct a model that 1 Desctibes the relationships that exist between the variables in a manner that permits Y to be predicted well given known values of the X
199. sion of the colors from one plot to the next shows a decrease in strength with increasing polyethylene When pasting a graph into the StatGallery you may select Paste Link from the alternate mouse button popup menu rather than Paste With paste link the graph in the gallery is hot linked back to the analysis window in which it was originally created and will change in the StatGallery whenever it changes in the original analysis window 6 3 Overlaying Graphs When a graph is pasted into a pane in the StatGallery that already contains a graph you are given the choice of replacing the graph already there or overlaying the new graph on top of the old Overlaying one graph on another can be useful as when fitting two different statistical models 1 16 Using the StatGallery B StatGallery Page 1 Next Page Prey Page First Page Last Page Plot of Fitted Model MPG City exp 2 1328 2799 07 Weight 1600 2100 2600 3100 3600 4100 4600 Weight Figure 6 4 Overlaid Graphs in the StatGallery When a graph is overlaid on top of another that is already in the StatGallery only the contents of the second graph zzs de the axes are added to the display Text from the second graph is not included Note If the scaling of the second graph is different than the first the second plot will be adjusted so that it matches the first 6 4 Modifying a Graph in the StatGallery Certain aspects of a graph may be changed after it is pasted into
200. sis Toolbar Showing Row Number of Selected Point Additional information about the point may be obtained by pressing the Identify button and selecting a column from the DataBook 97 Graphs Point Identification t3 Identify by 2 TR J Sort column names Cancel Delete Help Figure 4 29 Point Identification Dialog Box After selecting a variable clicking on any point will add the value of that variable to the Labe field on the analysis toolbar H B a lt p g Pd Label Honda d Row 42 d Figure 4 30 Analysis Toolbar Showing Make of Selected Point The binoculars buttons a to the right of the Label and Row fields may be used to locate points on a graph If you enter a value into either edit field and then press the corresponding Locate button all points in the graph matching the entered value will be highlighted For example the plot below colors the points for all Hondas light blue 98 Graphs Simple Regression MPG City vs Weight Plot of Fitted Model MPG City exp 2 1328 2799 07 Weight 1800 2100 2800 3100 3600 4100 4600 Weight Figure 4 31 Plot Highlighting All Hondas This technique is also quite effective on a Matrix Plot In the following display all points corresponding to row 42 have been highlighted Figure 4 32 Matrix Plot Highlighting Row 42 Locating a point in a Matrix Plot can help identify whether it is an ou
201. ss points than that on the plot If you press the J7 er button a dialog box will appear allowing you to add a little jitter random offset to the points Jittering Horizontal a Less More Cancel Vertical Help i Less More Figure 4 19 Jittering Dialog Box In this case adding a small amount of horizontal jitter gives a much better picture of the location of the points 92 Graphs Plot of MPG City vs Cylinders H H y 4 5 6 Cylinders Figure 4 20 Scatterplot after Horizontal Jitter Each point has been offset a small random amount along the horizontal axis Jittering a plot affects only the display It has no affect on the data in the datasheet or any calculations made with it 4 3 Brushing a Scatterplot An interesting method of visualizing relationships between variables is to color the points of a scatterplot according to the value of another variable For example consider the following Matrix Plot for selected variables from the 93cars sf6 file 93 Graphs Figure 4 21 Matrix Plot for Data from the 93cars File The scatterplot in each cell of the matrix plots the values of variables corresponding to its row and column identifiers Suppose you wished to visualize how the horsepower of the automobiles was related to the 5 plotted variables If you press the Brush button 7 on the analysis toolbar the following dialog b
202. sstosnuasssvassassavassonsavavats KR YR EA VERO 171 11 1 Running the Two Sample Comparison PrOCeHUFe iusacetis arc hrerticee c ap rores 171 T2 Sammary Statist CS ET TET 173 DD Dual AS Cota m iiai iaaa 174 11 4 Dual Box and Whisker Plot es sss sssesseesseesseessressressressreerenerentesntesnresnresneeeeunteenrtsnrensrerrsereenere 175 11 5 Comparing Standard Deviations siessen is 177 11 5 Comparto Means ed eget patere nda vU M E E T EEE rM TUE 178 11 7 Comparing NCA AT 179 11 8 Quantile Plot D e 180 119 Two Sample Kolmogotov Smitnov Test scsccasasscscacessescisanssscasavavestetassnksetitenitscienrinncsiarsvecivenises 181 11 10 Quantle Quantile Plot itr err t o Ee cH n 182 Tutorial 3 Comparing More than Two Samples eee 183 12 1 Running the Multiple Sample Comparison Procedure usine bui ARUM ped M E ctbdt us 184 12 2 Analysis V QUELS guidato actin dan pesos hl dod epubiennafnen ba brad on 188 NS M 190 v Table of Contents 124 Comparing Mediatis spans ns tepals telson esa T ues 192 12 5 Comparing Standard DeviaiOoPs i aciacrasesdetceciincaetavud e etab i choses quu AL DURS EU URUN LM LLL UR 194 12 0 Residual PY T eiai 194 12 7 Analysis of Means Plot ANOM iss cssisinssivvacinaesascidaansaertdennetiaasinaividanieibscatorataciascatinsasieisinapuials 196 Tutorial 4 Regression Analysts vencre ecrin ceasedivanensheveneiesenacsvatavanensdavacsvatevats 197 15
203. stical Procedures Plot DK All models EE C Least squares fit only C Altemative fit only Help Include v Prediction Limits Confidence Limits Confidence Level Axis Resolution f Figure 3 13 Pane Options Dialog Box for the Fitted Model Plot For example removing the check mark alongside Confidence Limits and pressing OK will replot the graph without the inner limits 69 Running Statistical Procedures H Simple Regression MPG City vs Weight Plot of Fitted Model MPG City exp 2 1328 2799 07 Weight 1600 2100 2600 3100 3600 4100 4600 Weight Figure 3 14 Fitted Model Plot Without Confidence Limits 3 2 7 Graphics Buttons Whenever a graph is maximized within the analysis window several additional buttons are enabled These buttons include Graphics options displays a dialog box used to change colors labels axis scaling and other similar features icd Add text used to add additional text to the graph t Jitter used to offset points randomly in the horizontal or vertical direction to prevent their overplotting each other 70 Running Statistical Procedures s Brush colors points on a scatterplot according to the value of a selected variable a Suootb Rotate smoothes a 2 dimensional plot or rotates a 3 dimensional plot Identify displays a label identifying a point when clicked on with the mouse Al Locate by name highlights i
204. t term only Box Cox optimized Figure 15 12 Analysis Options Dialog Box for Selecting Transformation Choices include a natural logarithm raising each value to a specified power or selecting a transformation according to the methods of Box and Cox The latter approach considers a variety of transformations of the form Y using the methods of Box and Cox and selects an optimal value for p If a transformation is selected a normal distribution is fit to the transformed data The plot below shows the results of taking the Box Cox approach 245 Process Capability Analysis Process Capability for Strength 25r Normal after transformation f J Mean 9 32622E 14 20 f Std Dev 1 47639E 14 15 Cp 1 02 L 1 Pp 0 99 Cpk 0 92 1 Ppk 0 89 3 K 0 41 gt o Cc o 2 oO oO un wu Strength Figure 15 13 Capability Plot after Box Cox Transformation For the plot an inverse transformation has been applied to show the fit in the original metric The transformation has had a similar effect on the shape of the distribution although not as strong as assuming a largest extreme value distribution The estimated DPM is 4 353 which is about twice as large as when using the largest extreme value distribution but still much smaller than when a normal distribution is assumed Note the mean and standard deviation displayed on the plot correspond to the transformed
205. ta is the One Variable Analysis procedure This procedure calculates summary statistics such as the sample mean and standard deviation It also creates several plots including a histogram and box and whisker plot The location of the One Variable Analysis procedure depends on the menu you ate using 1 Classic menu Select Describe Variable Data One V ariable Analysis 18 Getting Started 2 Six Sigma menu Select Analyze Variable Data One Variable Analysis Like all statistical procedures the One Variable Analysis begins by displaying a data input dialog box One Variable Analysis Median Age Per Capita Income Dus Percent Female Pe Capita Income Population Siate Select v Sort column names Cancel Delete Transform Help Figure 1 20 One Variable Analysis Data Input Dialog Box The list box at the left displays the names of all columns in the data sheets that contain data To analyze the data in the Per Capita Income column click on its name and then click on the button with the black arrow alongside the Dara field This places the name of the column containing the income data into the Data field Leave the e c field blank it is used only when you want to analyze a subset of the rows in the datasheet instead of all the rows When OK is pressed a new analysis window will be created 19 Getting Started Es One Variable Analysis Temperature One Variable Analys
206. ta may be transformed using an algebraic expression or mathematical function 4 The datasheet may be sorted according to one or more columns 5 Data values may be recoded to form groups or for other reasons 6 Data extending over multiple columns can be rearranged into a single column if required by a statistical procedure These important operations are described below 2 3 1 Copying and Pasting Data The STATGRAPHICS Centurion datasheet supports many normal spreadsheet operations including cut copy paste insert and delete The one important fact to remember when using these operations is that every column has a specified type If you inadvertently paste character data into a numeric column STATGRAPHICS Centurion will change the type of that column to 37 Data Management 8 accommodate the new data If you ever have any doubt about a column s type click on the column header to display the Modify Column dialog box You can change the type of the column using that dialog box 2 3 2 Creating New Variables from Existing Columns STATGRAPHICS Centurion has a wide array of operators to assist in performing mathematical calculations One of the most important uses of these operators in data analysis is to create new variables based on existing columns In STATGRAPHICS Centurion new variables may be created 1 On the fly directly within the data fields on data input dialog boxes without saving the variable in the datasheet
207. te test 156 F test 177 factorial designs 255 file directory temporary 142 FIRST 60 formulas absolute value 42 284 Index average 42 backward differencing 42 conversion to Z scores 42 exponential function 42 lag by k petiods 42 log base 10 42 maximum 42 minimum 42 natural logarithm 42 square toot 42 standard deviation 42 fractional factorial designs 255 frequency histogram 24 158 174 237 Frequency Tabulation 161 Friedman test 192 FTP 110 gage R amp R studies 127 Generate Data 43 51 goodness of fit 242 graphical ANOVA 189 Graphics Options 26 axes 87 fills tab 89 grid tab 79 layout 77 lines tab 81 points tab 83 profiles 142 text labels and legends 90 top title tab 85 graphs 3D effects 77 adding text 90 axis powers 141 axis scaling 87 axis titles 87 background 77 black and white 141 changing default appearance 142 copying to other applications 100 excluding points 71 fonts 88 freezing scaling changes 88 identifying points 97 labels 141 log scaling 88 maintain aspect ratio 141 modifying 76 rotating 95 rotating axis labels 87 saving in image files 100 tickmark gap 141 toolbar buttons 70 Graphs 64 Grubbs test 156 heteroscedasticity 195 HSD intervals 191 HTML files 109 hypothesis tests comparing distributions 181 comparing means 178 comparing medians 179 compating proportions 234 comparing several means 18
208. ted for data from a normal distribution The standardized kurtosis value is within the range expected for data from a normal distribution r3 Figure 1 22 Maximized Summary Statistics Pane Several interesting statistics are given in the table Of the 7 51 states plus D C per capita income ranges between 15 853 and 28 766 The average per capita income is 20 934 50 Beneath the table is the output of the StatAdvisor which gives a short interpretation of the results In this case the StatAdvisor concentrates on the two statistics displayed in red which measure the skewness and kurtosis in the data As explained by the StatAdvisor data that come from a normal or Gaussian distribution should yield standardized skewness and standardized kurtosis values between 2 and 2 In this case both statistics are within that range indicating that a bell shaped normal curve is a reasonable model for the observations although the skewness is very close to being statistically significant Double clicking on the summary statistics table again will restore the original split display Double clicking on the bottom right pane then maximizes the box and whisker plot 21 Getting Started One Variable Analysis Per Capita Income Box and Whisker Plot 18 21 24 27 Per Capita Income 30 X 1000 Figure 1 23 Maximized Box and Whisker Plot Pane The box and whisker plot invented by John Tukey provides a 5 number summary of a d
209. test to compare means Null hypothesis mean mean2 Alt hypothesis meanl NE mean2 assuming equal variances t 2 06616 P value 0 040846 Reject the null hypothesis for alpha 0 05 Figure 11 7 Two Sample Comparison of Means 178 Comparing Two Samples The most important output in this table is again highlighted in red 1 Difference between the Means assuming equal variances displays a 95 confidence interval for the mean of the population of females minus the mean of the population of males The interval for u u ranges from 0 01 to 0 49 indicating that the mean female body temperature is somewhere between 0 01 degrees and 0 49 degrees higher than the mean body temperature of the males 2 The P value associated with a t test of the hypotheses stated above Since P is less than 0 05 there is sufficient evidence upon which to reject the hypothesis of equal means and thus declare the two population means to be significantly different at the 5 significance level Note that this test was made assuming that the variances of the two populations are equal which was validated by the F test in the previous section Had the variances been shown to be significantly different an approximate t test could have been requested by accessing Pane Options and removing the checkmark from the checkbox labeled Assume Equal Sigmas It thus appears that the females come from a population with a higher mean temperature than that of th
210. th a line by which to determine which effects are statistically significant 4 Main Effects Plot plots the estimated change in the response when each of the factors is moved from its low level to its high level The standardized Pareto chart in the upper right corner can be used to quickly determine which effects are most important 267 Design of Experiments Standardized Pareto Chart for yield A temperature pe AB EN C concentration Ete C AE D agitation rate B flow rate C CD block 4 8 12 16 Standardized effect Figure 16 14 Standardized Pareto Chart The length of each bar is proportional to the value of a t statistic calculated for the corresponding effect Any bars beyond the vertical line are statistically significant at the selected significance level set by default at 5 In this case there are 3 significant main effects temperature concentration and catalyst There is also a significant interaction between eperature and flow rate The Main Effects Plot in the bottom right pane shows how each factor affects y7e d 268 Design of Experiments Main Effects Plot for yield 86 F 85 b TEE 83 E yield temperature concentration catalyst flow rate agitation rate Figure 16 15 Main Effects Plot The lines indicate the estimated change in yze d as each factor is moved from its low level to its hig
211. that the coefficient corresponding to a selected variable equals 0 given that all of the other variables remain in the model P Values greater than 0 05 indicate that a variable does not contribute significantly to the fit in the presence of all of the other variables Except for Weight all predictors have P values above 0 05 This implies that at least one of those predictor variables could be removed without hurting the model significantly Note It would be wrong at this point to assume that all 5 predictor variables with P values above 0 05 could be removed Due to the high multicollinearity in the data all P values may change dramatically if even one variable is removed from the model A useful method for simplifying the model is to perform a stepwise regression In a stepwise regression variables are added or removed from a regression model one at a time with the goal of obtaining a model that contains only significant predictors but does not exclude any useful variables Stepwise regression is available as an option on the Analysis Options dialog box Multiple Regression Options Fit Box Cox Transformation C All Variables Power Cancel C Forward Selection fi E _ Cancel Backward Selection Addend Help mei Cochrane Orcutt Transformation v Constant in Model F to Enter F to Remove Autocorrelation E NM F Max Steps Display 50 C Final Model Only All Steps Figure 13 15 Multiple Regressio
212. the number of runs in the largest block In this case the engineer selected a half fraction in two blocks of 8 runs each The final dialog box is used to add centerpoints or replicate runs Blocked Screening Design Options Base Design Half fraction in 2 blocks Runs 20 Eror d f 4 Centerpoints Replicate Design ancel Number Per Block Number 2 o jenerator Placement Back Random IV Randomize ja C Spaced Help C First Last Figure 16 8 Design Options The input fields specify 261 Design of Experiments 1 Centerpoints the number of runs to be performed in the center of the experimental region Adding centerpoints is a good way to add degrees of freedom for the experimental error 2 Placement the placement of the centerpoints The most common choices ate Random which spreads the centerpoints randomly throughout the other runs and Spaced which spaces the centerpoints evenly throughout the design 3 Replicate design the number of additional times each set of experimental conditions is to be run Replicating the entire design this way can increase the number of runs to be done very quickly 4 Randomize whether the runs should be listed in random order Randomization should be done whenever possible to prevent external lurking variables such as changes in the process over time to bias the results For the current experiment four centerpoints have been requested bringing the final design
213. the requested information If you have not yet purchased the program leave the serial number field blank The program will then run in evaluation mode for 30 days following the first time you install it on your computer After 30 days you must purchase a license to continue to use the program Once the evaluation license expires only the license manager will display Step 5 The next dialog box indicates the directory in which the program will be installed 3 Getting Started Is STATGRAPHICS Centurion XV InstallShield Wizard E3 Destination Folder Click Next to install to this Folder or click Change to install to a different Folder Y A Install STATGRAPHICS Centurion XV to C Program Files Statgraphics STATGRAPHICS Centurion XV InstallShield Figure 1 3 Destination Folder Dialog Box By default STATGRAPHICS Centurion is installed in a subdirectory of Program Files named STATGRAPHICS Centurion XV If you are installing the program on a network server install it in any location where all potential users have read access Write access by users is not required Consult the Support page at www statgraphics com for full instructions on network installation Step 6 The next dialog box allows you to specify the type of installation to be performed 4 Getting Started y STATGRAPHICS Centurion XV InstallShield Wizard E3 Setup Type Choose the setup type that best suits your needs Please sele
214. timated standard errors A Studentized residual thus indicates how many standard errors the data value is from the fitted model STATGRAPHICS Centurion actually calculates Studentized deleted residuals Deleted residuals are calculated by withholding one observation at a time refitting the model and determining the number of standard errors that the withheld observation lies from the newly fitted model This keeps outliers from having a large impact on the model when its residual is calculated The Unusual Residuals selection on the Tables dialog box lists all Studentized residuals that are greater than 2 in absolute value Unusual Residuals Predicted Studentized _ Figure 13 12 Table of Unusual Residuals 208 Regression Analysis Studentized residuals greater than 3 such as row 57 are potential outliers that do not appear to belong with the rest of the data Row 57 corresponds to a Mazda RX 7 which is recorded as achieving only 17 miles per gallon in city driving although the model predicts 22 5 mpg Since the next section adds additional variables to the model which may help improve its predictive ability for such sports cars row 57 will not be excluded from the fit although careful attention should be paid to it 13 5 Multiple Regression To improve the model other predictor variables need to be added This is most easily accomplished using the Multiple Regression analysis which may be found on the m
215. tion 0 62408 1 0 798114 Figure 10 20 95 Confidence Intervals for the Mean and Standard Deviation The confidence intervals provide a bound on the potential error in estimating the mean and standard deviation of the population Given the remaining 7 129 observations we can declare 164 One Sample Analysis with 95 confidence that the mean population temperature is somewhere between 98 11 degrees and 98 35 degrees Likewise the standard deviation of the population is somewhere between 0 624 degrees and 0 798 degrees Selecting Pane Options additional confidence intervals can be requested using the bootstrap method Confidence Intervals Options Confidence Level Interval Type OK 95 Y Two Sided i Upper Bound eyed Lower Bound Iv Include Bootstrap Number of Subsamples 500 Help ad Figure 10 21 Confidence Intervals Options Dialog Box Bootstrap intervals unlike the intervals in Figure 10 20 do not rely on the assumption that the population follows a normal distribution Instead random samples of 7 129 observations are taken from the data sampling with replication the same observations may be selected more than once This is repeated 500 times sample statistics are calculated and the most central 95 of the results are used to calculate the confidence intervals The table below shows bootstrap intervals for the population mean standard deviation and median Confidence Int
216. tlier with respect to more than one variable 99 Graphs Note the color used to highlight the points is specified on the Graphics tab of the Preferences dialog box accessible from the Edit menu 4 6 Copying Graphs to Other Applications Once a graph has been created in STATGRAPHICS Centurion it can be easily copied to other programs such as Microsoft Word or PowerPoint by 1 Maximizing the pane containing the graph 2 Selecting Copy from the STATGRAPHICS Centurion Edit menu 3 Selecting Paste while in the other application By default graphs are pasted in Picture format which corresponds to a Windows metafile In rare cases when you wish to paste the graph in some other format you can select Paste Special instead of a simple Paste To copy an entire analysis to another application including all tables and graphs first copy the analysis to the StatReporter using the alternate mouse button popup menu and then copy the StatReporter to the other application This technique is illustrated in Chapter 7 To copy both the graph and its enclosing window as in Figure 4 31 above a third party screen capture tool is recommended In producing this manual a program called Szaglt has been used which is available for purchase at www techsmith com If you use Snaglt we recommend that you set the Input option to Window and the Output option to Clipboard You can then paste images directly into any document 4 7 Saving
217. tmseageutareisa 122 7 3 Modifying Stat Reporter QutP t ssiri ute dip Lab BEER EUR FORMAS AMOR AU MURE RS DII E 123 p Sayine TALI MN 123 Using the Stat VAT cru m 125 8 1 Accessing Data or Creating a New Study renstren advise eniddenqides sirena repre euin 126 8 2 Sele ting Analyses for Your Da ta euisesiuserisisisresiisniius reinoirei nessis 130 8 3 Scatching fof Desired Statistics OE TeStScinnnrsnnana aa 135 Syst m PESTO ECM GES PM 139 Were cic Niro TT 139 RUM 142 9 3 MSTA T 142 Tutorial 1 Analyzing a Single Sample iccs sicsssnsssassonsansossasessessasisncatesnonsatnendanesnssannsneanss 145 10 1 Running the One Variable Analysis PEGCOOUEe uade reser tie ciet os ii Colin totis robo EE 146 10 2 Sutimary Statistics oes dide Dal t RN e peared ncn Oen bui R S ei e vp pe eqq 148 10 9 Bossande Whisker Plot oiacnsontinbut pa tro initi Fen beta gne alain iba i tl SU re E 151 10 4 Testing for CU er T T m D em m 154 HE FANS ote ML 158 10 6 Q antile Plot aad Percentllesuencesceertcsedbavi beard recep pod ER M ERLE Ene d o aiaa 163 10 7 Confidence DUIEEU IB nin en SUMMA chu ERN NEM neers terre ter rarer eer ere rere 164 10 5 Hypothesis Fests PT 166 10 9 k lerance LIMIS TT 168 Tutorial 2 Comparing Two Samples ssiscsssiasssasssssscasssanssn
218. to test the hypothesis of equal population means by choosing between the following two hypotheses Null hypothesis Wy Hg Bc Up Alternative hypothesis the means are not all equal where u represents the mean of the population from which sample j was taken Rejection of the null hypothesis indicates that the samples come from populations whose means are not all identical The output of the ANOVA is contained in the ANOVA table which is initially displayed in the bottom left pane of the analysis window 188 Comparing More than Two Samples ANOVA Table Sum of Squares 157 882 52 6272 22 76 0 0000 Within groups 101 728 2 31201 a PERENNE Total Corr 259 61 pp gp Figure 12 7 Analysis of Variance Table The analysis of variance decomposes the vatiability of the observed data into two components a between group component quantifying differences between widgets made of different materials and a within group component quantifying differences between widgets made of the same material If the estimated variability between groups is significantly larger than the estimated variability within groups it is evidence that the group means are not all the same The key quantity in Figure 12 7 is the P Value Small P Values less than 0 05 if operating at the 5 significance level lead to a rejection of the hypothesis of equal means In the current example there is little doubt that the means are significantly different
219. ts ftom the Tables dialog box Regression coeffs for yield Tutorial 7 0 555417 2 6175 0 106625 The StatAdvisor This pane displays the regression equation which has been fitted to the data The equation of the fitted model is yield 250 074 1 0595 temperature 17 4475 flow rate 0 555417 concentration 2 6175 catalyst 0 106625 temperature flow rate Figure 16 20 Fitted Regression Model Note that the underlying model takes the form of a multiple linear regression model Each retained main effect is included in the model by itself while the two factor interaction is represented by a crossproduct of temperature and flow rate 272 Design of Experiments 16 4 Plotting the Fitted Model To fully understand the fitted model it is best to plot it Several types of plots may be created by selecting Response Plots from the Graphs dialog box By default a wire frame surface plot is displayed Estimated Response Surface concentration 6 5 agitation rate 137 5 catalyst 1 25 yield flow rate temperature Figure 16 21 Response Surface Plot In this plot the height of the surface represents the predicted value of yie d over the space of temperature and flow rate with the other three factors held constant at their middle values Highest yields are obtained at high temperature and high flow rate The type of plot and the factors over which the response is plotted can be changed using Pane Opt
220. type and facility 1 If using the Classic menu select Describe Categorical Data Crosstabulation 2 If using the Six Sigma menu select Analyze Attribute Data Multiple Factors Crosstabulation The data input dialog box expects two columns one defining the rows of a two way frequency ot contingency table and the other defining the columns 222 Analyzing Attribute Data Row Variable gt Detect Column Variable Select IV Sort column names Cancel Delete Transform Help Figure 14 6 Crosstabulation Data Input Dialog Box Entering the data as shown above creates the following analysis window 225 Analyzing Attribute Data Hai Crosstabulation Defect by Facility Crosstabulation Defect by Facility Row variable Defect Barchart for Defect by Facility Column variable Facility Facility E Texas Bl Virginia Contaminated sss aa Number of observations 120 Damaged Es Number of rows 9 Leaking E Number of columns 2 Misaligned EE Misshapen ka The StatAdvisor Missing parts FP This procedure constructs a two way table showing the frequency of occ Poor cotor lee Defect and Facility It constructs a 9 by 2 contingency table for the data ee E Of particular interest are the tests for independence between rows and cc Noa ir of Tabular Options frequency Frequency T T Facili Tess Vigna Row Total Mosaic Chart for Defect by Facility 7 Facility 00 4 1
221. u may select any of three alternative hypotheses 1 Not equal u 98 6 2 Less than u lt 98 6 3 Greater than u gt 98 6 Even though the sample suggests a lower mean temperature a two sided alternative has been selected Creating a one sided test with an alternative hypothesis of u lt 98 6 degrees would be considered data snooping at this point since we would be formulating the hypothesis after having looked at the data The results of the test are shown below 166 One Sample Analysis Hypothesis Tests for Temperature Sample mean 98 2295 Sample median 98 3 t test Null hypothesis mean 98 6 Alternative not equal Computed t statistic 6 00896 P Value 1 81264E 8 Reject the null hypothesis for alpha 0 05 signed rank test Null hypothesis median 98 6 Alternative not equal Average rank of values below hypothesized median 67 7099 Average rank of values above hypothesized median 43 5658 Large sample test statistic 5 07771 continuity correction applied P Value 3 82663E 7 Reject the null hypothesis for alpha 0 05 Figure 10 24 Hypothesis Tests Results The results of two tests ate shown 1 A standard t test which assumes that the data come from a normal distribution although it is not overly sensitive to departures from this assumption 2 A nonparametric signed rank test based on the ranks of the distance of each observation from the hypothesized median This test
222. ults from the STATGRAPHICS Centurion Fz menu NOTE Tables and graphs are imbedded in the HTML output files with names that are automatically generated by StatPublish While in a web browser you can view the HTML soutce code and easily determine the file names These files can then be imbedded in your own web pages if you prefer 111 StatFolios 112 StatFolios Chapter Using the StatGallery Displaying graphs side by side and overlaying graphs The StatGallery is a special window within STATGRAPHICS Centurion where graphs created in other procedures may be pasted side by side or on top of one another Side by side comparisons provide a powerful tool for comparing two sets of data two statistical models or two levels of a contour plot Overlaying graphs creates unique displays not producible elsewhere in the system StatGallery output is saved in files with the extension sgg If you place output in the StatGallery a pointer to the StatGallery file will be saved in the current StatFolio When the StatFolio is reopened later it will automatically load the associated StatGallery 6 1 Configuring a StatGallery Page The StatGallery is contained in a separate window that is created when STATGRAPHICS Centurion is first loaded It consists of one or more pages each capable of displaying up to 9 graphs By default each page of the gallery is configured to display 4 graphs as shown below 113 Using the StatGallery BE StatGallery
223. up to 20 runs It has also been requested that the design be done in random order which means that the order of the 10 runs within each block will be randomly generated After the final dialog box a design attributes window is opened 262 Design of Experiments Screening Design Attributes Screening Design Attributes Design class Screening Design name Half fraction in 2blocks 25 1 File name C DocDataltutonal sfx Comment Tutorial 7 Base Design Number of experimental factors 5 Number of blocks 2 Number of responses 2 Number of runs 20 including 2 centerpoints per block Error degrees of freedom 4 Randomized Yes flowrate 100 120 flitersinin Yes concentration 5 0 80 Yes agitation rate 1250 150 0 rpm Yes catalyst o fis f Ye strength psi The Stat dvisor You have created a Half fraction in 2 blocks design which will study the effects of 5 factors in 20 runs The design is to be run in 2 blocks y m la Figure 16 9 Design Attributes Window This output is used to verify that the design was created properly At the same time the design has been loaded into tab A of the STATGRAPHICS Centurion DataBook 263 Design of Experiments f tutoriat7 sfx flow rate BLOCK temperature degrees C liters mi concentration agitation rate catalyst yield Ceo 165 0 150 0 150 0 180 0
224. utton and select Reciprocal Y on the dialog box The resulting fit is shown below Plot of Fitted Model fe City 1 0 00193667 0 0000146623 Weight 45 35 25 15 E 1600 2100 2600 3100 3600 4100 4600 Weight Figure 13 10 Fitted Reciprocal Y Model While linear in the reciprocal of MPG City the model is nonlinear in the original metric Note also that the prediction limits for Weight become larger as the predicted values become larger This makes sense in the context of the data since it implies that there is more variability amongst the lighter cars than amongst the heavier cars 13 4 Examining the Residuals Once a reasonable model has been fit the residuals from the fit should be examined In general a residual may be thought of as the difference between the observed value of Y and the value predicted by the model 5 residual observed Y predicted Y The Simple Regression analysis automatically plots residuals versus the X variable 207 Regression Analysis Residual Plot i MPG City 1 0 00193667 0 0000146623 Weight 2 3 0 3 1 7 n 4 Studentized residual 3 7 i l fi FNIT 1 a l 17 1600 2100 2600 3100 3600 4100 4600 Weight Figure 13 11 Plot of Studentized Residuals Using Pane Options you may elect to plot either simple residuals or Studentized residuals Studentized residuals reexpress the ordinary residuals defined above by dividing them by their es
225. w Roman 12 B z ulel This is the StatReporter window Figure 7 1 The StatReporter Window You may type text in the window or paste output created elsewhere within STATGRAPHICS 7 2 Copying Output to the StatReporter STATGRAPHICS Centurion provides three methods for copying output to the StatReporter 1 To copy a single table or graph to the StatReporter first copy it to the Windows clipboard by maximizing its pane and selecting Copy from the Edit menu Then move to the StatReporter window put the cursor at the desired location and select Ediz Paste 2 Alternatively maximize the pane containing the table or graph to be moved by double clicking on it Then press the alternate mouse button and select Copy Pane to StatReporter from the popup menu This automatically pastes the table or graph into the StatReporter wherever the cursor is currently located 3 To copy all of the output in an analysis window press the alternate mouse button and select Copy Analysis to StatReporter from the popup menu All tables and graphs in the analysis window will be pasted into the StatReporter Each of the above operations does a static paste the output in the StatReporter will never change You can link a table or graph to its source using method 1 above but selecting Paste Link instead of Paste The pasted table or graph in the StatReporter will then be hot in the 122 Using the StatReporter sense that it will c
226. w and column classifications are independent In this case independence would imply than whether an item was defective or not had nothing to do with the facility in which it was produced Since the P value in the above table is less than 0 05 the hypothesis of independence is rejected at the 5 significance level We can thus conclude that the proportions of defectives at the two facilities are significantly different 234 Analyzing Attribute Data Chapter Tutorial 6 Process Capability Analysis Determining the DPM or percent beyond the specification limits STATGRAPHICS Centurion is widely used by individuals whose job it is to insure that the products and services they provide are of the highest quality A common task in such a job is to collect data from the process and compare it to established specification limits The output from this type of capability analysis is an estimate of how capable the process is of meeting those specifications Six Sigma which is a widely practiced methodology for achieving world class quality targets a defect rate of 3 4 defects per million opportunities As an example consider a product whose strength is required to fall between 190 and 230 psi Suppose that 7 100 samples are taken from the manufacturing process and their strength measured as shown in the following table 213 5 203 3 191 3 197 1 205 7 215 6 193 7 201 7 201 5 207 1 207 0 200 4 197 2 202 4 205 2 2
227. which measures the maximum distance between the cumulative distribution of the data and that of the fitted distribution In this case the best fitting distribution is the argest extreme value distribution You can switch to the largest extreme value distribution by accessing Analysis Options 242 Process Capability Analysis Process Capability Analysis Options Distribution Bimbaum S aunders C Generalized Logistic Lognormal 3 parameter nee C Cauchy C Half Normal 2 parameter C Maxwell 2 parameter Exponential C Inverse Gaussian C Normal e Exponential 2 parameter Laplace Pareto Exponential Power Largest Extreme Value C Pareto 2 parameter Parameters Folded Normal C Logistic C Rayleigh 2 parameter C Gamma Loglogistic C Smallest Extreme Value C Gamma 3 parameter C Loglogistic 3 parameter Weibull C Generalized Gamma C Lognormal C Weibull 3 parameter Include Data Transformation Lower Threshold Sigma Limits C Lor None fo E Long term only labeled P C Logarithm Long term only labeled C C Power o i C Box Cox optimized E Figure 15 9 Process Capability Analysis Options Dialog Box The resulting fit is shown below Process Capability for Strength LSL 190 0 Nominal 210 0 USL 230 0 25r Largest Extreme Value Mode 200 036 20 t Scale 4 80179 gt o E C 15 Pp 1 05 S Ppk 0 96 oO S 10r K 0
228. wizard will ask you to specify the type of study to be created and step through a sequence of dialog boxes in which you define the study to be created 126 Using the StatWizard 3 You wish to perform an analysis that does not require data In this case the wizard will list all such analyses ask you to select one and then take you immediately to that analysis For example suppose you want to set up a new gage study in order to estimate the repeatability and reproducibility of a measurement process Selecting the second radio button in Figure 8 1 and pressing OK displays the options shown below StatWizard Study Definition You can design various types of statistical studies Select the type of study you want to create C Determine an Adequate Sample Size to Characterize a Population C Determine an Adequate Sample Size to Compare Two or More Populations C Design an Experiment C Select a Screening Design Experiment SetUp a Gage R amp R Study C Design a Control Chart C Develop an Acceptance Sampling Plan for Variable Data C Develop an Acceptance Sampling Plan for Attribute Data Back Cancel Figure 8 2 StatWizard Study Definition Dialog Box Select Se Up a Gage Re amp R Study and press OK to display a third dialog box requesting information about the study 127 Using the StatWizard Gage Study Setup E3 Number of Operators Appraisers Labs Spreadsheet
229. xponential gamma lognormal normal uniform or Weibull distribution they may be generated within a datasheet by clicking on a column header selecting Generate Data from the Edit menu and entering the appropriate STATGRAPHICS Centurion expression 2 For other distributions the random numbers must be generated from within the Probability Distributions procedute As an example suppose 100 random numbers are desired from a normal distribution with a mean of 20 and a standard deviation equal to 2 Click on the header of an empty column in any datasheet to select that column Then select Generate Data from the Edit menu and complete the dialog box as shown below 53 f Data Management Generate Data Expression R NORMAL 100 20 2 Variables 2 E t 3 Delete perators 5 2 sisi de RR ka E goes aAa edl fa Ls sl El a vj Cancel Display Help Figure 2 29 Generating Random Numbers from a Normal Distribution The syntax of the RNORMAL operator is RNORMAL n mu sigma generates n pseudo random numbers from a normal distribution with mean mu and standard deviation sigma Press OK to generate the random numbers and place them into the selected column The syntax of the other random number generators is contained in the PDF document titled STATGRAPHICS Centurion Operators 2 5 DataBook Properties This chapter has described many important aspects of data handling within STATGRAPHICS Centu
230. yield the most information about the factors and their interactions in as few experiments as possible This tutorial will describe how an experimental design could be constructed using the third approach and how the resulting data would be analyzed 253 Design of Experiments 16 1 Selecting a Screening Experiment The goal of a screening experiment is to find out with the minimum number of experimental runs which process variables have the biggest impact on the final product In STATGRAPHICS Centurion the first step when designing a screening experiment is to determine what type of experimental design to run and how many runs are needed The DOE section contains a procedure that helps in this regard 1 Ifusing the Classic menu select DOE Design Creation Screening Design Selection 2 If using the Six Sigma menu select Improve Experimental Design Creation Screening Design Selection The first dialog box displayed collects basic information about the experiment Screening Design Selection Number of Factors 5 Designs to Consider Cancel Full 2 level factorials Help Resolution V fractional factorials iv iv Iv Resolution V fractional factorials Iv Resolution IV fractional factorials Resolution IIl fractional factorials v Resolution irregular fractions NV Resolution V irregular fractions Resolution V mixed level fractions Resolution mixed level fractions Resolution IV mixed level
231. you have two choices 1 Type the data directly into the STATGRAPHICS Centurion DataBook 2 Enter the data into another program such as Excel and then read or copy it into STATGRAPHICS Centurion In this section we ll take the first approach using the StatWizard to set up the data sheet for us When the StatWizard dialog box shown in Figure 1 6 appears accept the default selection Enter New Data or Import It from an External Source and press OK Note If you exited the StatWizatd you can start it again by pressing the button with the wizard s hat amp on the main toolbar On the second dialog box indicate that you intend to type the data using the keyboard 12 Getting Started StatWizard Data Location t3 amp The data to be analyzed can come from various sources Where is your data Want to Type It In C n an Existing Data File Database or on the Windows Clipboard C n aData Source Linked to an Existing StatFolio C Already Loaded in the STATGRAPHICS Datasheet Back Cancel Help Figure 1 12 StatWizard Dialog Box for Specifying Location of Data You will then be presented with a series of dialog boxes used to identify the information to be entered into each column of the datasheet 13 Getting Started Modify Column State Cancel Comment state name Dre Help Type C Numeric C Date Character Month Integer C Quarter Time HH MM C Date Time HH MM Time
232. ysis One Factor It begins by displaying a typical data input dialog box Simple Regression l t3 id Pri y Model MPG City MPG City MPG Highway Passengers x Rear seat Revs per Mile gt Weight RPM Type 4 U Turn Space Select Wei Wheelbase Width m V Sort column names Cancel Delete Transform Help Figure 3 2 Simple Regression Data Input Dialog Box The first two input fields are required Y The dependent or response variable X The independent or predictor variable 59 Running Statistical Procedures S In data entry fields you can enter either the name of a column such as MPG Cih or a STATGRAPHICS Centurion expression such as LOG MPG Ciy If more than one datasheet contains a column with the indicated name you must precede the name with an indication of the desired datasheet For example if both datasheets A and B contained a column named Weight and you wanted to use the column in datasheet A you would have to enter the name as 4 Weight The Sect field may be used to select a subset of the rows in the datasheet For example if you enter a statement such as FTRST 50 in that field only the first 50 rows in the datasheet will be used Typical entries in the Se ct field are Entry Use Example FIRST Selects the first amp rows FIRST 50 LAST k Selects the last amp rows LAST 60 ROWS tatt end Selects rows between start and e
233. ysis window 20 Analyze Design 266 AND 61 ANOM 196 ANOVA 188 ANOVA table 267 ASCII files 34 attribute data 217 Augment Design 278 Autosave 66 141 average 149 AVG 42 barchart 219 227 blocking 255 Boolean expressions 61 bootstrap intervals 165 box and whisker plot 22 151 175 193 Box Cox transformation 245 283 Index brushing a scatterplot 93 BY variables 133 Capability Analysis 238 capability indices 248 capability plot 239 248 centerpoints 262 chi squared test 229 234 confidence intervals mean 164 median 165 standard deviation 164 confidence level setting default 140 confounding 265 contingency tables 222 233 contour plots 274 correlation analysis 198 correlation matrix 201 COUNT 52 C 250 Cus 248 Crosstabulation 222 cube plots 274 cumulative distribution 162 data access 32 combining columns 47 copy 37 cut 37 datasheet 11 delete 37 entry 10 files 16 generating 50 insert 37 new variables 38 paste 37 patterned 50 recoding 46 230 sorting 44 transformations 41 data column comment 14 31 name 14 31 type 14 31 data files polling 55 reading 33 read only 55 data input dialog box 59 63 data soutces polling 108 DataBook 10 29 DataBook Properties 54 dates 141 design of experiments 253 DIFF 42 DPM 244 248 evaluation mode 3 Excel files 34 36 Exclude 71 excluding effects 271 EXP 42 extreme Studentized devia

Download Pdf Manuals

image

Related Search

Related Contents

Kicker S18X Owner's Manual  Service_Training_2007-03-26_1 575.07 KB  Zuvo Water Filtration System 300 Series  TABLE DES MATIERES - Université de Genève  Bedienungsanleitung C.A 1882    2013: une année d`échéances  Manuale Utente    DAP-1150 - D-Link  

Copyright © All rights reserved.
Failed to retrieve file