Home

Introductory Guide

1. ce eee eee II9 we tO ee raras ria E aida 121 AI 123 Fit Planes to Test EMOS 0 prensa reni tris Deo rom de cde i i oye le Dewees 124 pasa Model Eso essa serenidad des eb ie eee ted es 126 Moreand More BEEIGSNOPS esperara resta add ee 127 Interpreting Leverage Plot 5 052 duse m eb da icd doses varios band ed bi Kes 128 A A o In 93 dide Aq canoer 058 5 3 9 5 6 129 eus A m 130 Index JMP Introductory Guide seeeeeeee e 133 Credits and Acknowledgments Origin JMP was developed by SAS Institute Inc Cary NC JMP is not a part of the SAS System though portions of JMP were adapted from routines in the SAS System particularly for linear algebra and probability calculations Version 1 of JMP went into production in October 1989 Credits JMP was conceived and started by John Sall Design and development were done by John Sall Chung Wei Ng Michael Hecht Richard Potter Brian Corcoran Annie Dudley Zangi Bradley Jones Craige Hales Chris Gotwalt Paul Nelson Xan Gregg Jianfeng Ding Eric Hill John Schroedl Laura Lancaster Scott McQuiggan and Peng Liu In the SAS Institute Technical Support division Wendy Murphrey and Toby Trott provide technical support and conduct test site administration Statistical technical support is provided by Craig DeVault Duane Hayes Elizabeth Edwards Kathleen Kiernan and Tonya Mauldin Nicole Jones Jim Borek Kyoko Keener Hui Di Joseph Morgan Wenju
2. 88 Examining a Polynomial Fit Lincar Regression 22221 orar rss 90 luu BONN A II beet AE est eeses 9I Du E ONDE 222492999199 99 9 2 er er ee dia d ed dor Tode qo 5 dubie 93 Aoc qon TR rrCLIITUPIPTTTTPPPTUTMTMMMRETPIMTT 94 A Factorial Analysis Designed MOASINO ure dd ross PERI EY whee yin Ea dob di rbd es aed arenas 97 Dont Poole You Lap oou uus decori ROG eens tune OPERA AEG RAF oe EEG 99 Opena Dan DIDIE RP 99 What Questions Can De Answered usd pre eR RE ETPEGAI airada 99 ira A ETTmT IOO Graphical Displays Leverage Plots soe pn te em dejais dodrina iae ted did de a E IOI S uo1uo o 10 Quantify Results Statistical RepONS uude mr dre dare dir od da ai 102 Alia 4 cl o A v rr 103 Summary Reports For The Whole Model 2 65 6 05 24 ocean R RR eee REESE DERE 105 Summary Reports for ENCO A cehedbcnnnd one ateni shen 84 ib Am pique des 105 Boso gerunt x 107 Exploring Data Finding Exceptions eee RR RR eees IO9 dcn dc DC II One Dimenslonal VIEWS btw 4wawsoentned da kbs s idos 958464 vi pO EIE A eds III Jioc SHE ENS sastre d vdd pura facea lice m dapi ra irae ask 112 Three Dimensional Views 2o ees cris yo da ao red X dra oa 113 P ndpal Components and BIPIOES assess ecaucb pese rie odiar 114 Muldvatiate Distance lt lt lt dea 116 Sela n0 A o ee eee ee eee ee eee eee ee eee ee 117 Multiple Regression Examining Multiple Explanations
3. 8 Select Analyze gt Fit Model Fill the window with Oxy as Y and add Age Weight Runtime RunPulse and MaxPulse as model effects 8 Click Run Model to run the model In this case the prediction formula is Oxy 101 3 0 2123 Age 0 0732 Weight 2 688 Runtime 0 3703 RunPulse 0 3055 MaxPulse Look at the significance of each regressor with t ratios in the Parameter Estimates table or F ratios in the Effects Tests table See Figure 10 9 Because each effect has only one parameter the F ratios are the squares of the ratios and have the same significance probabilities The Age variable seems significant but Weight does not The Runtime variable seems highly significant Both RunPulse and MaxPulse also seem significant but MaxPulse is less significant than RunPulse Figure 10 9 Statistical Tables for Multiple Regression Whole Model Actual by Predicted Plot Summary of Fit RSquare 0 646846 RSquare Adi 0 816217 Root Mean Square Error 2 283778 Mean of Response 47 37581 Observations or Sum Wats 31 Analysis of Variance Source DF Sumof Squares Mean Square F Ratio Model 5 720 99043 144188 27 6472 Error 25 130 39112 5 216 Prob gt F C Total 30 651 36154 0001 Parameter Estimates Term Estimate Std Error t Ratio Prob Intercept 101 34768 11 56665 8 54 0001 Age 0 212322 0 094368 2 25 0 0335 Weight 0 073205 0 053611 1 37 01843 Runtime 2 600436 0 34207 7 66 lt 0001 RunPulze 0 37
4. mw Table Columns wv Functions grouped Ok ZEX oK Row eal marital status Numeric age e Transcendental country Trigonometric size Character type Comparison Clear age group Conditional Probability Statistical m column selector list function selector list Suppose 0 represents ages greater than the median 30 and 1 represents the ages less than or equal to the median To create a formula that divides the sample into two groups follow these steps Y Click Conditional in the function selector list and select the If function Table Columns Functions grouped m sex Hella row marital status Numeric age x ir Transcendental Apply country Trigonometric size iS Character type Comparison re age group Conditional Probability Mate Statistical Choose And Or Mat Interpolate Step while Ms Highlight the expression term denoted expr Y Choose a lt b from the Comparison functions ejeg jeou0bs1e D 9 74 Analyzing Categorical Data Chapter 6 Look Before You Leap Table Columns wv Functions grouped SEX Row marital status Numeric Mm age q Transcendental country dE Trigonometric size t Character I type Comparison azb age group conditional Sib Probability Statistical E asz bh a al b ach lt c a x bx lt c Is Missing 8 Double click the right side of the comparison clause to obtain a text entry bo
5. 2 cbs die vo dro don 98 9o Desde ieri thiet ni RR ACA IOI Quant Results Statmiical Beports usare vo oh a aa vr Pep Der Seeseee ede RE RU ee s 102 Lu om c P 103 summary Reports For The Whole Model rtorras ater dades dans 105 Summary R pons TUE BGOES ours 9E bo CC IURE ORE IER i 105 Sore A tse oe eee Seen ben Sok eee ene tase ee ta eben DETE care 107 Chapter 8 Look Before You Leap A Factorial Analysis Look Before You Leap 93 The popcorn yield data are the result of a designed experiment The same amounts of different types of corn were methodically popped under different conditions First look at the data to review the results of the popcorn experiment Open a Data Table D When you installed JMP a folder named ponco Sample Data was also installed In that Notes Artificial data ins i folder is a file named Popcorn jmp Open raid Popcorn jmp Columns 540 The Popcorn data table displays in spreadsheet form as shown here For the experiment the corn was popped under controlled conditions Plain popcorn 1 2 3 4 5 B T 8 3 a and specially treated gourmet popcorn were each popped in large or small amounts of oil popcorn plain gourmet plain gourmet plain gourmet plain gourmet plain gourmet little little oil amt hatch large large large large mal amal amal amal large large yield 8 2 B 5 10 4 8 2 8 3 12 1 10 6 15 0 8 8
6. Display Options To enhance the default graphical displays that show your results JMP provides options that you can add to them These options are found by clicking the red triangle icon beside a report name For example the red triangle icon next to the histogram name lists available report options Figure 1 16 For practice try selecting different combinations of these options and watch the effect they have on the displays and reports 3 Select a column in Select Columns 8 Select Y Columns in Cast Selected Columns into Roles YO Select variables for Weight Freq and By Click OK Figure 1 16 Options for Nominal or Ordinal Variable in a Distribution Analysis gt Distributions F hrand speed Display Options k Histogram Options b d v Histogram Mosaic Plot v Vertical Test Probabilities Std Error Bars Confidence Interval d Count Axis Save Prob Axis SPEEDY TYPE Density Axis Show Percents Show Counts REGAL BS BO Statistical Tables and Text In addition to the graphs charts plots and other graphical displays in a report JMP can also include text tables in a report The types of tables given depend on whether a variable is continuous or categorical ordinal or nominal When you installed JMP a folder named Sample Data was also installed In that folder is a file named Typing Data jmp Using the Distribution command to produce a distribution report creates text tables for
7. Linear Fit Examining a Polynomial Fit Linear Regression Now examine a polynomial fit for comparison to the linear fit A linear regression is simply a polynomial of degree 1 8 Click the Bivariate Fit of ratio By age report window to assure it is active O To remove the row exclusions and markers select Rows gt Clear Row States 8 Click the red triangle icon in the title bar select Fit Polynomial gt 2 quadratic which allows the fit to have curvature 8 Remove the line of fit that excluded the lower age groups by clicking the second Linear Fit modified regression lines red triangle icon and selecting Remove Fit D Remove the Fit Mean results so that only the polynomial fit and the line fit to all the data points remain Select Remove Fit from the Fit Mean red triangle icon 8 Click the red triangle icon in the title bar and select Fit Polynomial gt 3 cubic to overlay a polynomial curve of degree 3 on the scatterplot O Again select the Edit gt Journal command to append these results to the existing journal Chapter 7 Figure 7 3 Comparison of Linear Fit and Polynomial Y 7 Bivariate Fit of ratio By age ratio Linear Fit Polynomial Fit Degree 2 Polynomial Fit Degree 3 Regression and Curve Fitting 91 Fitting Models to Continuous Data Fits of Degree 2 and 3 Linear Fit ratio 0 6656231 0 0052759 age Summary of Fit RSquare 0 822535 RSquare Adi 0 6199
8. In this chapter the difference in mean typing scores for three brands of keyboard was summarized using the Fit Y by X command in the Analyze menu This command was also used to e Plot the typing scores for the three brands of keyboard e Overlay a means diamond on each group of typing scores to compare the means of each group e Overlay a quantile box plot on each group of typing scores to compare the shape of the distribution of scores in each group e Produce comparison circles to visualize the difference in mean typing scores e Compute and display a one way analysis of variance table which confirmed that at least one pair of means is statistically different e Display a table of the group means and standard errors Display a table showing the multiple comparison statistical test results for group means Using the selection tool 4 from the Tools menu the graphs or tables can be copied and prepared in a report for presentation The analysis concludes that in this typing trial the SPEEDYTYPE keyboard produced significantly higher scores than either of the other two brands See the chapter Oneway Layout of the MP Statistics and Graphics Guide for a complete discussion of one way analysis of variance sueojw dnoauy sg Chapter 6 Analyzing Categorical Data Comparing Proportions Survey data are frequently categorical data rather than measurement data Analysis of categorical data begins by simply counting
9. Multivariate e Remove the labels from METHYLACETATE and Y Princinal C amnanante Esrtor Analysis ACETONE by selecting Rows Label Unlabel Eigenvectors RIBUS 8 Select Analyze gt Multivariate Methods gt Scree Plat Principal Components Loading Plat Score Plot Add only two highly correlated variables in this case Ether and 1 Octanol 0 Highlight Ether and 1 Octanol select Y Columns and click OK Save Principal Components Spin Principal Components Factor Rotation 8 Click the red triangle icon in the Principal Components title bar and select Spin Principal Components The results are shown in Figure 9 5 Note that because the data are highly correlated the scatter in the points runs in a narrow ellipse whose principal axis is oriented in the direction marked Print Figure 9 5 Two Correlated Variables with Principal Components Print ee A a I 1 0 MEME _ 4 E ee A ns Prin prind To see the greatest variation of the data in one dimension Y Rotate the axis so that the first principal component Prin1 is horizontal Principal components capture the most variation possible in the smallest number of dimensions Use principal components to explore all six dimensions with the following steps O Select Analyze gt Multivariate Methods gt Principal Components 8 Add all six continuous variables to the Y Columns list 8 Click OK e eq 6uiojd x31 6
10. Release 8 Introductory Guide Second Edition The real voyage of discovery consists not in seeking new landscapes but in having new eyes Marcel Proust JMB A Business Unit of SAS SAS Campus Drive Cary NC 27513 8 0 2 The correct bibliographic citation for this manual is as follows SAS Institute Inc 2009 JMP 8 Introductory Guide Second Edition Cary NC SAS Institute Inc JMP 8 Introductory Guide Second Edition Copyright 2009 SAS Institute Inc Cary NC USA ISBN 978 1 60764 299 2 All rights reserved Produced in the United States of America For a hard copy book No part of this publication may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical photocopying or otherwise without the prior written permission of the publisher SAS Institute Inc For a Web download or e book Your use of this publication shall be governed by the terms established by the vendor at the time you acquire this publication U S Government Restricted Rights Notice Use duplication or disclosure of this software and related documentation by the U S government is subject to the Agreement with SAS Institute and the restrictions set forth in FAR 52 227 19 Commercial Computer Software Restricted Rights June 1987 SAS Institute Inc SAS Campus Drive Cary North Carolina 27513 lst printing December 2009 JMP SAS and all other SAS Institute Inc product or service
11. The Fit Model Window fi Select Analyze gt Fit Model The Fit Model command lets you specify and analyze complex models like the factorial design in this experiment The Fit Model command displays the Fit Model window shown in Figure 8 1 This window is used to define the type of model the model response variable and model effects To specify the factorial model 8 Select yield from the Select Columns list 8 Click the Y button O Select popcorn oil amt and batch from the Select Columns list 3 Click the Macros button and select Full Factorial This adds all main effects and interactions crossed effects to the Construct Model Effects list Figure 8 1 D Further tailor the model by adding effects or removing unwanted effects with the Add and Remove buttons In this case remove the three way interaction term by highlighting popcorn oil amt batch and clicking Remove Figure 8 1 Fit Model Window Fit Model Model Specification Select Column Pick Role Variables Personality standard Least TENE il popcorn a yield Erpresis ERE RTT oil amt optional i batch dd yield atrial Weight ationg Numeric Run Madel optional Numeric emove optional Construct Model Effects oil amt Cross batch Mest E tu a a uw Factorial to degree Factorial sorted Response Surface Mixture Response Surface Polynomial to Degree Scheffe Cubic D
12. for the table Figure 3 4 Summary Table for Type of Hot Dog Type N Rows 1 Beet 20 2 Meat 17 3 Poultry 17 Creating Statistics for Groups Next expand the summary table with columns of statistics Summary tables have an additional command in the columns panel called Add Statistics Column Use this command to add statistical summary columns to the table at any time B Hot Dogs By Type Type W Rows 1 Beef 20 Columns 2 0 2 Meat 17 17 Add Statistics Column New Column Add Multiple Columns Go bo To follow along with this example do these steps 8 Click the red triangle icon and select the Add Statistics Column command from the columns panel on the left side of the screen 7D Select Calories Sodium and Protein Fat in the column selector list of the window as shown in Figure 3 5 3 Click the Statistics button and select Mean You should now see new column names in the Statistics list 8 Click OK ejeg 6urzueuuins 36 Summarizing Data Chapter 3 Grouping Data Figure 3 5 Summary Window Add Summary Statistics Jha Add summary statistics to current summary table jn ly Product Mame Statistics Mean Calories la Type Meant Sodium il Taste Mean Pratein F at ad Boz optional E all Fb Protein Help 3 Type Include marginal statistics For quantile statistics enter value 95 statistics column name format stat column Y Subgr
13. key on the formula editor keypad D With the empty denominator term highlighted select height from the list of column names EIE ratio Table Columns wa CST aala weight ratio Functions grouped Row Murmeric Transcendental Trigonometric Character Comparison Conditional Probability Statistical D When the formula is complete click Apply or OK on the formula editor or just close its window The new column called ratio is now in the Big Class data table as shown here Its values are the computed weight to height ratio for each student suonnqinsia y 54 Looking at Distributions Adding a Computed Column Big Class Distribution Bivariate FONE way Logistic Contingency Fit Model wi Set Sex Value Labels Set Age Value Labels Columns 6 0 il name amp J all age Il sex dll height al weight name KATIE LOUISE JAME JACLYN LILLIE TIM JAMES ROBERT BARBARA ALICE SUSAM JOHN JOE 1 2 3 4 5 B T 8 Now look at the distribution of the ratio variable 5 Select Analyze gt Distribution and assign the new column ratio to the Y Columns role 8 Click OK age 12 12 12 12 12 12 12 12 13 13 13 13 13 Sex height 53 61 55 B5 52 B 61 51 B 61 56 B5 B3 weight g5 123 74 145 B4 B4 128 79 112 107 By 35 105 ratio 1 61 2 02 1 35 2 20 1 23 1 40 2 10 1 55 1 87 1 75 1 2 1 51 1 57 Chapter 4 One
14. 116 Exploring Data Multivariate Distance Chapter 9 8 Click the red triangle icon in the Principal Components Factor Analysis title bar and select Eigenvectors The result is the Principal Components table in Figure 9 6 The cumulative percent row Cum Percent shows that the first three principal components account for 97 8 of the six dimensional variation Figure 9 6 Principal Components Report Principal Components Factor Analysis Y Principal Components on Correlations 20 40 60 30 Humber Eigenvalue Percent 1 4 7850 79 750 z 0 9452 15 754 3 0 1399 2 331 4 0 0611 1 018 5 0 0471 0 785 6 0 0217 0 362 vw Eigenvectors Cum Percent 1 Octanol 0 37441 0 55887 0 11070 0 65842 Ether 0 34834 0 64314 0 11973 0 62764 Chloroform 0 41940 0 29864 0 646550 0 30599 Benzene 0 44561 0 14756 0 21904 0 09455 Carbon Tetrachloride 0 43102 0 29736 0 18487 0 24135 Hexane 0 42217 0 27117 0 68608 0 10831 Multivariate Distance The basic concept of distance in several dimensions relates to the correlation of the variables For example in a Multivariate scatterplot cell for Benzene by Chloroform Figure 9 3 HYDROQUINONE is located away from the point cloud This compound is not particularly unusual in either the x or y direction alone but it is a two dimensional outlier because of its unusual distance from the strong linear relationship between the two variables The ellipse is a 95 density contour for a bivariate n
15. 8 2 bi bi and in large or small batches Two trials were done for both types of corn under all popping conditions This experimental design is called a factorial design The experiment has three factors usually called main effects which are e Type of popcorn plain or gourmet e Amount of cooking oil little or lots e Cooking batch size large or small What Questions Can Be Answered The appropriate statistical analysis for a factorial design addresses the following questions about the main effects s there an overall difference in yield between plain and gourmet popcorn s there an overall difference in yield between cooking in lots of oil instead of a small amount of oil e What is the difference in yield between cooking several small batches instead of one large batch Analysis of a factorial experiment also provides information about the zzzeraction between the main effects as addressed by the following questions Does the amount of cooking oil have the same effect on both types of popcorn In other words is there an interaction effect between popcorn type and amount of cooking oil used s there an interaction effect between batch size and type of popcorn e s there an interaction effect between batch size and amount of oil used Are there interaction effects among the three main effects euy eno3Jo0e3 8 SIS 100 A Factorial Analysis Chapter 8 The Fit Model Window
16. 90 to make a density ellipse visible 5 Repeat to complete a similar Fit Y by X analysis with Calories as y and Sodium as x These commands produce the Ib Protein by oz the Ib Protein by Protein Fat and the Calories by Sodium scatterplots shown in Figure 3 8 and Figure 3 9 The 0 90 ellipses in the scatterplots show the shape of the bivariate response for each type of hot dog The special markers identify the taste and type of each point Figure 3 8 Scatterplots Comparing Cost Taste and Nutritional Factors Y Bivariate Fit of Ib Protein By oz Y Bivariate Fit of Ib Protein By Protein Fat Tab Protein Fa n a th e Os 10 15 20 25 1 2 3 4 5 Hoz Protein at To further identify and highlight points of interest 8 Select the brush tool gl from the Tools menu O Press the Alt key Alt Shift on Linux and Option on Macintosh and drag the brush in the lower left quadrant of the Calories by Sodium scatterplot as shown in Figure 3 9 These points represent brands with both low sodium and low calories The highlighted points of these healthiest brands also highlight in the other scatterplots Figure 3 9 Select Low Sodium and Low Calorie Brands 7 Bivariate Fit of Calories By Sodium Calories 100 200 300 400 500 600 OD Sodium Chapter 3 Summarizing Data 4 Finding a Subgroup with Multiple Characteristics What Has Been Discovered The costs of meat and beef brands range from low
17. Andy Mauromoustakos Al Best Stan Young Robert Muenchen Lenore Herzenberg Ramon Leon Tom Lange Homer Hegedus Skip Weed Michael Emptage Pat Spagan Paul Wenz Mike Bowen Lori Gates Georgia Morgan David Tanaka Zoe Jewell Sky Alibhai David Coleman Ui Linda Blazek Michael Friendly Joe Hockman Frank Shen J H Goodman David Ikl Barry Hembree Dan Obermiller Jeff Sweeney Lynn Vanatta and Kris Ghosh Also we thank Dick DeVeaux Gray McQuarrie Robert Stine George Fraction Avigdor Cahaner Jos Ramirez Gudmunder Axelsson Al Fulmer Cary Tuckfield Ron Thisted Nancy McDermott Veronica Czitrom Tom Johnson Cy Wegman Paul Dwyer DaRon Huffaker Kevin Norwood Mike Thompson Jack Reese Francois Mainville and John Wass We also thank the following individuals for expert advice in their statistical specialties R Hocking and P Spector for advice on effective hypotheses Robert Mee for screening design generators Roselinde Kessels for advice on choice experiments Greg Piepel Peter Goos J Stuart Hunter Dennis Lin Doug Montgomery and Chris Nachtsheim for advice on design of experiments Jason Hsu for advice on multiple comparisons methods not all of which we were able to incorporate in JMP Ralph O Brien for advice on homogeneity of variance tests Ralph O Brien and S Paul Wright for advice on statistical power Keith Muller for advice in multivariate methods Harry Martz Wayne Nelson Ramon Leon Dave Trindade
18. Box Plots Mean Diamonds Mean Lines Mean CT Lines Mean Error Bars v Grand Mean Std Dev Lines Connect Means Mean of Means w x Axis proportional Points Spread Points Jittered Histograms sSueo w dnos Gc 62 Comparing Group Means Chapter 5 Graphical Display of Grouped Data Figure 5 3 Example of Means Diamonds Typing Data Fit Y by X of speed by AE Y Oneway Analysis of speed By brand SPEED YTYPE WWORD O NA brand The illustration in Figure 5 4 illustrates the means diamond e The means diamond has a line drawn at the mean average value of words per minute for each brand of keyboard The upper and lower points of the means diamond span a 9596 confidence interval computed from the sample values for each machine The width of each diamond spans the distance on the horizontal axis proportional to the group size Overlap lines within each diamond are drawn at 4 2 2 CI 2 For groups with equal sample sizes the marks that appear not to overlap indicate that two group means could be significant at the 95 confidence interval Figure 5 4 Means Diamond with X Axis Proportional Option Turned On Left and Off Right 3 90 overlap marks _ 85 80 a 75 95 confidence interval B5 60 REGAL SPEEDY TYPE group mean total response sample mean The mean scores of the REGAL and WORD O MATIC keyboards appear to be nearly the same but note that the SPEEDYTYPE mean is much hi
19. Chapter 10 124 Multiple Regression Fitting Plane Figure 10 5 Observed Points using Age Oxy and Runtime with the Predicted Plane of Fit Observed Oxy values gt 165 1 J es 160 Se 55 j i ee al E x Tales 2 E x LR r M Plane of Es p Is SO A p 50 e predicted ES 5 24 A UM E a 2E et E L m Oxy values E peA TA ue ral E T T lr at De o e gt lt i T e bu a T lo o m 35 leer et co m IN TT BEES EL P E PA i dl al ii m a e En a pe Mn i TT 4 E UU E Ed I QUU b ae S na i i ics Y 15 2 a TS Sa ueri u c E 3 xD n E d ue J maf j o Pra uem a ec cal um s ae a E i 145 EN Nu F E rS al a Se Aet TN eL s a dis P d A E a ie E Pa a pe i a w Le ES adi a p xem MD d um ma e 1 Plo P E i ae a T id es A cu gt ps E D o ES p Lars CN L a al Fit Planes to Test Effects The example in the previous section showed a plane fit to the whole model You can also look at hypothesis tests for each regressor to test whether the regressor s parameter is significantly different from Zero One way to view this test is to evaluate the difference between the current fit and the fit that occurs if the regressor variable is removed from the model For example remove the Runtime variable from the model by following these steps 6 First make sure Fitness jmp is the active data table VU i Select Analyze
20. Chapter A A Tr O 42 Chapter 3 Look Before You Leap Summarizing Data Look Before You Leap 33 When you installed JMP a folder named Sample Data was also installed In that folder is a file named Hot Dogs jmp Open the Hot Dogs jmp file to see the data shown in Figure 3 1 Figure 3 1 Hot Dogs Data Table Product Mame Type 1 Happy Hill Supers Beet 2 Georgies Skinless Beef Beet 3 Special Market s Premium Beef Beet 4 Spike s Beef Beet 5 Hungry Hugh s Jumbo Beef Beet 6 Great Dinner Beef Beef T RJB Kosher Beef Beef B Wonder Kosher Skinless Beef Beef 3 Happy Fats Jumbo Beef Beet 10 Midwest Beef Beef Taste Bland Bland Bland Medium Medium Medium Medium Medium Medium Medium Hoz 0 11 0 17 0 11 0 15 0 10 0 11 0 21 0 20 0 14 0 14 The Hot Dogs jmp table has the following information Hb Protein 14 23 21 70 14 49 20 49 14 47 15 45 25 25 24 02 18 85 18 85 Calories 156 151 175 143 184 190 158 138 175 146 Sodium Protein at 435 SERI 425 322 402 Sur 3r 322 479 a e The columns called Type Calories Sodium and Protein Fat an index ratio of protein to fat give information about nutrition The Type column has values Meat Poultry and Beef e Cost information is in columns oz dollars per ounce of hot dog and Ib Protein dollars per pound of hot dog protein e Three categories of taste are coded Bland Medium and Scrumptious in the Taste column T
21. JMP sample data files have the same name but show without an extension Reference to names of JMP files data tables variable names and items in reports appear in Helvetica to help distinguish them from surrounding text e Special information warnings and limitations are noted in sentences beginning with the bold word Note Reference to menu names File menu or menu items Save command appear in Helvetica bold font The notation to select a command from a menu is sometimes written as File New meaning select the New command from the File menu e Words or phrases that are important or have definitions specific to JMP are in italics the first time they appear Chapter 1 Introducing JMP 7 Step 1 Start JMP Step 1 Start JMP Start a JMP session by double clicking the JMP application icon Your initial view of JMP is a menu bar a toolbar the Tip of the Day window and the JMP Starter window Figure 1 6 Figure 1 6 First View of JMP Windows JMP Tip of the Day Fie Edit Tables DOE Analyze Graph Tools View Window Help BOAH Se hl Be MSP P EASSO F JMP Starter Ama la DEK JMP Triangles You will encounter two types of triangles in JMP s interface Both are clickable e Red triangles are hot spots that you click on to reveal a menu The menu is contextual so it only shows relevant commands 47 Big Class Fit Y by X of weight by height Al x Bivariate Fit of weight By height j Con
22. JSL functions 4 JSL Functions menu item 4 L Label Unlabel 50 111 lambda 91 large cross cursor 10 Launch button 4 Least Squares Means table 105 106 Level 52 66 leverage plot 102 126 Likelihood Ratio 76 linked table 35 local error 92 logistic regression see Fit Y by X Fit Model Longley jmp 129 M magnifier tool 29 Mahalanobis distance 116 main effect 99 100 Markers 88 112 Mean 66 Mean of Response 65 86 105 Mean Square 66 87 104 Means Diamonds 50 61 62 Means for Oneway Anova table 66 Means Anova t test 61 median 50 menus tips 5 modeling type 9 60 modify data table 71 74 moments see Distribution mosaic plot see Distribution multiple comparison see Compare Means multiple regression 119 121 130 Index Multivariate 112 N New Column 52 computing values 53 New Data Table 7 nominal 9 14 60 75 83 normal distribution 49 51 notation used in manuals 6 Number 66 numeric column 9 O object scripting index 4 Observations 65 86 105 Open Data Table 7 ordinal 9 14 60 83 outlier 50 88 109 116 outlier box plot see Distribution P Parameter Estimates table 87 partitioning 103 pattern in data 109 Pearson Chi Square 76 percentile see quantile plane fit 124 platforms 12 pointer cursor II Polynomial fit 90 post hoc see Compare Means Prediction Formula 106 Prob 52 Prob t 87 Prob gt F 66 87 104 Q Quantile Box Plot 50 63 Quantile Box Plot see Distri
23. MP Quick Reference Guide to learn more advanced commands in JMP View this document by selecting Help Books JMP Quick Reference Card Using This Book in Combination with Other Included Books The book you are reading now is the MP Introductory Guide See the following manuals for further documentation of JMP e The MP User Guide has complete documentation of all JMP menus an explanation of data table manipulation and a description of the formula editor There are chapters that show how to do common tasks such as manipulating files transforming data table columns and cutting and pasting JMP data statistical text reports and graphical displays e The JMP Statistics and Graphics Guide gives documentation of the Analyze and Graph menus It documents analyses discusses statistical methods and describes all report windows and options e The JMP Design of Experiments covers the DOE menu the experimental design facility in JMP e The JMP Scripting Guide is a reference guide to the JMP scripting language JSL that lets you automate action sequences If you did not receive printed copies of these books view the pdf files by selecting Help Books Conventions Used in this Book Conventions used in this manual were devised to help relate written material to information that appears on screen e The jmp extension follows filenames on the PC When you installed JMD a folder named Sample Data was also installed On the Macintosh
24. 3 v Change Month to a character variable as shown in Figure 2 3 by clicking the box beside Data Type Chapter 2 Creating a JMP Data Table 25 Entering Data The Column Info window is also used to change other column characteristics and to access the JMP formula editor for computing column values Figure 2 3 Change Data Type Month in Table Untitled Apply Data Type Numeric Help POM Row State j Initial Data Values Missing Empty Y N Rows o Column Properties w Add Rows Now add new rows to the table 5 Choose Rows gt Add Rows 8 Specify six new rows Alternatively double click anywhere in the body of the table to automatically fill it with new rows up through the position of the cursor B Select File gt Save to name the table BP Study jmp and save it The data table is now ready to hold data values To summarize the table evolution so far you e Began with a new untitled table e Added enough rows and columns to accommodate the raw data e Tailored the characteristics of the table by giving the table and columns descriptive names Changed the data type of the Month column to accept character values Entering Data To enter data into the data table type values into their appropriate table cells D Type the values from the study journal Figure 2 2 into the BP Study jmp table as shown here When entering data into the data table Edit the cell value by moving the cursor into a data cel
25. 5 Quantile Box Plot and Quantiles Table Quantiles 100 0 maximum 70 000 88 595 70 000 gr 3 69 975 30 0 66 000 means diamond 75 0 quartile 65 000 50 0 median 63 000 25 0 quartile 60 250 10 0 56 200 2 576 51 025 0 5 51 000 0 0 minimum 51 000 Learning About Report Tables JMP produces tables of statistical summaries along with graphs The tables that JMP produces depend on whether a variable is continuous ordinal or nominal Click the red triangle icon and select Display Options to reveal tables that are not available by default The diamond shaped disclosure button lt 4 on Windows and Linux and on the Macintosh at the top left of each report opens and closes it Reports for Continuous Variables The figure to the right is the table JMP gives along with histograms for the continuous variable height The Quantiles table displays the maximum value minimum value and other values for selected quantiles e The Moments table displays the mean standard deviation and other summary statistics Note You can also obtain more moments by clicking the red triangle icon on the variables title bar and choosing Display Options gt More Moments Quantiles 100 0 maximum 39 5 gy 5 30 0 75 0 quartile 50 0 median 25 05 quartile 10 0 2 595 0 5 0 0 minim Moments Mean Std Dey Std Err Mean upper 95 Mean lower 95 Mean M 70 000 70 000 69 975 66 000 6
26. In that folder is a file named Fitness jmp Open Fitness jmp The data are shown in Figure 10 1 For purposes of illustration certain values of MaxPulse and RunPulse have been changed from data reported by Rawlings 1988 p 124 Figure 10 1 Partial Listing of the Fitness jmp Data File Fitness Notes Linneruds Fitness d Age Weight oxy Fit Model 1 42 B815 5957 Stepwise Fit 2 38 8187 60 06 3 43 8584 54 30 Columns 3 0 4 5n 7nB7 5463 7 ore 5 49 8142 4916 B 38 8802 49 87 7 48 7E32 4867 8 52 7E32 4544 57 5808 5055 l RunPulse 51 77 91 46 67 dl RstPulse 40 75 07 45 31 al MaxPulse 49 73 37 50 39 44 7303 50 54 48 9163 46 77 54 8312 5185 52 7374 4579 52 8278 4747 47 7945 4727 Investigate Age and Runtime as predictors of oxygen uptake using the Fit Model platform To examine a multiple regression model with two effects 8 Highlight Oxy in the Select Columns list and click Y 8 Highlight both Age and Runtime 5 Click Add to specify them as model effects Runtime 8 17 8 65 8 65 8 32 8 35 9 22 8 40 8 53 3 33 10 00 10 07 10 06 10 13 10 25 10 33 10 47 10 50 10 60 You should now see the completed window shown in Figure 10 2 RunPulze 166 170 156 146 150 178 156 164 148 162 185 158 158 162 166 156 170 162 40 48 45 48 44 55 56 48 49 48 B2 B7 45 48 50 58 53 47 MaxPulze 172 156 158 155 185 150 188 166 155 158 185 158 158 164 170 1
27. Mean on V Bivariate Fit of ratio By age Windows and Linux and gt on the Macintosh This displays a report that shows The sample mean arithmetic average of the response ratia variable e The standard deviation of the response variable e The standard error of the response mean e The error sum of squares for the simple mean model T Fit Mean Fit Mean Mean 0 655556 Std Dev RMSE 04121747 Std Error 0 014346 SSE 1 052378 Fitting a Line To fit a simple regression line through the data points 8 Click the red triangle icon in the title bar and select Fit Line The regression line minimizes the sum of squared distances from each point to the line of fit Because of this property it is sometimes referred to as the line of best fit ad Z uoissoJ4bDo 86 Regression and Curve Fitting Chapter 7 Fitting Models to Continuous Data 7 Bivariate Fit of ratio By age Fit Mean Linear Fit uy Line of Fit 0 8 Confid Curves Fit 08 Confid Curves Indiv Line Color d Line Style d 0 6 Line Width Li is v Report Save Predicteds Save Residuals age Plot Residuals Fit Mean Linear Fit Set a Level F Confid Shaded Fit Red triangle icons Confid Shaded Indiv Remove Fit Each time a fit is selected from the red triangle icon the regression equation and another red triangle icon for that fit show beneath the scatterplot as shown here Click the red triangle icon to reveal
28. S 04 0 rra ARO wae O IAS e tale dara piratas 65 Ao VALES oes headgear es pool aaa 65 Mean Estimates and Statistical Comparisons lt 2 o5 450us40e0 e e a bes a 66 d curd TP 67 Analyzing Categorical Data Comparing Proportions ssssee RR RR RR Rs 69 Look Before You Leap rmm 7I Opus Dus DD 225x929 99994 8 vee RE ISO eea E 033 1 93 2 TEE Y ae iens 71 Address tae 0 SUD 23 5 9293 2 95 51 95 3 7 S84 eae ES OES Robb PE See ES 71 humus WD MET DUET 71 Contingency Table RopOriS 12e sms EUIS DE A EO IU ia are 240 e de 75 Cast varaibles Dus OES oa oh eee oo os G95 26 nro yd o dd PERSE 75 Contngeney Table Mosaic Plots eibar address 75 a o ee ee ee ee ee ee ee ee ec ee 78 Regression and Curve Fitting Visualizing Relationships 0 0 0 cee ee eee cece oo 81 Lack DIE Tou DOS ecuatoriano he thug beens aa ped Ea pde Sd edades 83 So AA ie 83 o e IIS 83 Choose Variable Roles 122143299993 ER Eu ASG add dad add 83 Fitting Models to Continuous Data sisara dianti dto ta eti E do dede tee SOR tee E eed dnd 84 ime A e PP aaeeaureseusae etd outed cee aes bee 85 BUONO SP coat o UI E E E E E E 85 Understanding the Summary of PIE Table L2 isse oae died verae raro pa 86 Understanding the Amalysis ot Variance Table io oaa ober vanes De Sua be dades e Pairs 87 Understanding the Parameter Estates Table sarereisiseiradiisis RO HERE UE 87 IDE o MPMEEPCC 88 Jotirnalme MP
29. Select Effect Leverage from the box beside Emphasis at the top right of the Fit Model window 8 Click Run Model to estimate the model parameters and view the results Chapter 8 A Factorial Analysis 101 Graphical Display Leverage Plots Graphical Display Leverage Plots The Fit Model command graphically displays the whole model and each model effect as the leverage plots Shown in Figure 8 2 through Figure 8 5 It is possible to tell at a glance whether the factorial model explains the popcorn data and which factors are most influential The Whole Model plot to the left in Figure 8 2 shows actual yield by predicted yield values with a regression line and 9596 confidence curves The regression line and the 9596 confidence curves cross the sample mean the horizontal line which show that the whole factorial model all effects together explains a significant proportion of the variation in popcorn yield There is also a significant difference in yield between the two types of popcorn as shown in the right hand leverage plot for the popcorn main effect The small p values beneath the plots quantify the significant model fit and popcorn effect Figure 8 2 Leverage Plots of Actual by Predicted and of Popcorn Effect Response yield Whole Model popcorn Actual by Predicted Plot Leverage Plot 20 0 20 0 17 5 17 5 oO gil 4 TEJ D 2125 E n 3g m T E 10 0 a 10 0 a5 T3 5 0 5 0 ad 75 100 125 150 17 5 200 30 95 100 10 5 11
30. The Whole Model Other tables in the Fit Model report provide statistical gt Whole Model summaries The Summary of Fit table shows the numeric Actual by Predicted Plot summaries of the response for the factorial model e Rsquare R of 0 809 tells the scientist that the two factor model explains nearly 8196 of the variation in the data e Rsquare Adj adjusts R to make it more comparable over models with different numbers of parameters e Root Mean Square Error sometimes called the RMSE is 50 75 100 125 150 175 200 feh WOMEN h b yield Predicted P 0 0001 a measure of the variation in the yield scores that can be REq D 81 RMSE 1 5411 attributed to random error rather than differences in the Summary of Fit gt model s factors TR mE RSguare Adi 0 761738 e Mean of Response is the mean average of the yield Rod Mean seus Eno 3541104 scores Mean of Response 10 75 Observations or Sum Vygts 16 e Observations is the total number of recorded scores The F test probabilities in the Effect Test table tell the scientist that all model effects explain a significant proportion of the total variation JMP indicates a significant F value by placing an asterisk beside 1t There is also a table that gives the parameter estimates for the model Y Effect Tests Source Mparm DF Sum of Squares F Ratio Prob F popcorn 1 1 34 810000 14 5568 0 0024 batch 1 1 49000000 20 6316 0 0007 papcorn batch 1 1 37 210000 15 6
31. add them and which type of columns to add Data Table LIntitled Column prefix How many columns ta add Add asa group of columns Mumeric Add Where Q Before first Column After last Column Q After Selected Column Column 1 Initial Data values Missing Empty e Number of rows mu D Specify five new columns The default column names are Column 1 Column 2 and so on These can be changed at any time in the columns panel or at the top of the column in the data table So the next step is to type in meaningful names Use the names from the data journal Month Control Placebo 300 mg and 450 mg O Edit a column name by clicking it in the data grid or in the columns panel O Begin typing once the name is highlighted Untitled Column 1 Column 2 Column 2 Columns 517 Columns 540 ail l Column 1 l Column 2 l Column 2 l Column 3 l Column 3 l Column 4 l Column 4 l Column 5 l Column 5 Click once to select Then begin typing Set Column Characteristics Columns can have different characteristics By default they contain numeric data However the values for month in this example are character values To change the Month column from numeric to character 8 Highlight the column by clicking either in the area at the top of the column or the area beside its name in the columns panel 8 Select the Cols gt Column Info to display the window in Figure 2
32. any unusual subjects such as students who have extreme height or weight values A good indicator of extreme values is the ratio of weight to height To examine the ratio of weight to height create a new column called ratio computed as weight divided by height To do this 8 Click the Big Class jmp data table to make it the active window 8 Select Cols gt New Column To create the new column of weight to height ratios complete the New Column window as in Figure 4 6 D Type the new name ratio in the Column Name area The default data type is Numeric and is correct as is The modeling type is Continuous and is correct as is 8 Click the drop down menu beside Format and set the format for ratio in the data grid to Fixed Dec with two decimal places i Click the Column Properties button and select Formula as shown in Figure 4 6 to compute values for the new column Chapter 4 Looking at Distributions 53 Adding a Computed Column Figure 4 6 New Column Window Mew column in Car Poll OK ille Data Tvpe Character w E D T Modeling Type Nominal v Next alle Initial Data Values Help Column Properties Parte Rd Range Check lick r heck Missing Empty w Construct the formula that calculates values for the ratio column as follows 8 Highlight the empty term in the formula and select weight from the list of column names in the upper left corner of the formula editor O Press the divide
33. below see the JMP User Guide L diNr 6uionpoanul I6 Introducing JMP Chapter 1 A Practice Tutorial Figure 1 18 A Journal A The report window B The journal window DEX ia Big Class Bivariate of weight by height Q EA Journal Untitled a x EEF P Y Bivariate of weight By height Ed Y Bivariate Fit of weight By height Undo Typing Chrl z IR B cut Ctrl Eg Copy Ctrl C Paste Ctrl Clear Select All CEr A Linear Fit Y Linear Fit weight 127 1452 3 7113549 height Y Summary of Fit Je Linear Fit Y Linear Fit weight 127 1452 3 7113549 height Y Summary of Fit RSquare 0 502917 RSquare 0 502917 RSquare Adj 0 489836 RSquare Adj 0 489836 Root Mean Square Error 15 85786 Root Mean Square Error 15 85786 Mean of Response 105 Mean of Response 105 Observations or Sum Wats 40 Observations or Sum igts 40 gt Lack Of Fit Search k gt Lack Of Fit Y Analysis of Variance Y Analysis of Variance Source DF Sum of Squares Mean Square F Ratio Model 1 9668 079 9668 08 38 4460 Error 38 9555 921 25147 Prob gt F C Total 39 19224 000 0001 Y Parameter Estimates Term Estimate Std Error tRatio Prob tt Intercept 127 1452 37 52372 3 39 0 0016 height 3 7113549 0 598559 5 20 lt 0001 Source DF Sum of Squares Mean Square F Ratio Model 1 9668 079 9668 08 38 4460 Error 38 9555 921 251 47 Prob gt F Layout Ckri L C Total 38 19224 000 0001 Y Par
34. commands that show confidence curves and give the ability to save predicted and residual values as new data table columns The Save Predicteds command saves the prediction equation for the fit with the new column of predicted values The fit can be removed from the scatterplot at any time with the Remove Fit command Understanding the Summary of Fit Table Clicking the red triangle icon in the title bar and selecting Fit Line produces a Summary of Fit table which summarizes the linear fit Linear Fit ratio 0 6656231 0 052758 age Summary of Fit RSquare 0 822535 RSquare Adi 0 619999 Root Mean Square Error 0 051653 Mean of Response 0 855556 Observations or Sum Vygts T2 e Rsquare R quantifies the proportion of total variation in the growth ratios accounted for by fitting the regression line e Rsquare Adj adjusts R to make it more comparable over models with different numbers of parameters e Root Mean Square Error RMSE is a measure of the variation in the ratio values that is attributable to different people rather than to different ages e Mean of Response is the arithmetic mean average of the ratio values Observations is the total number of nonmissing values Note The first line of the report is the regression equation which is editable Chapter 7 Regression and Curve Fitting 87 Fitting Models to Continuous Data Understanding the Analysis of Variance Table In addition to producing a Summary of Fit t
35. compute values as needed a facility for grouping data and computing summary statistics special plots charts and communication capability for quality improvement techniques tools for printing and for moving analyses results between applications a scripting language for saving and creating frequently used routines This introductory chapter gives basic information about using JMP Contents Wiat ou o SA 3 Lamine Que n 0 a oe bea Dio e o oax 3 IM UU q esses EE wees aes EEA 3 Scar hing mee rm 3 Learning About Statistical and JSL Ierms oeo aseo honc rhe ieat booed eden eae le 4 Using the Contest Sensitive LIelp uode wu ve uie vss denda pes eir a 4 Learna MP BI miro TET 6 Using This Book in Combination with Other Included Books oooo ooooooomoooo o 6 Conventions Used in this BOOR lt s rues 6444 0 carre o dee qe db eed 6 Ad A PA ee eee ae Se ee oS NE 7 sep 2 Open a JMP Di Table ewe yen eg neon rr dom deca en ean ewe oR a Sep J Lern About he Data AA IE 8 Speci pino the Values Tape suene euous ma EO Eod nb abordada ds ados 9 Data Table Cursor Formio uuo rd nager Nc deed tb Te art deban obe E E ee wl oe aae 9 eo o MEERMETMCrETEEEMTT I pep Select AIS aos udo spake Face ar ye aor Sauter ed e Far d etr ador p eRe ees 12 Casting Columns Into A Ue I2 Sepo View thie Onpa Rppo aiu uo ao quot podras do Parm ds 13 Grapho and An Ia 13 Seas ME ETT 14 Step Or Saye the MP Ourput REPOT 225 3 1 99 arde 64 ka a 6S IS A eic A eer es ee e
36. country By marital status Tests N DF LogLike RSquare U 303 2 25700381 0 0086 Test ChiSquare Prob ChiS5q Likelihood Ratio 5 140 0 0765 Pearsor 5 081 0 0788 These statistical results reveal that American auto manufacturers might want to direct advertising plans toward married couples Chapter 6 Analyzing Categorical Data 77 Contingency Table Reports 8 Scroll the report to see the relationship between size of car and each x variable sex marital status and age The three mosaic plots indicate no relationship between car size and gender marital status or age group his is seen numerically by looking at the Contingency Tables and the Tests tables beneath each of the mosaic plots See Figure 6 6 Note that by default Col and Row also appear in the Contingency Tables Right click hold the CONTROL key and click on the Macintosh the table to access the Columns menu to turn columns on and off The Chi squared values support the hypothesis that the purchase of large medium and small cars is not significantly different across the sex marital status and age group factor levels The Chi squared probabilities range from 0 06 to 0 30 so you should expect smaller Chi squared values to occur six to 30 times in 100 similar surveys It probably makes no difference what size cars appear in advertisements Figure 6 6 Tables for Relationships with Size of Car Y Contingency Table size Count La
37. from the data table that scrumptious tasting hot dogs have the lowest meat average calories and sodium content However medium tasting hot dogs have the highest protein to fat ratio and they compare A well with respect to the other nutritional factors UO ope do ie een 25 31 Mean Protein F at ejeg 6urzueuuins 38 Summarizing Data Grouping Data Charting Statistics for Two Groups Chapter 3 Next it is useful to know the frequency of the three taste responses for each type of hot dog 9 Make sure the Hot Dogs jmp table is active and use Tables gt Summary again and select both Type and Taste as grouping variables 8 Click OK This produces the table shown here There is one row for each taste response within each type of hot dog The N Rows column lists the frequency in the source table of each type taste combination Type Taste M Rows 1 Beef Bland 3 2 Beef Medium 16 3 Beef Scrumptious 1 4 Meat Bland B 5 Meat Medium 5 B Meat Scrumptious 3 T Poultry Bland 1 a Poultry Medium 15 3 Poultry Scrumptious 1 8 Select the Graph gt Chart command with both grouping variables as Categories X and the Nrows column with the y role 8 Select Statistics gt Data 4 5 Click OK This produces the chart shown here In this example there are side by side charts that show the frequency for each taste within each type of hot dog Note Graph Chart can also be used to directly
38. gt Fit Model 4 8 In the Fit Model window select Oxy in the list of variables and click Y Select Age in the list of variables and click Add to add Age as a model effect Then click Run SY VL Model 8 Click the red triangle icon in the Response Oxy report and select Save Columns gt Prediction Formula The new predicted column labeled Pred Formula Oxy 2 is calculated using the formula 62 4229492 0 3156031 Age To compare this fitted line with the plane in the previous example i Select Graph gt Surface Plot Add Oxy Pred Formula Oxy and Pred Formula Oxy 2 as Columns and click OK Chapter 10 Multiple Regression 125 Fitting Plane 0 In the Point Response column drop down menu select Oxy for both Pred Formula Oxy and Pred Formula Oxy 2 8 In the Style drop down menu select Needles for both Pred Formula Oxy and Pred Formula Oxy 2 3 In the Surface drop down menu select Both Sides for Pred Formula Oxy and Pred Formula Oxy 2 Both this grid and the one in Figure 10 5 represent least squares regression planes but this plane has a slope of zero in the orientation of the Runtime axis Figure 10 6 shows the plot from an angle Figure 10 6 Three Dimensional Plot with Regression Planes Oxy by Age and Observed Values x Runtime regression plane Oxy by Age regression plane Pred Formu E Oxy Oxy ooo 13 12 8 Click and drag to rotate the plot A rotated view is s
39. in yield for the popcorn trials The interaction between the two main effects was significant The Least Squares Means table for the interaction showed how the two types of popcorn behaved under different popping conditions The new more expensive gourmet popcorn had better yield than the plain everyday type only if popped in small batches euy en1o3J0e3 9 SIS Chapter 9 Exploring Data Finding Exceptions Exploration is the search to find something new the endeavor to make some discovery For data analysis exploratory study is often the most fruitful part of the analytical process because it is the most open to serendipity Something noticed in a data set can be the seed of an important advance There are two important aspects of exploration e What is the pattern or shape of the data Are there points unusually far away from the bulk of the data outliers When exploring data composed of many variables the great challenge is dealing with this high dimensionality There can be many variables that have interesting relationships but it s hard to visualize the relationship of more than a few variables at a time Objectives e Use graphical techniques to search for outliers in one two three and higher dimensions e Perform a principal components analysis and examine it graphically Examine outliers graphically using Mahalanobis distance Contents soda rr eae ee eee ea doe See aa teane a E II One Dimensio
40. list on the left side of the window and click Y Columns as shown in Figure 1 20 Figure 1 20 Scatterplot 3D Column Selection Window Rotating 3D scattergraph with biplot options Select Columns Cast Selected Columns inta Roles FE av Fr dll d Y Columns dll Weight Freq These column names now appear in the list on the right side of the window 8 Click OK The scatterplot 3 D appears See Figure 1 21 P Pa nr Opto he iapa pe ee mk oe L 1 h a E uH m ec E 1 L opti Pow m erp H a n ra it e E Ct y r apaga Action OK Cancel Remove Recall Help L diNr 6uronpoanul Chapter 1 Introducing JMP A Practice Tutorial 18 Figure 1 21 The Cowboy Hat Spin the Cowboy Hat 8 Right click on the plot and select Settings To get a better view of the cowboy hat Y Deselect Walls Grids Axes and Box Y Move the mouse over the plot press the mouse button and move the cursor about The cowboy hat moves in three dimensions Chapter 1 Figure 1 22 Moving the cowboy hat Scatterplot 3D Data Columns Ix bul lE y To have the plot spin itself D Press the Shift key and give the plot a push with the cursor 8 To stop the spinning click again in the scatterplot 3 D frame Introducing JMP A Practice Tutorial I9 L diNr 6uionpoanul Chapter 2 Creating a JMP Data Table Entering and Plotting Data This lesson evaluates a new d
41. moments quantiles and proportions The students in a local school are participating in a health study This lesson summarizes basic information about the students for the school system s health care specialists The data collected include age sex weight and height To document the sample of participating students and identify any students with unusual characteristics who might need special attention we need to view summaries of the data This lesson roduces reports with graphs and short straightforward explanations p p grap 8 p Objectives e Use the distribution analysis to explore several variables at once Produce reports of moments quantiles frequencies and proportions Use the formula editor to compute a columns value e Create a subset of a data table Contents L ok Daor cua 5cewedtisesr4d vu CREDE Epid dud er PRU 47 Displaying Distribution o iioc due mh a A ic 47 Understanding Histograms of Nominal and Ordinal Variables oooooooooooooo o o 48 Understanding Histograms of Continuous Values cress oa bee be oed on 49 Learning About Report Tables lt lt tetrad ness dates nae a denne Puede ad ed hea KEE Rs 1 Reports tor Continuons Variables suas wr dese bakes ow d ira 3 rar es 9 403 qos Hee a 1 Frequency Table Tor Ordinal or Nominal Variables 2222 mt mr cee ace el nde os 52 Adding a Compot COO DERI A Un OI 52 Ha A M 55 Wi cx nim sag ca Gok we Soa ea ohh Hed eee OR ea
42. open a new Fit Model window Use the following method to specify the two factor model 8 From the full factorial model select unwanted effects listed in the Construct Model Effects box and click Remove 8 Click Run Model Analysis of Variance Construct Model Effects popcorn batch Cross popcorntbatch Mest Macros Degree Attributes o Transform Ce Ho Intercept The whole model leverage plot in Figure 8 6 shows that the two factor model describes the popcorn experiment well Examine the tables that accompany the whole model leverage plot The Analysis of Variance table in Figure 8 6 that accompanies the whole model leverage plot quantifies the analysis results It lists the partitioning of the total variation of the sample into components The ratio of the Mean Square components forms an F statistic that evaluates the effectiveness of the model fit If the probability associated with the F ratio is small then the analysis of variance model fits better statistically than the simple model that contains only the overall response mean euy eno3J0e3 8 SIS 104 A Factorial Analysis Chapter 8 Analysis of Variance Figure 8 6 Analysis of Variance for the Two Factor Whole Model Y Response yield Whole Model Actual by Predicted Plot 20 0 17 5 5S0 75 100 125 150 17 5 2008 yield Predicted P 0 0001 R gt q 0 01 RMSE 1 3411 Summary of Fit Y Analysis of Variance source DF Sum af S
43. professor Korea Advanced Institute of Science and Technology and Duk Hyun Ko SAS Korea for reviewing the Korean translation Bertram Sch fer and David Meintrup consultants StatCon for reviewing the German translation Patrizia Omodei Maria Scaccabarozzi and Letizia Bazzani SAS Italy for reviewing the Italian translation Finally thanks to all the members of our outstanding translation teams Past Support Many people were important in the evolution of JMP Special thanks to David DeLong Mary Cole Kristin Nauta Aaron Walker Ike Walker Eric Gjertsen Dave Tilley Ruth Lee Annette Sanders Tim Christensen Jeff Polzin Eric Wasserman Charles Soper Wenjie Bao and Junji Kishimoto Thanks to SAS Institute quality assurance by Jeanne Martin Fouad Younan and Frank Lassiter Additional testing for Versions 3 and 4 was done by Li Yang Brenda Sun Katrina Hauser and Andrea Ritter Also thanks to Jenny Kendall John Hansen Eddie Routten David Schlotzhauer and James Mulherin Thanks to Steve Shack Greg Weier and Maura Stokes for testing JMP Version 1 VIL Thanks for support from Charles Shipp Harold Gugel d Jim Winters Matthew Lay Tim Rey Rubin Gabriel Brian Ruff William Lisowski David Morganstein Tom Esposito Susan West Chris Fehily Dan Chilko Jim Shook Ken Bodner Rick Blahunka Dana C Aultman and William Fehlner Technology License Notices The ImageMan DLL is used with permission of Data Techniques Inc
44. the speed and brand variables Figure 1 17 In this example e Fora continuous variable JMP displays a Quantiles table and a Moments table For nominal and ordinal variables JMP displays a Frequency table showing the total sample frequency category frequencies and associated probabilities Chapter 1 Figure 1 17 Statistical Reports JMP also gives you the ability to change the appearance of these tables For details see the JMP User Guide Distributions brand Frequencies Level REGAL SPEEDY TYPE VWORD O MATIC Total M Missing 3 Levels Count 8 5 4 17 Prob 0 47059 0 29412 0 23529 1 00000 Introducing JMP Step 6 Save the JMP Output Report speed Quantiles 100 0 maximum 99 5 oF 5 30 0 75 0 quartile 50 0 median 25 05 quartile 10 0 2 595 0 5 0 0 minimum Moments Mean Std Dev Std Err Mean upper 95 Mean lower 95 Mean M Step 6 Save the JMP Output Report To save a report just as it appears in the report window select File Save ar 000 7 000 7 000 82 200 76 000 72 000 67 000 61 500 61 000 61 000 61 000 2470500 7 0010503 1 5880041 76 070196 65 67096 17 I5 Another way to save Is to duplicate it in a separate window titled Journal Then you can append other reports to it or manipulate it in the journal window To do this select Edit gt Journal Then select File gt Save As For more detail than is presented
45. to high However it is not surprising to see the tight low cost cluster of poultry brands X marked at the lower left of the Ib Protein by oz scatterplot The highlighted points include poultry brands one meat brand and one beef brand The selected beef point Z marked is in the upper right corner of the plot which places it in the most expensive category The single meat point Y marked is more costly than the poultry brands but less than the beef brands A bigger surprise appears in the lb Protein by Protein Fat scatterplot As the protein to fat ratio increases the cost per pound of protein stays about the same Further the poultry brands not only cost the least but also contain the most protein Most of the selected points are in the three highest protein categories The density ellipses on the Calories by Sodium scatterplot show clearly that the poultry brands have about the same range of sodium content as the meat and beef brands but many poultry brands have fewer calories Finding the Best Points Now there is sufficient information to identify several hot dog brands as possible cafeteria menu items 8 Click the Hot Dogs jmp table to make it active Note that the icon beside the Product Name column in the Column panel means that is has been designated as a Label column The poultry X marked brands are acceptably economical and some of them have high protein content Few meat or beef brands compared well 8 Sel
46. 0 11 5 120 125 yield Predicted P 0 0017 popcorn Leverage P 0 0034 R q 0 87 RMSE 1 4937 In Figure 8 3 the confidence curves for oil amt and the popcorn oil amt interaction do not cross the horizontal mean line rather they encompass the mean line This shows that neither of these factors significantly affected popcorn yield Figure 8 3 Leverage Plots for the Oil Amt and Its Interaction with Popcorn oil amt popcorn oil amt Leverage Plot Leverage Plot 20 0 20 0 yield Leverage Residuals n vield Leverage Residuals n 10 0 10 0 5 75 5 0 5 0 10 0010 25 10 50 10 75 11 00 11 25 11 50 8 10 11 12 15 14 oil amt Leverage P 0 1933 popcorn oil amt Leverage P 0 2134 The leverage plots in Figure 8 4 show that the batch size effect batch and the interaction between popcorn type and batch size popcorn batch are significant effects T his means that the size of the euy eno3J0e3 9 SIS 102 A Factorial Analysis Chapter 8 Quantify Results Statistical Reports batch makes a difference in the popcorn yield Furthermore the significant interaction means that batch size affects each type of popcorn differently Figure 8 4 Leverage Plots for Batch and Its Interaction with Popcorn batch popcorn batch Leverage Plot Leverage Plot 20 0 20 0 vield Leverage Residuals RO cn vield Leverage Residuals F cn 10 0 10 0 7 5 ns 5 0 5 0 8 5 9 0 3 5 10 010 511 011 512 0125 8 8S8 10 11 12 13 14
47. 0 6 DROP d ENERO a d SU QR Ded dee tee eeees 30 Chapter 2 Creating a JMP Data Table 23 Starting a JMP Session Starting a JMP Session Y Open the JMP application icon to begin a JMP session D Use the New command in the File menu to create an empty data table like the one shown here Click File New Data Table or open JMP Starter and select New Data Table Column 1 The data values for this project are blood pressure statistics collected over six months and recorded in a notebook page as shown in Figure 2 2 Figure 2 2 Notebook of Raw Study Data Blood Pressure Study Month Control Placebo 300mg 450mg March 165 163 166 168 April 162 159 165 163 May 164 158 161 153 June 162 161 158 151 July 166 158 160 148 August 163 158 157 150 Creating Rows and Columns in a JMP Data Table JMP data tables have rows and columns sometimes called observations and variables in statistical terms The raw data in Figure 2 2 are arranged as five columns treatment groups and six rows months March through August The first line in the notebook names each column of values These names can be used as column names in the new JMP table Add Columns First create the number of rows and columns that are needed SoIqeL eyed z Creating a JMP Data Table Chapter 2 Creating Rows and Columns in a JMP Data Table D Select Cols gt Add Multiple Columns which prompts for the number of columns to add where to
48. 0263 0 117723 315 0 0042 MaxPulze 03055336 0 13454 227 00320 Y Effect Tests Source Mparm DF Sum of Squares F Ratio Prob F Age 1 1 26 39162 5 0601 0 0335 Weight 1 1 8 72475 1 6645 018543 Runtime 1 1 322 16433 61 7668 lt 0001 RunPulze 1 1 51 59471 8 5823 0 0042 1 MaxPulze 2609775 3 1571 0 0320 Y lidi NIN OL uoissoibsS 128 Multiple Regression Chapter 10 Fitting Plane Interpreting Leverage Plots The leverage plots for this example multiple regression model allow visualization of the contribution of each effect First look at the whole model leverage plot shown in Figure 10 8 of observed versus predicted values This plot illustrates the test for the whole set of regressors The Analysis of Variance table in Figure 10 9 shows a highly significant F corresponding to this plot The confidence curves show the strong relationship because they cross the horizontal line Now examine the leverage plots for the regressors Each plot illustrates the residuals as they are and as they would be if that regressor were removed from the model The confidence curves in the leverage plot for Age in Figure 10 10 show that Age is borderline significant because the curves barely cross the horizontal line of the mean Note that the significance of the Age effect is 0 03 in the text reports Figure 10 9 which is only slightly different from the 0 05 confidence curves drawn by JMP The leverage plot for Weight shows that the e
49. 15 16 batch Leverage P 0 0011 popcorn batch Leverage P 0 0027 The two leverage plots shown in Figure 8 5 show that there is no significant interaction between amount of oil and batch size Figure 8 5 Leverage Plots for Other Interaction Effects popcorn batch oil amt batch Leverage Plot Leverage Plot 20 0 20 0 17 5 17 5 15 0 yield Leverage Residuals cn vield Leverage Residuals RO cn 10 0 10 0 7 5 7 5 5 0 5 0 8 8 10 11 12 13 14 15 16 8 3 10 11 12 13 popcorn batch oil amt batch Leverage P 0 0027 Leverage P 0 9481 For more information about interpretation of leverage plots see the chapters Understanding JMP Analyses and Standard Least Squares Introduction and the appendix Statistical Details of the MP Statistics and Graphics Guide Quantify Results Statistical Reports Because oil amt and its interactions with other effects are not significant fit the popcorn data again without these effects The new model should have the significant factors type of popcorn batch size and their interaction term This approach condenses the statistical reports that show estimates of yield under the different conditions of interest Use the same Fit Model command as before 5 Click the Fit Model window to make it the active window Chapter 8 A Factorial Analysis 103 Analysis of Variance 8 If the window is closed click the red triangle icon on the report and select Script gt Redo Analysis to
50. 2 800 spinning Plot 2 ETHANOL 0 340 0 570 0 850 1 620 1 400 2400 a liverien 3 PROPANOL 0 250 0 020 0 400 0 700 0 820 1 520 4 BUTANOL 0 880 0 890 0 450 0120 0 400 0 700 aa 5 PENTAMOL 1 560 1 200 1 050 0 620 0 400 0 400 a Octanul 6 HEXAMOL 2 030 1 800 1 690 1 300 0 990 0 460 ad Ether 7 HEPTAMOL 2 410 2 400 2 410 1 810 1 670 1 010 dl Chloroform 8 ACETIC_ACID 0470 0 340 1 600 2 260 2 450 3 060 l Benzene PROPIONICACID 0 330 0 270 0 960 1 350 1 600 2340 al Carbon Tetrachloride BUTYRICACID 0 790 0 610 0 270 0 560 0 970 1 760 d Hexane HEXAMOICACID 1 920 1 950 1 150 0 300 0 570 0 460 PENTANOICACID 1 390 1 000 0 280 0 100 0 420 1 000 TENE T TRICHLOROACETICACI 1 330 1 210 0 690 1 300 1 660 2 630 e 5 14 DICHLOROACETICACID 0 920 1310 0 890 1 400 2 310 2 720 Use this menu to assign Label role to a column There are six solvent variables but there are no six dimensional graphics However it is possible to look at six one dimensional graphs 15 two dimensional graphs and 20 three dimensional plots Using principal components a representation of higher dimensions can be displayed One Dimensional Views The Distribution command helps you summarize data one column at a time It does not show any relationships between variables but the shape of the individual distributions helps identify the one dimensional outliers e eq GBunojdxg 6 u2 Exploring Data Chapter 9 Solubility Data To begin exploring
51. 3 903 dees dhe creia ee 75 Contingency Table Mosne P O cance send alias e ra AE aw dosnt ted 75 do o e oo AP ee ee ee 78 Chapter 6 Analyzing Categorical Data Look Before You Leap Look Before You Leap The first step is to become familiar with the data Begin by reviewing the data to determine the best way to proceed with the market analysis Open a Data Table Y When you installed JMP a folder named Sample Data was also installed In that folder is a file se named Car Poll jmp Open Car Partition Car Poll Motes Example data for ca c 1 2 3 4 5 B T 8 Poll jmp as shown here The Car Poll data were collected Columns 6 0 ud sex from a random sample of people ina marital status specific geographic area The d age il country columns panel shows that Age is a a u numeric variable and is assigned the th type continuous al modeling type The marital status age Married Single Married Single Married Single Married Married Married Married Married Single a4 3B 23 28 38 34 42 40 20 26 26 26 other five columns are character variables with nominal th modeling types Address the Research Question country American Japanese Japanese American American Japanese American European American American European European size Large Small Small Large Medium Medium Large Medium Medium Medium Small Medium type Family Sporty Family Family Fa
52. 5 000 65 000 60 250 56 200 51 025 51 000 51 000 62 55 42423385 0 5ro0r 726 63 906766 61 193234 40 suonnqinsia y 52 Looking at Distributions Chapter 4 Adding a Computed Column Frequency Table for Ordinal or Nominal Variables The report for nominal and ordinal variables has a different table Frequencies from those produced for continuous variables The table JMP Level Count Prob StdErr Prob Cum Prob gives along with histograms for the nominal or ordinal 12 8 0 20000 0 06325 0 20000 13 7 047500 006008 037500 categorical variables sex and age has frequency tables that M co gsuuun Enzo DEE hores eno 15 7 047500 006008 085000 16 3 007500 004165 0392500 Level lists each value of the response variable S i uM Sud i ce Total 40 1 00000 0 00000 1 00000 e Count lists the number of rows found for each level of a N Missing O B Levels response variable e Prob lists the probability of occurrence for each level of a response variable The probability is computed as the count divided by the total frequency of the variable shown at the bottom of the table The following two statistics are not displayed by default Right click hold the CONTROL key and click on the Macintosh the frequency table and select the Columns menu to reveal them e StdErr Prob lists the standard error of the probabilities e Cum Prob contains the cumulative sum of the column of probabilities Adding a Computed Column Be on the alert for
53. 674 0 0019 Summary Reports for Effects Now look at the summary tables for each effect in the model The tables for the main effects are shown here The Least Squares Means table lists the least squares means and standard errors for each level of the model factors without considering the interaction between them In this balanced example the least squares means are simply the sample means of each factor level Y popcorn 7 batch Leverage Plot Leverage Plot Least Squares Means Table Least Squares Means Table Level Least Sy Mean Std Error Mean Level Least Sq Mean Std Error Mean gourmet 12 225000 D 54485237 12 2250 large 9 000000 54485237 3 0000 plain 9 275000 54485237 9 2750 amall 12 500000 054456237 12 5000 The nature of the interaction is important in the interpretation of the popcorn experiment To examine the significant popcorn batch interaction 8 Click the red triangle icon from the Response yield title bar and select Factor Profiling Interaction Plots This command plots the least squares means for each combination of effect levels as shown in Figure 8 7 euy eno3Jo0e3 8 SIS 106 A Factorial Analysis Chapter 8 Analysis of Variance Figure 8 7 Interaction Plots Baennncea wald Regression Reports k Estimates k Plot Effect Screening Ek A Factor Profiling Row Diagnostics k Save Columns i k ae Cube Plats En oot 5 Box Cox Y Transformation SurFace Prafiler hd jm Za 3 a p
54. 8 Runtime 3 196774 1 358841 8 91 0001 Effect Tests Source Mparm DF Sum of Squares F Ratio Prob F Auge 1 1 18 21015 25460 041218 Runtime 1 1 568 36071 79 4627 0001 Clicking the red triangle icon and selecting Save Columns displays a list of save commands To save predicted values and the prediction equation for this model 5 Click the red triangle icon and select Save Columns Prediction Formula Chapter 10 Multiple Regression 123 Fitting Plane This command creates a new column in the Fitness data table called Pred Formula Oxy Its values are the calculated predicted values for the model To see the columns formula 8 Right click the Pred Formula Oxy column name and select Formula The Formula window opens and displays the formula 88 4356809 0 1509571 Age 3 1987736 Runtime This formula defines a plane of fit for Oxy as a function of Age and Runtime Y Click Cancel to close the window and return to the data table window Fitting Plane JMP can show relationships between Oxy Runtime and Age in three dimensions with a surface plot i Select Graph gt Surface Plot Y Add Oxy and Predicted Formula Oxy as Columns and click OK D From the Style menu for Pred Formula Oxy select Needles See Figure 10 4 Figure 10 4 Initial View of the Surface Plot of Oxy Age and Runtime E 8 Click and drag to rotate the plot so it looks like that in Figure 10 5 Y lidi NIN OL uoissoibsS
55. 88 172 164 Y lidi NIN OL uoissoibsS 122 Multiple Regression Aerobic Fitness Data Figure 10 2 Completed Fit Model Window for Multiple Regression with Two Effects Fit Model Model Specification Pick Role variables il ame id Sex ad Age e ll eight e al Ory all Runtime e all RunPulse a RstPulse dos Weight Freq By a eee opima opima Numeric Abre uec JE VEU opima Numeric atte eat opima all MaxPulse 8 Click Run Model Construct Model Effects Age Runtime Cross Mest Macros Degree Attributes Transform Ce Ho Intercept aja Ey Personality Standard Least Squares Y Emphasis Effect Leverage bal Chapter 10 You should now see the tables shown in Figure 10 3 These statistical reports are appropriate for a response variable and factor variables that have continuous values Figure 10 3 Statistical Text Reports Y Response Oxy Whole Model Actual by Predicted Plot Summary of Fit RSquare 0 764769 RSquare Adi D 747967 Root Mean Square Error 2574424 Mean of Response 47 37581 Observations or Sum Vygts 31 Y Analysis of Variance Source DF Sumof Squares Mean Square F Ratio Model 2 651 11025 325 555 45 5160 Error 28 200 27129 7153 Prob F C Total 30 651 36154 0001 Parameter Estimates Term Estimate Std Error tRatio Prob Intercept 58 435651 5 321348 16 62 lt 0001 Auge 0 150957 0 094608 1 60 0 121
56. 99 Root Mean Square Error 0 051653 Mean of Response 0 855556 Observations or Sum Vygts v2 Y Analysis of Variance Source DF Sumof Squares Mean Square F Ratio Model 1 0 86561 72 0 565617 324 4433 Error FO 0 15867605 0 002665 ProbsF C Total 71 1 05237 78 0001 Parameter Estimates Term Estimate Std Error tRatio Prob l Intercept 0 6656231 0012176 54 67 0001 ade O 0052759 0 000293 18 01 lt 0001 Polynomial Fit Degree 2 ratio 0 6973133 0 0052759 age 7 3371e 5 age 35 2 Summary of Fit RSquare 0 87747 RSquare Adi 0 673918 Root Mean Square Error 0 04323 Mean of Response 0 855556 Observations or Sum Vygts v2 The tables show the R value from the Summary of Fit tables for the linear fit the second degree polynomial fit and the third degree fit As polynomial terms are added to the model the regression curve appears to fit the data better See Figure 7 3 for the graph and some of the tables Fitting a Spline Even the polynomial fit of degree 3 does not quite reach the outlying points of the very young subjects A free form function that acts as if it smooths the data such as a smoothing spline might be better 5 Use the Remove Fit command on both polynomial fits so that only the first linear regression line shows on the scatterplot r Pakvoomial Fit l ienrsez3 a v Line of Fit Confid Curves Fit 8 Click the red triangle icon on the title bar and select Fit Spline three times eant
57. Characteristics 222 litere ER ERG E I dad 39 Comparative cac EDI es sean vi bra eae Vesa wee Ea v3 heeds dew ER T qot S i um 39 What Has Been Discovered recurra rara donde RC Ad CR Re s AI inne o i e o ee ee re ied iri ee 41 S cs A P U 42 Looking at Distributions Histograms moments quantiles and proportions 45 boob Dolore YW Lep donuaqumwg quum Quen SPERO as 47 Displaying DISE OS mM 47 Understanding Histograms of Nominal and Ordinal Variables ooooooooooooo o 48 Understanding Histograms of Continuous Values voccommrrrrcrar rra 49 Lenine Abot Report VANS 1udaaidtes disques qae eq e ade Feo bebde died des 1 Reports for Continuous Variables ias oda y a 6 0643 pe Nee debes 1 Frequency Table for Ordinal or Nominal Variables 42 s0eesseeedeeee ceeds ib cess 52 pugne a Computed COMM 44 5420 respet o 97 Ra US tee aa a e Sd a thor ss 52 s 0 chc A II 55 Teper ny MMC 56 Comparing Group Means Testing Differences cece e 57 Loor Belus Tou LED Dn eee A 59 Graphical Display ol Grouped Data 2 2243 943 v3 P ess 565105055049 0404N OLEH ees 59 Choose Variable Roles unus eed eek cst ar E een beds Ge EAR dBA SSeS eee see em cen 60 a IU aaa ETT ETT TETTE UTC ET CETT TT ECT TT T OTT TE OTT OTT TT 60 ud ruis c P 61 iua LU sucesor a oe eae ates da tia dida 63 Comparison Cte SMEREM D 64 111
58. Each of the typing test scores is plotted for each brand of keyboard Note that the distance between tick marks on the brand axis is proportional to the sample size of each group The mean typing score for the total sample is shown as a horizontal line across the plot Chapter 5 Comparing Group Means 61 Graphical Display of Grouped Data Figure 5 2 Oneway Analysis for speed by brand Typing Data Fit Y by X of speed by AE Oneway Analysis of speed By brand 30 B5 REGAL SPEED YTYPE WWORD O MA brand It is easy to see at a glance that most participants who used the SPEEDYTYPE machines typed faster than the others Fit Means Option Now look at more graphical information about the v S Anaua Analveie speed By brand distribution of typing scores Quantiles Means Anowa 8 Click the red triangle icon on the title bar and select Means and Std Dev Compare Means k Means Anova onparametric k This produces the appropriate analysis of variance reports ES g Equivalence Test It automatically turns on mean diamonds which draws a Power 95 means diamond for each group as shown in Set a Level Figure 5 3 You can also turn the Mean Diamonds option O NES on and off from the Display Options submenu Click the Densities red triangle icon on the title bar and select the Display me Column Options submenu Mean Diamonds has a check mark SESEWEEERE rr beside it if it is turned on Script lam v Points
59. JMD a folder named Sample Data was also installed In that folder is a file named Typing Data jmp Open the file Typing Data jmp The Typing Data table appears in the form of a data grid as shown here P 4 i Typing Data Motes Three brands af typ ox brand speed v Oneway 1 REGAL r 2 SPEEDY TYPE ar E 3 SPEEDYTYPE 79 il brand AETA 5 SPEEDY TPE PY B REGAL T2 T WORD 0 MA TIC Ez B REGAL 71 8 WORD 0 MA TIC TT 10 SFEED YT YFE 50 11 REGAL T2 The data table has columns named brand and speed The modeling type for each column shows to the left of each column name in the columns panel The character variable brand has nominal th values and the numeric variable speed has continuous al values There are 17 rows that represent typing scores for 17 employees However the number of participants in the groups differs because some of the scheduled participants did not show up for the study Perhaps other statistics for the groups differ also In particular e Is the mean average typing speed the same for each brand Doany one of the three brands of keyboard stand out from the others Does it make a difference as to which brand the employees use Graphical Display of Grouped Data Comparing the mean typing scores of each keyboard brand involves analyzing two variables so use the Fit Y by X command from the Analyze menu Selecting Fit Y by X allows you to perform e Categorical analysis when b
60. Paul Tobias and William Q Meeker for advice on reliability plots Lijian Yang and J S Marron for bivariate smoothing design George Milliken and Yurii Bulavski for development of mixed models Will Potts and Cathy Maahs Fladung for data mining Clay Thompson for advice on contour plotting algorithms and Tom Little Damon Stoddard Blanton Godfrey Tim Clapp and Joe Ficalora for advice in the area of Six Sigma and Josef Schmee and Alan Bowman for advice on simulation and tolerance design For sample data thanks to Patrice Strahle for Pareto examples the Texas air control board for the pollution data and David Coleman for the pollen eureka data Translations Erin Vang Trish O Grady Elly Sato and Kyoko Keener coordinate localization Special thanks to Noriki Inoue Kyoko Takenaka Masakazu Okada Naohiro Masukawa and Yusuke Ono SAS Japan and Professor Toshiro Haga retired Tokyo University of Science and Professor Hirohiko Asano Tokyo Metropolitan University for reviewing our Japanese translation Professors Fengshan Bai Xuan Lu and Jianguo Li at Tsinghua University in Beijing and their assistants Rui Guo Shan Jiang Zhicheng Wan and Qiang Zhao and William Zhou SAS China and Zhongguo Zheng professor at Peking University for reviewing the Simplified Chinese translation Jacques Goupy consultant ReConFor and Olivier Nufiez professor Universidad Carlos III de Madrid for reviewing the French translation Dr Byung Chun Kim
61. Release the button and enter the text for the title Comparison of Treatment Groups s jqeL eead z 30 Creating a JMP Data Table Chapter 2 Chapter Summary 8 Click outside the annotation to quit editing the text D Repeat to enter the footnote XYZ Blood Pressure Study 2007 Note Double click any report title bar to edit the text on the bar Figure 2 7 Bar Chart with Modified Y Axis Titles and Footnotes X Chart 175 we Treatment Groups H EaeSLITIEML Ur ups 169 WS control D 168 DO Placebo i E E 160 450mg 8 157 D isa 151 148 145 March April May June July August Month Chapter Summary A study was done to evaluate the effect of a new drug on blood pressure To complete this analysis you Used the New Data Table command in the File menu to create a new JMP table Created the appropriate number of rows and columns for the data Typed the data into the empty data grid Used the Chart command in the Graph menu to request a bar chart of blood pressure measures over time Ordered the values in chronological order so they would appear properly in the chart Tailored the chart with a specific axis scale and axis name and added a plot title and footnote with the annotate tool Chapter 3 Summarizing Data Look Closely at the Data The hot dog is a questionable item on a school cafeteria menu because of its reputation as an unhealthy food possibly classified in the junk food category Many st
62. SAS Institute 1987 SAS Stat Guide for Personal Computers Version 6 Edition Cary NC SAS Institute Inc Snedecor G W and Cochran W G 1967 Statistical Methods Ames lowa lowa State University Press 132 References Winer B J 1971 Statistical Principals in Experimental Design 2nd Edition New York McGraw Hill Inc Symbols tool 3 4 A Add Columns 24 Add Rows 25 Add Statistics Column 35 All Pairs Tukey s HSD 64 analysis methods see platforms Analysis of Variance see Fit Y by X Fit Model Analysis of Variance table 65 87 104 analysis role 12 analysis type see modeling type 9 analyze categorical data 69 79 Analyze menu 12 Annotate cursor 29 arrow cursor IO assign role 26 B bar chart 27 beginner s tutorial 3 7 box plot 50 BP Study jmp 25 C C Total 65 calculator example 53 Car Poll jmp 71 categorical analysis see Fit Y by X Fit Model categorical data 69 79 categorical type 9 character column 9 Charts 26 36 Chi Square 77 classification variable 13 collinear 129 Column Info 24 Index JMP Introductory Guide column name 24 columns 8 Compare Means 57 64 66 67 Comparison Circles 64 confidence curves IOI 128 confidence interval 50 62 construct formula 52 53 Construct Model Effects 100 103 continuous 9 I4 60 83 Count 52 Cowboy Hat jmp 16 create subset 52 54 crossed effects 100 Cum Prob 52 cursors in data table 9 curve fitting 83 D data
63. Scintilla is Copyright 1998 2003 by Neil Hodgson lt neilhCscintilla org gt NEIL HODGSON DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS IN NO EVENT SHALL NEIL HODGSON BE LIABLE FOR ANY SPECIAL INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE DATA OR PROFITS WHETHER IN AN ACTION OF CONTRACT NEGLIGENCE OR OTHER TORTIOUS ACTION ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE XRender is Copyright 2002 Keith Packard KEITH PACKARD DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS IN NO EVENT SHALL KEITH PACKARD BE LIABLE FOR ANY SPECIAL INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE DATA OR PROFITS WHETHER IN AN ACTION OF CONTRACT NEGLIGENCE OR OTHER TORTIOUS ACTION ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE Chapter 1 Introducing JMP Your First Look JMP uses an extraordinary graphical interface to display and analyze data JMP is software for interactive statistical graphics and includes a data table window for editing entering and manipulating data a broad range of graphical and statistical methods for data analysis an extensive design of experiments module options to highlight and display subsets of data a formula editor for each table column to
64. Select Edit gt Journal The Journal command appends the scatterplot with spline fits and text reports to the open journal file After journaling the final analyses the following draft notes about the spline fitting technique can be added at the bottom of the journal window 8 Select the annotate tool from the toolbar 8 Click and drag a large box at the bottom of the report 8 Add the following text to the box Smoothing Spline Fit lambda 10 R Square 0 970061 Sum of Squares Error 0 031508 Change Lambda E Smoothing Spline Fit lambda 1000 R Square 0 952519 Sum of Squares Error 0 049966 Change Lambda E Smoothing Spline Fit lambda 100000 R Square 0 671034 Sum of Squares Error 0 135721 Change Lambda E This fitting technique applies a cubic polynomial to the interval between points the polynomial is joined such that the curve meets at the same point with the same slope to form a continuous and smooth curve A small enough lambda could make such a curve go through every point which would model the error not the mean A moderate lambda value forces the curve to be smoother or in other words less curved This 1s accomplished by adding a curvature penalty to the optimization that minimizes the sum of squares error By comparing various regression fits notice that both the polynomial fits and the spline fit with moderate flexibility best describe the data These models show that infants grow most rapidly
65. Solubility jmp data table to clear the row selection O Select Graph gt Scatterplot 3D which opens the Scatterplot 3D window 5 Add all six continuous variables to the Y Columns list 8 Click OK O After the plot appears change the drop down lists below the plot to any combination of the three variables e eq 6uiojd x3 6 114 Exploring Data Chapter 9 Solubility Data The goal is to look for points away from the point cloud for each combination of three variables To aid in this search D Rotate and examine each three dimensional plot by dragging the plot with the mouse 8 Hover over points to identify outliers Figure 9 4 shows two three dimensional outlying points in the view of Ether by 1 Octanol by Benzene that hadn t been apparent before To label them D Shift click these points 8 Select Rows gt Label Unlabel Their labels METHYLACETATE and ACETONE appear on the plot Figure 9 4 Spotting Outliers in a Three Dimensional View penen JF j fi i mra l i j an i 4 f PPP cq Pa if 25 d 5 ET 3 eit LACET TE Ld Ace Y T g E i i T r Principal Components and Biplots Because many of the variables in the Solubility jmp table are highly correlated there is not a lot of scatter in six dimensions The scatter is oriented in some directions but is flattened in other directions Chapter 9 Exploring Data 115 Solubility Data To illustrate this
66. The employees participated in a study to help decide what type of keyboards to buy The company selected three different brands of keyboards to test These keyboards were randomly assigned to three groups of employees with comparable typing skills The employees completed typing tests and recorded their words per minute scores This lesson finds out if the typing scores are significantly better on any one brand of keyboard than on the others Objectives e Use the Fit Y by X command to produce plots and analyses appropriate for a one way analysis of variance Use interactive tools to examine differences among groups Produce text reports to display differences among groups Contents L ok Bente Ph C LT 59 Graphical Display of Grouped Data ra ae eae bene ad ese Bor he Sen e eae eRe Ro 59 Choose pe 120313 9 92595 99929 99d d oe oo aie hoor ee euvaeeteenetes 60 s geo ee E ae ee ee re 60 M cito PC CTPTTTMITTPTMMMT 61 co Uno eeu Ooo ness wares 63 Si a III 64 dido x Peke e ee ee ee es ee eee 65 aL oe ee ee ee ee ee eee ee ers er oe 65 Mean Estimates and Statistical COmparisofNs oeusescue recrear e eA EE 66 Su o a Dee Dn m peto ic E abd qus cedes od OR ari 67 Chapter 5 Comparing Group Means 59 Look Before You Leap Look Before You Leap The first step is to become familiar with the data The typing test scores are in a JMP file so that they can be reviewed and the type of analysis determined y When you installed
67. To assign a different modeling type to a variable 1 Click the icon next to the variable name 2 Select the appropriate modeling type Figure 1 9 Changing a column s modeling type Columns 27 il brand us Ordinal Nominal For details see the MP User Guide Data Table Cursor Forms As you move the cursor around the data table it changes forms Its shape gives you information about performable actions The following sections describe the different cursor forms L dwir 6uionpoanul IO Introducing JMP Chapter 1 Step 3 Learn About the Data Table le k Arrow Cursor The cursor is a standard arrow when it is anywhere in the table panels to the left of the data grid except when it is on a red triangle icon amp or diamond shaped disclosure button 4 amp on Windows and b on the Macintosh or when it is in the upper left corner of the data grid the area where rows and columns are deselected See Figure 1 10 Figure 1 10 A Click to select a column Double click to edit a column name B Click in this area to deselect all rows C Click in this area to deselect all columns B Typing Data Motes Three brands of typ ox brand speed T nea 1 REGAL gt FO 2 SPEED T YPE ar Columns 240 3 SPEEDYTYPE 79 A brand 4 REGAL 73 l speed 5 SPEED YTYPE TT l Beam Cursor The cursor is an I beam when it is over text in the data grid or highlighted column names in the data grid or column pa
68. Xs in each scatterplot Note in the scatterplot matrix that many of the variables appear to be correlated as evidenced by the diagonal flattening of the normal bivariate density ellipses There appear to be two groups of variables that correlate among themselves but are not very correlated with variables in the other group Chapter 9 Exploring Data 113 Solubility Data Figure 9 3 Two Dimensional View Scatterplot Matrix Er Pus ER of rr yau Mra at RT DUE UM A A UNT Dat 3 pix won un ay wigs 4 A ZA Chloroform eye ey eet ES 7 ee E a we a a E qe E rachide BRO BD y wre o og 085152510010 25 101234210123 3 10123 4 2 0123 The variables Ether and 1 Octanol appear to make up one group and the other group consists of the remaining four variables These two groups are outlined on the scatterplot matrix shown in Figure 9 3 Scan these plots looking for outliers points that fall outside the bivariate ellipses of a two dimensional nature and identify them with square markers using the following steps Y Double click on Selected in the rows panel in the Solubility jmp data table to clear your current selection D Shift click each outlier 5 Select Rows Markers and select the square marker from the palette Now both one and two dimensional outliers are identified Three Dimensional Views To see points in three dimensions Y Double click on Selected in the rows panel in the
69. a 31 43 survey data 69 79 typing study 57 67 tutorials learning JMP 3 Typing Data jmp 59 typing study 57 67 W Z Weight role 13 weight height ratio example 83 whiskers 50 Whole model plot 10 X Factor role 13 Y Response role 13 Index
70. able clicking the red triangle icon and selecting Fit Line produces an Analysis of Variance table Analysis of Variance Source DF Sumof Squares Mean Square F Ratio Model 1 85586172 0 865617 324 4433 Error FO 0 15867605 0 002665 ProbsF E Total 71 1 0523778 0007 The elements of the table give an indication of how well the straight line fits the data points e Source identifies the sources of variation in the growth ratio values Model Error and C Total DF records the associated degrees of freedom for each source of variation e SumofSquares SS for short quantifies the variation associated with each variation source The C Total SS is the corrected total SS computed from all the ratio values It divides partitions into the SS for Model and SS for Error The Model SS is the amount of the total variation in the ratio scores explained by fitting a straight line to the data The Error SS is the remaining or unexplained variation e Mean Square lists the Sum of Squares divided by its associated degrees of freedom DF for Model and Error e FRatio is the regression Model mean square divided by the Error mean square e Prob gt F isthe probability of a greater F value occurring if the ratio values differed only because of different subjects rather than because the subjects are different ages In this example the significance of the 7 value is 0 0001 which strongly indicates that the linear fit to the weight height growth pa
71. ameter Estimates Customize Term Estimate Std Error tRatio Prob tt Intercept 127 1452 37 52372 3 39 0 0016 height 3 7113549 0 598559 5 20 lt 0001 A Practice Tutorial Before you begin the tutorials in the following chapters of this book complete this brief practice tutorial that is a short guided tour through a JMP analysis Follow the steps to see a three dimensional scatterplot Open a Data Table D Open the file called Cowboy Hat jmp to begin a JMP session When you installed JMP a folder named Sample Data was also installed In that folder is a file named Cowboy Hat jmp The data table shown in Figure 1 19 appears Chapter 1 Figure 1 19 Cowboy Hat Data Table Cowboy Hat 4 E Motes The values for x x Columns 507 a OO A m a A aat oon EA oy en a 5 4 5 4 3 5 3 25 2 418 1 0 5 Z 0 700061 29 0429217953 0 119651 56 0 1 709506 0 4369755 0 6300599 0 1020949 08r 38338 0 9261040 1 3515328 hue Introducing JMP 17 A Practice Tutorial hue shade This data table has three numeric columns and two row state columns Columns x and y are x and y coordinates and Z is created using the function p 1 2 Z SIDAX y Select an Analysis To plot the three columns of information from the Cowboy Hat data table Y Choose the Scatterplot 3D command from the Graph menu 8 Select the x y and z columns from the column selector
72. arning About Statistical and JSL Terms The Help Indexes command displays the following sources for your reference e Statistics Index Accesses references that give definitions of statistical terms Once you are in the Statistics Index window click Topic Help to go to the place in the online Help that describes the highlighted topic Click Example to run the script associated with the highlighted topic Click Launch to run the script that corresponds to the item you have highlighted in the list Figure 1 1 The Statistics Index A List of topics B Description C Example script accelerated failure time models i analysis of covariance separate slopes t added variable Plot leverage plot E de ple Definition adjusted means The sl iate is diff tin diff t adjusted r square e slope an a covariate is ji feren in iferen AIC groups Lise Fit Model specifying the main effect the Akaike s W Information Criterion E Trailen and a crossed effect for main effect by covariate E analysis af covariance same slopes analysis af variance ANOVA general analysis of variance ANOVA one way Open SAMPLE DATA Drug jmp RunScript 2 ARIMA j Attribute Gauge R amp R E autocorrelation backpropagation i bar chart Bartlett s test i Bayes Plot Box Meyer biplot Gabriel box plot box plot groups vi JSL Functions Index Presents a list of JSL operators such as Sin Cos Sq
73. bution question mark tool 3 4 Quick Reference Guide 6 R R 9I references 131 regression analysis 83 155 regression example 83 regression line ror regression see Fit Y by X Fit Model Remove Fit 86 91 report options I4 rescale axis 28 resize plot 89 Response role 13 role 12 26 75 Root Mean Square Error RMSE 65 86 105 rows 8 Rsquare 65 86 105 Rsquare Adj 65 86 105 S Save As 89 Save Predicteds 86 Save Prediction Formula 122 saving a JMP session 15 Scatterplot 3D example 16 19 Scatterplot 3D 113 scatterplot matrix 112 scatterplot see Fit Y by X Multivariate Overlay Plot select rows and columns 11 selection tool 67 Shift Tab 25 shortest half 50 Show Points 85 smoothing 91 solubility study 109 117 Solubility mp 111 Source 65 87 104 Spin Principal Components 115 spline 91 start JMP 23 statistical index 4 statistical summaries see Distribution Std Error 66 87 StdErrProb 52 Subset 52 55 112 Sum of Squares 65 87 104 summarizing data 31 43 Summary 34 Summary of Fit table 65 86 91 105 survey data 69 79 xapul 136 T Tab 25 tension 9I Term 87 three dimensional plots 119 130 tick mark 60 Tip of the day 6 Tools 29 Topic Help button 4 t ratio 87 t test 87 tutorial 3 7 tutorial examples data table 21 30 drug experiment 21 30 exploratory study 109 117 multiple regression 119 130 popcorn experiment 97 107 regression analysis 83 summarizing dat
74. chart data grouped by two variables the data doesnt have to be grouped first by Tables Summary To label each bar with the frequency it represents O Label gt Label by Value is selected by default Right click the bars and select Label Show Labels The chart shows that the poultry hot dogs excelled in nutrition factors and that most people find them medium tasting However because the sodium content appears slightly high in some poultry brands more investigation is needed Chart 15 e 10 a te 5 o E BEE ES EZ JB 2 E 2 m Y n pt OD a E n st E JE E E 2 o 5 o i y y Beet Meat Poultry Taste within Type Taste WW Bland O medium WY Scrumptious Chapter 3 Summarizing Data 39 Finding a Subgroup with Multiple Characteristics Finding a Subgroup with Multiple Characteristics Continue the search for the ideal hot dog Add special markers to the summary table Hot Dogs by Type Taste that identify each type of hot dog In the Hot Dogs by Type Taste summary table 8 Shift click or click and drag over the medium and scrumptious beef rows 2 and 3 to select them 3 Use the Markers command in the Rows menu to assign them the Z marker 3 Deselect those rows 8 Shift click or drag the medium and scrumptious meat rows 5 and 6 to select them assign them the Y marker and deselect them 8 Shift click or drag the medium and scrumptious poultry rows 8 and 9 assign t
75. click OK JMP displays a histogram with an accompanying outlier box plot Quantiles table and Moments table The Quantiles table shown here identifies 30 as the median age Quantiles 100 0 maximum 60 000 99 5 56 500 97 5 44 400 90 0 30 000 Toi quartile 35 000 50 0 median 30 000 25 0 quartile 26 000 10 0 24 000 2 0 22 000 0 5 19 040 0 0 minimum 16 000 The next step is to create a new column whose values identify whether a subjects age is greater than 30 or is less than or equal to 30 Y Select Cols gt New Column to display the New Column window which is used to define column characteristics Data Type Modeling Type and Format options define the new columns characteristics Enter characteristics for the new column as follows 8 Type the new name call it age group in the Column Name text box O Because the new column has grouping values instead of measurements select Character from the box beside Data Type Figure 6 1 New Column Window New column in Car Pall OK Apply Data Tvpe Character vw Modeling Type Nominal v Next Help Initial Data Values Missindg Empty w Column Properties Motes l Range Check lick r heck 8 Click the Column Properties button and select Formula See Figure 6 1 You are presented with the formula editor window shown in Figure 6 2 Chapter 6 Analyzing Categorical Data 73 Look Before You Leap Figure 6 2 Formula Editor Window
76. d Preschool Children in the North Central Region of the United States of America World Review of Nutrition and Dietetics 14 Eubank R L 1988 Spline Smoothing and Nonparametric Regression New York Marcel Dekker Gabriel K R 1982 Biplot Encyclopedia of Statistical Sciences Volume 1 Kotz and Johnson editors New York John Wiley amp Sons Inc Hartigan J A and B Kleiner 1981 Mosaics for Contingency Tables Proceedings of the 13th Symposium on the Interface between Computer Science and Statistics W E Eddy editor New York Springer Hawkins D M 1974 The Detection of Errors in Multivariate Data Using Koehler Grigorus Dunn 1988 The Relationship Between Chemical Structure and the Logarithm of the Partition QSAR 7 Koehler M G Grigorus S and Dunn J D 1988 The Relationship Between Chemical Structure and the Logarithm of the Partition Coefficient Quantitative Structure Activity Relationships 7 Leven J R Serlin R C and Webne Behrman L 1989 Analysis of Variance Through Simple Correlation American Statistician 43 Mosteller F and Tukey J W 1977 Data Analysis and Regression Reading Mass Addison Wesley Rawlings J O Pantula S G and Dickey D A 1998 Applied Regression Analysis A Research Tool 2nd ed New York NY Springer Verlag New York Inc Sall J P 1990 Leverage Plots for General Linear Hypotheses American Statistician 308 315
77. determine whether the ratio values are related to or dependent on the age values Select an Analysis To fit regression curves YO Select Analyze gt Fit Y by X The Fit Y by X analysis does four types of analyses depending on the modeling type of the variable Regression analysis when both x and y have continuous values as in this example e Categorical analysis when both x and y have nominal or ordinal values Analysis of variance when x is nominal and y has continuous values e Logistic regression when x is continuous and y has nominal or ordinal values Choose Variable Roles The Fit Y by X command first displays the Fit Y by X window Y identifies a response or dependent variable and x identifies a classification or independent variable To choose variable roles 8 Highlight ratio and click Y Response 8 Highlight age and click X Factor as shown in Figure 7 1 8 Click OK d Z uoissoibsS 84 Regression and Curve Fitting Chapter 7 Fitting Models to Continuous Data Figure 7 1 The Fit Y By X Window Report Fit Y by X Contextual Distribution of Y for each X Modeling types determine analysis Select Columns Cast Selected Columns inta Roles al ratio optional all age optional optional Remove optional Numeric Recall optional Numeric Help optional Now investigate if the ratio of weight to height is a function of age Fitting Models to Continuous Data The scatterplot shown
78. during the first months of life and that growth rate decreases significantly at approximately 12 months Chapter 7 Regression and Curve Fitting 93 Fitting By Groups Fitting By Groups Excluding Points p 88 in this chapter shows how to overlay a linear fit for the whole sample with a linear fit for children over the age of seven months Carry this idea one step further with overlay fits to compare children under the age of one year with children over one year 8 In the Growth jmp data table create a new column called group to act as a grouping variable Right click in the new column area of the data table and select New Column from the resulting menu Write the column name and click OK 8 Right click in the Group column hold the CONTROL key and click on the Macintosh and select Formula Now enter the formula shown in Figure 7 5 Y Click Conditional in the function selector list and select the If function Se 5 The expression term denoted expr is highlighted 79 Choose a lt b from the Comparison functions D The left side of the comparison clause is highlighted Click age in the column selection list D Enter 12 for the numeric comparison Y Double click the term denoted then clause D Enter Babies in double quotes because this column is a character variable Y Double click the term denoted else clause D Enter Toddlers with double quotes i Click Apply and then OK Thi
79. e Big Class red triangle icon and select Tables Subset 8 Click OK to accept the default choices presented in the window This creates a new data table that has only the selected rows and columns from the active data table 85 123 r4 145 B4 B4 128 nis 112 107 Br 35 105 ratio 1 61 2 02 1 35 2 20 1 23 1 40 2 10 1 55 1 87 1 75 1 2 1 51 1 57 5 The new data table shown in Figure 4 8 contains only the students that have extreme weight to height ratios By default the table is named Subset of Big Class Change the name by clicking the existing name Subset of Big Class in the panel located on the top left side of the window The table can be saved exported for use in another application or printed Figure 4 8 Data Table Containing a Selected Subset Subset of Big Class Source Distribution Bivariate Oneway wl Logistic Contingency Fit Model Sel Sex value Labels sel Age Value Labels Columns 607 Il name amp J al age il sex l heiaht 1 2 3 4 5 B T 8 name JAME LILLIE TIM SUSAN DAVID JUDY ELIZABETH CAROL PATTY FREDERICK LEVIS MARY age 12 12 12 13 13 14 14 14 14 14 14 15 sex height weight F 55 74 F 52 B4 M B B4 F 56 Br M 53 79 F 61 51 F B2 31 F B3 B4 F B2 a5 hal 63 g3 hal B4 g2 F B2 g2 ratio 1 35 1 23 1 40 1 20 1 34 1 33 1 47 1 33 1 37 1 48 1 44 1 48 suonnqinsia y 6 Looking at Distributions Chapt
80. e eee ose Ata Pee eee eed 56 Chapter 4 Look Before You Leap Looking at Distributions Look Before You Leap 47 The first step in this analysis is to become familiar with the data in the Big Class jmp file Looking at the information in the JMP data table helps us decide which summary charts and tables to use in the health report Open the Big Class jmp data table to see the data table shown in Figure 4 1 Figure 4 1 Big Class jmp Data Table The file contains the name age sex height and weight for each student participating in the health Big Class wi Distribution i Bivariate T 0neway Fl Logistic Contingency Fit Model Set Sex Value Labels sel Age Value Labels Columns 5 0 il name amp J all ade id sex dl height l weight oo 4 Cn cm E w he name KATIE LOUISE JAME JACLYN LILLIE TIM JAMES ROBERT BARBARA ALICE SUSAM JOHN JOE 12 Sex height 53 61 55 66 52 60 61 51 B 61 56 B5 63 weight 35 123 74 145 B4 B4 128 79 112 107 By 35 105 study The data table is in order by age and sex is ordered within each age group Even though there are only five columns of information these variables address the following questions e How many boys and how many girls are there e How old are they e What is the average height and weight of the students Are there any students drastically younger or older than the average age Are there any s
81. e means diamond together helps show if data are distributed normally within a group If data are normally distributed bell shaped the 50th percentile and the mean are the same and the other quantiles are arranged symmetrically above and below the median Figure 5 6 Quantiles Box Plot 90th percentile 75th percentile sample mean 50th percentile 25th percentile group mean 10th percentile sueoj wv dnoauy sg 64 Comparing Group Means Chapter 5 Graphical Display of Grouped Data The quantile box plots Figure 5 5 show a difference in variation of scores across the three groups The scores in the REGAL group cluster tightly around the mean score but the WORD O MATIC scores show much more variation However even with this variation among the groups the SPEEDYTYPE brand still appears to promote the best performance Comparison Circles To complete the typing data inspection Y Y Onawau Analucic of enged By brand P v Quantiles 8 Click the red triangle icon and choose C M All Pairs Tukey HSD anh ompare Means gt airs tUKe P i y Means and Std Dev Ee This option produces statistical reports Compare Means Each Pair Student s t discussed later and automatically draws a set of Nonparametric All Pairs Tukey HSC h comparison circles to the right of the plot that one with Best Hsu MCB provides a graphical test of whether the mean Eduensi est with Control Dunnett s l un Power typing scores are statistically diffe
82. e years Using the Journal command to append each of these regression reports and graphs to a journal file ad Z uoissoJ4bDo Chapter 8 A Factorial Analysis Designed Modeling This lesson examines two treatments of popcorn The plain everyday type has been around for years but researchers claim to have discovered a special treatment of corn kernels This new process supposedly increases the popcorn yield as measured by popcorn volume from a given measure of kernels Is this true If so how much is the increase Are these increases the same for all groups of conditions The special treatment raises the cost of the popcorn so the increase in yield must be significant enough to warrant the higher costs The popcorn data used in this chapter and for examples in the JMP User Guide and the JMP Statistics and Graphics Guide are artificial but the experiment was inspired by experimental data reported in Box Hunter and Hunter 1978 Objectives Learn techniques to analyze a designed factorial experiment using the Fit Model command Evaluate and interpret effects using interactive graphic tools Examine supporting text reports Evaluate the significance of interaction effects using interaction plots e Save a model s predicted values for each observation Contents a ER 99 Opena Daa MDE L 99 Whar Questions Can Be Amilcar AAA pidid E a AE 99 ia A ERE o II IOO Graphical Display Leverage Plot
83. ect the Arrow tool from the Tools menu 8 Click inside the Calories by Sodium scatterplot to deselect all points 8 Shift click hold down Alt Shift and click on Linux to highlight the two poultry brands X marked in the data table with the least calories and lowest sodium content 8 Shift click to highlight the lone meat point Y marked in the data table that has the least sodium of all brands 1s low in calories has a moderate protein count and is average in price 5 Select Rows gt Label Unlabel to display the brand names of highlighted points Thin Jack Veal Calorie less Turkey and Estate Chicken as shown in Figure 3 10 ejeg 6urzueuuins 42 Summarizing Data Chapter 3 Chapter Summary Figure 3 10 Labeling Ideal Points Y 7 Bivariate Fit of Calories By Sodium 200 150 Calories Fa E qi a a a e e x Ualarie lezs Lace ric co er i00 200 300 400 500 600 700 Sodium As a final step use Analyze Fit Y by X to look again at the two scatterplots that compare costs B Select Analyze gt Fit Y by X 8 Assign lb Protein as Y O Assign both oz and Protein Fat as X 8 Click OK The plot in Figure 3 11 shows that the Estate Chicken brand is the most economical of the three labeled brands showing oz as continuous The plot to the right indicates that the Calorie less Turkey brand is in the group with the highest proportion of protein showing Protein Fat as nomi
84. ee 16 Opens Daa WG MR 16 wl xad pris i ee EE pE ee ee ee 17 vui o nee eee Saree da sate eundes esas 18 Chapter 1 Introducing JMP What You Need to Know What You Need to Know Before you begin using JMP you should be familiar with e Standard operations and terminology such as click double click Ctrl click and Alt click Command click and Option click on the Macintosh Shift click drag select copy and paste e How to use menu bars and scroll bars how to move and resize windows and how to manipulate files in the desktop If you are using your computer for the first time consult the reference guides that came with it for more information Minimal statistics Even though JMP has many advanced features you only need a minimal background of formal statistical training All analyses include graphical displays with options that help you review and interpret the results Each analysis also includes access to help windows that offer general help and some statistical details Learning About JMP If you are familiar with JMP you might want to know only what s new The MP New Features document gives a summary of general changes and additions To learn more about JMP use the recommendations in the following sections Using Tutorials JMP provides three types of tutorials e Beginners Tutorial The beginners tutorial steps you through the JMP interface and explains the basics of how to use JMP It is accessible through the Tip o
85. egression points occupy a narrow band showing their linear relationship When a plane is fit representing collinear regressors the plane fits the points well in the direction where they are widely scattered However in the direction where the scatter is very narrow the fit is weak and the plane is unstable In text reports this phenomenon translates into high standard errors for the parameter estimates and potentially high values for the parameter estimates themselves This occurs because a small random error in the narrow direction can have a huge effect on the slope of the corresponding fitting plane An indication of collinearity in leverage plots is when the points tend to collapse toward the center of the plot in the x direction The Longley jmp example shows collinearity geometrically in the strongly correlated regressors X1 and X2 To examine these regressors examine Figure 10 12 which shows rotated views of the regression planes They illustrate a regression of X1 on Y X2 on Y and both on Y Most of the points are near the intersection of the three planes All three planes fit the data well but their vastly different slopes show that the hold is unstable Geometrically collinearity between two regressors means that the points they represent do not spread out in x space enough to provide stable support for a plane Instead the points cluster around the center causing the plane to be unstable The regressors act as substitutes for each o
86. er 4 Chapter Summary Chapter Summary In this chapter the demographic and vital data of students participating in a health study were summarized The profile was completed using the Distribution command and the data management features of the JMP data table The Distribution command displayed histograms and box plots or stacked mosaic bar charts for each variable assigned the role of response variable y Using display and text report options to look more closely at the data the following actions were completed Adjusted the number of bars and the scale of the histograms Produced supporting statistical reports showing moments and quantiles of numeric variables and frequencies and proportions of nominal and ordinal variables Created a new column in the data table computed as a function of existing columns Highlighted histogram bars to identify a subset of rows in the data table Created a new data table from a subset of highlighted rows Graphs and text reports can be printed directly from JMP Graphs and reports can be copied to a JMP journal or into other applications to complete a report for the school system health care specialists See the chapter Univariate Analysis in the JMP Statistics and Graphics Guide for more information about distributions Chapter 5 Comparing Group Means Testing Differences The company has decided to replace all computer keyboards with the brand that produces the fastest accurate typing
87. es dividing the bars align horizontally the response proportions are the same When the lines are far apart the response rates of the samples might be statistically different Figure 6 4 Mosaic Plot Axes response rates Japanese proportion of married people with Japanese cars European American width of x axis is proportional__________ Married Single to number in sample marital status 8 Scroll to see each x variable as it relates to manufacturing country Sex and country do not appear to have any relationship at all l he proportion of automobiles from the three manufacturing countries is about the same for each sex The country by age group mosaic plot shows that the proportion of American car owners 30 years or over is only slightly greater than the proportion of American car owners under age 30 The most significant relationship is seen between marital status and country The mosaic plot shown previously in Figure 6 4 and 1ts supporting Tests table Figure 6 5 suggest that married people are more likely than single people to own American cars The Likelihood Ratio and Pearson Chi squared tests evaluate the relationship between an automobiles country of manufacture and the marital status of owner If no relationship exists between country and marital status a smaller Chi squared value than the one computed in this survey would occur only seven times in 100 similar surveys Figure 6 5 Table of Statistical Tests for
88. f the Day window which appears when you start JMP To start the tutorial from the Tip of the Day window click Enter Beginner s Tutorial Or start the tutorial by selecting Help View on the Macintosh Tutorials Beginners Tutorial e Specific Analysis Tutorials Tutorials that step you through creating an analysis in JMP are found under Help View on the Macintosh Tutorials Tutorials describe how to create a chart compare means how to design an experiment and more e JMP Introductory Guide The JMP Introductory Guide is a collection of tutorials designed to help you learn JMP strategies If you did not receive a printed copy of this book view the pdf file by selecting Help Books JMP Introductory Guide By following along with these step by step examples you can quickly become familiar with J MP menus options and report windows Searching in the Help You might want help on a specific topic and you want to search the online Help for that topic The main menu bar contains a Help menu which provides the appropriate searching capabilities On Windows and Linux the Help Contents Help Search and Help Index commands access the JMP Help system The Help system provides navigable online JMP documentation 3 L diNr 6uionpoanul 4 Introducing JMP Chapter 1 Learning About JMP On the Macintosh the Help gt JMP Help command displays a list of JMP help items with search capabilities and a table of contents Le
89. ffect is not significant The confidence curves do not cross the horizontal line of the mean Figure 10 10 Leverage Plots for the Age and Weight Effects Age Y Weight Leverage Plot Leverage Plot B5 60 cen en E cen Oxy Leverage Residuals cn a Oxy Leverage Residuals E a Qu cen 40 45 50 55 55 60 65 70 75 60 85 80 85 Age Leverage P 0 0335 Weight Leverage P 0 1843 The leverage plot for Runtime shows that Runtime is the most significant of all the regressors The Runtime leverage line and its confidence curves cross the horizontal mean at a steep angle Runtime Leverage Plot Oxy Leverage Residuals 8 9 10 11 12 313 314 15 Runtime Leverage P Dnon1 The leverage plots for RunPulse and MaxPulse shown in Figure 10 11 are similar Each is somewhat shrunken on the x axis This indicates that other variables are related in a strong linear fashion to these two regressors which means the two effects are strongly correlated with each other Chapter 10 Multiple Regression 129 Fitting Plane Figure 10 11 Leverage Plots for the RunPulse and MaxPulse Effects RunPulse MaxPulse Leverage Plot Leverage Plot Oxy Leverage Residuals Oxy Leverage Residuals 140 150 160 170 150 190 155 160 165 170 175 180 185 190 195 RunPulze Leverage P 0 0042 MaxPulze Leverage P 0 0320 Collinearity When two or more regressors have a strong correlation they are said to be collinear These r
90. g information appears on the right as shown in Figure 2 5 The list in the window contains values in the order in which they appear in reports Use the Move Up and Move Down buttons to change the order of the months s jqeL eead z 28 Creating a JMP Data Table Chapter 2 Entering Data Figure 2 5 The Value Ordering Window Month in Table Untitled 12 K Apply Data Type Character vw Help Modeling Type Nominal Column Properties Value Ordering Value Ordering optional item Specify data in the order that you want them to appear in the reports July Click OK The properties icon 3k now appears next to the column name in the data table s column panel indicating the column contains a property O In the analysis report click the red triangle and select Script gt Redo Analysis Rescale the Plot Axis By default y axis scaling begins at zero and the overlay chart looks like the one shown here But to present easy to read information the y axis needs to be rescaled and the chart needs labels 8 Double click the y axis area which accesses the Axis Specification window Figure 2 6 This window gives you the ability to e Set the minimum and maximum of the axis scale e Specify the tick mark increment Request minor tick marks Request grid lines at major or minor tick marks e Format numeric axes e Use either a linear or log based scale Chapter 2 Creating a JMP Data Table 29 Enteri
91. g th Analysis of Variance Table iius eee dier RE ERRARE RR d 87 Understanding the Parameter Estimates Table sacs cessere etr RR RES b Rr pe 87 udin Em 88 lo wc c a ee ee ee ee eee ee 88 Esamining a Polynomial Fit Linear Regression 2205 2a oca ttr deed edt donde bendads 90 E PU 9I a ro REP oreo 4666 Re Oe oe bade te Edessa te aeceeneeoees 93 Chapter aaa baa cede aces Eds Kad eed E ese 9e pap EEEE TERNAL qe Ren for dan RUE 94 Chapter 7 Regression and Curve Fitting 83 Look Before You Leap Look Before You Leap The first step is to become familiar with the data Begin by reviewing the data to determine the best way to proceed with the regression Open a JMP File Y When you installed JMD a folder named Sample Data mm was also installed In that folder is a file named Notes Eubanks 1888 Splin ralio age Growth jmp Open Growth jmp ENS 1 048 05 2 aa7 145 A partial listing of the Growth jmp data table is shown here Teen 3 056 25 4 061 35 There are 2 columns and 72 rows The ratio column 5 oe 45 contains the average weight to height ratio for each age SE MET i 7 068 65 group in the study The age groups range from 0 5 to 71 5 8 aval 75 months 068 85 a74 85 The modeling type for each column is shown to the left of 077 105 the variable name in the columns panel Both columns have a continuous modeling type 4 as needed for a regression 080 135 analysis in JMP The purpose of the analysis is to
92. gher Figure 5 3 Chapter 5 Comparing Group Means 63 Graphical Display of Grouped Data Fit Quantiles The next logical step is to check the distribution of points within each group This gives a better idea of the spread of the values and shows the distance of extreme values from the center of the data 5 Click the red triangle icon and select Quantiles When you select the Quantiles command JMP automatically overlays a quantile box plot on each group of typing scores as shown in Figure 5 5 JMP also displays the report in Figure 5 5 which lists the standard percentiles for each keyboard The median 50th percentile is the typing speed that divides the sample in half This means that 50 of the employees had speeds greater than the median and the other half had lower speeds Figure 5 5 Fit Quantiles Option Typing Data Fit Y by X of speed by brand Oneway Analysis of speed By brand REGAL SPEEDYTYPE VWORD O MATIC brand Quantiles Level Minimum Median 90 Maximum REGAL BE OS SPEED YT YPE TT 50 WORD 1 4 TIE 61 B4 Figure 5 6 illustrates the quantile box plot The median or 50th quantile shows as a line in the body of the box The top and bottom of the box represent the 75th and 25th quantiles also called the upper and lower quantiles The box encompasses the interquantile range of the sample data The 10th and 90th quantiles show as lines above and below each box Looking at the quantile box plot and th
93. grid 8 data table create 2130 density contour 116 designed experiment 99 DF 65 87 104 disclosure control 51 discrete data see nominal ordinal Display Options 14 DisplayBox scripting index 4 distance 116 Distribution 72 112 example 13 documentation overview 6 double arrow cursor 10 drug experiment example 21 30 E enter data 25 Estimate 87 Example 134 button on JMP Starter 4 Example button 4 Exclude Include 88 90 explanatory variable 13 explore data 109 117 extend selection 11 extreme values 52 63 F F Ratio 66 87 103 104 F Statistic 103 Factor role 13 factorial analysis example 97 107 fit by groups 93 Fit Line 85 94 Fit Model 100 fit plane 124 Fit Polynomial 90 Fit Y by X 39 59 75 Fit Spline or Fitness jmp 121 formula 52 example 53 prediction 107 123 F probability see Prob gt F Freq role 13 frequencies see Distribution frequency table 14 52 G Graph menu 12 Group By 94 95 grouped charts see Charts grouped fitting 93 grouping data 34 37 grouping variable 13 Growth jmp 83 H hand cursor 1 49 Help online 3 4 high dimensionality 109 highlight see Select histogram see Distribution honestly significant difference see Tukey Kramer HSD hotdog example 31 43 Hotdogs jmp 33 Index I beam cursor 10 independent variable 13 Index tab on JMP Starter 4 interaction 99 IOS interquartile range 50 J JMP Starter window 7 Journal 88 92 journaling analysis results 88
94. h small 1 525 else gt Match popcorn arge 1 525 plain gt Match batch small gt 1 525 else gt else Chapter 8 A Factorial Analysis 107 Chapter Summary This command creates a new column in the Popcorn data table called Pred Formula yield that contains the predicted values for each experimental condition The prediction formula shown at the bottom of Figure 8 8 becomes part of the column information To see this formula 8 Highlight the new column name Pred Formula yield D Select Formula from the Cols menu The prediction formula can be copied to the clipboard using standard cut and paste techniques Results show that popcorn should be packaged e in small packages so that the yield is good e in family size packages with smaller packets inside e in family size packages with popping instructions that clearly state the best batch size for good results Chapter Summary In this chapter a designed experiment evaluated the difference in yield between two types of popcorn A three factor factorial experimental design was the basis for popcorn popping trials The results were analyzed by using the Analyze Fit Model command The following results were found e The leverage plots for the factorial analysis of three factors showed one main effect and its associated interactions to be insignificant e A more compact two factor analysis with interaction adequately described the variation
95. he plot suomnngidsiG y O Looking at Distributions Chapter 4 Displaying Distributions 8 Move the hand to the left to increase the bar width and combine intervals see Graphs and Charts p 13 The number of bars decreases as the bar size increases O Move the hand to the right to decrease the bar width showing more bars 9 Move the hand up or down to change the boundaries of the bins The height of each bar adjusts according to the new number of observations within each bin Using Outlier Box Plots Available by default in histograms with continuous variables the outlier box plot see Figure 4 4 is a schematic that shows the sample distribution and allows identification of points with extreme values sometimes called outliers You can display and hide an outlier box plot by clicking the red triangle icon in the variable s title bar and selecting Outlier Box Plot The ends of the box are the 25th and 75th quantiles also called the quartiles The difference between the quartiles is the interquartile range The line across the middle of the box identifies the median sample value The lines extending from each end of the box are sometimes called whiskers The whiskers extend from the ends of the box to the outermost data points that fall within the distance computed as quartile l 5 interquartile range Points beyond the whiskers indicate extreme values that are possible outliers To label a point click the point t
96. hem the X marker and deselect them The type taste summary table now looks like the one shown here and the corresponding rows in the Hot Dogs jmp table are marked likewise 4 Type Taste M Rows 1 Beef Bland 3 z 2 Beef Medium 16 z 3 Beef Scrumptious 1 4 Meat Bland B Y 5 Meat Medium 5 Y B heat Scrumptious 3 T Poultry Bland 1 x o Poultry Medium 15 x 3 Poultry Scrumptious 1 Comparative Scatterplots Now examine the relevant variables with scatterplots to identify specific points brands The Fit Y by X command in the Analyze menu produces scatterplots when both the x and y are continuous numeric variables The following scatterplots graphically show the relationship of cost and the nutritional factors together 8 Click the Hot Dogs jmp source table to make it active B Select Analyze gt Fit Y by X Y Make your selections in the window giving lb Protein the y role and both oz and Protein Fat the x role Y Click OK This produces Ib Protein by oz and a Ib Protein by Protein Fat scatterplots 8 Click the red triangle icon and select Group By Y Choose Type as the grouping variable from the list of variables in the Grouping window ejeg 6urzueuuins 40 Summarizing Data Chapter 3 Finding a Subgroup with Multiple Characteristics O Repeat this action for the Ib Protein by Protein Fat scatterplot 8 For both plots click the red triangle icon and choose Density Ellipse gt
97. hown in Figure 10 7 Notice this subset Age only model regression showing as a line instead of a plane The view is edge on for Runtime which eliminates it from the visual model Y lidi NIN OL uoissoibsS 126 Multiple Regression Chapter 10 Fitting Plane Figure 10 7 Comparison of Three Dimensional Views Runtime effect Is edge on pO Runtime Pred Formula Oxy Oxy Whole Model Tests The leverage plot in Figure 10 8 shows joint test of the Age Weight Runtime RunPulse and MaxPulse effects in the model This plot compares the full model with the model containing the intercept that fits the overall response mean only This leverage plot is formed by plotting the actual observed values on the y axis and the values predicted by the whole model on the x axis The residual for the subset model is the distance from a point to the horizontal line drawn at the sample mean Figure 10 8 Leverage Plot for the Whole Model Age and Runtime Whole Model Actual by Predicted Plot 35 40 45 50 55 BO B5 Oxy Predicted P lt 0001 RSq 0 85 RMSE 2 2635 Chapter 10 Multiple Regression 127 Fitting Plane More and More Regressors Its easy to visualize two regressors predicting a response by using fitting planes But how can this be done with more regressors when the analysis requires more than three dimensions In actuality the fitting testing and leverage plot analyses still work for more regressors
98. iables The mosaic plots in Figure 6 7 show that the type of car varies for levels of marital status and age group As perhaps expected many of the cars owned by married people are family automobiles while the largest proportion of cars owned by single people are sporty cars ejeg jeou0bs1e D 9 78 Analyzing Categorical Data Chapter 6 Chapter Summary Figure 6 7 Reports for Type of Car and Marital Status and Age Group Contingency Analysis of type By marital status Contingency Analysis of type By age group Mosaic Plot Mosaic Plot Work Work Sporty Sporty Family Family Married single marital status age group gt Contingency Table gt Contingency Table Tests Tests N DF LogLike RSquare U N DF LogLike RSquare LU 303 2 13382804 0 0441 303 2 T B832984 0 0253 Test Chisquare Prob gt Chisgq Test Chisquare Prob2ChiSq Likelihood Ratio 26 766 0001 Likelihood Ratio 15 367 0005 Pearson 26 963 0001 Pearson 15 100 0 0005 So American automobile manufacturers might choose to focus advertisements toward married couples buying family type automobiles It follows logically that a relationship between age group and type of car also exists because older people are more likely to be married The graph to the right in Figure 6 7 shows graphically that the proportion of people over 30 years old who own family cars is much greater than those under 30 The small Chi squared values support the significant diffe
99. ics and Graphics Guide discusses analyzing categorical data in more detail For more information about using the formula editor see the chapter Using the Formula Editor in the MP User Guide ejeg jEe2l 0693 82 9 Chapter 7 Regression and Curve Fitting Visualizing Relationships This lesson demonstrates the interactive regression capabilities of JMP The data is from Eppright et al 1972 as reported in Eubank 1988 p 272 The study subjects are young males The variables in the data table are age in months and the ratio of weight to height A third variable classifies the subjects into two groups based on age The goal is to describe and model the growth pattern of subjects for the age range given in the data table Objectives Use the Fit Y by X command to fit least squares lines to continuous data Fit polynomial curves and cubic splines to the data set and explore their goodness of fit e Journal and save analysis results Use the Group By command to fit different lines to certain groups of data Contents Lar Dare JOUER Lus s scrwedtiqes rtd ved adea edad ed eden dde d dud aer eds 83 pena IME UA q 83 A FP ee eee eee eer rr ee 83 Choose Variable Roles 14242 uu ue dem dae 83 Fitting Models to Continuons Datt secador dor dS don det unda cts deu dh a ida draco d Rp eet 84 s DUAL A A II 85 Ie Rb M M nrEm 85 Understand die summery er PIE Table rirse iae xad ex dde vende behar dae 86 Understandin
100. idicu ies indie with lambda values of 10 1 000 and 100 000 Lambda is a tuning factor that determines the flexibility of the spline The Fit Spline command submenu shown to the left in Figure 7 4 lists lambda values The three new fits are overlaid on the scatterplot Line Calor Line Style Line Width k v Report Save Predicteds Save Residuals Plat Residuals Set a Level l Confid Shaded Fit Confid Shaded Indiv MET ad Z uoissoJ4bDo 92 Regression and Curve Fitting Fitting Models to Continuous Data Figure 7 4 Comparison of Spline Fits Rivariata Fit of ratin Ry age v Show Points Fit Mean Fit Line Fit Polynomial E Fit Special Fit Each Value Fit Orthogonal Density Ellipse Monpar Density Histogram Borders Group Bv Script k Chapter 7 Y Bivariate Fit of ratio By age ratio T Linear Fit Smoothing Spline Fit lambdaz10 T Smoothing Spline Fit lambda 1000 T Smoothing Spline Fit lambda 1 00000 By inspecting the plot see that the lambda 10 curve is too flexible and therefore local error has too great an effect on it The lambda 100 000 curve is too stiff It is so straight that it does not reach down to model the lower ages closely However the lambda of 1 000 curve fits well Its shape is not influenced by local errors and it appears to fit the data smoothly If a report of these results is needed journal these results O
101. ied e X Factor Identifies a column as an independent classification or explanatory variable whose values divide the rows into sample groups e Weight Identifies a numeric column whose values supply weights for each response e Freq Identifies a numeric column whose values assign a frequency to each row for the analysis e By Identifies a column that is used to create a report consisting of separate analyses Step 5 View the Output Report After you have cast columns into their roles JMP provides output reports that include graphics and text For more detail than is presented below see the MP User Guide Graphs and Charts JMP reports are usually filled with graphs charts plots and other graphical displays that show your results For example if you select Analyze Distribution and assign several columns the Y Response role in the Distribution window that appears you create a report that contains a graphical display of each column assigned the Y Response role For the example shown in Figure 1 15 the Distribution command produces graphical displays that include e Histograms of both the brand and speed columns e Anoutlier box plot of the continuous variable speed Figure 1 15 Distribution Histograms and Outlier Box Plot A Histogram B Outlier Box Plot Distributions Y 7 brand speed VyORD O MATIC SPEEDY TYPE REGAL L diNr 6uironpoanul 14 Introducing JMP Chapter 1 Step 5 View the Output Report
102. in Figure 7 2 is the result of the Fit Y by X analysis It is easy to see that the growth pattern is not random A straight line regression is a good baseline fit to compare with other regression curves Figure 7 2 Scatterplot of ratio by age Growth Fit Y by X of ratio by age EBR 7 Bivariate Fit of ratio By age Chapter 7 Regression and Curve Fitting 85 Fitting Models to Continuous Data When clicked the red triangle icon on the scatterplot title bar Bisariata Eit af ratin R reveals a variety of fitting commands and additional display Show Points options Options include Show Points fitting commands and Fit Mean EN other features The Show Points command alternately hides or Fit Line E displays the points in the plot Fitting options can be as simple Fit Polynomial A IDE as fitting a straight line or involved as drawing density ellipses BE pe Options Fitting options can be used repeatedly to overlay different fits ii SE on the same scatterplot erie Begin with a simple line and try different techniques after Density Ellipse inspecting the initial straight line regression fit Monpar Density Histogram Borders Group By Script b Fitting the Mean 8 Click the red triangle icon and select Fit Mean This is the baseline fit that hypothesizes that there is no relationship between x and y All other fits compare to this fit Since the Fit Mean table is closed by default 8 Click the disclosure button by Fit
103. in the Tables menu groups data and computes summary E dn statistics Eg Transpose The Summary command creates a summary table This table summarizes E ii columns from the active data table called its source table The Hot Dogs jmp B Update table is the source table in this example A summary table has a single row for E each level value of a specified variable Missing Data Pattern O Select Tables gt Summary 8 Select Type and click the Group button to see the window as shown in Figure 3 3 8 Click OK Figure 3 3 Summary Window H Summary Request Summary Statistics by Grouping Column Select Columns WM Product Mame Statistics 20002 Type ll Taste d ioz Fib Protein al Calories alllsodium al PrateiniFat Include marginal statistics For quantile statistics enter value 95 25 statistics column name format stat column Output table name Keep dialog open Chapter 3 Summarizing Data 35 Grouping Data The Hot Dogs By Type summary table Figure 3 4 appears in a new window The Type column lists hot dog type and the NRows column gives the frequency of each type in the source table A summary table is not independent of its source table It has these characteristics e When rows are highlighted in the summary table their corresponding rows highlight in the source table e The summary table is not saved when closed Select File gt Save As to specify a name and location
104. l and double clicking The cursor becomes a flashing vertical bar Correct a mistake by dragging the text entry bar across the incorrect entry and typing the correction over it e Press the Return or Enter key on the numeric keypad or the Tab key to move the highlight one cell to the right Press Shift Tab or Shift Return Enter on the numeric keypad to move the highlight one cell to the left s jqeL eead z 26 Creating a JMP Data Table Chapter 2 Entering Data Month Control mE SOOM 450mg 155 166 158 Click to highlight Month Control Placebo 300mg 450mg 165 163 166 1685 Begin typing Month Control Placebo 300mg 450mg The completed data table n us dt is pe 2 April 162 159 155 165 3 May 164 158 161 153 4 June 162 161 158 151 5 July 155 158 160 145 B August 165 158 157 150 Plotting Data When working with the Analyze and Graph menu commands you tell JMP which columns to work with and what to do with them This section shows how to plot the months across the horizontal x axis and the columns of blood pressure statistics for each treatment group overlaid on the vertical y axis 8 Select Graph gt Chart The window in Figure 2 4 appears Assign x and y roles and choose the type of chart This example specification is for a bar chart with data as opposed to statistics as chart points O Assure that the default choice Vertical is selected from the chart type drop do
105. mily Family Family Family Sporty Family Sporty Sporty 71 The basic research question asks Is the response probability for country of manufacture size of car or type of car a function of the age sex or marital status of the owner Look at the data table to see what specific relationships lend insight into this question You are interested in the relationships between the following automobile characteristics and demographics manufacturing country by age e manufacturing country by sex manufacturing country by marital status e size of car by age e size of car by sex e size of car by marital status e type of car by age e type of car by sex type of car by marital status Modify the Data Table You can sometimes obtain better summary information from age groups rather than specific ages In fact dividing people into two age groups is often the basis for a valuable broad analysis So let s find the median age the age that divides the sample into two equal age groups eljeg je2ol 0693 8 9 72 Analyzing Categorical Data Chapter 6 Look Before You Leap The distribution of a variable and its corresponding quantiles display a good way to form sample groups Use the distribution of the age column to find a reasonable value of age that divides the sample into two groups 8 Select Analyze gt Distribution D When the Distribution window appears select age as the analysis column Y Column and
106. n Bao Fang Chen Susan Shao Hugh Crews Yusuke Ono and Kelci Miclaus provide ongoing quality assurance Additional testing and technical support is done by Noriki Inoue Kyoko Takenaka and Masakazu Okada from SAS Japan Bob Hickey is the release engineer The JMP manuals were written by Ann Lehman Lee Creighton John Sall Bradley Jones Erin Vang Melanie Drake Meredith Blackwelder Diane Perhac Jonathan Gatlin and Susan Conaghan with contributions from Annie Dudley Zangi and Brian Corcoran Creative services and production was done by SAS Publications Melanie Drake implemented the help system Jon Weisz and Jeff Perkinson provided project management Also thanks to Lou Valente Ian Cox Mark Bailey and Malcolm Moore for technical advice Thanks also to Georges Guirguis Warren Sarle Gordon Johnston Duane Hayes Russell Wolfinger Randall Tobias Robert N Rodriguez Ying So Warren Kuhfeld George MacKensie Bob Lucas Warren Kuhfeld Mike Leonard and Padraic Neville for statistical R amp D support Thanks are also due to Doug Melzer Bryan Wolfe Vincent DelGobbo Biff Beers Russell Gonsalves Mitchel Soltys Dave Mackie and Stephanie Smith who helped us get started with SAS Foundation Services from JMP Acknowledgments We owe special gratitude to the people that encouraged us to start JMP to the alpha and beta testers of JMB and to the reviewers of the documentation In particular we thank Michael Benson Howard Yetter d
107. nal Figure 3 11 Winning Hot Dog Brands Y Bivariate Fit of Ib Protein By oz Y Bivariate Fit of Ib Protein By Protein Fat Thin Jack eal Hib Protein Hib Protein i X alorie less Turkey Calorie lesz Typhey Estate Chicken Estate Chicken 05 10 15 20 25 1 2 3 4 5 Hoz Protein at Chapter Summary This lesson examined different hot dog brands for a cafeteria menu A JMP table has data for 54 brands of hot dog showing type of hot dog taste preference nutritional factors and cost factors To find the ideal hot dog we did the following Chapter 3 Summarizing Data 43 Chapter Summary Created a summary table that group the data by hot dog type and by taste preference within each hot dog type Used Graph Chart to chart summary statistics and identify the subset of hot dog brands that are both the most nutritious and the best tasting Assigned different markers to each type of hot dog Used Analyze Fit Y by X to see scatterplots that compare cost factors and nutritional factors e Selected the points representing the lowest cost most nutritious and used the Label Unlabel command in the Rows menu to identify the Calorie less Turkey brand as a possible cafeteria hot dog See the MP User Guide for details about the Summary command For scatterplot and bar chart examples see the MP Statistics and Graphics Guide ejeg 6urzueuuins Chapter a Looking at Distributions Histograms
108. nal The window similar to the one shown here prompts for a filename and appends jrn to the filename to identify the file type Leaving a journal file open causes each subsequent use of the Journal command to append results in the active window at the end of the journal contents D Name the journal Regression Results Y On Windows change the Save as type to RTF Files RTF and click Save Save Journal As Save ir C3 Sample Data wp m hy Recent Documents Desktop 98 hy Computer hy Network Places C Design Experiment 5 Loss Function Templates Nonlinear Examples Nonlinear Templates C Quality Control gt Reliability Cj Templates File name Save az type Journal Files JRM wt On Linux change the Save as type option to rtf Rich Text File and click Finished On the Macintosh select Export from the File menu instead of Save As Select RTF and click Next Click Export D Navigate to the files directory on your system and open the file The file should open in your default word processor as shown here Note that the graphics are saved as graphics and the reports are saved as text tables ad Z uoissoJ4bDo Regression and Curve Fitting Chapter 7 Fitting Models to Continuous Data ul Regression Results RTF Microsoft Word E File Edit View Insert Format Tools Table Window Help Adobe PDF Acrobat Comments eke el Fit Mean Linear Fit
109. nal Views cusa cesses e heh bose doe aaa REC OR e III Two Dimensional iS corra dad RE P RS Rad ARIA d ERES II2 Three Dimensional Views eee RR RR RI RR Rh ha aha II3 Principal Components dnd Diplo odo ducem rioo see cease Sus 969 d deb da ani d pd 114 A AAA 16 E c Los oko ng 117 Chapter 9 Exploring Data 1m Solubility Data Solubility Data This lesson examines compounds for those with unusual solubility patterns in various solvents When you installed JMP a folder named Sample Data was also installed In that folder is a file named Solubility jmp Data from an experiment by Koehler Grigorus and Dunn 1988 are in the Solubility jmp file Y Open Solubility jmp There are 72 compounds tested with six solvents in columns called 1 Octanol Ether Chloroform Benzene Carbon Tetrachloride and Hexane The Labels column in the table should serve as a label variable Figure 9 1 so when you plot them the compound names instead of row numbers identify points Although this is already done for you in Solubility jmp you should know how to assign the label role to columns 8 Select the columns 6 Select Cols gt Label Unlabel or click the red triangle in the columns panel and ensure Labels is highlighted Figure 9 1 Solubility Data Table Solubility Carbon Motes Chemical compounds Labels 1 Gctanol Ether Chloroform Benzene Tetrachloride Hexane Distribution 1 METHANOL 0 770 1 150 1 260 1 890 2400
110. names are registered trademarks or trademarks of SAS Institute Inc in the USA and other countries indicates USA registration Other brand and product names are registered trademarks or trademarks of their respective companies Contents JMP Introductory Guide Introducing JMP TOD FUSE MOON antag das wraqevestoes dtr rcd 994 9d Post a a ES EE dap S bur be I What You Need to Know usce ds cas ERR E ROI E CREAR BOE IE A acd ded ia b e d 3 Leanne bau IMI os cence heehee verte tu eee 9t eee Ss Hees Hates pura pot dos 3 MA M 3 Drache d She LET oo au seeder esee eere gees RESP Eq PSP PSU SS dt 3 L arnme About Statstical and JSL Eris iioii i3 RED HEIRAT EN A S ERUHERE dee 4 Using the Context Sensitive Help iu2u2225u2 med rte asteroide bbcode ads 4 eatin IMP Lips de WICKS no io ney ins yeso Gece orseie shes rrpp arras 6 Using This Book in Combination with Other Included Books oooooooooooooo o 6 Conventions Used inthis BOO escitas au au E aeq Nord elis dido dora deg s 6 vp c I eee eee ae a eee eee eee ee 7 si O a IMP Daa a A 7 Step J Learn About ine Data Table odiarte nee ne RR n9 QURE en dE acad PROS En pies 8 Specifying th Values IPS uode oA pu Ese RAERERACISITQTEESE TIGE PITE E ERE Edd 9 Data Table Cursor OB silos we drew da cr Suede Eon oO a ado Ra dm deed 9 Salsa Rosana COMES sosorparr acra ded uc d rs d Rd A itd dd e ace d de oae I reps le ah i cc a5 qeu bove do
111. nels To edit text in the data grid 1 Click the cell that you want to edit The cell highlights 2 Click again next to any character to mark an insertion point 3 The I beam deposits a vertical blinking bar 4 Use the keyboard to make changes To edit a column name 1 Click the column name to highlight the column 2 Press the Enter key to change the I beam cursor to an insertion point 3 Use the keyboard to make changes qn Large Cross Cursor The cursor becomes a large cross when moved into a column or row selection area When moved over a column name you can edit the name To do so click the column name and begin typing The cross cursor can also be used to select rows and columns To select a column click the area above the column name See the next section Selecting Rows and Columns p 11 for a detailed explanation of selecting rows and columns Double Arrow Cursor The cursor changes to a double arrow cursor when positioned on a column boundary or on a panel splitter Dragging the double arrow cursor changes the column width or the panel size Chapter 1 Introducing JMP II Step 3 Learn About the Data Table Figure 1 11 Changing the width of a column A Click and drag to change the width of a column A Typing Data q Motes Three brands af typ ox brand speed T neway 1 REGAL md FO 2 SPEED T YPE ar Columns 240 3 SPEEDY TYPE 79 il brand 4 REGAL 73 l speed 5 SPEED YTYPE TT i Hand C
112. ng Data Figure 2 6 Axis Specification Window scale Linear Format Fixed Dec w width 12 Dec o Maximum 200 Increment 5 Minor Ticks a Tick marks and Grid Lines Tickmark Gridline Major El Minor L Tick Label Orientation Horizontal v Reference Lines color MEN Line Style In this example the plotted values range from about 145 to 175 Q Enter these figures into the Axis Specification window for Minimum and Maximum 5 Change the increment for the tick marks from 50 to 1 by entering a 1 in the Increment box 8 Click OK Tip The magnifier tool 9 found in the Tools menu and the cursor toolbar can also be used to change the scale of graphs Drag the magnifier diagonally across the points of interest to see the chart automatically adjust Double click the plot frame to reset the plot to its original scale 9 Click the edge of the graph and drag it to the right to increase its width 9 Change the name of the axis from Y to Blood Pressure O Place cursor over Analysis Report until cursor becomes an I bar 8 Click for a text box and enter Blood Pressure Document the Report The chart also needs a title and other documentation to make it easy to interpret The annotate tool IB places text on the report Refer to Figure 2 7 to see where to place the following steps O Select the annotate tool 8 Click and drag in the report to create a text box O
113. o get a feel for these data use the Distribution command 8 Select Analyze gt Distribution to see the window in Figure 3 2 8 Select all the variables except Product Name and click the Y Columns button 8 To select more than one item highlight the first item hold down the Shift key and press the down arrow button until all desired items are selected Y Click OK Figure 3 2 Distribution Command Report Distribution The distribution af values in each column Cast Selected Columns inta Roles Select Columns il Product Name FW lb Protein Sodium F FroteinFat Ih Taste dl Hoz al Fb Protein i Ate oP PU Ree Weight optional Numeric ARAB A Freq optional Numeric Examine the resulting report to see the distributions and levels of each variable ejeg 6urzueuuins 34 Summarizing Data Chapter 3 Grouping Data Grouping Data Of course health is a primary concern of a school cafeteria It is interesting to see whether the type of hot dog plays a role in healthfulness In particular e Which type of hot dog has the fewest calories sthe amount of sodium different in the three types of hot dogs e Which hot dogs have the highest protein content e Which hot dogs taste good and are healthy gt a summary To address these issues the data need to be grouped into hot dog type and taste Bis Subset preference categories with summary statistics computed for each group The EB sort Summary command
114. o highlight it and then select Rows gt Label Unlabel The red bracket along the edge of the box identifies the shortest half which is the most dense 50 of the observations Figure 4 4 Outlier Box Plot Interquartile range 25th percentile 75th percentile possible outliers shortest half Using Quantile Box Plots In histograms whose variables are continuous you display a quantile box plot by clicking the triangle icon in the variables title bar and selecting Quantile Box Plot A quantile box plot shows the location of preselected percentiles sometimes called quantiles on the response axis The median shows as a line in the body of the box The ends of the box locate the 25th and 75th quantiles The number of other quantile lines depends on the available space The accompanying text report lists the data values for each of the standard quantiles The box also contains a means diamond The two diamond points within the box identify the 95 confidence interval of the mean The line that passes through the two diamond points spanning the box identifies the sample mean Chapter 4 Looking at Distributions SI Learning About Report Tables Looking at the quantile box plot and means rectangle together helps see if data are distributed normally as shown in Figure 4 5 If data are distributed normally bell shaped then the 50th quantile and the mean are the same and other quantiles show symmetrically above and below them Figure 4
115. of numeric data or summary statistics for values of X columns select Calumns Cast Selected Columns into Fales Action Click to il Product Name isti Meanrcamories choose a il Type Mean Sodium R statistic WMeaniPFroteiniF at dl proz optional dl Eb Protein all Calories 4m ik Taste E rateirnir a 5 o ns Remove Options Additional Roles Overlay E Overlay m Click to add checkmark Horizontal v highlighted Bar Chat v columns as Change to AE the x variable Connect Points Horizontal ee ae Percent for quantiles gt Chart It appears that poultry hot dogs have fewer calories on average than the other two hot dog types Also note that the poultry hot Poultry dogs have slightly more sodium The most visible difference is that the protein to fat ratio appears much higher in poultry hot dogs Meat Repeat the above steps using taste to produce another set of bar charts ad Select Graph gt Chart 0 50 100 150 Select Calories Sodium and Protein Fat in the Select Mean Calories Columns list Au 8Click Statistics gt Mean Select Taste from the select columns list Ses Click Categories X Levels GRemove overlay checkmark Beef GChange to horizontal 0 100 200 300 400 500 LL Mean Sodium It might not be surprising to see from the bar charts that hot dogs rated as bland tasting have on average more calories more Poultry sodium and a lower protein to fat ratio It can be seen
116. omparisons p 66 discusses the multiple comparison tests the comparison circles represent Quantify Results Now examine the report beneath the plot that consists of several tables The Summary of Fit table shown in Figure 5 8 summarizes the typing data distribution with these statistics Rsquare R quantifies the proportion of total variation in the typing scores resulting from different keyboards rather than from different people Rsquare Adj adjusts R to make it more comparable over models with different numbers of parameters Root Mean Square Error RMSE is a measure of the variation in the typing scores that can be attributed to different people rather than to different machines Mean of Response is the mean average of all the typing scores Observations is the total number of scores recorded Figure 5 8 Summary of Fit Report for speed By brand Oneway Analysis of speed By brand Y Oneway Anova Summary of Fit R quare 067446 Adj Requare 0 627954 Root Mean Square Error 4 27033 Mean of Response 247059 Observations or Sumate 17 Analysis of Variance When you select the Means Anova command from the red triangle icon in the title bar JMP gives you a standard analysis of variance table If there are only two group levels the report also includes a 7 test table Note that the value of the F probability Prob gt F for the Analysis of Variance is 0 0004 This implies that differences as great as seen in
117. on of each ordinal or nominal level within the sample It has a section for each level of the variable where the size of the section is proportional to the corresponding groups size Think of a mosaic plot as a bar chart with its bars stacked end to end Understanding Histograms of Continuous Values If your variables are continuous the histogram looks slightly different than if the variables are nominal or ordinal To see the difference look at the distribution of the continuous variables height and weight in the sample of students B Click the Big Class jmp data table to make it the active window O Again choose the Distribution command from the Analyze menu O Designate the height and weight columns as Y Columns variables Click OK Histograms are displayed for height and weight as shown in Figure 4 3 Figure 4 3 Histograms of height and weight Distributions height weight Both height and weight appear to have approximately normal bell shaped distributions but notice the extremely high weight value It will be examined more closely later It is important to present data in the best possible form Sometimes it is worthwhile to experiment with the shape of a histogram by changing the number of bars or altering their arrangement on the axis To adjust the histogram bars O Select the hand from the graph cursor toolbar O Position the hand on the bars and press the mouse button to grab t
118. opcorn E lain a a le batch 5 gourmet plain large small The Least Squares Means table for the popcorn batch effect gt popcorn batch tells the whole story Batch size makes no difference for the plain Leverage Plai brand popcorn but popping in small batches increases the yield gt Least Squares Means Table in the new gourmet brand Level Least Sq Mean Std Error gourmet large 6 950000 073993243 Because the factorial model with two factors is a good gourmet emal 15500000 073993243 A pod plain large 8 050000 073893243 prediction model save the prediction formula A meno BAM EE 8 Click the red triangle icon in the Response yield title bar and select Save Columns Prediction Formula as shown in Figure 8 8 Figure 8 8 Prediction Formula for Popcorn Yield Baennncea wialsl Regression Reports k Estimates k Effect Screening ud Plot Factor Profiling j Row Diagnostics k e i Save Columns k 50 75 1t vield RSq Summary of Fit Analysis of Varia k ri m aun Predicted Values Residuals Mean Confidence Interval Indiv Confidence Interval Studentized Residuals Hats Std Error of Predicted Std Error of Residual Std Error of Individual Effect Leverage Pairs Cook s D Influence StdErr Pred Formula 10 75 gourmet gt 1 475 1 475 else gt Match popcom glain arge 1 75 Match batch small 1 75 else gt arge 1 525 gourmet Match batc
119. or Context Sensitive Help Big Class 4 Ml Distribution na Bivariate 2 1 KATIE Oneway 2 LOUISE Dom ma e n some reports make a small circle with your cursor to reveal information about the item in the area Figure 1 4 Making a Circle with the Cursor Displays Help Quantiles Frequencies Sa Ves Guantiles the value such that a given 00 37 17 33 percentage of the data points in the 00 30 0 133 40 sample are less than that value o0 750 quartile 115 75 ren 50 0 median 105 00 16 3 007500 25 0 quartile 91 25 17 3 0 07500 10 096 78 00 Tatal 40 1 00000 2 596 54 08 M Missing 0 5 64 00 E Leyde 0 0 Mini 64 00 e n some menus hold the cursor on menu items to reveal information about the menu item Figure 1 5 Display a Description of Menu Items Fo Onowaw Analucte af haiqht By SEX Quantiles v Means Anowa Pooled t Means and Std Dey t Test Compare Means k 5 Monparametric METETE Tests if the variances are the same and Equivalence Test Missi cds COME Power Set a Level L diNr 6uironpoanul 6 Introducing JMP Chapter 1 Conventions Used in this Book Learning JMP Tips amp Tricks When you first start JMP you see the Tip of the Day window This window provides tips about using JMP that you might not know To turn off the Tip of the Day clear the Show tips at startup check box To view it again select Help View on the Macintosh Tip of the Day Also use the
120. ormal distribution with the means standard deviations and correlation estimated from the data The concept of distance that takes into account the multivariate normal density contours is called Mahalanobis distance Though only three dimensions can be visualized at a time the 79 750 95 504 97 835 93 553 399 638 100 000 0 31660 0 20890 0 43061 0 49849 0 45965 0 45926 0 01874 0 11456 0 18793 0 68565 0 64968 0 23426 Mahalanobis distance can be calculated for any number of dimensions To produce a plot of the Mahalanobis distance Select Outlier Analysis gt Mahalanobis Distances from the menu accessed by the red triangle at the top of the multivariate report Figure 9 7 shows the Mahalanobis distance by the row number for each data point To label these points O Select the brush tool dr from the tools palette 79 While holding down the Shift key drag the brush over the points labeled in Figure 9 7 These are the five points with the greatest Mahalanobis distances 8 Select Rows gt Label Unlabel Chapter 9 Exploring Data 17 Chapter Summary Figure 9 7 Mahalanobis Distance Plot to See Multivariate Outliers Outlier Analysis Mahalanobis Distances B pi LF ATHIAZOLE LH DROGLIINOME LP HDROXYBENZALDEH YDE EE GQUINOLINOL OCAFFEIME Distance Qu 0 10 20 30 40 50 60 FO B Rowe Mumber Chapter Summary In this example commands from the Analyze and Graph menus
121. oth x and y have nominal or ordinal values e Analysis of variance when x is nominal or ordinal and y has continuous values as in the example shown here Logistic regression when x is continuous and y has nominal or ordinal values Regression analysis when both x and y have continuous values sSueoj wv dnom sg 60 Comparing Group Means Chapter 5 Graphical Display of Grouped Data Choose Variable Roles To discover if typing speed is related to dependent on a brand of keyboard follow these steps 8 Choose Analyze gt Fit Y by X 8 Select brand as X Factor and speed as Y Response See Figure 5 1 8 Click OK The plot shown in Figure 5 2 appears Figure 5 1 The Fit Y by X Launch Window Report Fit Y by X Contextual Distribution of Y far each X Modeling types determine analysis Select Columns Cast Selected Columns into Roles d dispen all speed optional ih brand optional optional Remove optional Numeric Recall optional Numeric Help Selecting Fit Y by X and completing the window produces a statistical analysis appropriate for the variable roles x and y and the modeling type continuous and nominal or ordinal of each variable e Y Response identifies a response dependent variable e X Factor identifies a classification independent variable The next step is to choose an analysis that investigates if there is a statistical difference between the group mean values Show Points
122. oup optional The new columns of statistics are displayed in the Hot Dogs By Type table top table of Figure 3 6 D Repeat the previous steps to create a second summary table of Hot Dogs by Taste to look at health factors and hot dog tastiness The Hot Dogs By Taste summary table shows average calories sodium and protein to fat ratio for each taste category bottom table of Figure 3 6 Figure 3 6 Summary Statistics for Hot Dog Groups Hot Dogs By Types Source Type N Rows Mean Calories Mean Sodium MeantProtein F at Source 2 1 Beef 20 156 85 401 15 1 45 2 Meat 17 158 705882 415529412 1 41176471 3 Pout 17 115 764706 459 3 23529412 Columns 5 0 ed Hot Dogs By Taste Source Taste M Rows Mean Calories Mean Sodium MeantProtein Fat 1 Bland 10 172 7 455 5 1 4 2 Medium 39 139 435597 415 102564 2 2051 2821 Columns 510 3 Scrump 5 137 8 384 1 5 Charting Statistics from Grouped Data The summary tables in Figure 3 6 show the summary statistics in tabular form but bar charts are better for visual comparison The Chart command on the Graph menu can also summarize data and then create charts of the summarized data Make sure the Hot Dogs jmp table is active i Select Graph gt Chart O Assign variable roles as shown in Figure 3 7 Charts like those below should appear Chapter 3 Summarizing Data 37 Grouping Data Figure 3 7 Charting Data E chan aaa Chart
123. p Ed Automess jmp Ed Baby Sleep jmp Ed Baltic jmp Ed Baseball jmp Ed Basketball jmp Ed Bicycle jmp Ed Big Class imp En Birth Death Subset jmp 5 Design Experiment Loss Function Templates Nonlinear Examples Nonlinear Templates Cj Quality Control C Reliability Cj Templates Time Series C Variability Data Ed Abrasion jmp Ed AdverseR jmp Ed Alcohol imp Ed amplitude 21 imp File name Files af type Select this filter the next time this dialog is invoked Step 3 Learn About the Data Table Opening or creating a data table creates a data grid and table information panels like the ones shown in Figure 1 8 The counts of table rows and columns appear in the corresponding panels to the left of the data grid In the data grid a row number identifies each row and each column has a column name Rows and columns are sometimes called observations and variables in an analysis Figure 1 8 A Data Table i Typing Data Motes Three brands of typ brand T neway 1 REGAL 2 SPEEDY TYPE Columns 2 0 3 SPEEDY TYPE il brand 4 REGAL 5 SPEED TYPE B REGAL 7 WORD O MATIC F 8 REGAL i VIDRD O MATIC ou E SPEEDYTYPE Labelled EINE WORD O MATIC Lao Birth Death imi ue Bladder Cancei Ed Blenders jmp j Blood Pressure Ed Body Measurer 18 Boston Housine Ed BoxCox jmp Ed BpTime jmp Ed Candy Bars jm Ed Candy jmp Ed Car Physical D Ed Car Pall jmp Ed Cars jmp speed FO a
124. quares Mean Square F Ratio Model 3 121 02000 40 3400 16 9853 Error 12 25 50000 2 3750 Prob F C Total 15 143 52000 0 0001 The Analysis of Variance table shows these quantities Source identifies the sources of variation in the popcorn yield values Model Error and C Total DF records the degrees of freedom for each source of variation Sum of Squares SS for short quantifies the variation in yield C Total is the corrected total SS It is divided partitioned into the SS for Model and SS for Error The SS for Model is the variation in the yield explained by the analysis of variance model which hypothesizes that the model factors have a significant effect The SS for Error is the remaining or unexplained variation Mean Square is a sum of squares divided by its associated degrees of freedom DF F Ratio isthe model mean square divided by the error mean square Prob gt F is the probability of a greater F ratio occurring if the variation in popcorn yield resulted from chance alone rather than from the model effects In this example the p value Prob gt F is 0 0001 JMP indicates a significant p value by placing an asterisk beside it The low value of this p value implies that the difference found in the popcorn yield produced by this experiment is expected only 1 time in 10 000 similar trials if the model factors do not affect the popcorn yield Chapter 8 A Factorial Analysis 105 Analysis of Variance Summary Reports For
125. r 79 73 TT 72 B2 71 un a0 72 66 The JMP data table window is a flexible way to prepare data Using it you can accomplish a variety of table management tasks such as Editing the value in any cell Chapter 1 Introducing JMP 9 Step 3 Learn About the Data Table Changing a columns width by dragging the column line Hiding columns temporarily or deleting columns permanently Adding rows or rearranging the order of rows Adding columns or rearranging the order of columns e Selecting a subset of rows for analysis and saving that subset for further use e Sorting or combining tables For details see the MP User Guide Specifying the Values Type The small icon to the left of the column name in the columns panel is an icon that can be clicked Use it to declare the modeling type of the values in the column JMP uses three modeling types to determine how to analyze the columns values Continuous 4 Values are numeric measurements Ordinal 4 Values are ordered categories which can have either numeric or character values Nominal 1h Values are numeric or character classifications Modeling types are changeable depending on how you want to look at your data For example a variable like age should be specified continuous to find the mean average age but nominal or ordinal to find frequency counts for each age value The default modeling type is nominal for character values and continuous for numeric values
126. rence in proportions The Chi squared values of 0 0005 mean that proportions as varied as these are expected to occur only five times in 1 000 similar surveys Chapter Summary This chapter looked at relationships between categorical variables obtained from a survey The survey recorded age sex marital status and information about the type of automobile owned by a random sample of people in the same geographical area The auto information included manufacturing country size and type of car Car types were classified as work sporty and family The question Is the size of car type of car or manufacturing country related to the age gender or marital status of the owner was investigated The Fit Y by X command produced nine mosaic charts with supporting statistical summaries that show No relationship between either sex or age and manufacturing country Asignificant relationship between marital status and manufacturing country with married people more likely to own American cars than single people Chapter 6 Analyzing Categorical Data 79 Chapter Summary No relationship between sex age or marital status and size of car No relationship between sex and type of car e Significant relationships between marital status and type of car As might be expected married people over 30 years old were more likely to own family type cars than younger single people The chapter Contingency Tables Analysis in the MP Statist
127. rent Ann EBC O LEYE Comparison circles for the three word processor Normal Quantile Plot gt groups are shown in Figure 5 7 CDE Plat The center of each circle is aligned with the mean Matching Column SOMITE IRE E of the group it represents If you select Student s bs d y Display Options en t test instead the diameter of each circle spans icis crip the 95 confidence interval for each group Whenever two circles intersect the confidence intervals of the two means overlap suggesting that the means might not be significantly different Whenever two circles do not intersect the group means they represent are significantly different Y Click the SPEEDYTYPE comparison circle This graphically illustrates that the SPEEDYTYPE machine is statistically better than the other machines The comparison circles highlight to show the statistical magnitude of the difference between typing scores Circles for groups that are statistically the same have the same color Figure 5 7 Comparison Circles Oneway Analysis of speed By brand REGAL SPEEDYTYPE WORD O Marz c All Pairs Tukey Kramer brand 0 05 The comparison circle for the SPEEDYTYPE brand does not intersect with either of the other two The REGAL and WORD O MATIC brands are statistically slower than SPEEDYTYPE but do not appear Chapter 5 Comparing Group Means 65 Quantify Results different from each other A later section Mean Estimates and Statistical C
128. response y variables sex marital status and age group are independent x variables This example shows how to complete the Fit Y by X window Figure 6 3 D Select the three y variables country size and type 75 Click the Y Response button O Assign the x variables sex marital status and age group by selecting them and clicking the X Factor button 3 Click OK when finished Figure 6 3 The Fit Y by X Window Report Fit Y by X Contextual Distribution af Y for each X Modeling types determine analysis Select Columns Cast Selected Columns into Roles Ih marital status ll size all age latype Ih country ill size ETT E th sex um il marital status Loue group il aae group optiona Block optional Weight options Murneric advent Gong istic Contingency Contingency Table Mosaic Plots If both x and y have either nominal or ordinal values JMP displays a mosaic plot with accompanying text reports for each combination of columns assigned x and y modeling roles ejeg jeou0bs1e D 9 76 Analyzing Categorical Data Chapter 6 Contingency Table Reports A mosaic chart has side by side divided bars for each level of its x variable The bars are divided into segments proportional to each discrete level value of the y variable The mosaic chart in Figure 6 4 shows the relationship of marital status to the manufacturing country The width of each bar is proportional to the sample size When the lin
129. rge Medium Small Count Large Medium Small Count Large Medium Small Total Total Total 96 Col Col 9 6 Col 96 Row 96 Row 96 Raw 96 Female iy 63 58 138 Married 30 84 82 196 0 26 61 57 144 5 61 20 79 19 14 45 54 990 27 72 27 06 64 69 8 58 20 13 18 81 47 52 40 48 50 81 42 34 E 71 43 67 74 69 85 2 61 90 4919 41 61 x 12 32 45 65 42 03 v 1531 42 86 41 94 j 2 18 06 42 36 39 58 Male 25 61 79 165 Single 12 40 55 107 g1 16 63 80 158 8 25 2013 26 07 54 46 E 3 96 13 20 18 15 35 31 m 5 28 20 79 26 40 52 48 59 52 4919 57 66 20 571 3226 4015 38 10 50 81 58 39 1515 36 97 47 88 11 21 37 38 51 40 10 06 3962 50 31 42 124 137 303 42 124 137 303 42 124 137 303 1386 4092 45 21 13 86 40 92 45 21 13 86 4092 45 21 v Tests v Tests v Tests N DF LogLike RSquare U N DF LogLike RSquare U N DF LogLike RSquare U 303 2 1 1939943 0 0039 303 2 1 3763506 0 0045 303 2 27855079 0 009 Test ChiSquare Prob gt ChiSq Test ChiSquare Prob ChiSq Test ChiSquare Prob gt ChiSq Likelihood Ratio 2 388 0 3030 Likelihood Ratio 2 753 0 2525 Likelihood Ratio 5 573 0 0616 Pearson 2 388 0 3030 Pearson 2 743 0 2537 Pearson 5 546 0 0625 Y Contingency Table size Y Contingency Table size The market survey categorizes cars based on both size and type where a cars type is work sporty or family 8 Scroll to see the plots that show the relationship between type of car and the three x var
130. rt and Abbrev Date that you would use when writing JSL Highlight an operator name to see a description of the operator appear in the window on the right Click Topic Help to see more information in the online Help Object Scripting Index Presents a list of JSL objects These are scriptable JSL building blocks Highlight an object name and messages the object recognizes appear in the window on the right e DisplayBox Scripting Index Presents a list of the elements that make up a JMP report These elements are the JSL building blocks with which you build output Highlight a Display Box and available messages for each object appear in the window on the right Using the Context Sensitive Help To use the online Help system select one of the following methods Chapter 1 Introducing JMP Learning About JMP e Select Help from analysis construction windows as shown in Figure 1 2 and report windows Figure 1 2 Help Is Available The distribution af values in each column Select Columns Cast Selected Columns into Roles Action il name required i age optiona b sex j dl height dl weight AT optional Numeric Annal AD Ea Fi Freg DALDRA Ae Re satin By Or m Recall e Select the help tool 1 from the Tools menu and click a place in a data table or report on which 5 you need assistance Figure 1 3 Context sensitive help tells about the items in the area you clicked Figure 1 3 Use the Help Tool f
131. rug developed to lower blood pressure Data were recorded over a six month period for the following treatment groups e 300 mg dose 450 mg dose e placebo control Figure 2 1 shows the mean monthly blood pressure for each group recorded in a journal This lesson shows how to enter data values into the data table and to create a single neat and informative line plot that shows the study results Objectives Create rows and columns in a data table one at a time and in groups e Enter data into JMP e Create a chart using the Chart command e Rescale axes in a plot e Animate a plot Figure 2 1 Blood Pressure Study Blood Pressure Study Month Control Placebo 300mg 450mg March 165 163 166 168 April 162 159 165 163 May 164 158 161 153 p July 166 158 Month Control Placebo 300mg 450mg August 163 158 1 ise 155 165 155 158 2 April 152 159 155 165 3 May 164 158 161 153 4 June 152 161 158 151 5 July 155 158 160 145 6 August 165 158 157 150 Contents vcr NP gc ERN ENTM E Cm 23 Creating Rows and Columns ina JMP Data Table uua ecrire Rb OR o 23 AO CODOS pee sees cae deena RECTA EUITUSNT EAS UE EON LASRIdP US ede EE taa 23 ber Column Characteristics e Reo as 24 D 0 I P rr 25 sob AA ge ey EEEE EET ote coe e ees ae esaee bake 25 ooe TAE che FMT 26 De inne DOR se eda mara tere tos baste did tha det s aai qur Fae 29 io A 0399
132. s There are a variety of analyses available through the Analyze and Graph menus in the main menu An alternate way to access these analyses is through toolbar buttons and selections in the JMP Starter window Selecting an analysis in the Analyze or Graph menus produces graphs charts plots and tables For example to see a histogram of columns in the data table you have open select Analyze Distribution T hen complete the window and click OK Casting Columns Into Roles After you select an analysis from the main menu a window appears that asks you to cast columns into roles For example if you select Analyze Fit Y by X from the main menu you see the window in Figure 1 14 Figure 1 14 Fit Y by X Window Distribution of Y for each X Modeling types determine analysis Select Columns Cast Selected columns into Roles Action 1 Mhrand Y Response required ad speed optional nao Cancel w Factor required optional Black optional Remove Weight gbona Numeric Recall Al 7 Freq optional Numeric Help Logistic Contingency optional The JMP analysis methods are like stages or platforms for variables to dramatize their values Each analysis requires information about which variables play what roles in an analysis The most typical variable roles are Chapter 1 Introducing JMP I3 Step 5 View the Output Report e Y Response Identifies a column as a response or dependent variable whose distribution is to be stud
133. s assigns the value Babies to each child less than 12 months old and Toddlers to children who are 12 months or older Figure 7 5 Computed Age Grouping Variable Growth group Motes Eubanks 1988 Splin age s ES Bivariate 1 0 46 0 5 Babies Table Columns Y E ASES 2 0 47 1 5 Babies ratio Row IS Columns 3 1 3 0 56 25 Babies Numeric in 4 0 51 3 5 Babies group Bz esla Apply age 5 Def 45 Babi tigonometric A m m e is Character abies a Comparison R 2wa 7 0 68 6 5 Babies es rows 8 0 78 7 5 Babies ili Selected n ioco E EH o 3 0 69 8 5 Babies Statistical Hidden a 10 0 74 3 5 Babies al Labelled a 11 or 10 5 Babies age lt 12 gt Babies else gt Toddlers O Click the Bivariate report to make it the active window ad Z uoissoibsS 94 Regression and Curve Fitting Chapter 7 Chapter Summary Y Clear the Smoothing Spline fits still showing such as those seen in Figure 7 4 using each fit s Remove Fit command Click the red triangle icon for all three smoothing spline fits and select Remowve Fit for each one Q Click the red triangle icon and select Group By to display the window shown here JMP Select a Grouping Column O Select group the newly created grouping variable and click OK D Choose the Fit Line command With a grouping variable group in effect the overlaid regression lines shown in Figure 7 6 appear automatically The poin
134. see the list of means for each group look at the Means for Oneway Anova table This table summarizes the scores for each brand and reveals what level of performance to expect The Means for Oneway Anova table shows the following information e Level lists the name of each group e Number isthe number of scores in each group e Mean isthe mean of each group e Std Error is the standard error of each group mean e Lower95 isthe lower 95 confidence interval for the group means e Upper 95 is the upper 95 confidence interval for the group means Oneway Anova Summary of Fit Analysis of Variance Means for Oneway Anova Level Mumber Mean Std Error Lower 3575 Upper 3575 REGAL 8 70 2500 1 5098 67 012 73 456 SPEEDY TYPE 5 80 8000 1 8087 6 704 64 596 VWORD O MATIC 4 66 5000 2 1352 61 921 71 078 Std Error uses a pooled estimate of error variance When you select the Compare Means command from the red triangle icon in the title bar JMP gives several multiple comparison options to statistically compare pairs of groups This example uses the All Pairs Tukey HSD option which performs a statistical means comparison for the three pairs of means using the Tukey Kramer HSD honestly significant difference test Tukey 1953 Kramer 1956 This means comparison method compares the actual difference between group means with the difference that would be significantly different The difference needed for statistical significance is called the LSD lea
135. sis roms adbu y qd pire gene qe VE Pg sc 12 Caste Columns io ROlES 22 2400 ode 60 Godd E ob retarda ea I2 Step 3 View the Output Report us ucqorie dod I3 URCIHEACE ACER td AUS a eR AER nona 13 Apis CDE 2122 4959929 00 ia Vache ws dura teehee ee poa 13 Stanica Tables a a Text ear eR ded il d ca ca ER i AM eee ees 14 Seen gt Cave te JMP Output Report 2 5 2 m Rad pae tinin kirone de IS ace Mo en o A ee ee ere ee ere ee re ee 16 Opena Danm A ee ee ee ee ee 16 deer TMCIDL 17 wu nd bna MM s 18 Creating a JMP Data Table Entering and Plotting Data o ooooooooo ooo oo 21 esca PA SOM Pr rTEDTPTTTTUTITTUUT 23 Creatas Rows and Colums ma MP Datt Table ssunsseed3uivbes i decires 23 D GOES Ao TETTE UTE DROIT 23 Ser Cola Characteristics circa d a edd e RC ACCORD p de 24 A A nm 25 coco gb d v RTUUTTITMPM 25 O A A eenseen ese Ath ese eter eeeeee tas 26 Bred ctad a cnc as 29 A AS 30 Summarizing Data Look Closely at the Data sese IRI RII 31 Look BS IDU ascuas an idos podi asadas aa 33 Ed td CAPARO ew et eat can hee eee oe Seabees ae ea ene 34 Crean Sissies TOP GIONS EUST 35 hatte Statistics tom Grouped Data snoot vied oros 36 Charing Stanstics dor IWO Groups iuda dee dps gus arp E ced dedos arci do pad 38 Finding a Subgroup with Multiple
136. st significant difference The graphical results show as the comparison circles previously seen in Figure 5 7 The circles centers represent the actual difference in the group means The corresponding report is the Means Comparisons table Figure 5 9 which shows the actual absolute difference between each mean and the LSD The top half of the report gives information based on a Students comparison of each pair The Chapter 5 bottom half shows the results of the Tukey Kramer multiple comparison tests Pairs with a positive Comparing Group Means Chapter Summary 67 value are significantly different The Means Comparison table confirms the visual results in Figure 5 7 Figure 5 9 Means Comparisons Table for Tukey Kramer HSD Means Comparisons wc Comparisons for all pairs using Tukey Kramer HSD qt Alpha 2 61725 0 05 AbsrDifj L SD SPEED TYPE REGAL VWORD O M4 TI SPEED TYPE OS 41785323 REGAL 41785328 5 59033 VyORD O MATIC 6002475 3 09427 Positive values show pairs of means that are significantly different Level Mean SPEEDYTYPE A eu 600000 REGAL B 0 250000 WORD O MATIC B 66500000 Levels not connected by same letter are significantly different Difference Std Err Of Lower CL 2 064624 2 434452 2 615032 Level Level SPEEDY TYPE VWORD 2 MATIC 14 30000 SPEED YTYPE REGAL 10 55000 REGAL WORD O WA TIC 3 75000 Chapter Summary 6 602475 3 8427 F S308 Upper CL 21 79753 16 92167 10 59427
137. textual Menus BH Y Tip 1 of 34 Show tips at startup i b To start the online tutorial click Enter Beginner s Tutorial Or click the Close button to close the window and follow the tutorials in this book Step 2 Open a JMP Data Table There are several ways to open a data table e Go to your Sample Data directory in its default location For example In Windows C Program Files SAS JMP 8 Support Files English Sample Data In Linux Opt SAS JMP8 Support Files English Sample Data In Macintosh Library Application Support JMP 8 English Sample Data e Selecting File gt New or clicking the New Data Table button on the JMP Starter window creates and displays a data table with an empty data grid First add rows and columns then type in or paste in new data For details see the MP User Guide e Selecting File gt Open or clicking the Open Data Table button on the JMP Starter window presents a file selection window Figure 1 7 with a list of existing tables Select a file and click Open For details see che MP User Guide L dwir 6urionpoanul 8 Introducing JMP Step 3 Learn About the Data Table Figure 1 7 The Open Data File Window Open Data File Chapter 1 Look in 3 Sample Data p d EP gt hy Recent Documents La Desktop My Documents My Computer hy Network Lee Amplitude 100 imp Ed Analgesics jmp Ed Animals jmp Ed Anscomb jmp Ed AttributeaGage jm
138. the number of responses in categories and subcategories Counting is easy but interpreting the relationship between categories based on counts is more complex It requires computing probabilities and evaluating the likelihood of these probabilities compared to expectations For example an American automobile manufacturer feeling the pinch of competition from foreign auto sales needs a market analysis before proceeding with a multimillion dollar advertising campaign A random sample of people is surveyed The auto manufacturer wants to know each participants age sex marital status and auto information The auto information consists of the manufacturing country the car s size and the cars type and whether it is a family work or sporty car This information might provide the advertising experts with direction for the upcoming advertising campaign Who buys what Objectives e Use the Fit Y by X command to compare two variables consisting of categorical data Use the formula editor to re code a categorical variable as a numeric variable Produce and examine graphs and statistics appropriate for the comparison of proportions such as Chi squared tests and mosaic plots Contents Lor BUS JOE A ved adeo eda dS E abge diede d qupd aer ee 71 So CEPR A MITPIT Dr 71 Address the Rescate meso erario 71 Modirihe Dira Table barreras oops rad ia end pant ees 71 Mo E ET 75 Cast Variables Into ROGS ad ado oe ete com wm wd 5
139. the remaining values To highlight these outliers and exclude them from the analysis 3 Select the lasso tool from Tools menu or toolbar 9 Drag the lasso around the points to be excluded O Select Rows gt Exclude Unexclude to exclude the selected points 79 Right click the selected points select Row Markers and select X to assign the X marker to the excluded points 8 Click the red triangle icon in the title bar and choose the Fit Line command again to see the results of excluding the low age points The scatterplot shown here has both regression lines The low age points still show on the plot but are not included in the second regression lines computation Y Bivariate Fit of ratio By age fitted line after excluding point fitted line for all points Fit Mean T Linear Fit T Linear Fit Journaling JMP Results After completing this part of the exploratory regression analysis 5 Choose Edit gt Journal The first time the Journal command is selected during a JMP session a journal window opens and is filled with the graphs and tables from the report window Chapter 7 Regression and Curve Fitting 89 Fitting Models to Continuous Data The open journal file contains all reports from the active report window Plots can be resized opened or closed as can outlines This allows for printing of certain parts of the report Y Choose Save As from the File menu to save the jour
140. the solubility data 8 Choose Analyze gt Distribution D Select the six solubility columns and click the Y Columns button 8 Click OK Their histograms resized and trimmed of other output are shown in Figure 9 2 8 Click any histogram bar That bar and all other representations of that data are highlighted in all related windows To see how outlying values are distributed in the other histograms 8 Shift click the outlying bars in each histogram This identifies the outlying rows in each single dimension 5 Use the Rows gt Markers palette to assign the X marker to these selected rows The markers appear in the data table and in subsequent plots Figure 9 2 One Dimensional Views 1 Octanol v Ether Y Chloroform v Benzene Y Carbon Tetrachloride Y Hexane To create a new data table that contains only the outlying rows 5 Use the Tables gt Subset command as shown here 9 Click OK to accept the default settings D Scroll through the new subset table to see the compound names of the one dimensional outliers Two Dimensional Views Y Return to Solubility jmp O Select Analyze gt Multivariate Methods gt Multivariate O Highlight all the continuous columns in the table and click the Y Columns button Y Click OK This displays a correlation matrix and a scatterplot matrix of all 30 two dimensional scatterplots Figure 9 3 The one dimensional outliers appear as
141. ther to define one direction redundantly This is cured by dropping one of the collinear regressors from the model In this case drop either X1or X2 from the model since both measure essentially the same thing Y lidi NIN OL uoissoibsS 1330 Multiple Regression Chapter 10 Chapter Summary Figure 10 12 Comparison of RunPulse and MaxPulse Effects Chapter Summary Multiple regression uses the same fitting principle as simple regression but accounting for significance is more subtle Each regressor opens a new dimension for fitting a hyperplane and its significance is tested by how much the fit suffers in its absence When regressors correlate to each other they are said to be collinear and they define directions where the fitting hyperplane is not well supported References Becker R A and Cleveland W S 1987 Brushing Scatterplots Technometrics 29 2 Belsley D A Kuh E and Welsch R E 1980 Regression Diagnostics New York John Wiley amp Sons Box G E P Hunter W G and Hunter J S 1978 Statistics for Experimenters New York John Wiley amp Sons Inc Daniel C and Wood E 1980 Fztting Equations to Data Revised Edition New York John Wiley amp Sons Inc Draper N and Smith H 1981 Applied Regression Analysis 2nd Edition New York John Wiley amp Sons Inc Eppright E S Fox H M Fryer B A Lamkin G H Vivian V M and Fuller E S 1972 Nutrition of Infants an
142. there are more opportunities to model the data well but the process is more complicated This chapter begins with an example of a two regressor fit that includes three dimensional graphics for visualization The example is then extended to include six regressors but unfortunately no seven dimensional graphics to go with it Objectives e Illustrate the concept of a fitting plane using graphical techniques Combine data tables using the Concatenate command e Explore a three dimensional version of a leverage plot Contents noB fies BU ecos ridad De ridge a bi ia edi E 121 iaa o ee ee III 123 Fit Planes to Test Elec A 124 Whole be o a ee E E E E E E 126 More and More o 5 127 Interpreting Leverage PIG Lose oer tetau vr SR S dads a pe siot ador bene dd 128 Soll nn eee hae ee eee Fae e a eae oe ae eae ae ores 129 Sg O e ee ee ee ee ee ee 130 Chapter 10 Aerobic Fitness Data Multiple Regression Aerobic Fitness Data I2I Aerobic fitness can be evaluated using a special test that measures the oxygen uptake of a person while running on a treadmill for a prescribed distance However it would be more economical to evaluate fitness with a formula that predicts oxygen uptake with simpler measurements To identify such an equation runtime and fitness measurements were taken for 31 participants who ran 1 5 miles The participants ages were also recorded Y When you installed JMD a folder named Sample Data was also installed
143. this typing trial are expected only four times in 10 000 similar trials if the keyboards did not really promote different typing performances The Analysis of Variance table has the following information Source lists the sources of variation brand Error and C Total DF is the degrees of freedom associated with the three sources of variation Sum of Squares SS for short identifies the sources of variation in the typing scores C Total is the corrected total SS It divides partitions into the SS attributable to brand and the SS for Error The brand SS is the variation in the typing scores explained by the analysis of variance model that hypothesizes the keyboards are different The Error SS is the remaining or unexplained variation sSueoj w dnom sg 66 Comparing Group Means Chapter 5 Quantify Results e Mean Square is the sum of squares divided by its associated degrees of freedom e FRatio is the model mean square divided by the error mean square e Prob F isthe probability of obtaining a greater F value if the mean typing scores for the keyboards differed only because different people were typing on them rather than because the keyboards promoted different scores in any way Oneway Anova Summary of Fit Analysis of Variance Source DF Sumof Squares Mean Square F Ratio Prob F brand 2 520 93529 264465 14 5027 0 0004 Error 14 255 30000 15 236 C Total 16 ad 23529 Mean Estimates and Statistical Comparisons To
144. ts that correspond to each regression give a dramatic visualization of the steep growth rate for babies during the first year of life compared to the more moderate growth rate of toddlers and small children age one to five years Figure 7 6 Regression Lines for Levels of a Grouping Variable Y Bivariate Fit of ratio By age 1 0 0 9 0 5 ratio Of 0 6 0 5 Chapter Summary To analyze some bivariate data the Fit Y by X command was used to examine a variety of regression model fits The task was to model and describe the growth pattern of subjects over a range of ages You measured growth using the ratio of weight to height and accomplished this task by e Fitting mean to use as a baseline comparison to other regression models and evaluate the fit using statistical text reports Chapter 7 Regression and Curve Fitting 95 Chapter Summary e Fitting a straight line as a first guess for a model Excluding outliers and again fitting a straight line to compare the R values given by the Summary of Fit tables for both lines e Fitting second and third degree polynomials to see whether they model the growth pattern more realistically Fitting smoothing splines with lambda values of 10 1 000 and 100 000 and comparing them with each other and with the linear fit Clicking the red triangle icon and selecting the grouping facility Group By to compare growth rates of babies under the age of one year with toddlers from age one to fiv
145. ttern is significantly better than the horizontal line that fits the sample mean to the data Understanding the Parameter Estimates Table In addition to producing a Line of Fit table and an Analysis of Variance table clicking the red triangle icon and selecting Fit Line produces a Parameter Estimates table Parameter Estimates Term Estimate Std Error tRatio Probe Intercept 0 6656231 0012176 54 67 0001 ade O 0052759 0 000293 18 01 lt 0001 Term lists the parameter terms in the regression model Estimate lists estimates of the coefficients in the regression line equation e Std Error lists estimates of the standard error of the parameters e tRatio isthe parameter estimate divided by its standard error e Prob t is the probability of a greater absolute t value occurring by chance alone if the parameter has no effect in the model The significant 7 ratio in the Analysis of Variance table tells the student that the regression line fits significantly better than the horizontal line at the mean the simple mean model However while the d Z uoissoJ4bDo 88 Regression and Curve Fitting Chapter 7 Fitting Models to Continuous Data regression line looks like a good fit for age groups above seven months it does not describe the data well for ages younger than seven months Excluding Points Because the low age points are the trouble spots for the linear fit remove them from the analysis and try fitting the model to
146. tudents whose height or weight might signal the need for special attention Displaying Distributions To summarize the data B Select the Distribution command from the Analyze menu 8 In the window that appears select the age and sex columns as Y Columns suonnqinsia y 48 Looking at Distributions Chapter 4 Displaying Distributions Report Distribution Sle The distribution of values in each column Select Columns Cast Selected Columns inta Roles Action il name d age az th sex m Cancel DATO dl weight weight optional Numeric Freq optional Numeric Recall 8 Click OK The frequencies table that appears shows that the class of 40 contains 18 girls and 22 boys Understanding Histograms of Nominal and Ordinal Variables After selecting Analyze Distribution and completing the window you see a window that displays histograms for analysis variables The histogram for ordinal or nominal variables like age and sex has a bar for each level value of the variable Figure 4 2 Histogram of the age and sex Variables Distributions Y age Y Sex 17 16 M 15 14 13 F Chapter 4 Looking at Distributions 49 Displaying Distributions Displaying Mosaic Plots In histograms of ordinal and nominal variables you can display a mosaic plot by clicking the red triangle icon in the variable s title bar and selecting Mosaic Plot A mosaic plot shown in the figure to the right visualizes the proporti
147. udents feel this is unpatriotic and are upset This lesson examines the hot dog as a menu item but not before looking into the multitude of brands available The data shows information about cost nutritional ingredients of concern and taste preference for 54 hot dog brands This information is sufficient to provide a summary of hot dog statistics and to identify the brands that are most nutritious e least costly e best tasting The taste cost and nutritional variables used in this chapter are an enhancement of data from Moore D S and McCabe G P 1989 Introduction to the Practice of Statistics and Consumer Reports 1986 The brand names were changed to fictional names and the taste preference labels correspond to a taste preference scale Objectives Find and mark subgroups of data e Produce scatterplots using the Fit Y by X command and use them as discovery tools abel individual points in plots e Produce and plot summary statistics P y Contents ee Bente TU Loss kowed eat cea aea edad Sd eden qid dps Aere v ras 33 E cub T o E E M 34 Suid d lcxt AN II 35 Charting Staristics Trom Grouped Dita racistas irritada rr penton 36 Charting Statistics tor Two Groups cores EEE ee Ra 048 dba ai eRe ee 38 Finding a Subgroup with Multiple Characteristics 2er creara tent a sasinda ees 39 Campriitivan CIO CTUM 39 What Has Been Discovered aerea dos o bU E bau sodas ibd e dra dU dere 41 Pinding the Wet Git METTE 4I
148. ursor The cursor changes to a hand when you move the mouse over a red triangle icon i or diamond shaped disclosure button 4 amp on Windows and jp on the Macintosh Click the red triangle to reveal the menu and select a menu icon Click the disclosure button to open or close a panel Figure 1 12 The hand cursor Typing Data brand speed REGAL FO SPEEDY TYPE ar SPEEDY TYPE 79 REGAL 73 SPEED YT YPE JY Motes Three brands of typ T 0neway Columns 2 0 il brand l speed Selecting Rows and Columns Select rows and columns in a JMP data grid by highlighting them as explained in Table 1 1 and shown in Figure 1 13 For additional details see the JMP User Guide Table 1 1 Ways to Select Rows and Columns Highlight a row Click the space that contains the row number Highlight a column Click the background area above the column name Or click the column name in the columns panels to the left of the data grid Extend a selection of Shift click the first and last rows or columns of the desired range rows or columns Make a discontiguous Ctrl click Command click on the Macintosh the desired selections selection L dwir 6urionpoanul 12 Introducing JMP Chapter 1 Step 4 Select an Analysis Figure 1 13 Select Rows and Columns A Selected rows B Selected column Tvpina Data Motes Three brands of typ c Oneway o0 PS Columns 27 Step 4 Select an Analysi
149. way to identify subjects that have extreme values 1s to highlight histogram bars for the highest and lowest values To highlight more than one bar press the Shift key and click the desired bars O Select the bars for the two lowest and one highest bar Figure 4 7 Figure 4 7 Histogram of Ratio with Bars Highlighted at Extreme Values Distributions ratio The highlighted bars in the histogram represent a ratio either greater than or equal to 2 25 or less than 1 5 The corresponding points automatically highlight in the data table and in all other reports generated from the Big Class data table Chapter 4 Creating Subsets Looking in the Big Class data table allows examination of the selected rows but scrolling through a large data grid can be tedious For the final report to the health researchers include a separate list containing only the highlighted students those with extreme values To do this use Tables menu Big Class Distribution Bivariate FONE way wi Logistic Contingency Fit Model wiSet Sex Value Labels sel Age Value Labels Columns 6 0 il name amp J all age id sex dll height l weight age 12 12 12 12 12 12 12 12 13 13 13 13 13 Looking at Distributions Adding a Computed Column ZEN height weight 59 B1 55 BE 52 B B1 51 50 B1 56 B5 B3 commands to create new data tables or modify existing tables 8 Select Tables gt Subset or click th
150. were used for data exploration to locate and identify unusual points The data were first examined in one dimension using the Distribution command and then in two dimensions using the Multivariate command to look for unusual points in histograms and scatterplots Next the Principal Components command was used to plot three columns at a time This technique was used to summarize six dimensions and to plot principal component rays The Principal Components table showed that the first three principal components accounted for more than 9796 of the total variation Finally the Outlier Analysis command in the Multivariate report produced the Mahalanobis outlier distance plot which summarizes the points in six dimensions The multivariate outliers were highlighted and labeled in this multi dimensional space See the chapter Correlations and Multivariate Techniques in the MP Statistics and Graphics Guide for documentation and examples of multivariate analyses Three Dimensional Scatterplots in the JMP Statistics and Graphics Guide documents the 3D plot e eq 6uiojdx3 6 Chapter 10 Multiple Regression Examining Multiple Explanations Multiple regression is the technique of fitting or predicting a response by a linear combination of several regressor variables The fitting principle is like simple linear regression but the space of the fit is in three or more dimensions making it more difficult to visualize With multiple regressors
151. wn list 3 Select one continuous variable in the list 8 Select the Shift and down arrow keys to select the other continuous variables 3 Select the four continuous variables in the Select Columns list 8 Click the Statistics button and select Data from the drop down list O Select Month from the Select Columns list 8 Click Categories 8 Click OK Chapter 2 Creating a JMP Data Table 27 Entering Data Figure 2 4 Creating the Bar Chart Chart of numeric data or summary statistics far values af X columns select Columns Cast Selected Columns into Roles Action all cantral Placebo APlacebo 300mg 300mg 450mg 4a a50mag optional Choose y axis niin a information Choose chart SE j th Month eae Overlay tvpe optional yp Vertical ow Emre gt Additional Roles ar Cha Show Points Connect Points Add Error Bars to Mean Percent for quantiles JMP displays an overlaid bar chart of the data Y Chart 200 n Wi control Wl Placebo Bomag TM X 450mg 0 March April May June July August Month Reorder the Values By default the Month values in this chart do not appear in a logical order To further enhance the report reorder the months so they are in chronological order rather than alphabetical order 7D In the data table highlight the Month column and select Cols Column Info 8 Select Value Ordering from the Column Properties drop down menu Value orderin
152. x D Enter 30 for the numeric comparison Y Double click the term denoted then clause 8 Enter 1 in double quotes because this column is a character variable 3 Double click the term denoted else clause D Enter 0 with double quotes The complete equation should look like the one shown here 75 Click Apply OK or the formula editor s close box to fill the new column with calculated values Tip Instead of using the buttons in the formula editor you can double click the outermost nesting box to create a single text entry box and enter if age lt 30 1 O Then press Enter or Return or click outside the text box and the formula appears in formatted form Chapter 6 Analyzing Categorical Data 7 Contingency Table Reports Contingency Table Reports The nominal age grouping variable shows the relationship of age to the other nominal variables using contingency tables To look at combinations of two variables Choose Analyze gt Fit Y by X JMP does the statistical analysis appropriate for a variables modeling types and role assignments Cast Variables Into Roles EP Distribution iv x Matched Pairs Fit Model Multivariate Methods lk Nonlinear Assign analysis roles to variables by choosing an analysis from the Analyze menu and making selections in the window that appears In this investigation the country size and type columns are dependent

Introductory Guide

Contents

Download Pdf Manuals

Related Search

Related Contents