Home
        SCOUT User`s Guide
         Contents
1.     Choices of Contour Ellipses   By pressing the  E   e  key  several contour ellipses can be drawn on the various scatter  plots available in Scout including scatter plots of raw data  scatter plots of PCs  and those of  discriminant scores  These contours can also be erased by pressing the  E   e  key  The  simultaneous contour is obtained using the probability statement  7  and the individual contour is  obtained using the statement  9  given below in Section 6 0  The five contour options are   Individual  This option simply draws the desired  classical or one of the three robust  contour  ellipse given by the statement  9  on a scatter plot by pressing the  E   e  key    Simultaneous  This option plots the desired  classical or one of the three robust  simultaneous  contour ellipse given by the statement  7  by pressing the  E   e  key   Indiv  amp  Simult  This option plots the desired  classical or one of the three robust  individual as  well as simultaneous contour ellipses given by the statements  7  and  9  on a scatter plot by  pressing the  E   e  key    Indiv   Class  This option plots the chosen robust  HUBER   PROP  or MVT  and the    Scout User s Guide 14 11    Chapter 14 Statistical Procedures    corresponding classical contour ellipses given by the statement  9  by pressing the  E   e  key   Simult   Class  This option plots the chosen robust  HUBER   PROP  or MVT  and the classical  simultaneous contour ellipses given by the statement  7  by pressing the  E
2.     Note  Typically small values of     such as 0 001 or 0 005  correspond to classical estimates   It is recommended to try a few different values of   on the same data set  Larger values of     0 15  0 2  etc   may be needed to unmask multiple outliers  especially in small data sets of large    Scout Toturial 11 24    Chapter 11 Tutorial Ill    dimensionality     Index Plot for Brownles   s Stack Loss    127 27 T    wh  p  u  c  m  v  in  E  a  n  v  N  n  a     i  T  c  T  o    as Haximum rLargezr HOF   11 835    a p x       asz Warning tIndividual HD     8 17          t t  5 1i    Observation Numbers       Figure 11 28  Index plot for STACKLSS DAT using Prop  influence     11 6 Generalized Distance    Select IRIS DAT from the Data subdirectory of the Scout directory  This is a fairly well   behaved four dimensional data set of size 50  Return to  Robust Method    Robust Analysis    and within the  Select Graph Type  menu  select  Q Q Plot  Generalized Dist     Set  Statistics  Options   as shown in Figure 11 26 with the exception of a right tail cutoff     of 0 05  using   Huber Influence  to detect outliers  Accept the new settings  and then generate the graph   Figure 11 29   Now exchange  Prop Influence  for  Huber Influence  and regenerate the graph   Figure 11 30   note the diferences     Scout Toturial 11 25    Chapter 11 Tutorial Ill    Robust Analysis    SS Haximum tLargest HD     15 83    a  a    SSH Warning Individual HD     g 95  ad  a    aa    wh  a  u  c  m  
3.     and minus     keys select all variables but Count  At this point  your  display should match Figure 12 1     Film Data Classical Nathod Rosust Nathod PCA Gcagnica Systan    4AAAAAAACAAAACAAAAGCAAAAGAAAAG AAA AE AA A AE AA AA AAA A AA A 5 nnd ELLA  SA RxERXEAEAAAAEEAEEERARAEUERAAEHAXAEREAAKEARZAERAAAERAAX   Zm  mck Vaciaanima   exei s   4A44AAAACAAAREAAA4EAA AACAAA amp EAA A4A6A amp A46AA 446A 44464444  Oragiay MHatcicma leaaaana   4AAAAAAACAAAACAAAAGAAAAE AAAAE AAAAEAAAAEAAAACAAAACAAAA   Figmaovalumsa 4AAAAAA   4A4ALAAA  AAAAGAAA4ACAA4 AGAA 4 ACA A4 ACA A4 AG AA  amp 4 46 A4 A446 4444   vian Congonanta leaaaaana   4A4A4AA4ALAAAAE AAA AEAA A ACA AA A4EAA A AE A4 A46 A4 A46 A 4 A46 4444  Teanatocnm Data lenaaana     LLLA    AAAAAAKAEAAAAEAAAAEAAAAEAAAAEAAAAEAARAAEAAAAE AAA AEAAAAE AAA AEAAA AE AAA AE AAA AE AAA AAA     4AAAAAAACAAAACAAAAGAAAAGAAKAAG AA AA AA AA AA AA AAA AC AAA    A4AA4A4A44   4AAAAAAACAAAACAAAAGCAAAAGC AA AA AA AA AA AA AA AA AA AA AA AAGCAAA AGAR AA AA AA AA A A amp  AA AAA   taaa qm Pocmas cl gt  to includa  cC  gt  te mxcludm  and  lt ENTER gt  to mxib gcc ccc     cancel aaa     ana Vacianla Uam waciaa2im uam waciaaim uam RRRA   tasaa  e         E o                 ferata   t y  t    4AAAA  count 323 langth a9 width LARRA     eraan ft l  ngth 9t width hessas   4444  44444   44444   haha    RRR     AAAA  K     cancel aaa     AAA  ELLET    ana d daas   Pa 4 Vaciaalalaj Salactad 50 Valid Da3amcwvabinnas     m re  4AAAAAAACAAAACAAAAGCAAAAGAAAAE AA AA AA AA 
4.    e  key  Choices for  the X Y Coordinate Scale Factor   The scale factor on both of the axes can be controlled by this option  The default value is  10  This option is really useful when drawing contour plots  especially when parts of the contours  are missing  Choosing a bigger number will shrink the graph  so that the entire contours can be seen    on the same graph     14 4 Robust Procedures in Scout  Outliers in Univariate Data Sets  Let xX  X     x   represent a univariate data set of size n obtained from a normal  population with mean  u  and sd  F  The MLEs of mean and sd are x    x  n  and  s   x x  nx        n 1    The Grubbs test statistic  which is equivalent to the Max Mds  test for  univariate data sets  uses the zero breakdown point estimates and therefore  suffers from masking  effects  Dixon  1953  suggested the use of multiple hypotheses testing to identify upper and lower  outliers  Several classical procedures  e g   Rosner s  1975   Dixon type test statistics  for finding  univariate multiple outliers exist in the literature  as given in Barnett and Lewis  1994   In practice   however  the number of outliers  k  is unknown  and it becomes quite tedious to test for multiple  hypotheses  H   k   1  outliers are present  Also use of a separate set of critical values is required    Scout User s Guide 14 12    Chapter 14 Statistical Procedures    for each test   Simple robust statistics such as the sample median  M  and    yap  gt  are sometimes used to  esti
5.   008 0 552 esi 0 406   se   22 idth so    426 o sr  0 04 0 471   T58   gt Imagta so 352 n ir  4 n in5 1  00r   o1   gt widkn so 245 o 105 1 216 q ris p o rse         CENTERS  Teana ocn   CHS SH i stoge an     P gt   Pc ik     CE SCS SEKI E          Tse    1    Mined  5 NaxeS 6  aaanaaaeaananacanaanacanananaeanhAnAkC A4 44EAAAA46   AA4446AA4A   1       Figure 3 2  Transformation functions displayed in the upper nght menu   statistics for all variables in the lower window  and the    histogram for  sp length  in the upper left window     3 4 1 Normality Tests    Upon entering the transform module  you are given a choice between two normality  tests that can be used  These are the Kolmogorov Smirnov test and the Anderson Darling  test  The test selected will be used throughout the transform module     3 4 0 Statistics Window    A window containing statistical information about each variable will appear in the  lower portion of the screen  The information displayed includes the number of observations   mean  standard deviation  skewness  test statistic and critical value for the selected normality  test  If an asterisk character appears between the test statistic and critical value  then that  variable did not pass the normality test  You may scroll through the information in this    Scout User s Guide 3 7    Chapter 3 Managing Data in Scout    window by using any of the following keys   lt UP ARROW      DOWN ARROWS    PAGE  UP      PAGE DOWN        lt HOME gt   and   END    Thi
6.   DF SrT 449  r    aaanaaneaanneaaaa     saa  naaacaaaaana  T Valus 2 0254 laanaaaenanaceAAa     aaaaaaaceaaaaaa    4A4AAAAAAECAAAAAAA Lonac Limit 20 5544 l eanaaaacaaaaenaaa   daa d ARAAEAXAAAAA d Uggme Limit 55 9979 e cdenadcandaccnaa   eatccnccecnnacced 0 75 Teo Sicdmd Limits laaaaaaenaaacaAnna   saaaaaaaceaaaaaal   PPEEEPETTEPETES EES  eaarRRRRERRRRERA ETD eP gt  togeint    9 gt  bo geagn  CESC gt  to xxit m   kAARRERRRRERRRRR   4AAAAAAACAAAACAAAACAAAAE AA AA AK ARE AR AAE AA AA AK ARA ARAAEAA AA AA AR AK ARE AAA AE AA AAA A   4AAAAAAACAAAACAAAACAAAAG AA AA AR AAE AA AAE AA AA AA ARE AR AA AA A AG AA AA AR AAEAA A AE AA AA A 4   4AAAAAAACAAAACAAAAEAAAAE AA AA AR ARE AA AA AA A AG AR AR AK AA AA A AS AA AR AR AA ECAA AA AA AA A A4   4AAAAAAACAAAAEARAAAECAA AA AA AA AR AA AA AAE AA A AG AR AR AR ARE AA AA AA AK ARAAE AA AA AA AAA A   4AAAAAAASAAAACARAAAGCAAAAE AA AA AR ARE AR AAE AAA AG AR AA AR AAEAA AA AA AA AKAAEAA AA AA AA AA     SAAAAAAAREARAAAEAAAAEAAAAEKAAAAREAAAAEAAARAEAAAAEAAAAEAAAAEAARAAEAARAARK ARA AE AAA AE AAARAAM    Dicmckocy  C SSCOUTSOATA Filmhamm  4 NETHYL OAT       Figure 11 24  Statistics and limits for the prediction interval     Rebus  DP od  c io  Imto va     mm dil  lC u    dasi    m9 l9 d  IA IIO        te 2         UL LC uJ ae Ads Bates d ara              e  F  v  Y  t  T  A  L  3        2 02 Z21 7  z   2    3l    amp r      waictiar        Figure 11 25  Robust prediction interval for 4  METH YL DAT        Scout Toturial 11 22    Chapter 11 Tutorial
7.   Display Graphs For   25 2 2szkc dk RE RARE REX RRXG Q Q Plot  Indiv  Raw Data   statistics OptiOS 42 eese da step nde items emm mA ames deese Classical  Zero Lower Lit s iris enanat a MEE ae Gand laren e  eG ES eS Cat ak tek vineis No  BIMES det oe ood we gau PARS RS DA Nude NR a ER B ANUS Se Two Sided  X Axis Valiables   be ete i peat eS Mb i ve La iS e Menu Red  oae prd RU Ga 2  X ARIS  Variable areas Groh eters para aE E TA aaia EE RE KeRb MAM hea EM RE RE 3  Hn  P a E e a eines A e a A Robust Analysis  PRIS Titlene sa o edea e a taa VI adea a a IS tS aaa a a a a AS  Numbeting  peera oes A L ea E ee eee s Observations  Contour Ellipse sso Rea cok eenaa E Pe Roe eam ence e oes Indiv  amp  Simul    Erase Output File  View Weights  amp  Generalized Distances                 0 000000 eee IRIS WTS  Generate Graph With Current Options    Each of these headings has various choices  which can be selected by repeated use of the    ENTER    key  After a selection is made  an arrow key can be used to move the cursor to the  next heading  The process is repeated until the desired choices for all of the headings have been  selected  For  Robust Analysis   the various choices for each of the headings are listed in a  fourth window     The  Display Graphs For  heading offers the following list of available graphs     Q Q Plot  Indiv  Raw Data   Q Q Plot  Indiv  Standardized   Q Q Plot  Simul  Raw Data   Q Q Plot  Simul  Standardized   Scatter Plot  Raw Data    Q Q Plot  PCA    Scatter 
8.   Halip Ma  3249  23      4AAAAAAACAKAAEAAAAEAAAACAAAA         Unzc 3 Guida    THkaacaaaaaaa   Exit  e4A4AA4ACAA4ACAA4A4A4   A4 4464444  Iateoductian 444A4AAAEAAAK   444A44A4A   AA4A6AA44A46AA444   A4A444   Ia3tallat inn 4AAAAAACAAAACAARAAEGAAAAE AAAAA     4444A44A4   AA44A   A4A4A4   AA44A464444   Uamc sa Guida Halls lecceccceeeececeeceaeeneaaaed   44444A4A   CAA4ACAAA446AA44A4   A4444   Film Hanaganaat l enanannekAnAXeAnnA  eAAA4AEAAA4A4   4A4A44A4EAAA AE AA 4ALAAA4CAA44   Data Hanaganant  eanaaAasa Adae 4a natu A A4Euau44a4   444 A44A46AA44EA4 ACA 44464444  Dublimc Taking 4AAAAAACAAAKACAAKAAEAAAAE AAAAA   4444 44A445AAA4EAA AACA A44 E4444  Rosust Analysis PPPPPPTeTPrerrerrrr rrr rrr rrr ye  e4A4444AA  A4AA4   A4446A444A464444   Peiancigal Congonmsatzs leannnaae nna AAeknnAAeAnAAbEAAA444   s4444A444   AA44ACA444   4444   A4444 l Geagnica 4AAAAAAEAAAAEAAAAEAAAAE AA AAA   EA RARRRAEAAAKERRRRER RR RCRA KA Syatan 4AAAAAACAAAACAAAACAAA AC AA AAA   4A444444CA444CA4A4   A4A44   4444   Qguittiaqg 4AAAAAACAAAACAAAACAAAAC AA AAA   4AAAAAAACAAAAEAAAAECAAAAE AAA  4EAAAACAAAACAAAAE AA AA AAAAAA   4AAAAAAACAAAACAAAAEC AA AA AA AA AR AA AA AA AA AA AA AAE AA AA AA AA AR AA AA AA AA AA AA AAA   4AAAAAAACAAAACAAAACAA AA AA AA AR AA AA AA AA AA AA AAE AA AA AR AA AR AA AR AA AA AA amp  AA AAA   4AAAAAAACAAAACAAAACAA AA AA AA AR AA AA AA AA ARE AAA AC AA AA AA AA AR AA AR AA AA AA AA AAA   4AAAAAAACAAAACAAAACAAAAS AA AA AR AA AA AA AA AA AA AA AA AA AR AA AR AA AR AA AA AA amp  AA AAA   Dicmc
9.   Press the   ENTER    key to generate the Q Q  plot using the individual setting  identify the bottom two and top two data points and your display  will match Figure 11 5  The difference between Figures 11 4 and 11 5 is how the control limits   horizontal lines  are computed  The horizontal lines in Figure 11 4 are obtained using the first   order Bonferroni inequality  as given by equation  12  in chapter 14  whereas the limits in Figure  11 5 are obtained using the probability statement given by equation  13  of chapter 14     Scout Toturial 11 4    Chapter 11 Tutorial Ill    aaz Haximum UIL   4 36       S5  Warning UIL   4 14    asx Warning LIL       LI       a  Pe   T  a  n     D  pas   m  Lu   c         T  m   gt   i   v  b   oO  a  n  a  3   v      E  a    aax Maximum LIL             T t t  2 84 S 42   a9    Theoretical Quantiles  Normal LDiztributian        Figure 11 5  Q Q plot for individual raw data for the sp width variable     11 2 Q Q Plots of Principal Component Analysis    Q Q plots of the principal component analysis  PCA  of the IRIS DAT data set will be  produced in this section  Accordingly  select IRIS DAT as the data file  The initial action is to  establish that your options match those in Figure 11 6  Under  Robust Method   select  Robust  Analysis   and then select  Statistical Options   If your options do not match those in Figure 11   6  use the   ENTER   key  repeatedly if necessary  to change the options to one of the other preset  choices  When n
10.   enaaaaana     ananas  Tua ing Constant   lenaanaana   saaaaana2n l Conteal Chact Limits   4AAAAAAAA     sa4aana2n l Teinniag Paccmat PPPTETPPT      a4aa444   Igascm Population   herarrrrsa   saenneed l Pint igaocmd Population lecaceeaee     4AAAKAARAR  4AAAAAAAKR   saaaana2n l Accagt Naw Settings lecuccecee     Pr ip     E B a ry    Dicmctacy  C  SSCDUTSDATA Fi lanana  STACKLSZ DAT       Figure 11 26  Statistical options for an index plot using Huber influence     This data set consists of 21 observations with four variables  Several outliers are present  in this data set  In order to unmask these outliers  a higher value of    right tail cutoff  must be  used    20 15   The Huber procedure cannot unmask these multiple outliers  even with an  of  0 5    Scout Toturial 11 23    Chapter 11 Tutorial Ill    Index Plot for Brouwnles   s Stack Loss    17 84      as Haximum tLargost HO3s   11 85       asx Warning rIndividual ND    8 17    W  po  u  c  m  eg  in  P  a  n  v  MH  E  Fe   T   Iw  v  c   v  e          iL    Observation Numbers       Figure 11 27  Index plot for STACKLSS DAT using Huber  influence     The second Index plot is generated by exchanging  Prop Influence  for  Huber Influence     in  Statistics Options   Using  Prop Influence  we increase our ability to unmask multiple  outliers  Accept the new settings  and then generate the graph  Figure 11 28   All of the outliers   1  2  3  4  and 21  present in this data set are well separated from the rest of the data 
11.   minus  key on all other checked variables  After IRIS DAT has been selected and  properly modified  while remaining in the  Robust Method menu choose  Robust Analysis    press   ENTER      and the screen should match Figure 11 1     Film Data Classical Nathod Rosust Nathod PCA Gcagnica Syatan    4AAAAAAACAAAAEAAAAE AAA AE AA AA AA A  nnn  RRR ERRAR RCRA RRR    A4 AAAAAEAAAAEAAAACAAAAEAAA4EA4 A444  Sulmct Vaciaaima 4AAAAAAEAAAAEAAAAA   44444A4446A4A446A44464 44464 44464444444 l Uniwaciakm Statistica leccueecetecceccaca   4444444A46444A464444   4A4A446444A464444444   Rosust Analyaia leccateceeececeaces   CRAALALLERRLRERRRRERRRRERRALERRERALALN Contusion Nateix 4AAAAAAECAAAAEAAAAAK   444A44444644446A44A464A4446444464444444 l Pattacn Racogartian leccnteceeeecceadcs   4A4AAAAACAAA4 amp CAAA4CAAAACAAA4A4  4444444   0 Teand PEPE EP ETES ESTES ET YS  RRL A  M Ruumusab Analysis eM dE ER     aaaaa44 l OQiagtey Geagna Fac J 9 Pilot Lindiw  Ren Ostaj Jeccntecce     aa  4a444 l Statistica Ogtians Classical lecentecae     aa4aaA4 l taco Lowme Limit  eaaaaanann     aa44aA44A4 l Limit Styla Tao Sidad lecentecae     aaaaaAa   X Axis Vaciaanim herrarnas     aaaaaa2   Y Axia Vaciaanim lennaanaaa     aa4a444   Titis Rosust Analysis leaaaaanaaa     aa42444   X Axis Titis 4A4AAAAAAA     aa4a444   Hunamciag Daamcwabinn2 leaaaaanaaa       a444444   ConbknucEkll igam Indiw    Simul lenaaanaan     4AAAAAAA  4AAAAAAAAR     aa44444   Ecass Outgut Film leccetenae     aa44444   Yis WSaigqnts    Ganwcaliz
12.  11 10    Chapter 11 Tutorial Ill    Film Data Classical Nathod Rosust Haxthaod PCA Gcagnica Sysatan    4AAAAAAACAAAAEAAAAE AAA AE AA A AG A A AA AA m AAAAQAAAAGAAAAAA   4444444464 4AA46   AAA4   A4AA46AA4A44   A 444444  Zmimct Vaciaasima 4AAAAAACAAAAEAAAAA   4A444A444A6A4AA46AA44  A4AA46AA4A46   AA4 44444  Uniwaciabm Statistica leccctenetecccecced   4444444464A4A446AA444  A444  A4A46A4A44444   Rosust Analysis 4AAAAAACAAAACAAAAAR   4A444A4446A4A46EAA4AEA4AAA  AAA4  A4 44444  Contusion Mabcix 4AAAAAAE  AAKAAEAAKAAA   4A4ARAAAEAAAACAA4AEAAAAtAAA4   A4A44444   Pattaecn Racogairtioan leccctceetceececaces    LARRARAAERAARCAAAAEALAACARRALARLALLA   0 Teand leccnceceteeceedaee     4A4AAAAACAAAACAAAACAAAA   AAA44   4A444444   Add Naana 4AAAAAACAAAACAAAAA     TETTETETT    M                   a a                        s44AAAAAnR   saaaanan   Sbabiskica Dgtiona Classical lecccaaaaa       aaaaaaa l Nunamciag Pugulatinazs leaanaaakna       anaaaana   Contouc Elligam Indiwidual PPP PTT  4A4AAAAAA  4AAAAAAAAR     aaaaaaa l Typa of Geagn PCA Zcncma lecettcece     aa4aa44   Gcagn Titia Pattaecn Racogoitian Jececteace       a44a444   Sawa   Uiaccininant 5cocma 4AAAAAAAA     aaaaaaan   4AAAAAAAA     aaaaada l Vian Eigqan Valuma and Vactoca leceeeeece   caceaaae   Viu Contuaion Nate rx 444444444     aaanaaa l Vime Caowaciancm Mabcix and Naana leaanaaaaa   RRRA  4AAAAAAAAR     anaaaaa l Bagin Congutatiana with Cuccmat Dot iona leceeecece     td  4AAAAAAA  4A   AAAAAA   Diecmckacy  C  SSCDUTSDATA 
13.  5 10    Chapter 5 Robust Statistical Methods    Typecof Graphs  e nk sie Exe Rate ES RAR sed EM RAL SOMES Discriminant Scores  Graph Title us cuero tet te gigantea gs quat edes qm AD    Pattern Recognition  Save Discrimitant Scores oca o  Code aeu olv EE Mab ei Ga oa DG cedo hg No  View Eigenvalues and Vectors   2 46223 ua bod bia EX phe eee ede ES Yes  View Confusion Matrix zelum o RUE ds eme ss Semone de eame Yes  View Covariance Matrix and Means 340s oii oi peewee ee ee es E RES Yes    Each of these headings has various choices which can be selected by repeated use of the    ENTER    key  After a selection is made  an arrow key can be used to move the cursor to the  next heading  The process can be repeated until each of the desired choices for the various  headings have been selected     Statistics Options presents the same menu as described in Section 5 3  Set these options  as desired then return to the third window  as shown above   The remaining headings and  corresponding choices in the third window are as follows     Headings Choices  Numbering Observations Populations  Contour Ellipse Individual Simultaneous Indiv  amp  Simul     Indiv   Class  Simul   Class    Type of Graphs Discriminant Score PCA Score X Y  Graph Title Can be typed in after using the  lt ENTER gt  key  Save Discriminant Scores Yes No   View Eigenvalues and Eigenvectors Yes No   View Confusion Matrix Yes No   View Covariance Matrix and Means Yes No    The Graph titles can be typed in after using 
14.  848  x2   0 78 1 39 0 91  0 05  0 25 1 311  x3  1 37 0 91 13 33  0 3  0 64 56 716  xA 0 17  0 05  0 3 0 03 0 06 1 583  Octn   4 02  0 25  0 64 0 06 0 8 91 549  Robust Statistics After Deletion of 8 Outliers  Covariance Matrix Mean Vector  xl x2 x3 x4 Octn   xl 44 35  0 83  7 27 0 24  3 95 62 657  x2  0 83 1 24 0 91  0 06  0 25 1 294  x3  727 0 91 12 88  0 35  0 63 56 833  x4 0 24  0 06  0 35 0 03 0 06 1 590    Octn   3 95  0 25  0 63 0 06 0 79 91 568    Scout User s Guide 14 27    Chapter 14 Statistical Procedures    14 9 Interval Estimation   Computation of several classical and robust interval estimates useful in many applications  are incorporated in the robust module of Scout  A good description of these procedures is given in  Hahn and Meeker  1991   The following four interval estimates are available in Scout  which can    be obtained using one of the robust  HUBER  PROP  and MVT  or classical procedures     l  Confidence interval for the population mean  p    2  Prediction interval for a single future observation  x     3  Simultaneous confidence interval for all of the sample observations  x   X3       X     4  Confidence interval for a single observation  x    in a sample     These intervals are significantly different from each other and care must be exercised to  use them appropriately  For example  at a polluted site one of the objectives is to obtain a threshold  value estimating the background level contamination prior to any activity that polluted the site   H
15.  AA AA AA AA AA AA   Oremetocy  C  SSCDUT s5  DATA Filmahamm  IRIZ U0AT    Figure 11 31  The kurtosis value for IRIS DAT        Note  The classical kurtosis  as given in chapter 10  is 25 49  which got distorted by outliers     Scout Toturial 11 27    Chapter 11 Tutorial Ill  11 8 Summary  ASSESSING NORMALITY AND THE IDENTIFICATION OF OUTLIERS        11 1  Q Q plots  While covering the production of these plots  we also covered   1  a graphics option   lt SHIFT   gt     2  options for graphics output   lt P gt  and   lt F gt    and  3  the use of  lt   gt  and  lt   gt  to select and deselect variables        11 2  Q Q plots of PCA  While describing the production of these plots  we also  covered  1  using the  lt ENTER gt  key in a menu to change preset choices  and  highlighting and typing in values for numerical fields  and  2  to the use of Page  Down  or Page Up  to display other graphics when multiple plots are present     DATA REDUCTION TECHNIQUES AND EXAMINING DATA FOR PATTERNS        11 3  PCA scatterplots  In addition to describing the production of this output we  also described   1  the use of  lt N gt  to identify data points  and the use of  lt E gt  to  draw ellipses   2  supplying titles for graphical output   3  use of the  X Y  Coordinates Scale Factor      to rescale graphs to get all output on the screen   4   viewing the eigen values and eigen vectors as part of analysis output    5 examining discriminant analysis along with the confusion matrix  and  6  
16.  AA AA AA AAA AC AA AA AA AA AA AA AA   4AAAAAAACAAAACAAAAGCAAAAG AA AAE AA AA AA AA AA AA AA AA AA AAC AA AA AA AAG AA AAKG AA A A amp K AK AA A4   4AAAAAAACAAAACAAAAGAAAAGAAAAE AA AA AA AA AA AA AA AA AA AACAA AA AA AAKG AAA AG AK AA AA AAA     4AAAAAAACAAAACAAAAGAAAAGAAAAE AK AA AK AA AA AA AA AA AA A AC AA AA AA AAKGAA A AG AK AA AA AAA   Diecmckacy  C  SSCDUT9 sx M DATA Filmahamnmm  IRIS  DAT       Figure 12 5  An explanation window for the  Transform Data  function  indicating completion of the transformation     Press  lt ESC gt  three times to return to the main menu  select  PCA  and press   ENTER     Move the cursor to highlight  Display Matrices   and press   ENTER    to generate the variance  covariance matrix for the transformed variables  i e  the principal components  as shown in the  Figure 12 6        Scout Tutorial 12 5    Chapter 12 Tutorial    Fila Data Classical Hamthod Ruauask Natnod PCA Gcagnica Syatan    SAAAAAAACAAAACAAAACAAAACAAAAE AAA AG AA AA AA A AG A A A A     A A A A nde LEER ERY  444A4A4AAEAAAACAAARAEAA AK GARA A amp CAA AACAA A 46A A4 ACA A4 4464444  Suimct Vaciaasnima leaananas     AAAAAAAEAAAAEAA AACEAA AA LAA AA EAR A 4E A4 AACA AA 46A AA 46 4444   OQiagtiay Hatcicma  eaaaaaa   4AAARKAARAGAAAREAAAAGAAAAEAA AA AKA ACAAA ACA 4X4 AE A A44 64444  Eidgmowalumsa 4AAAARAK   44A4A4AAAEAAAAEAAAREAA AK EAA A ALAA 4 ACA AA AE AA A ACA A4 46A 44     vian Conaonmsata leceeeee   CAARLARLEAALAEARLALRAAALRAAREARRLERRRALARAALLARRCARLL   Teanatocnm Data Jecc
17.  AA AA AK AA AA AA AA AA AAA AAA   4AAAAAAACAAAACAAAACAAAAGAAAAG AA AA AA AA AK AA ARA AA AR AA AA AA AA AA AAA AG AA A AG AA AA A A   4AAAAAAACAAAACAAAACAAAAG AA AA AA AA AA AA AR AA AA AA AA AA AA AA AA AAKGAAA AG AA AA AA AAA A   LL quM pBmxad AZCOO  O Fam s EEEE  eaanaanan   Raada a data amt fcon an ASCII Eila on any diah  Tha tila Jecccccace     aanaana  tocnmet  dat inad io thew Usec s Guicdm   3 GED EAS conagabiaim  beeaanaaaa   danasaaa   CAUTION  Oata i  manocy will om loat  lI eaaaaaaaa   P                 ua  xzG  tC  4AAAAAAACAAAACAAAACAAAAGAAAAG AA AA AA AA AR AA AA AA AA AA AA AA AA AAGCAAA AG AA AA AA AA AA   SAAAAAAACAAAAGAAAAE AA AAE AA AAE AA AA AA AACAAAAGCAAAAG AAA AG AA AA AA AA AA AA AA AA AA AAA   4AAAAAAACAAAACAAAAGCAAAAGAAAAE AA AA AK AA AA AA AA AA AA AA AA AA AA AAGAA AA AA AA AA AAA A   SAAAAAAAGCAAAAE AK AA AR AAE AA AAE AA AA AA AACAAAAGAAAAG AA AA AAA AG AK AA AK AA AA AA AA A4 AA   SAAAAAAAGCAAAAEAAAAEAAAACAAAACAAAAGAAAAG AAA AE AAA AE AK AA AR AA AA AA AA A amp K amp  AA AA AA AAA   Oremetocy  C  SSCDUTSDATA Fi lanana  FULLIRIS DAT       Figure 2 1  Scout s main menu with the File heading selected  displaying  six headings and choices for file management and an  explanation window for the first heading     2 20 Reading Spreadsheet Files    Scout cannot read Spreadsheet data directly  However  a spreadsheet file can easily be  converted into Scout data set  In order to convert a spreadsheet data file to a Scout data file   the specific file for
18.  AAAAAAAKEAKAAAEAAAAEAAAAEAAAAEAARAAEAAAAEAAAAEAAAAEAARAAEAAAAEAAA AE AKA AE AAA AE AAA AAA   SAAAAARAAKEAAAAREARAAAEAAAAEAAAAEARAARERAAAAEAAARAEAARAAEAARAAREAAAAEAAARAE AAA AE AAA AEAAAAAR   SAAAAARAAREAAAARERAAAKAEARARAAEAARAAEARAARERAAAEAAARAEAAAAEAARAAEAKAAAEAARAEAAAAEAKRAAEAAAAAK   CARA AAA qM Gcagn Pacanmimca MA 4 AA AERA   saananas  Allows tha uamc to modify tha coloc and shapa of V easasaaaa   PPPE   iodiwidual osamewations  Dsamsewations can 3m canovad PISTES    PPPE lmaiddano  tecon tha gcagpha ay bucning bhan alach  Deccccccaa     ShAAAKBARS AEAAAAAA   SAAAAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAE AAA AE AA AAE AAA AE AAAAAA   AAAAARAAEAAAAERAAAEAAAAEAARAAEAARAAEAAAAEAAKAAEAAAAEAARARAEAAAAEAAA AE AAA AE AAA AE AAA AAA   AAARAARAAKEAAAAERAAAAEAARAAEAARAAEAARAAREAAAAEAARAAEAAAAEAARAARERAAAAEAARA AE AKA AE AAA AE AAAAAAR   SAAARARAAREAAARERAAAEAARAEAARAAEARARERAAAAERAARAAEAAAAEARARAREARAAAERARAAEAAAAEAARAREAAAAARK     4SAAAAAAACAAAAGCAAAAEGAAAAEAAAAEAAAAEAAAAGCAAAAGCAAAAE AA ARE AAAAEAAAAEAAA AG AAA AE AA AAA   Oremectocy  C  SSCDUT 952 S DATA Filmhanm  IRIS  OAT       Figure 13 1  The Graphics menu with the explanation window for Graph  Parameters displayed     The  Graphics  module always considers all the variables in a data set  Move the cursor  to highlight  2 Dimensional  and press  lt ENTER gt   The screen will be similar to Figure 13 2   All variables in the data set are displayed across each axis in this matrix  The upper left to lower   right d
19.  AAK amp  AA A ACAAAAAA   LARAAALAEAAALLLAN Ganawcalizad Oiatenca leaanakse eA 4464 44ACA A4 ACAAAREAARAEAAAAAA   444 4444444444444   Multiwaciabm Kuckaosia  eaaaanaen aa 4e An 4AbEAAA4A  EAA4A4   A4AA4AA4A4A4     a444444a444a4444   Causal Vaciaaima SAAAAAAEAAAAEAAAAECAAKAAE AKA AE AA A AE AA A AAA   eanaaeaasAanaa4aa2A l  ssucciabmd Caus23 4AAKAAAEAAAKACAAAACAAAAK AA AK amp  AA A ACAAAAAA   ausa u4444e4Aa44444   Rz nowms Dutliae Flaga   eanaaaasansae4a4AE44 A4EA44A4E4AAACAAR AAA   4444444 4    A44 4    4 4  P M      4 44444644 44   A44A   AAAA   AAAA   AAAACAAAAAA   4MAAAAAAACAAAACAAAACAA AA AA AA AA AA AA AA AA AAE AA ARE AA AA CAR AA AR AA AA AA AA AA AA AAA   4AAAAAAACAAAACAAAAECAA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AR AA AR AA AA AA AA AAA   4AAAAAAACAAAACAAAACAAAAC AA AA AR AA AA AA AA AA AA A AE AA AA AA AA CAR AA AR AA AK AA amp  AA AAA 4   4AAAAAAACAAAACAAAACAAAAEAAAAG ARA AA AR AA AA AA AA ARE AA AA AA AA AR AA AK AA AA A AE AA AAA A   4AAAAAAACAAAACAAAAECAA AA AA AA AA AA AA AA AA AAE AA AAE AA AA AA AA AR AA AR AA AA AA AA AAA   taaa aa aa qM Gacmcaliimd iabanca        Mac 4 A4   sanaasaal Tha twat statistic ta bha lacrqmat gaomcalizad diastancs  V eeaaaanaas       aananaa   Tha twat i3 ibEmcabmd until no Fuckthmc cemcocda acm cnimcimd  PTEPSTEASPS      anaanna   Use whan I   ia unlikely that many oubliaes acm 2cmamat  beenaanaaa   LL     M                        ee A AAA  SAAAAAAACAAAAEAAAACAAAAGAAKAAG AR AA AA A AK AK AA AK AA ARA AA AA A AK AK AA AK A
20.  Garner  Fitzgerald  Kirk  and Nocerino  J   1993   Simultaneous Acceptance  Regions and An Alternative Statistical Scoring Algorithm to Assess the Performance of the    Laboratories Participating in the CLP Program of the USEPA  An Internal Report     Wilks  S S   1963   Multivariate Statistical outliers  Sankhya  25  407 426     Scout User s Guide 14 42    
21.  Ill    You can save this output by pressing  lt F gt   and supplying the name of a file to hold the graph  or  by pressing  lt P gt   to print the graph     11 5 Index Plots    Select STACKLSS DAT from the Data subdirectory of the Scout directory  Return to   Robust Method    Robust Analysis   and within the  Select Graph Type  menu  select  Index  Plots   Set  Statistics Options   as shown in Figure 11 26  using  Huber Influence  to detect  outliers  Accept the new settings  and then generate the graph  Figure 11 27      Film Data Classical Nataod Robust Nathod PCA Geagnica Syatan  4AAAAAAACAAAAEAAAAEAAAACAA A AE AA A A LL  a                n RRR RRRA ERRAR An  444444AA6AA4 AEAAA46AA 4464444 4  A4A44444   Zm imct Vaciaaima PPEPPTSYST  P     4444444AC  AA4AE  A4A46AA 4464444 46A 444444  Uniwactatm Statistics  ececceccecccedcana   44A4AAAAEAKAAEAAAACAAAA  EAAAACAAAAAA4  Rosust Analysis lecccccccerccedcada   ALAA AAALRARAERRAAERARAERARREARRAALA Contusion NHabcix 4AAAAAAEKAAAAEAAAAAR   444444 4ACAA4AEA4A46AA A46 AAA 4    A444444   Pattaecn Racogaitian lecccceccerceeccada   44444444CAAAACAAA amp 46A4AACA4 44    4444444  0 Tcmad 4AAAAAACAAAAEAAAAA   REEL ms Skatisbical Dgb inn mm eK KKK   saeaceaal Conguts Statistica Using Husar Int luacnce lecacccces   taeaaeec l Initial Eatinata Rosuat lecaceecen     aaaaaa   l Matcix 4AAAAAAAAR     a4aa4A4 l umidgnta leaaaanaana     aaaaaaa l  saaaanana   sa4aa4424   xX Y Cuucdinabma 5calm Factor Lal 4AAAARAAAA   s44a4444   Right Tail Cutatt  
22.  Robust Regression  amp  Outlier Detection  John Wiley     New York     Rousseeuw  P  J   and van Zomeren  B  C   1990   Unmasking multivariate outliers and leverage    points  Journal of American Statistical Association  85  633 639     Schwager  S J   and Margolin  B H   1982   Detection of multivariate normal outliers     Ann  Statist   10  943 954     Scout  A Data Analysis Program  Technology Support Project  U S  EPA  EMSL LV  Las Vegas     NV 89193 3478     Stapanian  M A   Garner  F C   Fitzgerald  K E   Flatman  G T   and Englund  E J   1991      Properties of two tests for outliers in multivariate data  Commun  Statist  Sim   20  667 687   Singh  A   and Nocerino  J M   1993   Robust QA QC for Environmental Applications  Proceedings    of the Ninth International Conference on Systems Engineering  Las Vegas  Nevada  370 374     Singh  A   1993   Omnibus robust procedures for assessment of multivariate normality and detection    Scout User s Guide 14 41    Chapter 14 Statistical Procedures    of multivariate outliers  Multivariate Environmental Statistics  Patil  G P  and Rao  C R   Editors     Elsevier Science Publishers  Amsterdam  445 488     Singh  A   and Nocerino  J M   1995   Robust Procedures for the identification of multiple outliers     in Handbook of Environmental Chemistry  Vol 2 G  Springer Verlag  in press     Singh  A   Singh  A K   and Flatman  G T   1994   Estimation of background levels of contaminants     Math  Geol   26  361 388   Singh  A   F  C 
23.  S   Britton  P  W   and Lewis  D  F   1988   On the Prediction of a Single Future    Observation from a Possibly Noisy Sample  The Statistician  37  165 172   Huber  P J   1981   Robust Statistics  John Wiley  New York   Iglewicz  B   1983   Robust Scale Estimators and Confidence Intervals for Location  in    Understanding Robust and Exploratory Data Analysis  Hoaglin  D C   Mosteller  F   and Tukey     Scout User s Guide 14 39    Chapter 14 Statistical Procedures    J W   eds  New York  John Wiley     Johnson  R A   and Wichern  D W    1988   Applied Multivariate Statistical Analysis  Second    Edition  Prentice Hall  New Jersey   Jennings  L W   and Young  D M   1988   Extended critical values of multivariate extreme deviate  test for detecting a single spurious observation  Communication in Statistics  Simulation and    Computation  17  1359 1373     Kafadar  K   1982   A Biweight Approach to the One Sample Problem  Journal of the American    Statistical Association  77  416 424     Mardia  K V   1970   Measures of multivariate skewness and kurtosis in testing normality and    robustness studies  Biometrika  57  519 530     Mardia  K V   1974   Applications of some measures of multivariate skewness and kurtosis in    testing normality and robustness studies  Sankhya  36  115 128     Rosner  B   1975   On The Detection of Many Outliers  Technometrics  17  221 227     Scout User s Guide 14 40    Chapter 14 Statistical Procedures    Rousseeuw  P  J   and Leroy  A  M   1987  
24.  Scout files together  The new data file is always written as an ASCII file  The append  routine assumes the variables are the same in each of the input files  If the two input files do  not contain the same number of variables  the routine will not allow them to be appended   The variable names from the first input file will be used as the variables names in the new file   All of the observations from each of the input files are written to the new file even if duplicate  record labels occur     Scout User s Guide 2 5    Chapter 3 Managing Data in Scout    3 1 Data Management    Scout enables the user to edit  insert  or delete observations and variables currently in  memory  change the title of the data set  and change the name  units  or other attributes of the  variables  Select  Data  from the main menu and  Edit Data  from the pull down menu as  shown in Figure 3 1 below     Film Data Clazaa cal Nathod Rosust WNathod PCA Gcagnica Syatan    RBBB nC EC KKK KE KKK KERKRKRERKRKRERKRKRERKERERARKRERKERERKERERKRAKERKRAKRKG     aaaaaaa   Edit Data  ennnannne an Anc AAAACAAAACAAAACAAAAEAAAAEAAAAEAAA amp EAA A amp EAA A  eaaa  aaa   Shab BRIS jcxunuAkAX4A4AAAAEAAAAEAAAACAAKAACAAAACAAAAK AA AK amp  AK AA AK AACAAAAA   e  aa  AA4   Teanatocn  aaanaa tea t4 Aea 44 4e444AEA4AAEA4 AAEAAAACAA AR LARA AA LAA A46 AA A4 A     aaaaaaa   Peint Data  e cA KORA ARERR An4CAAAACAAAACAAAAEAAAAEAAAAEAAAKEARAAEAA AAA   thhhhhha  SEAAAAECAAAAEAAAAGCAAAACAAAACAA AA AA AA AA AA AA AA AA AA AA AAA A   
25.  a level 1  heading     Additional levels of menus and headings will be found in Scout  Their description will be  consistent with the definitions described above  In this tutorial  you will learn  a  how to read  data files   b  how to use the Statistics choice under the Data heading   c  how to save the  Statistics output obtained by using a Statistics option  and  d  how to work with the various  functions under the Transform heading     9 2 Read Data Files    In the Scout directory  at the prompt  C  Scout gt    type  SCOUT  and use the    ENTER    key three times  This will guide you to the screen shown in Figure 9 1  Any of  the headings can be selected by using the  lt RIGHT gt  or  lt LEFT gt  arrow keys     Highlight  select  the  File  heading  press the   ENTER   key  and the level 2 menu  will appear  The heading   Read ASCII File  will be highlighted  press the  lt ENTER gt  key  again and a directory will appear listing the names of files and other directories  To select a  different drive just hit the appropriate key  A  B  C       etc    to represent the appropriate  drive  The files and directories displayed will depend on the directory content of each  individual user  The file  IRIS DAT  should be in the Scout directory  Highlight this file and  press   ENTER    the list of files and directories will vanish  a small explanation window will  appear stating   Reading data  please wait   which may vanish before you can read it  and  then the Figure 9 1 screen 
26.  are obtained using either the  MLE or one of the robust approaches  The univariate simultaneous limits given by equation  10   can be plotted on the single variable normal probability plots  Observations falling outside these    limits are the univariate outliers     14 7 Contour Plots   The contour probability plots of the Mds based on classical or robust estimators of location  and scale can be used to further enhance the identification of outliers  The contour ellipsoids of the  Mds are displayed at the same two levels as the warning point  Md    and the maximum   point  Md lines on the Q Q plot of the Mds as described above  For given values of   and n  the    critical values Md    zi a and Md   differ significantly  The associated confidence ellipsoids are given    by the following statements   P Md   Md   1   1 2     n   1 a   and P Md   MdP  i  1 2      n     1 0   Outlying observations stick out more clearly on the plots obtained using the robustified Mds     Observations falling outside the outer contour are outliers  whereas the observations lying between    Scout User s Guide 14 20    Chapter 14 Statistical Procedures    the inner and the outer contours need further examination  and points falling inside the inner contour    represent the main stream of data     14 8 Robust Principal Component Analysis   Principal component analysis  Anderson  1984   Johnson and Wichern  1988   is one of the  well recognized data reduction techniques  It is well known that  while 
27.  be based solely on their magnitudes   Logically  one cannot truly distinguish non normality from contamination  Discordant values    Scout User s Guide 4      Chapter 4 Classical Methods for Outlier Identification    should be subjected to increased scrutiny  and removal should occur only when this inspection  reveals unique or unusual problems in the measurement or recording of these values  Scout  is designed to enhance the user s ability to quickly identify such problems     4 2 Select Variables    When searching for outliers  the user should decide which variables are to be included in  the analysis  The  Select Variables  heading will allow the user to do this  If the user skips this  step  Scout will default to testing all of the variables  Once in the variable selection screen  a  check mark next to a variable name indicates that variable will be tested  The user may place or  remove these check marks by using the  lt UP ARROW   and   DOWN ARROWS keys to move  the selector to a particular variable name  and then pressing the  lt   gt  key to remove the check  mark and the  lt   gt  key to place a check mark  The  lt   gt  and  lt   gt  keys move the selector to the  next variable name so that a series of variables can easily be set by holding down one of these  keys  Pressing  lt ENTER gt  or  lt ESC gt  will accept the variable selection as indicated     4 3 The Classical Outlier Tests    The two outlier tests available in the Classical Method menu are Mardia s multi
28.  be performed for each of the regions in the Pb add file   7 here       5 9 Causal Variables    When Causal Variables is selected  the second window will display the message  Searches  for the variables that might have caused a given observation to be an outlier  A variable is a  cause if  when removed  the observation is no longer an outlier   When the   ENTER   key is  pressed  the third window appears allowing the various headings to be set  The available    Scout User s Guide 5 13    Chapter 5 Robust Statistical Methods    headings for this choice are as follows     Scout User s Guide 5 14    Chapter 5 Robust Statistical Methods       Headings Example Choices  Statistics OpHORS   cov ve c ee lv de cU Dd d eae oa Classical  Confidence Interval 2    0      eee Simultaneous  Zero Lower Limit 222 5  x  d be ee hee ee be hee ee he a es No    Each of these headings has various choices  any of the choices for Confidence Interval  and for Zero Lower Limit can be selected by repeated use of the   ENTER   key  After a  selection is made  an arrow key can be used to move the cursor to the next heading  The Zero  Lower Limit option can be used when the lower limit becomes negative  and the data cannot take  negative values     Statistics Options presents the same menu as described in Section 5 3  Set these headings  as desired and return to the third window  The remaining headings and corresponding choices  in the third window are as follows     Headings Choices  Confidence Interval 
29.  benefit from reviewing the tutorial sections before  reading the user s guide  Various examples presented in the tutorial section are produced by  using some well known data sets     The main menu in Scout contains seven headings  These headings are labeled as File   Data  Classical Method  Robust Method  PCA  Graphics  and System  Each of these headings  has various options  These options can be viewed by moving the cursor in the main menu to  the appropriate area and pressing the   ENTER   button  A short description associated with  each heading or choice is displayed automatically in the window of the main Menu  The  description window associated with any heading or choice can be activated by moving the  cursor  or by using the  lt ARROWS3 key to the corresponding area  The User s guide section  and the tutorial section of the manual are organized systematically from the  File  heading to  the  System  heading     Scout User s Guide 1 2    Chapter 1 Preliminaries  1 3 Installing Scout  Place the Scout diskette in drive A  or B  and install to hard disk C     1  Type  C    without quotes  and press   ENTER     This changes the current disk drive to drive C     2  Type MD  SCOUT  and press   ENTER     This creates a directory called SCOUT  where the program will reside     3  Place the Scout disk in drive A  or drive B  and close the drive door     4  Type  COPY A     CASCOUT  and press   ENTER     This copies all the files from the program disk in drive A into the SCOUT  
30.  computers  the exact critical values based on a scaled beta distribution can  be obtained quite easily  Using Scout  the critical values of the distances  Mds  and the theoretical  quantiles used along horizontal axis in the Q Q plot of the Mds can be obtained using one of the  following two options      The Chi square Approximation  e The scaled beta distribution  The default option is the scaled beta distribution   The Right Tail Probability      And The Confidence Coefficient   Scout allows the user to select a value for      the right tail area   gt  0 01  for the distribution  of individual Mds  default 0 05   Also  for all of the control limits  in Q Q plots  index plot  and  interval estimates   the user can pick a confidence coefficient of his or her choice   for example  80   90   95   99  etc  warning and maximum limits   The default confidence coefficient is  0 95   Two Choices For The Scale Estimator    Scout User s Guide 14 9    Chapter 14 Statistical Procedures    For multivariate data sets  the user can obtain the relevant statistics such as the Mds  the PCs  etc   either using the variance covariance matrix or the correlation matrix  The correlation matrix  is chosen by default    Tuning Constant and Trimming Fraction   The PROP procedure does require the use of a tuning constant  An option for selection of  a tuning constant is provided in Scout for interested users  The default value is 1 0  Also  the  trimming fraction  representing the percent of observat
31.  contains three headings  as shown in Figure 13 1   Graph  Parameters  is used to select the color and shape of data points used in a graph  After selection  of a data set  and the optional selection of desired colors and shapes of data points  a 2   dimensional or 3 dimensional graph can be displayed  The 3 dimensional capability of Scout  affords opportunities to view the data from many perspectives  For this tutorial  select the  FULLIRIS DAT data set from Scout s Data directory     Film Data Classical Hamthod Rosust Wathod PCA Gcagnica Systan    4AAAAAAACAAAACAAAAEAA AA AA AA AA AA AA AA AR AAGC AAA AG AA A A amp  A A A A A g          3  4AAAAAAAEAAAACARARCAR ARCAR A amp KEAR ARERR AR CARA RERRRRERRRRERRR REA  Gcagn Pacanmimca    4AAAAAAACAAAAEAA AAA A AACAAAAEAA BRE RRR MERA BER 444    AK A46 444444  2 Oinanatranal    4AAAAAAACAAAAEAR ARCA A AACARAREAA A4 AAAKEAR A amp EA 444    AA A46 444444  5     immoainna                 AARAAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAE AAA AEAAAAE AAA AE AAAAEAAAAE AAA AEAAAAARKR     SAAARAARAREAAARERAAAEAARAAEAARAAEARAARERAAAEAARAAEAARAAEARAAAKEAAA AERA     AAARARAAAEAKAARAREARAAAEAARAEAARAAEARARAARERAAAAEARARAAEAARAAEARARAEAAAAERARAA AE AAA AE AA AAEAAAAAR   SAAARAARAAREARAAARERAAAERARAEAARAAEARAARERAAAERAARAEAAAAEARAAREAAARAERARAAEAAAAEARAAEAAAAAK   SAAAAARAAREAARARERAAAEARARAAEAAAAEARARERAAARERARAEAAAAEARAAREARAAARERARAEAAAAEARAAEAAAAARK   AARAARAAKAERARARAAEAARAAERARAAAEAAAAEAAAAEAARAAEAARAAEARAAAEAAAAEAARAAEAAAAEAAAAEAAAAEAAAAAK   
32.  examples  some of which are discussed in the tutorial chapters of this user s  guide  The readers are encouraged to try the procedures described here on data sets from their own  applications    Some desirable properties of an outlier identification procedure are       The procedure should be resistant to swamping and masking effects with a have high  breakdown point       The procedure should be graphical and intuitively appealing to the user  There is no  substitute for a good and revealing graphical display of the data set       The resulting robust and resistant estimates of location and scale and the Mds with or  without the outliers should also be in close agreement with the corresponding MLE    estimates and the Mds obtained after the removal of the outlying observations     Scout User s Guide 14 5    Chapter 14 Statistical Procedures    The procedure should be able to order the Mds accurately  leading to the correct    identification of outliers     14 2 General Description of Statistical Procedures in the Scout Software Package    All of the major menus available in Scout have been discussed in earlier chapters  Some    statistical procedures used in Scout are listed as follows     l     Histogram and Data Transformation  Several transformations are available including  standardization  linear and logarithmic transformations  power transformation  e g   square   root   Box Cox type transformations  These have been discussed in earlier chapters   Normality Tests  And
33.  identification of outliers and the estimation of population parameters of location and scale  typically use an influence function  The robust module of Scout computes various statistics using  four methods  These include the classical MLE approach  the robust multivariate trimming    Scout User s Guide 5 1    Chapter 5 Robust Statistical Methods    approach  Devlin et al  1981   the Huber influence function  Huber  1981   and the proposed  PROP influence function  Singh  1993   Numerous graphical procedures are incorporated in  Scout  These include the normal Q Q plots of raw data  scatter plots  Q Q plots and scatter plots  of principal components  Q Q plot and index plot of the Mahalanobis distances  scatter plots of  discriminant scores  contour plots  plots of prediction interval  simultaneous confidence intervals  and more  The control chart type quantile quantile  Q Q  graphical display of multivariate data  combines the effect of a formal test procedure and an informal graphical display into one  powerful multiple outlier identification procedure     5 2 Choices of robust analyses    Several univariate and multivariate robust procedures are available in Scout which are  worked out in detail in the tutorials  Section II   There are nine options in the  Robust Method   menu     Select Variables  Univariate Statistics  Robust Analysis  Confusion Matrix  Pattern Recognition  D Trend   Add Mean   Causal Variables  Print Destinations    There are various screens associated 
34.  in an environmental monitoring application  it is quite possible that the classification  procedure based upon the distorted estimates may classify a contaminated sample as coming from   the clean population and a clean sample as coming from the contaminated part of the site  This may    Scout User s Guide 14 3    Chapter 14 Statistical Procedures    lead to incorrect remediation decisions    The MLEs based classical and even the robust outlier identification procedures are  vulnerable to masking and swamping effects in the presence of multiple outliers  Masking means  that the outliers are hidden  and the presence of some outliers may mask the existence of others   Even the sequential use of the outlier identification procedures can not help unmask these multiple  outliers  e g   see Example 1  Chapter 10   When the outliers arise in clusters  the OLS regression  model gets attracted toward the outliers resulting in deflated residuals  leading to masking of  outliers  Swamping  on the other hand  means that some of the inlying observations are identified  as outliers due to the presence of some other outliers  In the presence of multiple outliers  or for  a mixture sample from two or more populations  the generalized distances including robustified Mds  get distorted to such an extent that the cases with large Mds may not correspond to the outlying  observations  This data masking distorts the estimates of the population parameters  e g   u        and the correct ordering o
35.  l Geagn Titia lecceteeca   PEPEE Savm Oraccininant S5cocma leccateace      ean n ta   vian Eigan Valuma and Vactocs 4AAAAKAARAR     eAAAAAAK  AAAAAAK  Vim Contusion Habcix  eaaaaanAua   vVime Covaciancm Nateix and Naana lectcecece     RRRA    4AAAAAAA  4AAAAAAA  4AAAAAAAAR   saaaanaan l Gagin Conmgutatiaona with Cuccant Dot iona leeaaaaasa     44444444 I         M  MMMM       M                        6 A  amp  44 4 ce   OQiemctocy  C  SSCDUTSDATA Fi lanana  FULLIRIS OAT       Figure 11 15  The pattern recognition menu with  Type of Graph  set to   Discriminant Scores      The Eigen Values and Eigen Vectors associated with this analysis will first appear as  shown in Figure 11 16  After examination of these values  press  lt ESC gt   and the confusion   error  matrix will be displayed  as shown in Figure 11 17     Scout Toturial 11 15    Chapter 11 Tutorial Ill    Film Data Clesasical Hxthod Rosust MHxthod PCA Gcagnica Syatan    SAARAAAAAEAAAAEAAAAEAAAAEAAAAEAAAAE AA  r      AA   ASEAAAARKAAAAREAAAAAAK     n  a  a  F    lt      E  W           lt   W  n  m  nu            Eigao Valuma    I 52 19720  2 0 2654    Ergun Wactoca    I 0 6294 1 5545 2 2012 2 6105  2 n u2z4  2 1545 0 75317 2 5592    Pcmas CP   to 2cinb ac CESC gt D to axit        Oremetocy  C  SSCDUTSUATA Fi lanana  FULLIRIS OAT       Figure 11 16  The Eigen values and Eigen vectors associated with Fisher s  discriminant analysis of FULLIRIS DAT     Film Data Classical Hmthod Rosust Natihod PCA Gcagnica Sysatan    SAA
36.  may choose it to be something else  i e    New    Select your choice with the    ARROW  keys and the   ENTER    key  or press the key corresponding to the first letter of  your choice  If your choice is not  New   Scout will automatically insert the correct values  for each variable in this observation  and the label will read  Arithmetic    Geometric   or   Median   If  however  your choice is  New   Scout will enter a value of 1E31 for each  variable and  Obs n  for the label  where n the observation number   You must enter the  correct values and label manually if you select  New   Simply move about the screen with  the ARROW  keys until you find the value or label you wish to change  type the correct  value or label  and press   ENTER       Scout User s Guide 3 2    Chapter 3 Managing Data in Scout    SUGGESTIONS   1  It is recommended that means  medians  or any other  summary statistics be inserted as either the first or last observation   2  Scout allows  insertion of only one observation at a time  If you wish to insert many observations with  additional data  it may be more time effective to exit Scout and insert the new data under a  different software  e g   a spreadsheet      Inserting Variables  This option allows the user to insert variables  1 e   columns  to  the data set  Move about the spreadsheet screen with the  lt ARROW gt  keys until you find the  column in which you wish to insert a variable  Press the  lt INSERT gt  key  You will then be  given a choi
37.  of the data  The second line of the file  must contain the number of variables  This number  p  must be an integer greater than or  equal to one and less than or equal to 22  The next p lines contain the variable names in the  first 10 columns  1 10   and the associated units in the next ten columns  11 20   Data  formats  in FORTRAN notation  can be included after the units in columns 21 30  Finally  a  comment for each variable may be included in columns 31 80  After line p 2  the remaining  lines contain the data so that each line represents one observation  Numbers must be  separated by spaces  commas must not be used  Missing values are designated by 1E31  An  observation identifier may be placed at the end of each line  This identifier or label can be up  to ten characters long and must be in quotes  The following is an example of a file in Scout  format     Geostatistical Environmental Data   3   Easting feet F7 1   Northing feet     F7 1   Arsenic ppm  Gl6 9   Cadmium   ppm  F10 3   Lead ppm F10 3   288 0 311 0  850 11 5 18 25  Sample 1   285 6 288 0  630 8 50 30 25  Sample 2   273 6 269 0 1 02 7 00 20 00  Sample 3   280 8 249 0 1 02 10 7 19 25  Sample 4   273 6 231 0 1 01 11 2 151 5  Sample 5   276 0 206 0 1 47 11 6 37 50  Sample 6   285 6 182 0  720 7 20 80 00  Sample 7   288 0 164 0  300 5 70 46 00  Sample 8   292 8 137 0  360 5 20 10 00  Sample 9   278 4 119 0  700 7 20 13 00  Sample 10     To save data in this format  select the option  Write ASCII Data File   S
38.  of the variables used in  the file   3  the number of missing values for any variable   4  the minimum and maximum  values for each variable   5  the mean of values for that variable   6  the standard deviation   sd    7  the percent coefficient of variation  and  8  the variance     Scout Tutorial 9 4    Chapter 9    Should you wish to save this file  it can be incorporated in word processing software   for example  import as ASCII  DOS  TEXT in WP6 0  press the   P   key  This option brings  forth a window asking for a file name  Fill in with an appropriate name  perhaps linking the  statistics to the data file they came from   and be sure to specify the path if different from the    Tutorial    default path indicated in the lower left corner  If no name is supplied  pressing the  lt ENTER gt   key will simply print the summary statistics to the local printer     9 4 Transformation of variables    The next option in the  Data  menu is the  Transform  heading  This option can be  used to perform variable transformation  The two headings within this menu are shown in  figure 9 3   1  the Kolmogorov   Smirnov goodness of fit and  2  the Anderson   Darling  normality tests  Various transformation functions can be obtained by choosing one of these    two tests     Choosing the Kolmogorov   Smirnov  Hogg and Craig  1978  goodness of fit test and  pressing   ENTER   will give a table of variable statistics  Choosing the variable you are  interested in and pressing the   ENTER   k
39.  sure that the variables in the plots were included  in the outlier test  Otherwise  the plot may include additional outliers     4 4 Causal Variables    After an outlier test has been executed  the user may wish to identify the variables  if any   which are responsible for each discordant observation  This is done by selecting the  Causal  Variables  choice from the pull down menu  Scout will retest each discordant observation with  one variable excluded at a time  Thus each discordant observation is tested p times using all  subsets of p 1 of the variables  A variable is listed as causal only if absence of the variable  prevents identification of the outlier  Although this procedure is based on iterations of rigorous  tests of hypotheses  the user should consider its results only as general guidance and not as  definitive proof of the cause  Starting with an investigation of the suspected causal variable  or  group  whose removal results in the largest decrease in the value of the test statistic is  recommended  As with any quality control technique  the results of these statistical procedures  should be combined with experience and knowledge of the measurement system for proper  interpretation of the data     The output is described as follows  The  Outlier column provides the observation number  and label of the discordant observation being tested   Test  shows the outlier test statistic  while   Crit  gives the critical value used in the test  The test statistic and crit
40.  the  transform you have just selected along with any constants  This window keeps a record of all  the transforms you have chosen for each variable  If a transform does not produce the desired  results  you may  undo  that transform by selecting the undo option from the transformation  menu     3 4 4 2 Logarithm    Transforms the data by using the natural logarithm  All of the data must be greater    Scout User s Guide 3 8    Chapter 3 Managing Data in Scout    than zero in order to use this transformation     3 4 4 3 Power and Box Cox    These two transformations will be explained together as they are very similar in usage   Both of these require a nonzero constant  a   After entering a value for  a   you have the option  of adjusting it  The value you entered will be displayed along with an incremental value   delta   Pressing the  lt   gt  key will increment  a  by delta and immediately reflect the results on  the screen  Likewise  pressing the  lt   gt  key will decrease  a  by delta and show the results   This gives you the ability to quickly try many values of  a  before you decide which one to  select  You may also adjust the delta value for larger of smaller increments  Press the   lt CTRL gt  and  lt   gt  keys at the same time to make delta smaller  Press the  lt CTRL gt  and  lt   gt   keys at the same time to make delta larger  The range of delta is from 0 001 to 1 0  When  you find the desired value for  a   press the   ENTER   key to accept it  If you cannot fi
41.  the mean concentration at each location is constant within the region under consideration  This  assumption is often violated by the data collected from a polluted site  Therefore  in order to use  OK to characterize the site under study  the data with spatial trend need to detrended so that the  constant mean assumption is satisfied     Scout offers the D Trend heading for removing trend that might be present in a  geostatistical data set obtained from a polluted site  It assumes that the data is in the same format  as for the pattern recognition option with the population IDs in the first column  Using an  appropriate multivariate technique  first the data has to be partitioned into various strata with  significantly different statistics  e g   mean vectors   Using the geographic information of the  sample observations  a site map can be prepared exhibiting the actual sampling locations and the  respective population IDs  The D trend heading when used subtracts the respective sub   population means from each observation in the corresponding sub population  The resulting data  thus obtained satisfy the constant mean assumption  An example is included in the tutorial  section illustrating its usage     5 8 Add Means    This heading is used after OK has been performed using the detrended data and a file with  extension  grd  has been created  The means subtracted using the D Trend option need to be  added back to the kriging estimates in the  grd  file  This can be achieved u
42.  the region bounded by  1100  1220    1100  1700    3000  1220      and  3000  1700   This will be performed for each of the 7 regions in the Pb add file     Scout User s Guide 14 33    Chapter 14 Statistical Procedures    14 11 Outliers in Discriminant and Classification Analysis   Discriminant and classification analyses are multivariate techniques concerned with  separating distinct groups  discriminant analysis  of observations and with allocating new  observations  classification analysis  to previously defined groups  populations   The separatory  procedure is rather exploratory  In practice  the investigator has some knowledge about the nature  and the number of groups  The study might be about k known groups  for example  k geographic  regions  k treatments  k analytical methods  k species  or k laboratories  In these cases  the  investigator knows the origin of each of the objects in a sample of size n obtained from these k  populations  However  some of these k groups may be similar in nature and can be merged together   The objective here is to establish g   k significantly different groups  Let s min g 1 p    then  s discriminant functions can be computed for these g p dimensional groups  Anderson  1984    Johnson and Wichern  1988    These functions are then used in all subsequent classifications   However  if the investigators have no prior information about the observations and their origin  then  they have to search for natural groupings of observations  unsup
43.  their statistics  the lusto gram of the chosen  variable  and the Transformation Menu     CAUTION  Use of the transform option will produce values that will replace the  original data  Care in copying the original data to another file prior to use of the transform  option will ensure retention of the original data     9 5 Summary     1  The first step in working with Scout is to read in a data file   Read ASCII File     heading       2  Editing data is a potent Scout capability and is not needed in these tutorials      3  The summary statistics for a data file can be produced easily  and the output may be  saved to a text file that can be incorporated in word processing software     Scout Tutorial    9 6    Chapter 9 Tutorial     4  The  transform  heading offers the options of two normality tests  Transformations    can permanently alter data values  copying to another file name prior to work is  prudent     Scout Tutorial 9 7    Chapter 10 Tutorial Il    Classical Method    The level 2  Classical Method  menu contains four headings  Select Variables   Generalized Distance  Multivariate Kurtosis  and Associated Causes  and two choices  Causal  Variables and Remove Outlier Flags  as shown in Figure 10 1  Remember  a data file must be  read before any analysis is possible     Film Data Classical Nathnod Robust Natnod PCA Gcagnica Sysatan    4AAAAAAAEAAA AE LA      USACAAAAEAAAAEGAAAAE AR AA AA AA AA AAEAAAAAA   sanas44444444444   Suimct Vaciaaima 4AAAAAAEAAAKACAAKAACAAKAAE AA
44.  understanding of their data sets to group  General  Cause  and subgroup  Specific Cause  variables which  according to their specialized knowledge   may be causally related  The user must specify the groupings that will be sequentially excluded  from the outlier test  Any group whose exclusion results in the observation no longer being  discordant will be listed as potentially causal  This is intended to aid the user in finding and  correcting physical causes of discordancy  Thus the groupings should correspond with known  physical causes  For example  a subset of the variables may have been measured on a single  instrument  It would be natural to group these variables so that Scout can investigate the  possibility that discordancies are manifest in the entire group of variables due perhaps to faulty  operation of the instrument  Variables may be grouped according to a variety of characteristics   The user should also run the  Causal Variable  routine and interpret the results of the associated  causes routine in light of the fact that discordancy in a single variable will cause all groups  containing that variable to appear causal     4 6 Remove Outlier Flags    The  Remove Outlier Flags  choice provides the user with a means of unmarking any  data that has been identified as an outlier  Once a procedure has identified outliers  these outliers  are colored red in the data file  The  Remove Outlier Flags  choice turns the red data back to  white  the original color of the da
45.  viewing multiple populations with ellipses defining each population     FORMAL GRAPHICAL OUTLIER IDENTIFICATION        11 5  Index plots  Here  we produced index plots using Huber influence and Prop  influence  The different results highlight the difference between these two  methods  The Prop method has the ability to unmask multiple outliers that the  Huber method did not detect         11 6  Generalized distance  This procedure also highlighted the difference  between Huber and Prop       11 7  Kurtosis  The value for kurtosis was calculated using  Generate Graph With    Current Options   This choice in the  Robust Analysis  menu is equivalent to an   Execute  function     INTERVAL ESTIMATES     11 4  Control charts  In this section we  1  produced simultaneous C I  and    prediction interval control charts  and  2  learned to use   Q   to display a graph  after a tabular output     Scout Toturial 11 28    Chapter 12 Tutorial IV    Classical Principal Component Analyses    The PCA module has five headings as shown in the Figure 12 1  After selection of the  data set for PCA analyses  and after selection of the desired variables  any of the four remaining  headings may be selected for data analyses  For this tutorial  select the data set IRIS DAT  Move  the cursor to  PCA  and press  lt ENTER gt   Use the Select Variables option to assure yourself that  the two width and two length variables are checked and that Count is not checked  If this is not  the case  use the plus 
46.  when  Graphic  module is highlighted from the  Scout s main menu        A 2 Dimensional or a 3 Dimensional Graphics can be displayed by using these  options  If the number of variables in the data set exceeds the number of  dimension chosen for the graphic option  then various variable combination can  be selected for the graphic display       The  System  module provides on line information of various Scout modules   Each section of the User s guide can be displayed in the screen by selecting the    appropriate section        Printer setup can be accomplished by using the  Printer Setup  option  and by  setting various parameters for the option     Scout Tutorial 13 6    Chapter 14 Statistical Procedures    14 1 Introduction to Statistical Procedures for the Identification of Multiple Outliers   Outliers  also known as extreme  anomalous  discordant  suspect  maverick  or influential  observations  are inevitable in data sets originated from many applications  In a manufacturing  process  outliers typically represent some mechanical disorder of the system  unexpected  experimental conditions and results  raw material of an inferior quality  or misrecorded values  In  biological dose response applications  outlying observations may indicate an entirely different type  of reaction  an unusual response  to a newly developed drug  In this case   outliers  may be more  informative than the rest of the data  In environmental and ecological applications  outliers could  be indicat
47. 2  information on individual observations  from coded symbols is lost  Use the  lt T gt  key to toggle from symbols to pixels  and from pixels  back to symbols     Scout User s Guide 7 5    Chapter 7 Graphics    Stop Rotations   Restore Original Plot  The user can stop all rotations of the graph by  pressing the  lt SPACEBAR gt   The user can also restore the original plot at any time by pressing  the   HOME  key  These features can be very helpful when the rotations get out of hand     7 9 Search Observation Mode    The user can identify individual observations that make up the graph  This feature is  called  Search Observation Mode  and is entered by pressing the   S   key  The user can scroll  through the observations with the up and down arrows   lt PGUP gt    lt PGDN gt    lt HOMES gt   and    END    keys  The user can also change the color of an observation by pressing the first letter of  the desired colors  The available colors are  Yellow   W hite   G reen   C yan   R ed   B lack  If  an observation is changed to black  that observation will be removed from the graph and the  graph will be rescaled when the user exits search observation mode  Likewise  a black  observation can be put back in the graph by changing its color  The  lt ESC gt  or   ENTER   keys  will return the user to three dimensional rotations     7 10 Quick 2D Graphs    The user can have Scout display quick two dimensional graphs of the current three  variables  The  X      Y   and  Z  keys are used 
48. 2 idth    5  0 0266    waciaaim    TE 4T    fe 4fa    Peogactian    Cumnulabivm      vVaciaaim    Sees   Bn r5s4  gt Imagta    Loading    Peogoction   11 944  Cumnulabivm   55 412    Loading Wacianiz    n 5zur gt ltangta    Peagoctian   E T  Cunmulatiwm   7r  062    Loading waciaa im       Loading   d    0 09 5 5    Loading   e    0 4701    Loading    Pemaa CP   to 2cinb oe CESC gt  to mxikt                       Figure 12 4  Table showing the component loadings for various PCs     12 3 Transform Data    IV    The last heading in the PCA module is  Transform Data   This option is used to replace the  original variables by principal components  To use this option  move the cursor to highlight   Transform Data  and press   ENTER    The two choices  Covariance and Correlation appear   as they did for  Display Matrices    Eigenvalues   and  View Components   For this tutorial  session  select Covariance  and press   ENTER    At this point  the explanation window as shown  in the Figure 12 5 will appear on the screen stating  4 variables transformed      Scout Tutorial    12 4    Chapter 12 Tutorial IV    Film Data Classical Nathod Robust Natinod PCA Gcagnica Syatan    4AAAAAAACAAAAE AA AA AA ARAS AR AAECARAAE AAA AE AA AA AA A A LA A R  a KARAK As  4AAAAAAAECAAAAECAAAAEAAAA E AA AA AA AA AA AA AA AACAAAACAAAA    SmImcE Vaciaaima leaaaaaa     4AA4AAAAACAAAACAAAAGCAAAAEAAAAEAAAACEAAAAC   AAAACAA4AA4   A4A4A44   Dalay MHabcicma leccnaee   4AAAAAAACAAAACAAAAC AA AA AA AA AA AA AA AA 
49. 4   X Y Cuucdinabmas Scalm Factor Lal to PE  canaaaae   Right Tail Cutote 0 025   aaaaaaaa   PPPA Tuning Conatant    ea naaaana   PETT TFTA Conteol Chact Limita o 05 Lea aai ds    aa  aaax l Tetnnming Paccant leccecccce     a  aaa  2   l Igaacm Population F  leaaaaanna     saanaaan l Plot Igoocad Population letcenceee       AAAAAAKR  SAAAAKAAARAAR     eaaa  aas l Accagt Naw Sattinga leccccccce     SOO on a 4     4  amp  4  amp  4 4  Dicmctacy  C  SCOUTISSOATA Filmahanm    NETHYL OAT       Figure 11 20  Statistics options for simultaneous controlcharts     Set the other options in the  Robust Analysis  menu to match those shown in Figure 11   21  Generate the simultaneous control chart for all observations  by moving to  Generate Graph  With Current Options  and pressing   ENTER    Except for the title  and the identities of a few  data points  your display should match Figure 11 22     Scout Toturial 11 19    Chapter 11 Tutorial Ill    Film Data Classical Natnod Rosust Natihnod PCA Gcagnica Systan    4AAAAAAACAAAACAAAAEAAAAG AA AA AA A LEELA   IM M  ARAAACAAAAEAAAAAA   44444AA446A4AALAAAA  A4A444A4464444444  Smlact Vaciaaimas  eanaaaakAAA44 44444   44444444644 AA0A4AALA4A44444464444444   Uniwactatm Statistics lecceateeceeceeeead     4  4AAAAAAACAAAACAAAAECARAACAAAAEAAA4A4A4A44   ROBUSE Aalysisa   PPPEPTTTET PPI TT TS    4AAAAAAALAAAACAAAACAAAAEAAAACAAAA4A444    Cuntuainae MHabcix  eaananaennnnAcenaaaa   4A444A4AREAAAACAKAAEAAAACAAA amp E AA 44444  Pattmco Racogaritian lexaaaa
50. 4A4A   AAAACAAA4464444444   0 Teand PEP EPETES ESTES TTT E  4AAAAAAACAAA4A4CAAAA   CAAA4A   AA4A44   4444444   Add Naana 4AAAAAAEAAAAEAAAARA   4A44A4A44A   A4AA4   AAA4A4   A amp AA4   AA4A44    4444444  Causal Vaciaaima 4AAAAAAGCAAAAEAAAAA     4A4A44AAAEAAAALAAAACA AAA AAA 4EAA AAAA4   Peint Omatination 4AAKAAAAEAAAAEAAAAA     4AAAAAAACAAAACAAAAE AA AA AA AA AAAAE AA  4EAAAACAAAAEAAAAAA   4AAAAAAACAAAACAAAAGAAAAE AA AA AA AA AA ARE AAA AG AA A AG AA AAE AA AAE AR AA AA AA AA AA AA AAA A   SBR BBE BBB BEB BB BBB AA AAA qM ARA AAAAERAAAAECAAAAEAAAAEAAAAAA   e4AA44444   44446444464444   4444   Kuctosia  o   9F    ananaae ann aea nhe n44A4eanAAnA     4AAAAAAACAAAACAAAACAAAA AA 02333 any hwy M a AAAAACAAAACAAAACAAAACAA AAA   4AAAAAAACAAAACAAAAGAAAAGAAAAG AA AA E AA AA AA AA AA AA AA AAGCAAAAGAAAAG AA AA AA AA AA AA AA   4AAAAAAACAAAAGCAAAAGCAAAAG AA AA AA AA AA AA AA AA AA AA AA AAGCAAAAG AA AAKG AAA AE AA AA AA AAA 4   4AAAAAAACAAAACAAAAGAAAAG AA AA AA AA AR AA AR AA AA AACAAAAECAAA AGAR AA AA AA AA AA AA AAA A   4AAAAAAACAAAACAAAAGCAAAAG AA AA AA AA AA AA AA AA AA AA AA AAGCAAA AGAR AA AA AA AK AA AA AAA A   4AAAAAAACAAAAGCAAAAGAAAAGAAAAE AA AA AA AA AA AA AA AAGCAAAAKGCAAAAG AK AA AAA AG AK AA AA AA AA   4AAAAAAACAAAAGCAAAAGAAAAG AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA A AG AK ARE AA AA AA AAA 4   4AAAAAAACAAAACAAAAGAAAAG AA AA AA AA AA AA ARA AA AA AA AA AAGCAAAAG AA AA AA AA AA AA AA AA A A   4AAAAAAACAAAACAAAAGAAAAG AR AAG AA AA AA AA AA AA AA AA AA AAGCAA AA AAA AG AA
51. 5 12  5 13    TABLE OF CONTENTS  con t     Chapter6 PCA    6 1 Classical Principal Components Analysis  6 2 Display Matrices   6 3 Eigenvalues   6 4  View Components   6 5 Transform Data    7 1 General Description   7 2 Modify Graph Colors and Shapes  7 3   Command Summary for 2D and 3D Graphics  7 4   2 Dimensional Graphs   7 5   Zoom Feature   7 6   3 Dimensional Graphs   7 7    Moving 3D Graphs   7 8   Change Size of 3D Graphs   7 9   Search Observation Mode   7 10 Quick 2D Graphs   7 11 Response Surfaces    Chapter 8 System information    8 1   User s Guide  8 2 Other options  8 3 Exiting Scout    Chapter 9 Scout Basics   Tutorial I  9 1 Nomenclature   9 2 Read Data Files   9 3 Examine and Save Statistics   9 4 Transformation of variables   9 5 Summary    10 1 Outlier Detection  10 2 Determining Causal Variables  and Removing Flags  10 3 Summary    6 1  6 1  6 2  6 2  6 2    Fl  2l  12  7 3  13  7 4  7 5  7 5  7 6  7 6  7 6    9 1  9 2  9 3  9 4  9 5    10 1  10 2  10 3    TABLE OF CONTENTS  con t     Chapter 11 Robust Method   Tutorial III    11 1 Q Q Plots   11 2 Q Q Plots of Principal Component Analysis  11 3 PCA Sactter Plots   11 4 Statistical Intervals   11 5 Index Plots   11 6 Generalized Distance   11 7  Kurtosis   11 8 Summary    Chapter 12 Classical PCA   Tutorial IV    12 1 Display Matrices  12 2 Eigenvalues   12 3 Transform Data  12 4 Summary    Chapter 13 Graphics and System   Tutorial V    13 1 Graphics  13 2 System  13 3 Summary    Chapter 14 Statistical Pr
52. A AA AA AA AA AA AA AA AAA A     T                   Canwvaciancm ake i eM    22 Imagtna 22 idth st Imagta gt widkn  a  E       u  a    langta 0 124  077 o o0168 o o1  wick 0 099 TEE n uiz o 009       u  2    langta O o18    DI u us o o068  widti n i  007 o 006 o o1f    ge  2t    Pcmas CP   to geint ac cESC  to axit                   4AAAAAAACAAAAGCAAAACAAAAC AAA AC AA AAG AA AACAAA AE AA A AK AR AR AK AA AA AA AA AA AA A AK amp K AA AA A   Dicmckacy  C  SSCDUTASSUATA Filmaoanm  IRIS  OAT       Figure 12 2  The covariance matrix for the four variables     After the covariance matrix is calculated  the matrix can be saved by using the  lt P gt  key  and typing the path and the file name to save the matrix     12 2 Eigenvalues    To calculate the Eigenvalues corresponding to various principal components  move the cursor  to highlight the  Eigenvalue  heading  press  lt ENTER gt   select Covariance  press  lt ENTER gt     again  and you will generate the cumulative variance table for various principal components as  shown in the Figure 12 3     Scout Tutorial 12 2    Chapter 12 Tutorial IV    Film Data Claaza cal Hamthod Roesust Hmxthod PCA Gcagnica Sysatan    4AAAAAAACAAAAGCAAAAG AA AA AA AAE AA AA AA AA AA AA AA A A AA Lhe LL LRA  4AA4AAAAACAAAACAAKAAEAAAACEAAAACAAAACAAAACAAAACAAAA  AA4A4A4   5mImck Vaciaalima Jecencee   4AA4AAAAACAAAACAAAAKEAAAAEAAAAEAAAACAAAACAAAACAAAACA4A4A44   Diagieay Matcicma  eaasaaa   4AAAAAAACAAAACAAAAGAAAAC AA AA AA AA ARA AA AA AA AA AA AAA    Vian 
53. A AA AK amp  AAA AAA   SAAAAAAAGCAAAACAAAAK AA AAE AR AA AA AACAAAAGCAAAAG AR AA AK AA AK AA AA AA AA AA AA AA AA AAA   4AAAAAAAGAAAAGCAAAAC AA AAE AA AA AA AAECAAAAGCAAAAG AA AA AK AA AR AA AK AA AA AA AA AA AA AAA   4AAAAAAACAAAAGAAAAK AA AAE AA AAC AA AACAAAAGCAAAAG AAA AG AA AA AA ARE AK AA AA AA AA AA AA AAA A   4AAAAAAACAAAAGAAAAE AA AAE AA AA AA ARE AA AAGC AAA AG AA AA AA AA AK AA AK ARE AA AA AA AA AA AA AA   Dicmctacy  C  SSCDUT945 SDATA Fi lanana  IRIS  OAT       Figure 10 1  The level 2  Classical Method  menu  and the  explanation window for  Generalized Distance       10 1 Outlier Detection    For outlier detection  select the IRIS DAT data file  First  choose the  Generalized  Distance  heading from the  Classical Method  menu  set the   to either 0 1  0 05  or 0 01   and  use the  lt ENTER gt  key to generate list of outliers in the data set  There are no outliers detected  using this method for any of the three   values  Due to masking  the classical Generalized  Distance test could not identify any outliers  Now use the Multivariate Kurtosis heading with the  same three   values  and  as shown in figure 10 2  with  set to 0 1  one outlier is detected in  the data set     1  The limitation of only three values for   in the classical Generalized Distance test can be overcome using the   Robust Method   selecting  Robust Analysis   setting  Display Graphs for     to Q Q Plot  Generalized Distance      Scout Tutorial 10 1    Chapter 10 Tutorial Il     Compute St
54. A RE or ch Had TO Mech at NG E Ken RS Classical    The discriminant analysis method heading has two choices  Linear and Quadratic  which  can be selected by using the   ENTER   key when the cursor is at Discriminant Method in the  third window  Statistics Options presents the same menu as described in Section 5 3    Use the down   ARROW  key to move the cursor to the last selection   Generate  Confusion Matrix With Current Options   Use the   ENTER   key to generate the Confusion  Matrix  Use the   ESCAPE    key to return to the third window if the parameters need to be  readjusted or other analyses performed     5 6 Pattern Recognition    The pattern recognition heading performs principal component and discriminant analysis   The data should be multivariate in nature with at least two variables  The first column should be  population ID numbers  a number from 1 to 20      When Pattern Recognition is selected  the explanation window will display the message   Pattern recognition using discriminant scores and principal components analyses   Pressing the    ENTER    key displays the third window revealing various headings  The available headings  and example choices for Pattern Recognition are as follows        Headings Example Choices  Statistics ODBOIS x Suse ve quo sed d RR Ra WE eee weed Classical  Nuinberulg  ss cies hierne bee neh oe Ue kw eS bee we bee E cR Observations  Contour Ellipse   ex 26 fo ee an Ce Bed ERE CAR ERK BE EAR eS Indiv  amp  Simul    Scout User s Guide
55. AA AA AA AA AA AAGCAAA AG AA AA AA AAE AA AA AAA AAA   Dicmckacy  C  SSCOUTISSOATA Fi lanana  IRIS  OAT       Figure 12 1  The PCA menu with Select Variables chosen  Count is the  only variable not selected  checked      12 1 Display Matrices    After the variables are selected  press   ENTER    returning you to the PCA menu  and  move the cursor to highlight the  Display Matrices  heading  There are two choices for this  heading   1  Covariance and  2  Correlation  Choose Covariance  Use the   ENTER    key to  produce the covariance matrix as shown in Figure 12 2  The diagonal elements are the variances  and the off diagonal elements are the covariances     Scout Tutorial 12 1    Chapter 12 Tutorial IV    Film Data Classical Hmthad Robust Hmsthod PCA Gcagnica Syatan    4AAAAAAACAAAAGCAAAAGCAAAACAAAAEAAKAAGE AA AA AA A AG AA AA A A A nh LLL KK  4A4AAAAAEAAAACEAAA4ACAAA4EAA ARA CAA AAEAA4kA6 A44 46A 4A4464444   Smimct Vaciaaima leaaacaa   4AA444AAAEAAAACAAAAEAA AACAA AAA AAA  AAA46AA amp  46444 4   4444   OQragtiay Matcicma leananaaa   4A4AA4AACAAAACAAA4ACAAA 46A AA CAA AA EAA 4 ACA A4 A46 A A444 64444   Eidgmowalumsa 4AAAAAAR   4ARARARAEAAAAEAARACAAA ACAAAACAA AA CARA AUA AR A6 A AAA CA A4  4   V i 4 Congonaots PPETTTTAM  4A4AA4AAEAAAAEAAAACAAA AGAR ARACAA AAEAA 4 A6 A A4 A46 A 4k 446 4444   Tcanatacmn Osta leaaanaa   4AAAAAAAEAAAREAAAACAAAAEAA ARECAA AA EAR AAEAA A46 AA 44644 4 A             M la e   4 4 4 4   4AAAAAAACAAAAGCAAAACAAAACAAAAE AA AAGAAAAGCAAAAC AA AA AR AA AA A
56. AA AA AAAACAAAA     4AAAAAAACAAAACAAAAGAAAAG AA AA AA AAE AAAAEAAAACAA4A4ACAA4A44   V nn Congonanta lean             Eigmaowalumsa leceeeee           4AAAAAAACAAAACAAAACAAAACAAAACAAAACAAAACAAAACAAAAEAAAA   Tcanatucm Data saaaaaa   4AAAAAAACAAARCAAARAEAAAACAAAACARARCAKAAEAAAACAA A464 4 4 4     e h a ah   4AAAAAAACAAAACAAAAGAAAAG AA AA AA AA AA AA AA AA ARA AA AA AAC AA AAECAAA AC AA AA AAA AG AA AA AA   4AAAAAAACAAAACAAAAGAAAAG AA AAE AA AA AA AA AA AA AA AA AA AAC AA AACAAAAGC AAA AG AA A A amp  AA AAA   4AAAAAAACAAAACAAAAGCAAAAE AA AA AK AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AK AA AA AAA A   4AAAAAAACAAAACAAAAGAAAAG AK AA AR AA AA AA AA AA AR AA AA AA AA AA AA AAKGAAAA amp  AA A AE AA AAA A   4AAAAAAACAAAACAAAAGAAAAG AA AAK AA AA AA AA AR AA AA AA AA AA AA AAG AA AA AAA AG AK AAE AA AAA A   RBBB BBB BBB BBE   A pm M3MMAAAAREAAAAEAAAAEAAAAEAAAAAA     AAA 444A   A444   A4444   A4444   A444     Vaciaaims Tcanatncnmd    ananaacananaeaanneanaannaAnaa   4AAAAAAACAA ARCAA AA CARA A464 A a Dm 0333 any hay M  a AAARACAAAACAAAAEAAAAAAAA   4AAAAAAACAAAACAAAAGAAAAEAAKAAE AA AA AA AA AA AA AA AA CARA AAC AA AA AA AAG AA AAKG AAA AE AA AAA   4AAAAAAACAAAACAAAAGAAAAGAAAAK AK AA AA AA AA AA AA AA AA AACAA AA AA AA AA AA AK AA AA AAA   4AAAAAAACAAAACAAAAGAAAAG AA AAE AA AA AA AA AA AA AA AA AA AA AA AAGAAAAGAAAAG AA AA AA AAA A   4AAAAAAACAAAACAAAAGAAAAG AA AA AR AAE AA AA AA AA AR AA AA AACAA AA AA AA AA AAG AAA AG AA AA AA   4AAAAAAACAAAACAAAAGCAAAAGAAAAG AA AAK AK AA AK AA AK AR AA
57. AAA AE AAAAEAAAAEAAAAE AAA AE AAA AE AAA AEAAAAEAAAAAAK   SRR A quam Fa ay Mabcicma eM Mesa AA      aaaaan  d Congutas and diagiays mibomc bha covaciancm nabcix oc bm  1enaannas     eanasaaxn  cocecmlationmateix  Ift any outlimes acm gemaant  thw usar fecccceene     PPPE dacidas hathar oc nok bhay acm to bx uamd  Nececeeenae  thAAAAAK  4AE   AAAAAA   SAARAAAAAEAAAAEAAAAEARARAAEAAARAEAKAAAERAAAAEAAAAEAAAAEARAARAEARAAAEAAAAEAAAAEAAAAEAAAARAR   4AAAAAAAEAAAACAAAACAAAAGAAAAGCAAKAAG AA AAGAAAAGAAAAGCAAAACAAAAGCAAAAC AA AA AA AA AAAA AA   4AAAAAAACAAAAEAAAACAAAAG AA AAG AA AA AA AAC AA AA AA AACAAAAGCAAAACAAA AC AA AA AA AA AAAAAA   AAAAARAAEAKAAAERAAAAEAAAAEAARAAEAAARAERAARAAEAAA AE AAA AE AAA AEARAA AE AAA AE ARA AE AAA AE AA AAAA     SAAAARAAREAAAARERAAAERAARAEAAAAEAARAARERAAAEARARAEAAAAEAAAARERAAAEAAARAEAARAAEAARAAREAAAAAR   Diecmctacy  C  SSCDUT995SSDATA Fi lanana  IRIS  DAT       Figure 6 1  The PCA menu  and a description of Display Matrices     The Select Variables heading has been discussed in earlier chapters  so we omit its  description here     6 2 Display Matrices    The user may choose to display the covariance and or correlation matrices  To do this   select  Display Matrices  from the PCA menu  Within this heading  users can remove outliers   found by the Classical Method  manually  If any outliers have been identified  Scout will ask the  user if outliers are to be used or ignored  Then Scout will ask the user which matrix he is  interested in  covariance or correl
58. AAAAAEAAAAEAAAAEAAAAEK AAA AEAAAAEA A  A       MSEARAAAREAAAAEAAAAAR                           Cantuagona Haba    1  Data   Sunday May Id  1795 l    Film   FULLIRIS OAT  Trtim   letra data in Full    Peadictad    Actual Pogi Pose Poss  Pogi so n o  Pop   n 46 z  Poss o      Doamewation Classification Oittacancas  Aun Nana Actual Pemdict    ri z 5  Ei 2 5    154 5 z  Pcmas   P gt  bn gciat nc CESCD to maxik mmm  Oicactocy  C  SSCDUTSDATA Fi lanana  FULLIRIS OAT       Figure 11 17  The confusion matrix associated with Fishers discriminant  analysis of FULLIRIS DAT     Scout Toturial 11 16    Chapter 11 Tutorial Ill    Press  lt ESC gt  once more  and the scatter plot of the first two discriminant scores is displayed   Pressing  lt E gt   will once again draw ellipses around the populations  as shown in Figure 11 18   Pressing  lt Page Down gt  three times will produce Figure 11 19  Discriminant Score 1 vs pt   length     isher   s Classical Discriminant Analusis   31l 3 Species     13 12          v     D  u   a     c  m  c   E  E   Ez   Pu  u  MW   P   a          4  t  5 05 6 68    Discriminant Score 2       Figure 11 18  Plot of Discriminant Scores with superimposed  ellipses     Scout Toturial 11 17    Chapter 11 Tutorial Ill    Scatter Plot of First Dizc  Score Us pt length     13 12       v     D  u   a     c  m  c   E   A   Ez   x  u  Wn   P   a          t  3 92    mBt lenath       Figure 11 19  Discriminant Score 1 vs pt length     Scout Toturial 11 18    Chapter 11 Tutor
59. AAAAKAAAEARAAAEARAAEAAAAEAAAAEAAAAEAARAAE AKA AE AA AAEAAAAEAAAAEAA A AE AAA AE AAA AE AAA AA   4AAAAAAACAAAACAAAACAAAAK AA ARAS AR AA AA AAGCAAAAC AAA AE AA AA AA AA AK AA AA AA AA AA AA AAA A   SAAAAAAAEAAAAGAAAAGCAAAACAAAACAAAAG AA AA AA AA AK AA AA AA AA AA AA AA AK AA AA AA AA AAA A   SAAAAAAAEAAAAGAAAAGCAAAACAAAAC AAA AE AA AAK amp  AR AA AA AA AA AA AA AA AA AA AK AA AA AA AA AAA   SAAAAAAAEAAAAEAAAACAAAACAAAAEAAAAE AA ARE AA AAE AK AAECAAAAE AA A AK amp  AA AAK amp E AA ARE AA AA AA AA AA   SAAAAAAAEAAAAGCAAAAGCAAAACAAAAC AA AAE AA AK ARA AA AA AA AA AAE AA AA AA ARE AR AA AK AA AA AA AA   SAAAAAAAEAAAAEAAAACAAAACAAAAEAAAAE AA AAE AA AAGC ARA AACAA AA AA AA AA AA ARA AAG AA AA AA AA AA   dAAAAA AC qu Eni b Daba M EEEE  e  aanaaan d Edit oc wian tha daba i n nanocy  Editing bha ssamewations   eaanasns   cartita allows tha uamc to modify tha labala and tne data valuxma   feccccaaae     PEPPE   Tha titis and waciaa2im intocnation may alzo 3m mdibmad  e cenananns     SARRAR 4A   AAAAAAR   SAAAAAAACAAAAEAAAACAAAAEAAAAK AK AAGAA AA AAA AE AR AAE AA AAKGAA AA AA A AG AK AA AA AA AA AA AA   4AAAAAAAEAAAAGCAAAACAAAAE AAAAG AA AAGCAAAAE AA AA AA AA ARA AACAA AA AR AR AK AA AA AA AA AAA   SAAAAAAACAAAACAAAAGCAAAACAAAACAAAAGAAAAE AA AA AA AA AA AAKGCAA AA AA AA AK AA AA AA AA AA AA   4AAAAAAAEAAAAGAAAACAAAACAAAAC AR AAG AA AA AA AA AA AA AA AACAA AA AA AA AK AR AA ARE AA AA AA   SAAAAAAACAAAAGCAAAAGCAAAACAAAACAAAAG AA AA AA AA AA AAKG AA AA AA AA AA AA AK AA AA AAG AA AA AA   Ore
60. AAEAA AAEAAAAA4A     SAAAAAAAEAAAAEARR  SAE  AAAAEAAAAEARAAAEARAAAEAAAAEAAAAEAAAAAA   SAAAAAKAAEAAAAEAAAAEAAAAEAAAAEAARAAEAAARAEAAAAEAARAAEAAAAEAAAAEAAAAEAAA AE AAA AEAAAAAA   SAAAARAAAERAARAEAAAAEAAARAEAARAAEARAAEARAAAEAAAAEARAAAEAKAARAEAAAAEARAAAEAAAAEAAAAEAAAAAAK   AAAAAAAEAAAAEAAAAEAAAAEAARAAEAAAAEAAAAECEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAKAAAR   SAAAAAAARERAAAEAAAAEAAAAEARAAAERAAAEAAAAEAAAAEAAAAEAAAAEAAAAE AAA AE AARAAEAKAAAEAAAAARK   SAAAAARAARERAARAEAAAAEAAAAEAAAAERAAAEAAA AE AA AAREAAAAEARAAAEAAAAE AAA AE AAAAEAAAAEAAAAAK     PE  quue       Sm Imcb Vacianima            ETT ETT    anaanaand Allows thw uzmc Lo axmimct a au33z L of thew vaciaa2imas to oe PP3YYPPTPT    eaaaanaad usad in bha nubi  mc twats and causxm3 coutionsa  Tha dat uit e eenaanaaa     anaanaad   to uam all ot tha vaciaaima  4AAAAAAAA   CO ee ci   AAKAARAAKAEAAAAEAAAAEAAAAEAARARACEARARAAEARAAAEAAAAEAAAAEAAAAEAAAAEAARAACEARAAEAAAAEAAAAAAR   AAKAAAAAERARAAAEAAAAEAARAAEAARAAEAAAAEARAAAEAAAAEAAAAEAAAAEAAAAEARARAAEAARAAEAAAAEAAAAAAR   AAKAAAAKAEAAAAEAAAAEAKAAAEAARAAEAAAAEARAAAEARAAAERAAAAEAAAAEAAAAEARARAAEAARAAEAAAAEAAAAARK   SAAKAAAAKAEAAAAEAAAAEAAAAEAARAAEARAAAEARAAAEAAAAEAAAAERAAAAEAAAAEAAAAEARAAAE AAA AEAAAAAAR   AAKAAAAKAEAAAAEAAAAEAAAAEAARAAEARARAAEARAAAERARAAAERAAAAEAAAAEAAAAEAAAAEARAAEAAAAEAAAAARK   Dicmckacy  C  fOS6 Fi lanana  TEST OAT       Figure 41  The six options ofthe Classical Method menu  and the  explanation window for Select Variables     CAUTION  The removal of data values should not
61. AEAAAAEAAAAEAAARAEAAAAEAAAAEAAAAEAAAAAR     AAEARAAAERAAAAEAAAAE AAA AE AAA AE AAA AE AAA AE AAA AE AAAAEAAAAEAAAAAKR     SARAAKRAAAEARAAAEARARAAEAARAAEAARAAEAAAAEAARAAEAARAAEARAAE AAA AE AAA AE AAA AE AAA AE AAA AE AAA AAA   4AAAAAAAEAAAACAAAACAAAACAAAAE AA AAG AA AAGC AA AA AA AA AK AA AA AA AA AA AK AA AA AA AA AAA A   SAAAAAAAEAAAAEAAAAGCAAAACAAAAG AA AA AA AA AA AA AA AAG AAA AG AA A AC AA AA AK AA AA AA AA AAA A   4AAAAAAACAAAACAAAACAAAACAAAAEAAAAGCAAAAGCAAAAE AA AA AA AA AA AA AA AA AK AA AA AA AA AA AA   SRE AG quM pxad AZCOO Fa eM Mea AA AAA     4444444 Raada a data amt Ecom an ASTI  Film on any disk  Tha tila feccccccea   PPP tocnmat  dat inad io tha Usac a Guidm   3 GED EAS congpati  lxs  beaaaaasans     PEPEPEPE   CAUTION  Oata i  manocy wil  om loast  4AAAAAAAAR     F  LL  SSS  S                     M                                                 AAAAAAA     4A   AAAAAAR   4AAAAAAAEAAAAGCAAAACAAAACAAAAC AA AAG AA AA AA AA AA AA AK AA AA AA AA AA AK AA AK AA AAA AAA   SAAAAAAACAAAAEAAAAGCAAAACAAAAG AA AAG AA AAGCAA AA AA AA AA AA AA AA AA AA AK AA AA AA AA AA AA   4AAAAAAAEAAAACAAAACAAAACAAAAG AA AAG AA AAC AA AA AA AA AR AA AA AA AA AA AK AK AA AA AA AA AA   qARKAAKAAAEARAAEARAAAEARAAAEARAAEAARAAEAARAAEAAAAEAARAAEAAAAE AAA AE AAA AE AAA AE AAA AE AA AAA   4AAAAAAAEAAAAGCAAAACAAAACAAAAGAAAAGCAAAACAA AA AA AA AK AACAAA AC AA AA AA A AC AA AA AAA AAA     icmckacy  C  SSCDUTSDATA Filmhanm  IRIS  OAT       Figure 9 1  The first window in Scout  showing the level 1 m
62. Congponsnta leanananaa     d Eigmaowalumsa 4A4A4AAAA   4AAAAAAACAAARCEAAAAEAAAACAA AREAR AR CARA AACAAA amp 6  A4 444    A4444      4AAAAAAAEAAAAEKAAAAEK AAA AE AAA AE AA AA AK AA AA AA AA AA AAA Teanatocnm Data leaaaaaa   4AAAAAAACARARCAAAACAAA AGAR AREAAARKEAAAACAA A ACA A A 4     4 4 4 R MM a t h ttt   4AAAAAAAGCAAAAGCAAAAG AK AAEAAAAE AA AA AA AA AA AACAAAAG AA AA AAA AK AK AK AA AA AA A AE AA AAA   p             Cunmulativm Vaciancm Taala iCovaciancm       AI    Conaonmat Ei gmowalum Dit tmcmacm Peagoctian Cumulatiwm      EEE m   n m         LEO A Al  0 2565 0 1773 re 4fa re 4fa  0 05697 n ainti 11 942 EE 412  0 0266 oo org G amp  tfa 9    062  n gu   o o 2 922 100 02    239cmas CP   to geint oc ESOS to weit      4AAAAAAAGCAAAAGAAAAC AA AA AA AA AA AA AA AA AA AAGAAAAG AA AA AA AA AK AA AA AA AA AA AA AAA   Dicmctacy  C  SCOUTISSOATA Filmahanm  IRIS  OAT       Figure 12 3  The cumulative variance table for the four principal  components     To view the Eigenvalues  press  lt ESC gt  to return to the PCA menu  move the cursor to  highlight  View Components   select Covariance  and press  lt ENTER gt  to generate the table for  component loadings as shown in the table 12 4     Scout Tutorial 12 3    Chapter 12    Tutorial    pr Congonant Luadingas iICovaciancm Habcix  n     vVaciaaim     ril  ag Imagta  ft widkn    waciaaim    22 Imagta  gt idth    waciaaim    Eigmowalum      Eigmowalum      Eigmaowalum      I  0 2565    vVaciaaim    a9 width    2  0 05697    waciaaim    2
63. Covariance  Weights Beta    Chi Squared    X Y Coordinate Scale Factor  96  An integer betweeon  100 and 100    Scout User s Guide 5 7    Chapter 5    Right Tail Cutoff    Tuning Constant  Control Chart Limit    Trimming percent    Ignore Population      Plot Ignored Population    Robust Statistical Methods    A number between 0 01 and 0 8  to be used  with Huber or PROP     A number between 0 1 and 5 0  A number between 0 01 and 0 5    An integer between 0 and 100  to be used  with Multivariate Trimming     A non negative integer to represent the  population not to be considered in the analysis    Yes No  The last two headings assume that  the data set has the population ID in the first  column     NOTE  This Statistics Options menu is also shared by the three other procedures in the Robust  Analysis main menu  Confusion Matrix  Pattern Recognition  and Causal Variables  The  explanations of these headongs will refer back to this description     For the last four headings in the fourth window  Statistics Options   given above  the  numbers for choices can be typed to the screen after using the   ENTER   key when the cursor  is on the corresponding statement  The other choices can be selected by using the   ENTER   key  repeatedly  After all selections are made  move the cursor to the bottom of the fourth window  to the  Accept New Settings   Use the   ENTER   key to accept the selected choices for the   Statistics Options  and return to the third window     The remaining hea
64. EAARAAEAAAAERAARAAEAAA AE ARA AE AAA AE AAA AE AAA AE AAAAAR   SAAKAAAARERKAAAEAAAAEARAAEAKAAAEAARAAEAAAAEAAAAERAAAAEAAAAEAARAAEAAAAEAAAAEAAAAEAAAAAAR   SAARAAARAARERAAAEAAARAEARARAAEAKAAARERARAAEAAAAEAAARERAARAAEAAARAEAARAAREAAAAEARAAAE AAA AEAAAAAKR   AAAAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAE AAA AE AAA AE AAA AE AKA AE AAAAEAAAAAA   AAAAKAARERAARAAEAARAAEARAAE AAA AEAARAAEAARAAE AAA AEARAAAEAAAAE ARA AEAAAAEAAA AE AAA AE AAAAAAK   SAAAAKAAARERARKAEAAAAEAARAAEAAAAEAARAAEAAAAEAAAARERARAAEAAAAEAARAAEAAAAEAAAAEAAAAEAAAAARK   SAAAAARARERAAAERAAARAEAARAAREAAAARERARAEAAAAEAAAARERAAAEAAAAEAARAAERAAAAEARARAAEAAAAEAAAAARK   SAARAAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAE AAA AE AAA AE AAA AK AAA AE AAA AE AAA AA amp      AAKAAAAARERAAAAEAAAAEARAAAEAAAAEAAARAEAAAAEAARAARERAAAAEAAAAEARAAEAAAAEAAARAE AAA AE AAAAAARK     Diecmcktacy  C  SCDUT9SSDATA Fi lanana  IRIS  OAT       Figure 9 3  The level 2  Data  menu with the  Transform  heading    selected  and the level3  Transform  menu showing headings    for the two different normality tests     Scout Tutorial 9 5    Chapter 9    Tutorial      including  Z  standardization    Logarithmic  Box Cox type  Johnson and Wichern  1988      Power  square root   and more  are available in Scout     Film Data    Classical Nataod    Rosust Nataod    PCA    Geagnics    Sysatan    HE a ng ARK R AACA ARKO Tecanat ncmat ina Wanu      Naan  7 so      4     ta    lt a   ta   aa  4     4   4    4  4                4    A44  4          4    AA  C
65. Fi lanana  FULLIRIS U0AT       Figure 11 11  The  Pattem Recognition  menu  with numbering set to   populations      Select  Begin Computations with Current Options   press  lt ENTER gt   and the scatter plot  for the principal component scores will be drawn  Press  lt E gt  to draw the ellipses around the three  populations  and the scatter plot should match that shown in Figure 11 12     Scout Toturial 11 11    Chapter 11 Tutorial Ill    Scatter Plot of POs call 3 Species              Pw  o  u  I    2   a             n nt  PO Score  2       Figure 11 12  Scatter plot for the principal components of all  three species  The populations are identified by  number and defined by ellipses     Next  use  lt Page Down gt  once to view PC Score  1 vs PC Score  3  You will notice the  largest ellipse extends past the  Y  axis  as shown in Figure 11 13     Scout Toturial 11 12    Chapter 11 Tutorial Ill    Pattern Recognition             Pw  o  u  I    2   a          A  t     n  amp 4    n   g  PO Score  3       Figure 11 13  Three populations with one ellipse extending  beyond the boundaries of the graph     Scout possesses the capability to scale this scatter plot so that the entire ellipse can be  seen  Press  lt ESC3 gt   select  Statistics Options   press   ENTER    select  X Y Coordinates Scale  Factor       press   ENTER    type in 20  press   ENTER    again  and regenerate the scatter plot   Figure 11 14 shows the result  all three ellipses are now entirely on the screen  The 
66. HA  tA     o CAAA  AA  CHK CHR                                         4     ta   NaxeS 6           Kn Innoc awe    Teana vVaciaa im    count    2 langtn  32 idth  gt Imagta  gt widkn    r  l  l  l  l  l  l  l  l  l  l  l  I Nined s      r  l  l  l  l  l  l  l  l  3        CENTERS  Teansatocn   CH    Figure 9 4     a    so  so  so  so  so    EI  5  l   o   gt     SAAAARAAREARAAAEAAAAEARAAR      aaanaaeannaeaaAnAneaa  e l Linmac     x I al      aaaaaaeannnenannncaaaa l Logecitan    AARAARAAEAAAAEAAAAEAAAA     Ponar    Jeccccaceccccecannenanal Box Cox  lx       4AAAAAACAAAAEAAAACAAAA     A4A4AA  4444   A444   4444   Unde Laat Teanatocn   eaaanaacaaane an A4saaaua    EESC whan Piaranad        Accaiom          a      accainn LX     SAARARAARERAARAEAARAAEAAAAEAAAAERAAAEAAAAEARAAAEAAAA     SAARAARAARERAAAERKARAAEAARAAEAAAARERARAEAAAAEARAAREAAAA      anaanaennaaneannAekenAn4eAnnAACAAA4 EAA444   4AA46AAAA     hertrtttetttRERRRRERRRRERRRRERRARERRRRERRRRERRRR     StcdOaw  I      o o     558   ST9  ita   105    Zhs32 7 323    IE   n  1b  n u  4  o 105  1 216    Twat Stat    IESI  0 11s  o 105  O 155  n 549    4AAAAE  ARKAAEAKAAAEAAAAEAAAAEAAAAEAAAKAEAAKAAEAAAAAA     Zmicanw Nacmaliby Twat lalgna  D U0S  m    Cert Wal  a       12s  12s  12s  12s  12s  tHe at ogean     CPS Pork   CE SCS Exit          Selection of the  Transform  heading  followed by selection  ofthe Kolmogorov Smimov normality test heading and one  additional pressing of the   ENTER key produces the list of  variables and
67. Haximum USL       a5  Warning USL    4        c  v  c  D  a     9  a      m  a  E  u  c  E  b  a    3557 Warning LSL    z       Sax Maximum LEL    23 0             t t t   1 85    n   3  28    Theoretical Quantiles  Normal LDiztributiaoan        Figure 11 8  Q Q plot of principal component  1     Next  we use the data file  containg all three species of iris  FULLIRIS DAT  go to  File    select  Read ASCII File   select FULLIRIS DAT  and press  lt ENTER gt    Return to  Robust  Method   select  Robust Analysis  and change  Numbering     from  Observations  to  Populations  using the  lt ENTER gt  key  Next move to  Generate Graph With Current Options    press   ENTER   and the three different species of iris should be distinguished on the garph as  three different sets of numbers  as shown in Figure 11 9  This figure immediately suggests that  there is more than one population  It is remarkable to see how the observations from the three  populations are grouped togeher on this graph     Scout Toturial 11 8    Chapter 11 Tutorial Ill    Robust Analusis    4        c  v  c  D  a     o  a      m  a  E  u  c  E  b  a    nso    111i  aat  a1  1          t t     1 8 amp 5    n   3    Theoretical Quantiles  Normal LDiztributiaoan        Figure 11 9  Q Q plot of the first principal component  three  populations  species  present     11 3 PCA Sactter Plots    PCA Scatter Plots can be produced by selecting  Scatter Plot  PCA   from the  Select  Graph Type  menu found under  Display Gra
68. Plot  PCA    Q Q Plot  Generalized Dist      Scout User s Guide 5 6    Chapter 5 Robust Statistical Methods    Control Charts Indiv  Xi   Control Charts Simult   Xi   Control Charts  Defects    CI Limits Population Mean  Prediction Intervals   Index Plots   Multivariate Kurtosis    Use arrow keys to reach the desired procedure and then press the  lt ENTER gt  key to make  a selection from this list  The fourth window will disappear and the third window will reappear  with the selected choice listed after  Display Graph For      CI Limits Population Mean  This choice outputs the relevant statistics and the limits for  confidence interval for mean on the screen  These limits can be graphed by pressing the letter   Q   or  q    The Prediction Intervals can be graphed similarly  The Control Charts Simult  Xi   choice produces the graph for simultaneous confidence interval for selected settings as described  in Singh and Nocerino  1995   Multivariate Kurtosis simply computes the multivariate kurtosis  for the selected options  No graph is generated for this procedure  Some of these options are  discussed in the tutorial section     Move the cursor to the  Statistics Options  heading  Use the   ENTER   key to display  the menu  The various choices for the  Statistical Options  headings are listed as follows     Heading Choices  Compute Statistics Using Classical    Huber Influence  Proposed Influence  Multivariate Trimming    Initial Estimate Classical  Robust  Matrix Correlation  
69. REARAACAAAACAAAACAAAAC AAA AK AA A A amp  AA AAA     aaaaaaa   Peint Data  ennna AAA AAAEAAAAREAAAACAA ARCAA ARCAR ARCAR AREAR AKEAR AKEAA AA      4 4 4 4 4 4 4 D ce A44 4644446 A amp AAA6ARAA6AAAAEAAAAEAAAAEAAAAEAAAAECAAAACAAAAAA   4AAAAAAAEAAAAGAAAAGCAAAAGCAAAACAAAACAAAAC AA AA AK AAE AK AA AA AAG AA AA AA AA AA AA AA AA AA          vVaciaaim a Nia Nia Max Naan StdOaw acy Wactanca    count so 1  i  1 0 o o o o o o   ap langth so i  Si   006  5 52 LONI o ize   a9 Width so 2  i   426  579 11 036 0 144   gt Imaqkh so 1  I   4B2 SETA tl ere o 05  n  n     ft idth so    RAIL    105 42 64 o o1t    Qemasa CP   to geink nc cCESC   to mxik             4AAAAAAACAAAACAAAAGAAAAGCAAAACAAAAC AAA AC AA AA AK AA AA AA AK AA AA AA AA AAGCAA AA AA AAA A   Dicmckacy  C  SSCDUT9SSOATA Fi lanana  IRIS  OAT       Figure 9 2  The level 2 menu for the  Data  heading with the summary  statistics for IRIS DAT displayed     We are skipping  Edit Data   this is a potent choice with the potential to drastically  change the output we are trying to lead you through  while learning Scout we really have no  need to edit data  Keep in mind that this choice is available and will allow you to alter the  input data file  including the deletion or insertion of columns  variables  or rows   observations      The summary statistics describe IRIS DAT  or whatever data file you used  in terms of    Scout Tutorial 9 3    Chapter 9 Tutorial       1  the number of data points in the file   2  the number and identities
70. Robust Method heading of the Scout software  package     The successful identification of anomalous observations depends on the statistical  procedures employed  The classical Mahalanobis distance  MD  and its variants  e g    multivariate kurtosis  are routinely used to identify these anomalies  These test statistics depend  upon the estimates of population location and scale  The presence of anomalous observations  usually results in distorted and unreliable maximum likelihood estimates  MLEs  and ordinary  least squares  OLS  estimates of the population parameters  These in turn result in deflated and  distorted classical MDs and lead to masking effects  This means that the results from statistical  tests and inference based upon these classical estimates may be misleading  For example  in an  environmental monitoring application  it is possible that the classification procedure based upon  the distorted estimates may classify a contaminated sample as coming from the clean population  and a clean sample as coming from the contaminated part of the site  This in turn can lead to  incorrect remediation decisions     It is well established among practioners that for the identification of multiple outliers  one  should use robust procedures with a high breakdown point  The estimates obtained using the  robust procedures should be in close agreement with the corresponding MLEs when no  discordant observations  from different population s   are present  Robust procedures for the 
71. SCOUT USER S GUIDE    NOTICE    Although the production of this report was funded wholly by the United  States Environmental Protection Agency through contract 68 CO 0049 to  Lockheed Environmental Systems  amp  Technologies Company  it has not been    subjected to Agency policy review  and no official endorsement should be  inferred     TABLE OF CONTENTS    Chapter 1 Preliminaries   1 1 Introduction   1 2 Manual Organization   1 3 Installing Scout   1 4 A Viewing the User s Guide    Chapter2 Scout File Format    2 1 File Management   2 2 Reading Spreadsheet Files  2 3   Load Scout File   2 4     Save Scout File   2 5 Merge Two Files   2 6   Append Two Files    Chapter 3 Managing Data in Scout  3 1 Data Management   3 2 Scout functions and operations  3 3 Summary Statistics   3 4   Data Transformation   3 5 Print Data    Chapter 4 Classical Methods for Outlier Identification   4 1 Introduction to the Classical Methods for Outlier Identification  4 2 Select Variables   4 3 The Classical Outlier Tests   4 4 Causal Variables   4 5 Associated Causes   4 6 Remove Outlier Flags    Chapter 5 Robust Statistical Methods    5 1 Introduction to Robust Statistical Methods  5 2   Choices of robust analyses   5 3 Univariate Statistics   5 4 Robust Analysis   5 5 Confusion Matrix   5 6 Pattern Recognition   5 7 D Trend   5 8 Add Means   5 9 Causal Variables   5 10 Print Destination    iii    3 1  3 4  3 5  3 5  3 8    4 1  4 2  43  4 3  4 4  4 4    5 1  5 2  5 3  5 5  5 9  5 9  5 11  5 11  
72. Simultaneous Individual  Zero Lower Limit No Yes    When satisfied with all heading choices  use the down   ARROW  key to move the cursor to the  final selection   Begin search for causal variables   Use the   ENTER    key to generate the table  for Robust Causal Variables     5 10 Print Destination    This heading will create graphics files with an   eps  extension  The HP LaserJet III choice  will print the screen graph to a LaserJet III printer  Typing  F  will write the graphics screen to  a  pcx  file     When Print Destination is selected  the second window will display the message  Choose  print destination for graphs   When the   ENTER   key is pressed  three choices are displayed  in the third window as follows     HP LaserJet III    QMS ColorScript 100  Encapsulated Post Script    Scout User s Guide 5 15    Chapter 5 Robust Statistical Methods    Use Encapsulated Post Script to save the graph and data output files in a format that can  be imported to a word processing software such as Word Perfect  This option will create a  graphics file with the extension   EPS   The HP Laserjet III choice will print to the screen  or to  a Laserjet III printer  Pressing  lt F gt  can be used to write the graphics to a   PCX  file     Scout User s Guide 5 16    Chapter 6 PCA    6 1 Classical Principal Components Analysis    For simplicity and convenience  a separate principal component analysis  PCA  menu has  been included in Scout to perform the classical PCA  The Q Q plots  sca
73. Taco Lowmc Limit leceeeecee   PPP Limit Styla 4AAAAAAAA   eaaaaa  2n   X Axis Vaciaanim leaanaannaa     aaanaaa l Y Axia Vaciaanim leceeencec   saaaaaan l Titis Rosuat Analysis leaaaaanas   eaaaaa  2A   X Axis Titis 444444444   PEPEE   Hunamciag Daamcwabina as lececeaeads  eeneeane   Contouc Elligam Indiv    Simul leaasaanas       aaaaaaa     enaaaaaana     aaaaaa2A   Ecasm Outguet Film 444444444   eaaanaaa l Wcome Saignts    Gaowcealizad O 1atencas IRIS MTSZ leceeencee     eatecaee   Gmaomcabm Geagn With Cuccmat Dpt iona Jecenccccc     4AAAAAAA  4ACAAAAAA   OQiemetocy  C  SSCDUTSUDATA Filmahanm  IRIS  DAT       Figure 11 7  The Robust Analysis menu prior to the generation of Q Q  plots of PCA     The principal component Q Q plot should be similar to Figure 11 8  with the possible  exception of the eight labeled data points  which could be present by using the  lt SHIFT   gt   technique on the highlighted points  as described earlier   From this graph  it is clear that the  observations come from a single population  Setosa   The Q Q plots for the other three principal  components can be obtained by using the  lt PAGE UP gt  or  lt PAGE DOWN gt  keys  Users can  press the  lt N gt   or  lt n gt   key  which will number all of the points on the graph  Pressing the  lt N gt   key again will cause all numbers to disappear  Note  All keys used in generating graphics work  similarly  toggling on and off with repeated use      Scout Toturial 11 7    Chapter 11 Tutorial Ill    42 2 
74. Tutorial 13 2    Chapter 13 Tutorial V    For a 3 dimensional scatter plot  highlight  3 Dimensional  from the  Graphics  menu   and press   ENTER    to display the three dimensional scatter plot  At this point the variables  included in the data set are listed in the upper left corner of the display  One of these variables  will be highlighted  use the   UP   or   DOWN  arrow keys to highlight any variable to be  considered in the scatter plot  After the variable is highlighted  use the key pad to designate that  variable by pressing   X      Y    or   Z    and use the   ENTER   key to generate the three  dimensional graph  Press   ENTER   one more time to position the graph in the center of the  screen as shown in Figure 13 4     Iris data in set    Variables  5p  length  sp width  pt length    Help   Exit  Variables  Search Mode              Figure 13 4  The three dimensional graph of sp length  x axis  vs sp   width  y axis  vs pt length  z axis      To view the data from different perspectives  the 3 dimensional scatter plots can be rotated  by using the   RIGHT      LEFT      UP    or   DOWN  arrow keys  By increasing the number  of strokes the speed of the rotation can be increased  To reduce the speed use the opposite arrow  key  The rotation can be stopped at any position  see Figure 13 5  through neutralizing the  rotation effect by using the equal numbers of strokes using the opposite arrow keys  or by  pressing the  lt SPACE BAR gt   Several other features are ass
75. a   taco Lowmc Limit leceentere   eatcnnen   Limit Styia 4AAAAAAAR   PPPN   X Axis Vacianla hesttasiass       44a444 l Y Axia Vaciaanim 4AAAAAKAAAK   sa  4aa44   Titis Jecercccac     anaaaana l X Axis Tibia 4AA4A4AAAAA   PEPEPEPE Hunamciag Dasamcwab inazs lenanaanan     a  anaaa l Ceontouc Elligam ladiwidual leceecceac   4AAAAARAAR  SARRAR     anaaaaa l Ecasm Dutquet Film 44A4A4AAAAAK       anaaaaa l Wgme WSaigqnts  amp  Ganwcalizad O atencas 4 METHYL TS lecenccece       a444444   Gmamcabm Geagn With Cuccmat Dot iona Jeceecccae     td  4AA4AAAAA  4A   AAAAAA   Dicmctacy  C   SSCDUTSUATA Filmahanm  4 NETHYL OAT       Figure 11 23  The  Robust Analysis  menu for Prediction Intervals     Scout Toturial 11 21    Chapter 11 Tutorial Ill    Film Data Clazazaical Hamthod Robust Nathnod PCA Gcagnica Syatan    4AAAAAAACAAAACAAAAEAAAAE AA A A AA A A A R MA M RARR RRRA ERRAR R  4A4A44AAAALAAAACAAAACAAAAEA 44464444444  5mimct Vaciaaima leccceeceeeeaeecene   dRA amp ARAREAAAALAAAA  AdARCAAAQUA amp AA42 Uniwaciabm Statistica PP PEPPER  4A4AAAAACAAAAEAA ARE AAA AE A4 446 AAA 4444  Rouat Analysis leccetecececteacced   44A44AAACAAAACAAAA   AA4A46A4 446    4444444  Contusion Mabcix PPTYYVPTETPP TTA    4A44AAAACAAAA AA gp Pomdacbion lnbmcvala k LH CAL AA AE AKA KAA     dadaasaducaadAza d e caaanad  nAxacu Aaa     sa4anauaasaaaaaan A Natnod Peon Int lumacm Jecccccccectceceee   4AAAAAAAEAAAAAAA Naan 26 5512 feccceeceeerncerne   4A4AAAAAAECAAAAAAA Standacd   mviabina 5 r599  eanaaaacaaaaenaaa     
76. ad O iatencas IRIS  WTS leaaaaanaaa   cenecace l Gmamcabm Gcagn With Cuccmat Dot iona leceeecee   4AA4AAAAA  4A   AAAARAAR   Diecmctacy  C  SSCDUT 9SS DATA Filmoana  IRIS  DAT       Figure 11 1  The menu for  Robust Method  with  Robust Analysis   selected  and the menu for  Robust Analysis  displayed     Select the first heading in the  Robust Analysis  menu   Display Garphs For  and press   lt ENTER gt   A menu entitled   Select Graph Type  will appear  as in Figure 11 2  Select    Scout Toturial 11 1    Chapter 11 Tutorial Ill     Q Q Plot  simul  raw data   and press   ENTER    The menu will disappear  and the previous  window will now indicate your graph choice opposite  Display Garphs For         Film Data Claza  cal Nathod Robust Hmsthaod PCA Gcagnica Sysatan    AAAARAAAEARAAAERARAAAEAAA AE AAA AE AAA AAA   6AAAAEAAAAEAAKAAAR   RA AAWEXSAAEAARXGRKWRERARAEAAXAX  4  Zm    mck Vaciasima  eaaaanasa4A4e44444   4AAAAAAACAAAACAAAACAAAACAAAAEAAAAAA44   M vaciabm StabisEica     JcaskuuuxaucuuuAk  AAAAA   4AA4AAA44EAAAA  A4AA4AAA44   444464444444  Rosust Analyaia   eaaaanasa4 4 A4e 44444     44444446  A4A4  AA4A446AA444   A4 4464444444  Contusion Habcix 4AAAAAAEAAAAE AA AAA   4AAAAAAACAAA amp 46CAAA amp CAAAACA444   CA444444   Pattacn Racogaitian   PPP PPETEPESTESETTS  4AAAAAAACAAAACAAAACAA A  een 5n lact Geagn Typa     4 Jeccenceeeeecereeee   44444444           J J Plot Lindiw  Raw Catal  i titti   PPP Dia2lay Gcan  J J Plat Lindiw  Standacdizrmd       Ran Ostaj leaanaaanas   d
77. aenAnAcanaas   4AAAAAAAEAAA amp CA amp  446A 444    A4A4CAAA44444   0 Tcmad  saaaaaaeaaaaeaaaaa     SLL LL amp L nn Fs Anal yaa                q4 4 AAA AAA     aa44444   Di22lay Geagna Fac      oe       e  C  acta Sinulit  Xil  heaaarasisa   PPPE   Statistica Ontiana Peo laf lumacm lea naa aaa   casaria   zaco Lowme Limit        euis ER essc ss MI  euaauakaa   KKK eA ee bimak  SHY Tw cveeceeo iva esie asd yo nnn cee  Two Sided lexcacaaae   sanaanaa   X Axis Vacianim PIE s    c    esccocceccovse    leceeceeeee  saaanaaa   Y Axia Vaciasim Tee eee a ae ee ee ee ee ee ee ee lecececace   teenccce l TikIB       n    Robust Simultenmous C     toc i Nat Jecececcce   saa44444   X Axim Titis     eae           IPIS leceeecece   saa42444   Numamciag Pogulatinn3 letececeee   PPP PEETA Contour Elligam       eee el  Individual lecceeeece     a444444    eaaaaunna     aaaaaan  Ecasm Dutput Film leccccccce   taceeena l wWime Saeignts    Gxmomcalizmd  istancmm           4 METHYL WTS lecececace   sa  aanaa l Gmomcabm Graph With Cuccant Dot iona lesexteacee                                                E    Oiemetocy  C  SSCDUTSDATA Filmachanm  4 HETHYL UAT       Figure 11 21  The  Robust Analysis  menu settings for simultaneous  controlcharts     Rabu GSinlrancsus Fztlulp c ol   4  ae    Bom JXL lC Ueto               s  ra248 ia   2 22 2       F  x  c  a  7               2 02 Zz1 7  z 24 42  Lzc 7lv cl List CZavul asneouxr lor al  obrxerv ailonx        Figure 11 22  Simultaneous control chart f
78. also based    on similar statements with Xo  s p as the choice for the critical value  Md  a   This statement    Scout User s Guide 14 13    Chapter 14 Statistical Procedures    provides coverage to at least 50  of the observations  Small sample correction factors are typically  used to provide adequate coverage and consistency for samples from normal populations   Rousseeuw and van Zomeren  1990      Let xX  X       x  bearandom sample from a p variate population with elliptically  contoured density function  f x     X   h    ac eu  Et  x  2    The Mahalanobis  d i s t ances a r e given b y   Md     x    W   gt   x   w   1  1 2     n  where p    and 2  are the M   estimators of location  qm  andscale  X  and are obtained by solving the following system of    equations  iteratively     Y w   Md  a gt  w  Md      1   D P v w   Md     ag R  Gn wy 7 CO w   Md   1    Q     The weight functions used in  1  and  2  above are based on the PROP or the HUBER influence  functions  and are given by equations w   Md    y Md    Md  and w  Md     w   Md      where y  Md   represents the influence function used     The PROP influence function used here is given as follows      Md    Md    Md     Md     Md   Md  Md      Md   gt  Md  3     Scout User s Guide 14 14    Chapter 14 Statistical Procedures    where  Md  is the critical value obtained from the  distribution  n 1   B p 2   n p 1   2   n of the distances  Md    Notice that no tuning  constant  except an   value  representing the area 
79. amp K AA AA AA   SARL A quM aLL A  eaaaaaaan d Allows tha uaxmc to modit y tha colorc and shapa of PPEEYTTEXTPS  saanaasal Individual ovamewations  Dauoamcwab inon3 can 2m   canovad e eannaaaas     saanaaaa   Unindma  teon tha qcagna ay bucaiaqg bhan alach  e leaaaasaas     SARRERA  4A   AAAAARA   4AAAAAAACAAAACAAKAAGCAAAAGAAAAGCAAAACAAAAC AA AAE AA AA AKA AG AA AA AA AA AA AA AA AA AA AA AA   4AAAAAAACAAAACAAAAGCAAAAGCAAAAGCAAAAC AA AA AA AA AK AA AK AA AA AA AA A AC AA AA AA AA AA AA AM   4AAAAAAACAAAACAAAAGCAAAACAAAACAAAACAAAARE AR AR AR AA AA AA AAAAGCAA AA AA AA AA AA AA AA AA   4AAAAAAACAAAACAAAAGAAAAGCAAAAGCAA AA AR AA AA AA AA AAG AA AA AA AAGAA AA AA AA AA A A amp  AA AA AA   4AAAAAAACAAAACAAAAGCAAAAGCAAAAGCAA AA AA AA AA AAE AK AA AK AA AA AA AA AA AA AA AA AA AA AAA   Diecmctacy  C STOSE Filmahanm  TEST OAT       Figure 7 1  The Graphics menu with the explanation window for the  Graph Parameters heading displayed     7 2 Modify Graph Colors and Shapes    The first heading in the graphics pull down menu   Graph Parameters   allows the user  to modify the color and shape of individual observations  or points  that will be displayed on the  graphs  There are six colors and six shapes to choose from yielding 36 possible combinations   assuming the user has a color monitor   However  choosing black as the color of an observation  has a special meaning  Black observations will not be seen on the graphs  nor will they be used  in the scaling of the graphs  The default col
80. aph is first displayed  the three axes are scaled  independently from zero to the maximum value of each variable  The user can force equal  scaling of all axes by pressing the  lt E gt  key  The  lt E gt  key functions as a toggle  turning equal  scaling on and off  The user can also have the graph rescaled after removing an unwanted point   This feature is explained below in the section  Search Observation Mode      Rotating 3D Graphs  The four arrow keys are used to rotate the graph  The left and right  arrows rotate the graph around the Z axis  This is the blue axis which is always vertical on the  screen  The up and down arrows rotate the graph around an imaginary horizontal axis which  passes through the origin  The same arrow key can be repeatedly pressed to speed up the rotation  in that direction  The opposite arrow key can then be repeatedly pressed to slow down the  rotation  eventually stop it completely  and then begin rotating in the opposite direction     Changing from Symbols to Pixels  This feature enables the user to inspect a  3 Dimensional graph with either symbols or pixels  The pixel and the symbol for an observation  will have the same color  Two advantages of displaying pixels instead of symbols on 3 D graphs  are  1  an increase in the speed of rotation in large data sets and  2  improved resolution of  individual observations  Disadvantages are  1  the points on the graph may be more difficult to  see  since a pixel is much smaller than a symbol and  
81. approaches are given by the following probability statements  where x  and s  represent the  estimates  classical or robust  of u and o   respectively      a   1    100  confidence interval for population mean  u     P x      ty 9   Jwsum2  lt p  lt  s tua 9   Vwsum2    1 a    11     where t represents the critical value from the Student s t distribution     v a 2   b  1 a 100  simultaneous confidence interval for all x    13 2   n    The test statistic  max  d     is routinely used to identify a single outlier  Let d  represent the    100   critical value for the distribution of max   d     which can be obtained    using the Bonferroni inequality  The simultaneous confidence interval is given    by Fmax  dj   lt  d j   1 a   which is equivalent to the following probability statement     Scout User s Guide 14 29    Chapter 14 Statistical Procedures    P x  s d   lt x  lt x  s d    d1  2     nm  slo   12     This interval is equipped with a built in outlier detection procedure  An observation  outside of this interval is an obvious outlier and may require further investigation    c   1 a 100  confidence limits for the individual observations  x  from a population with    unknown mean and sd are given by the following statement   P x  sdsxs  x sd  1a  i1 2     n   13     where d  is the     100    critical value of the distribution of the robustified distances  d     Singh et al   1994  used this interval to resolve a mixture sample into its component populations   The Student 
82. ated  The user can view the scatter plot of the currently displayed variables by pressing  the   ENTER   key  When viewing a scatter plot  the user can scroll through the observations that  make up the graph  Again  the purple box will highlight the location of the current observation  being displayed  The axes are scaled independently from the minimum value to the maximum  value of the variable  The user can force equal scaling of both axes by pressing the   E   key   The   E   key functions as a toggle  turning equal scaling on and off  The   ENTER    key returns  the user to the correlation matrix and the  lt ESC gt  key exits the graphics mode returning the user  to the menu screen     7 5 Zoom Feature    This option enables the user to inspect portions of a 2 dimensional scatterplot in more  detail  This is especially useful when many data occur over a relatively small range  making    Scout User s Guide 7 3    Chapter 7 Graphics    resolution of individual observations difficult     To use the zoom feature on a 2 dimensional scatterplot  press the  lt Z gt  key  A white  rectangle encompassing all of the observations will appear  Use the      minus  key to decrease   or the     key to increase  the area of the rectangle  Use the XARROWS  keys to move the  rectangle to the portion  of the scatter plot that you wish to enlarge     When you have surrounded the observations of interest with the white rectangle  press the    ENTER   key  Scout will automatically rescale the 
83. ates are summarized below  Also  Figs  5 and 6 are the classical and the PROP    Scout User s Guide 14 22    Chapter 14       Figure i     31 91    Generalized Distances    asx Maximum cLargezi    Statistical Procedures    The Classical 0 0 Plot of the          as  Warning Individual HDO    10 85       5 27    Quantiles    Figure 2  The Robust    270 89    Generalized Distances    Scout User s Guide    t  10 73     Bete Oistributians    CPROPS O O Plot of the Mds     me  Asay ire  EP as  Maximum   Largest HDO    12 51    asz Warning rindividual HD     10 85          5 37    10 73    Quantiles  Beta Distribution        14 23    Chapter 14 Statistical Procedures       Figure 3  Scatter Plot of the Robust PCs   6 4      Principal Component  1          t t   0 89 2 31    Principal Component  2    Figure 4  Scatter Plot of the Robust PCs     Principal Component  4          t t   0 88  0 24    Principal Component  5       Scout User s Guide 14 24    Chapter 14 Statistical Procedures       gure  amp  The Plot of Classical Mas Without the 8 Outliers    250 464 T    Generalized Distances    asx Maximum tLargezr HD   153 11       SSX Warning tIndividusl MO    i     i                1  5 28 10 51 15 77 21 02    Quantiles  Beta Dietributions    Plot of Robust Mds Lithout the 8 Outliers     Generalized Distances          t t  E 26 10 51    Quantiles  Beta Liztributiaon        Scout User s Guide 14 25    Chapter 14 Statistical Procedures        0 05  Q Q plots of the Mds with location and 
84. ation  Scout will then display the selected matrix on the screen     Scout User s Guide 6 1    Chapter 6 PCA    If the entire matrix does not fit on the screen  then the user can press the arrow keys to scroll  through the matrix  Press  lt ESC gt  to return to the PCA menu after viewing the matrix     6 3 Eigenvalues    This heading allows the user to view the eigenvalues  Scout will ask the user whether to  calculate the eigenvalues using the covariance or correlation matrix  After making this choice  and pressing the  lt ENTER gt  key  the eigenvalues are displayed along with their differences   proportions  and cumulative proportions  If there are more eigenvalues than will fit in the  window  then use the   UP ARROW    and   DOWN ARROWS keys to scroll through them   Press the  lt P gt  key to send this information either to the printer or to a file  Press the  lt ESC gt  key  to close the window and return to the menu     6 4 View Components    This heading displays a listing of the component loadings  Scout will offer the user the  choice of performing PCA with either the covariance or correlation matrix  After making this  choice and pressing the   ENTER   key  use the UP ARROW   and   DOWN ARROWS keys  to scroll through the information  Use the   P   key to send the information to the printer or to a  file  Press the  lt ESC gt  key to close the window and return to the menu     6 5 Transform Data    The component scores are the product of the eigenvectors and the sta
85. ations such as page orientation  scale  position  and port   When this feature is selected  a screen will appear with the following headings     Choose Printer   Page Orientation   Use Shading Patterns  Horizontal Scaling Percentage  Vertical Scaling Percentage   X Starting Location   Y Starting Location   Formfeed After Print   Specify Printer Port    Choose Printer  To select a printer  highlight  Choose a printer  from the screen  that appears  as described above  Press  lt ENTER gt  and a screen will appear   alphabetically listing various types of printers  Find the printer you wish to use by using  the  lt ARROW gt     PAGE UP      PAGE     DOWN     lt HOME gt   or   END   keys  Press  the   ENTER   key when your printer is highlighted     Page Orientation  The user has a choice of  Landscape  or  Portrait  mode for  printing graphs   Landscape  is the default  and is usually the better choice for most  graphs  To change your selection  highlight  Page orientation  as described above  Press    ENTER    to change from  Landscape  to  Portrait   Press   ENTER   again to change  back to  Landscape      Use Shading Patterns  This option allows the user to replace the color in the  graphs with shading patterns  The choices are  Yes  and the default   No   Select  Use  Shading Patterns  as described above  Press   ENTER    to change the use of shading  patterns to  Yes      Horizontal and Vertical Scaling Percentage  These headings enable the user to  adjust the horizontal  w
86. atistics Using     to Classical   Initial Estimate     to Classical  and setting the  Right Cutoff Tail        to any number between 0 001 and 0 8     Fi Data Clerical Method Eoburt Method POA Graphie Stem  coe MILLIA  Mulbivaciabm Kuctoaia Laigne   o i    Oiacocdant Dasacwat ions Kuctosia  42 25 49   4 of thm S vaciaaims naca Used in bhia bwat   O ot thw 50 ohamewaetb ions acm miasing      of tha  0 o332 cvabin003 acm diacocdant    Attac canowing bm   diaxcacdant osamewationtsal   Tha twat statistic im 24 55 with a P Valum of o if    Pcmas CP   to a2cinb oc CRESTS to wxikt                    A    SAAAAAAAEAAAAEAAAAEAARAAEAAAAEAAAAEAAAAEAAKAAEAAAAEAA A AE AAA AE AAA AE AAA AE AA A AE AA AAA     icmctacy  C  SSCDUTSDATA Fi lanana  IRIS  OAT       Figure 10 2  The results of Multivariate Kurtosis  with     0 1  on the  IRIS DAT file  One outlier was detected and is identified  here     The  Select Variables  heading is a common option for three of the level 1 menu headings   Classical Method  Robust Method  and PCA   In each instance  the  Select variables  option  functions in the same way  through the use of plus     and minus     signs  users can indicate  which variables they want included in  and which variables left out of  the analysis  In the above  example we didn t use  Select Variables   with this particular file  by default  all variables except  Count are selected  resulting in the  4 out of 5 variables used     statement in Figure 10 2      The headings for Genera
87. ax  Mds  is    Scout User s Guide 14 17    Chapter 14 Statistical Procedures    described in the following Section     14 6    Q Q Plot of Mahalanobis Distances Using Beta Distribution   Compute the Mds  Md     x    p       gt   x    p     for 1 2  12s  where w  and 57 are M estimates  classical or robust  obtained appropriately using one  of the four procedures available in Scout     Order the distances  Md    Mdg   lt  Mdo   lt       lt  Md        Compute the expected quantiles  b using the beta  or a chi square  distribution  For     i      example  the beta quantiles are given by the following equation     buy Tap  a 1 EE meee ES  ip l a  F B  xe   1 x    dx    ia   na p 1   6     where       a 1  2a  B    b1  2b  a  p 2 and b   r p 1   2   Compute the    theoretical quantiles  c TE from the distribution of the Mds using c ay 7  21    b n     Finally  plot the pairs   Cp  Md      i  1  2       n     A Q Q plot using the chi square approximation can be obtained similarly  For multinormal data     this plot resembles a straight line  A formal test statistic  Rp n gt  and its critical values to assess    multinormality are given by Singh  1993   On this graphical display of multivariate data  points    well separated from the main point cloud represent potential outliers     Formal Graphical Identification of Outliers    Construct the Q Q plot of the robustified Mds as described above  If assessment of    Scout User s Guide 14 18    Chapter 14 Statistical Procedures    multi
88. ble  options in the  Display Graphs For  menu     The values for the X Axis and Y Axis Variables can also be typed in manually after using  the   ENTER   key when the cursor is on the X Axis Variable or the Y Axis Variable as  appropriate  In the same manner  the titles can be typed in after using the   ENTER   key when  the cursor is at title heading     Use the down   ARROW  key to move the cursor to the last entry   Generate Graph With  Current Options   Use the   ENTER   key to generate the graph  The Weights and the  Generalized distances can be viewed by moving the cursor to the  View Weights and Generalized  Distances  and by using the   ENTER   key     Scout User s Guide 5 0    Chapter 5 Robust Statistical Methods    5 5 Confusion Matrix    This option performs linear and quadratic discriminant analysis  and expects the data to  be multivariate in nature  The first column of the data set should have the population ID  a  number between 1 and 20  and the number of variables should be at least two  2   Graphs cannot  be produced with this option     When the Confusion  or error  Matrix heading is selected  the second window will  display the message  Robust supervised pattern recognition classification   Press the   ENTER    key to display the third window to set various options  The available headings for this choice are  as follows        Heading Example Choices  Discriminant Method  630  fs se os hehe ne Lew oe Bee ATO ESAE RS ORE XE Linear  Statistics Opuons   2i VE
89. bol at the end of the name  If the user is  not in the root directory  then the first item in the menu will always be    V  indicating the  parent directory  Choosing this item       allows the user to change to the parent directory of  the current directory     If the desired directory is not found on the current disk drive  then the user may select  a new disk drive to search  To change drives  simply press the letter of the new drive  If the  letter pressed is a valid drive from  A  to  N     then that drive will become the current drive     When the user has found the desired drive and directory  a data file can then be  chosen  Use the arrow keys to highlight the desired data file  and then press   ENTER   to  select it  Sometimes there are too many file names to physically fit in the window  If the  desired data file in not displayed  then scroll through the file names by pressing and holding  the down arrow key     Scout has the ability to search for any file name  including the use of wildcards       The current search string is printed at the top of the window  This string can be changed by  pressing  S  and then entering a new string  It is important to remember that data files saved  using the  Save Scout File  option have the  SCT  extension assigned by Scout automatically   while ASCII data files may have any extension     Scout User s Guide 2 4    Chapter 2 Scout File Format    2 4 Save Scout File    This option saves a Scout file in binary format which is inten
90. btained using the procedure described earlier     Q Q probability plots of the principal components are sometimes used to reveal suspect  observations  and also to provide checks on the normality assumption  Scatter plots of the first few  high variance PCs reveal outliers which may inappropriately inflate variances and covariances   Plots of the last few low variance PCs typically identify observations that violate the correlation  structure imposed by the main stream of data but that are not necessarily discordant with respect to  any of the individual variables  An example is discussed next to illustrate these procedures   Example  The data set of size 82  with five variables  including the octane readings  y  of gasoline  and four explanatory variables  was first considered by Daniel and Wood  1980   Atkinson  1994   used forward searches and stalactite plots to identify multiple outliers in this data set  which  becomes quite overwhelming for the typical user  Figure   is the Q Q plot of the Mds obtained  using the MLEs  Figure 2 is the corresponding graph obtained using the PROP function     0 05    This graph correctly identified the 8 outliers in a single execution  From this graph it is also clear  that observations 66 and 82 represent the border line cases  This is illustrated by the scatter plots  of some of the robust PCs as given in Figures 3 and 4  respectively    For confirmation  the outlying observations 44  and 71 77 were deleted  and the recomputed    estim
91. c  in     a  n  v  MH  n  pa     i  T  c  T  o          T t  4 37 a 74    Quantiles  Betas Liztributiaon        Figure 11 29  Generalized distance Q Q plot using Huber  influence       devins   sergam FO me      mrersiived ls n7        27 2 7    Leertiiex Usta Ur erties aticr        Figure 11 30  Generalized distance Q O plot using Prop  influence     Scout Toturial 11 26    Chapter 11 Tutorial Ill    11 7 Kurtosis    To calculate the Kurtosis  we will also use the IRIS DAT data set  Still in  Robust  Method  and  Robust Analysis   enter the  Select Graph Type  menu  select  Multivariate  Kurtosis   press   ENTER    Press   END    or move to the bottom of the menu  if you don t have  an   END    key   and then press   ENTER    again  In this instance   Generate Graph With Current  Options  will initiate the calculation of kurtosis  When complete  the output should match Figure  11 31     Film Data Classical Nathod Rosuat Nathod PCA Geagnica Syatan    4AAAAAAACAAAACAAAACAAAACAA A AG A A AA A A AM Vs AAAACAAAAEAAAAAA   4A4AAA  ALAAARCEAA A AGAR AAE AAA 46A 4 44444  Sulmct Vaciaaima 4AAAAARACAAAACAARAAAK   4A4A4A4AEAAAACLAA446A4AA0AAA4 464444444  Uniwaciabm Statistics lecceteeeeeccccneee   44444A444644446AA44A46A4 A46 A444 4444444  Rosust Analysis leanaanaekaan4enA4AA   444A4A4A4AEAAA4  AAA4AEAAA4AEA A4 464 A44 A444   Contusian Nateix 4AAAAAACAAAACAAKAAK   4A44A4A44A4   A4AA46   AAAA4   A amp AA4   AA444    4444444  Pattacn Racogoitian  eanaanaeanAneaanAa   4A44AAAACAAA4 amp 46A
92. cagt Nan 5m ttinga lecccccace   444 4 444 MM     MM  amp      4  amp  4  amp  4 4  Dicmckoacy  C  SSCDUT9 TNSDATA Firlmahanmm  IRIS  OAT       Figure 11 6  Statistical options for the generation of Q Q plots of  principal component analysis  PC A      Still in  Robust Analysis   move to  Display Graphs For     select  Q Q Plot  PCA    and  press   ENTER    Check to ensure the remaining options in the  Robust Analysis  window match  those in Figure 11 7  If necessary  use the same techniques as those explained in the last  paragraph to make them match  finishing this time with  Genrate Graph With Current Options      Scout Toturial 11 6    Chapter 11 Tutorial Ill    Film Data Clanzaical Hathaod Robust Natnod PCA Gcagnica Systan    4AAAAAAAEAAAAEAAAAE AA AAE AA AA A A AA LA  a          nd CEL AKACARRALARRAR KY  4AA44AA446AA4AUAAAALAAA4EA amp A442  4444444   Sulact Vaciasima 4AA4AAAACAAAACAAAAAR   4AAAAAA ACA AR AERA AAAEAK ARE AA AA6AA 44444  Uniwaciakm Statistica PRPIEER  AAPRSBPKRYPhASPBSPT  44AA44A46AAA4   EAAAACA 4446444 4   AA4A44444   Ruaust Analyaia  eanaanneannAeaaaaA   4444 AAA AEAAAAEAA ARE AA A46 AA 44    A4 44444  Contusion Nabeix SAAAAAAE  AAAAEAAAAA   4A444A4446A44A  A4AA4  AAA446AA44464444444   Pattacn Rxcagaitina PPEPPTTTPTP PPP  44AA4AA4AACAAA amp CAAAACAAAACA444   A4A4444   0 Tcmad IeaaaanaeaaaAneaaaaA     4A A AA A MM Fa Ana lys 13 a T  eaaaaa44   Dia2lay Geagna Fac Oedacad PCA leaaaannaa     eananaaa   Statistica Options Classical jzcsaaaananznszs   PEETA   
93. ce of  Observation  or  Variable   Select  Variable  either by highlighting   Variable  with an ARROW  key and then pressing   ENTER    or by pressing the  V  key   Scout will automatically insert a column and name the variable  Variable n   where n is the  number of the new variable  Each observation of this inserted variable is automatically  assigned the value of 1E31  To enter the desired name  units  and other information about the  inserted variable  see Editing Attributes of Variables  If the values of the inserted variable can  be calculated with a formula involving any of the other variables  see Formulas  Otherwise   the desired values must be hand entered  Simply move about the screen with the  lt ARROW gt   keys until you find the observation you wish to change  type the correct value  and press    ENTER    Repeat this procedure until each observation has the proper value     Formulas  It is often useful to analyze variables that are functions of one or more  variables in the data set  Consider  for example  a Scout data set in which there are 4  variables  V1 through V4  It may be of interest to analyze the results of a fifth variable  V5   Suppose that V5   V3   Log V1   1    V2   Scout enables the user to overwrite the values  for a variable with values which can be calculated by a formula involving one or more of the  remaining variables in the data set  This is especially useful if the variable that you wish to  overwrite is one that has just been inserted  S
94. cores    5  identify univariate or multivariate outliers  Q Q plots of generalized distances    6  perform principal component  linear  and quadratic discriminant analyses    7  compute and plot various statistical intervals including confidence interval for mean   prediction interval  and simultaneous confidence interval    Scout reads ASCII data files in a specific format which is discussed later in this  manual  Files created in other software  such as WordPerfect  are not recognized by Scout   unless they are in strict ASCII format  Scout can handle up to 22 variables  with the number  of observations limited only by the available memory of the microcomputer  Scout can save  data in a binary format  In this way  Scout can retain graph symbols and colors  and outlier  information in addition to the 22 variables  Spreadsheet data files can easily be converted into  Scout data files  as discussed in section 2 2     Scout allows the user to view and edit a data set  Editing is limited to the existing  variables and observations  Variable fields that can be edited are name  units  format  and the  comment  Observation fields that can be edited are the label and values for the variables     Scout is compatible with 8086  80286  80386  and 80486   based microcomputers with  at least 512K of RAM and an EGA  VGA  or Hercules graphics system  A fixed disk drive is  highly recommended as Scout performs many transfers between memory and disk during  execution  Scout also uses expande
95. cout will    prompt the user to enter a file name  The user may specify an extension here that will be  used  If the file name exists  Scout will ask the user if the old file should be written over     Scout User s Guide 2 1    Chapter 2 Scout File Format    The file heading in Scout contains six headings and choices as displayed in Figure 2 1  below  These can be used to read  write  load  save  merge  and append various data sets     Film Data Clanaical Nathod RFaoaaust Hathaod PCA Gcagnica yatan   aE AAEAAAAEAAAAEAAAACAAAACAAAACAAAAG AA AA AA AA AA A AK amp  AK AA AA AAA     Raad ASCII Film  kannnaaeaa 44e 444 ACAAAAEAAAAEA A AA amp CAAAACAAAAEAA ARCAR A ACAA AA AAA     Wcibzm ASCII Film  eanaaaaae AA AAkeA4AAkEARAACAAAACAA AR amp EAA AA EAAAACAA A AAA A A amp EAR AA AAA     Load Scout Film  eaannnaAeAAAAeARAAEAAAAEAAAACAAAACAA AACAA A amp E AA AAEAA AREAAAA AAA     Sewm Scout Film  eaannnaeaAAAEARAAEAAAAEAAAACAAAACAAAACAA ARE AR A amp EAA ARCAA AA AAA       Mmscqm Two Filma  eaannnaeaAA Ae AR AAEAAAAEAAARCAAAACAA AK CARA A amp CAR AREAA A  amp EAA AA AAA     Aggand Tao Films  anna Pee ree eT re rere rrr rT rrrrryrrrirrrerrrrrrrrrrrrrrrirrrryr rs  4AAEAAAAEAAAACAAAACAAAACAAAAGCAAAAGAAAAG AA ARE AAA AK AK AA AA AAA A   4AAAAAAACAAAACAAAAGCAAAAGAAAAG AA AAG AA AA AA AA E AA AA AA AA AA AACAA AA AA AA AAA AG AA AAA A   4AAAAAAACAAAACAAAAGCAAAAGAAAAGAAKAAK AK AA AR AA AA AA AA AA AA AA AA AAGAAA AG AA AA AA AA AK   SAAAAAAAGCAAAAEAAAAK AR AAE AA AA AA AA AA AACAAAAGAAAAG AA AA
96. d memory  if found on the system  in two ways  First  the  slow transfers between memory and disk mentioned earlier will be replaced by very fast  transfers between memory and expanded memory  needs 128K   Second  Scout will use up  to 64K of expanded memory for additional data storage  A color monitor will greatly enhance  Scout s text windows and graphics  A 20 MHz 80386 with a math coprocessor and a fixed  disk  is the minimum system recommended for Scout operation  By selecting the  System   heading in the main menu and then selecting  Information   a user can display the system    Scout User s Guide 1 1    Chapter 1 Preliminaries  specification     Scout was written by combining several subroutines and programs written for various  research projects conducted by Lockheed Environmental Systems  amp  Technologies Company  in service of the United States Environmental Protection Agency  EPA   Thus  Scout is in the  public domain  is not copyrighted  and no license agreement is necessary  However  users  should be cautious of the source of their copy of Scout  Due to computer viruses  it is best to  obtain Scout directly from Lockheed or the EPA     1 2 Manual Organization    The user s manual for Scout is organized into three sections  Section I  chapters 1 to 8   is the User s guide  section II  chapters 9 to 13  includes tutorials  and section III  chapter 14   provides technical notes  with examples  for statistically oriented users     Users not familiar with Scout will
97. d the use of confirmatory analysis  This is the   reason that   1  the MVE based procedures are used only for the identification of outliers  since the    Scout User s Guide 14 16    Chapter 14 Statistical Procedures    MVE robust estimates differ significantly from the corresponding classical estimates after the  removal of outliers   2  the use of a small sample correction factor is recommended  and  3  it has  been suggested not to use the approximate    chi square values too rigorously to define large distances     14 5 Normal Probability Q Q Plots of the Original Data and of Principal Components   In the following  data denoted by  y   Yz       y  represent raw standardized values of  a variable in the data set or scores on one of the principal components  The normal probability plot  for these data can be obtained as follows        Arrange the data  or PC scores  in ascending order of magnitude     Ya   lt  Yo          Yin       Compute the normal quantiles  q      using the following statement   VP 1 k  3 8  2    lt  a  Hz  lt  qwl     ua 50e e  2  dens 174  amp oco4d 1 2    ks  5     Plot the pairs   Qus  Yin  z k 1 2     7 H    If the data are from a normal population  then these pairs will be approximately linearly  related  Systematic departures from linearity and curved patterns suggest departures from normality   Outlying observations are well separated from the majority of the data    The Q Q plot of Mahalanobis distances  Mds  and an outlier test based on the M
98. data  These  discordant observations  or outliers  are highly unusual when compared to the rest of the data  For  a more thorough description of outliers and their significance  see the introduction to Chapter 14     The Classical Method menu has two tests for discordancy  Mardia s multivariate kurtosis  and the  Mahalanobis   generalized distance  Mardia s multivariate kurtosis is also a useful test  for assessing multinormality  and is recommended when the number of outliers is unknown but  potentially substantial  The generalized distance is strictly an outlier test and is recommended  when the number of potential outliers is known to be very few  Both tests assume the data  represent a random sample from a univariate multivariate normal population  Both of these tests  are included in the menu shown in Figure 4 1 below     Film Data Clazaza2 ical Hamthod Rosust Natanod PCA Gcagnica Systan  4AAAAAAAEAAA AE AA  MA      4A amp AAAACAAAACAAAACAAAACAAAAC AAA AE AA AAAA     ana aa4a4tA444a44 l Zm imct Vaciaaima   catt kA R4 AKA AGARAAEAA4AEAAAAEAAAACA AK AAA    a4aa44d  444A444 l Gmomcalizmd  iastancm  enaak Aue A4 A AE4E444EA4A 42AA4 amp A4 amp   A44ALA4AA4A   sanaanaaacaaaaaas   Mulbiwaciakbm Kuctosia  enuanAA4tcaa4AeARAAEAR4AEAAAACAAAAEAAAAAA       4aa44Aataaaaaaa l Causal Vaciaaima 4AAAAAAKEAAAACAAAACAAAACAAAACAAKAAEAAAAAA     4AAAAAAAEAAAAAAA Associated Causs3 SAAAAAAEAAAAEAAAAEAAAAEAAAAEAAAAE AAAAAA     sanaanaacaanaaaaa   Ranowm Outlime Flags  ennukA4xtaaaAeA4AaEAR4AAEAA
99. ded to be used only by  Scout  Generally  other software cannot read this format  This format has the advantage of  retaining the graphics color and shape specified for each observation  and the outlier status of  each observation  To save data in this format  simply select  Save Scout Data File  from the  pull down menu and enter a file name  Do not include an extension with the file name as  Scout will always use the   SCT  extension  Also  do not precede the file name with a path   New data files are always written to the current drive and directory displayed at the bottom of  the screen     2 5 Merge Two Files    This utility allows the user to combine two data files into a new data file  The user first  selects whether to merge two ASCII files or two Scout files together  If the merge is  successful  the new data file will always be written as an ASCII file     The merge routine assumes the variables are different in each of the input files   Therefore the output file will contain all of the variables from both input files even if they  have the same names  The routine does however account for common observations  Two  observations taken from each of the input files that have the same label or name will be  merged into a single observation in the output file     2 6 Append Two Files    This utility also allows the user to combine two data files into a new data file  but in a  different way than merge allowed  The user is given the option to append two ASCII files or  two
100. default scale  value is 10  The larger values shrink the graph     Scout Toturial 11 13    Chapter 11 Tutorial Ill    Pattern Recognition             k  o  u   I    2   a             n 11  PO Score  2       Figure 11 14  Rescaled graph     Change the values in the  Pattern Recognition  menu to those shown in Figure 11 15  then  move to  Begin Computations with Current Options   then press  lt ENTER gt      Scout Toturial 11 14    Chapter 11 Tutorial Ill    Film Data Clazazical Hmthod Robust WNathod PCA Geagnics Syatan    4AAAAAAACAAAACAAAAEAAA AG AA AA C AA A AG A R E A        j  s6AAAASAAAAEAAAARAAR   4AAAAAAAREARAAEARAREAAA AE AR A4 EAA44AAA     Zm imckt Vaciaasima 4AAAAARACAAAACAAAARA   444444 4AAEAAAACA amp A4EAAAA4CAA4AA   AA4AAA444   Uniwaciabm Statistica lecceeeceeeeceeeeed   44444A4A  A4AKCA4AA4CAAAA4   AA4A4AC44AA4444   Rosust Analysis 4AA4AAAACAAAACAAAAA   444444 AAEAAAACAAA amp ACAAA4CAA4A4A4C4A44A4A444   Contusion Nateix 4AAAAARACAAAACAAAAA   qAAARAAALARAAEA AA AEAAA ACA A A4 EA A4 A244  Pattacn Excagaitina  enaenaaaennaneekaAaa   CRAAAAAAER RARER 446A 444    4444    4444444  0 Tcmad  eaanaaneaaaaeaaaan   4AAAAAAA4EA4A4tA 44464 444    4444    4444444  Add Naana 4AAAAAACAAAAEAAAAAR     4A4A4AAAA A   MM Pakkmzca Racogaibinn nn eee KKK   casariis   Sbatisbica Dabinas Classical lesccaeaae     san4a  as   Hunamecing Pogqulabians    a  aaxaznza   casariis   Cantauc Elligas lad vv idual lescceeaae   4AAAARARA  4AAAARAARAR   e  aaaaana l Typa of Geagn 4A4AAAAAK   saaaanaa
101. dentified as discordant  the  user should be cautious that the problem may arise from a lack of multinormality  or the presence    Scout User s Guide 4 2    Chapter 4 Classical Methods for Outlier Identification    of multipopulations  Each observation identified as discordant is flagged as such  and the  graphics elements for those points are set to downward pointing red triangles  The discordant  observations can then be viewed in the graphics module  Scout does not remove the discordant  observations  unless the user desires to do so     During outlier testing  a new data set is generated  The user must decide how Scout  should handle the outliers when writing the new ASCII file  Four options available to the user  are   Remove    Keep    Flag   and  Query   The  Remove  option deletes all of the outliers  from the generated file  The  Keep  option saves all outliers and the  Flag  option numerically  flags the outliers in the new file  It does this by adding a new variable called  OUTLIERS  to  the end of the variable list  The values in each observation for this new variable will be either  a  0 or a l  where a  l  indicates this observation is an outlier  The  Query  option allows the user  to individually specify which outlier observations will be written to the new file  These features  are available only in the Classical Method menu     CAUTION  Scout only identifies outliers for the variables selected  When viewing 2 D  or 3 D scatter plots which flag outliers  make
102. dings and corresponding choices in the third window  Robust Analysis     are as follows   Heading  Zero Lower Limit  Limit Style  X Axis Variable    Y Axis Variable    Scout User s Guide    5 8    Choices  Yes No  Upper Limit Lower Limit Two Sided  An positive integer between 1 and 22    An positive integer between 1 and 22    Chapter 5 Robust Statistical Methods    Title Title of the Graph  X Axis Title Title of the X Axis  Numbering Observations Populations  Contour Ellipse Individual   Simultaneous    Indiv  amp  Simul  Indiv   Class  Simul   Class    Erase Output file See text    The Erase Output File feature may be important if a given file is used repeatedly  Each  time output is generated for a given file  it is appended to a file with the same name but a  different extension   URS   This appending of output means that the current output will be  appended to any previously generated output from any previous work with this file  The user has  the option to erase this file prior to the recording of the current session s output  in this manner  the output file will be reflective of only the current session     The values for the X Axis and Y Axis Variables are chosen by Scout automatically from  among the selected variables  While in the graphics mode the user can also use the Page Up and  Page Down keys to change the X labels and the Ctrl Page UP and Ctrl Page Down to change the  Y labels  New graphs appear after each selection  The  F1  key can be used to see all availa
103. directory on drive C     To run Scout  enter the following commands     1  Type  CD  SCOUT  and press   ENTER     This changes the current directory to the SCOUT directory     2  Type  SCOUT  and press   ENTER     This starts the Scout program     If you have any problems with the operation of Scout  please write to     Scout   c o John Nocerino or George Flatman  Characterization and Research Division  National Exposure Research Laboratory  USEPA   P O  Box 93478   Las Vegas  NV 89193 3478    Scout User s Guide 1 3    Chapter 1 Preliminaries    1 4 Viewing the User s Guide    Scout contains an on line User s Guide  When users are in any mode of Scout  they  can reach the on line User s Guide for that mode by pressing the   F1   key  When a section  of text is displayed in the large window covering the lower portion of the screen  users can  move through the text using the following key commands    HOME   Moves to the beginning of the text    END   Moves to the end of the text    UP ARROW   Scrolls the text up towards the beginning   DOWN ARROW   Scrolls the text down toward the end    PAGE UP   Scrolls the text up toward the beginning by a page   PAGE DOWN   Scrolls the text down toward the end by a page     ESC  ENTER   Closes the viewing window     Scout User s Guide 1 4    Chapter 2 Scout File Format    2 1 File Management    Scout reads ASCII data files in the following format  The first line of the data file is a  comment line  presumably to describe the origin or title
104. e    ENTER    key to highlight the corresponding heading  this applies to  Right Tail Cutoff  and   Trimming Percent  in the previous menu   The other choices can be set by using the  lt ENTER gt   key repeatedly  After all selections are made  move the cursor to the bottom of the third window  to indicate  Generate Statistics Using Current Options   Use the   ENTER   key to generate the  Univariate Statistics corresponding to the selected choices  At this point the result of the  univariate statistical analysis will be displayed on the screen  These statistics are also stored in  an output file of the same name with the extension   URS   For example  statistics for IRIS DAT  will be stored in IRIS URS    The statistics get appended to this file  if any information from an earlier Scout session is still in  the file  then the current statistics will be added to it     Scout User s Guide 5 4    Chapter 5 Robust Statistical Methods    Scout User s Guide 5 5    Chapter 5 Robust Statistical Methods    5 4 Robust Analysis    When Robust Analysis is selected  the explanation window will display the message  This  routine provides exploratory as well as confirmatory procedures for the assessment of  multinormality and detection of multivariate outliers   When   ENTER    is pressed  while  Robust Analysis is highlighted  a third menu appears listing various options  The available  headings and choices of this menu and the default choices are as follows        Headings Default Choices
105. e mixture of several populations with varying degrees  of contamination  In this situation  the objective will be to decompose the mixture sample into the  component populations  Experimentalists  especially environmental scientists dealing with large  amounts of data  often need to identify their experimental results that are significantly different from  the rest of the data  In data sets of large dimensionality  it becomes tedious to identify these  anomalies  Appropriate multivariate procedures need be used to identify multivariate multiple  anomalies  some of which are incorporated in Scout  The successful identification of outliers  depends on the statistical procedures employed  Most of the outlier identification procedures are  based on the Mahalanobis distances  Mds   The maximum distance  Max Mds   is a well  documented test statistic  e g   see Wilks  1963   Devlin et al    1981   for the identification of a  single outlier  Observations with Mds greater than the    100  critical value of the Max Mds  are  considered as potential outliers  Singh  1993   using the first order Bonferroni inequality and  incomplete beta distribution computed the critical values of Max Mds  for any combination of n and   p  and showed that these values are in close agreement with the available simulated values as given    Scout User s Guide 14 2    Chapter 14 Statistical Procedures    in Jennings and Young  1988   and Stapanian et al   1991   Computation of the critical values of  the 
106. e variable fails  these tests you may then try various transforms on the selected variables  Each time a  transformation is tried  the resulting variable is retested for normality  You may select one or  more transformations for each variable by selecting a suitable function as displayed in the  figure 3 2  An undo feature allows you to sequentially undo each transform     Scout User s Guide 3 6    Chapter 3 Managing Data in Scout    Film Data Claaa cal Nathod Rosust Haxthod PCA Gcagnica Syatan           HE eg ERA K KEK KAKA AAA PO T   anat acmab ino Nanu      10    aaaaanaeaaaaeanaaneaaaa l Naan    So    ea  444444444  A44444444   Liomac      X I al    5   aaanaaeauAnacaaaaeanas l Logecitan    TF CARRRRRERRARERARRERRRR   Ponar    Tr   A444A46   44A44   4444e4444  Box Caox IX a Iza    ta  Jeccceccececcenaccenacel Accaiom    accaia  x     ree      A4A4AA4  4444   A444   4444   Undo Last Teanatocn  4  44  44  4  lanaaanenaaaneaaaneaaaa    EEST whan tinianad      4 4  4A  444  4     aaaananenaaacaaAAkeAn4An4  C44446   AAAACAAAA6AAAACAAAA   Oe Re  VER WARS  BBs  EE  xaanaaanenaaaeananAehnnAnAeAaA4AbEAnAAACAAAAEAAAAEAAAA     AAA  AA    AA  TKK  ta  SAARAARAAREARAAAREARAAAEARAAAEARAAAERAAAAERAAARAEAAAAEAAARK                        ACA AREA AA REA AR ACA AA ALA amp  AREAR AREA AR ACA AR AK A             cim c3 00 Qeacling Nocnality Twat lalgnam  D U0S  A    Teana vVaciaaim a Naan StdOaw Zkmzuomas Twat Stat Cert Wal  SS EE SSS                count so 1 0 o o IESI IESI   122 langta so
107. ee Inserting Variables   Here  you would be  changing the inserted values from 1E31 to a formula involving one or more of the other  variables     Highlight the variable that you wish to overwrite with a formula by moving about the  spreadsheet screen until you arrive at the column corresponding to the variable  Next  press  the   ALT   and the   F   keys together  You will be asked   Replace  Variable name  with a  formula  are you sure    Press the  Y  key for  Yes   the default is  No    You will then be  asked to enter the formula  Carefully enter the formula     Scout User s Guide 3 3    Chapter 3 Managing Data in Scout    Scout User s Guide 3 4    Chapter 3 Managing Data in Scout    3 2 Scout functions and operations    Scout recognizes the following operators and functions       addition     subtraction or opposite sign    multiplication     division   x  y x raised to the power of y    Abs x  absolute value of x   Atan x   arctangent of x   Cos x  cosine of x   Exp x  exponential  e g   the value of e raised to power of x   Ln x  natural logarithm   Int x  integer function  e g   Int 7 99  7  Int 2 000  2   Log x  logarithm base 10   Round x  rounding function  e g   7 99 becomes 8    Sin x  sine of x   Sqr x  x raised to the power of 2   Sqrt x  square root of x    When you are sure that the formula is correct  press  lt ENTER gt   Scout will  automatically do the calculations and return you to the spreadsheet     Editing Attributes of Variables  This feature allows 
108. en above and can be obtained using one of the four procedures   three robust  and one classical  available in Scout  The critical values of kurtosis are given in a  simulation study performed by Stapanian et al   1991   The classical module of Scout includes a  sequential outlier detection procedure based on multivariate kurtosis and these critical values    The robust procedures  based on Campbell s  1980  influence function and HUBER function  as given in Devlin et al   1981   often leave some influence of outliers on robust estimates  The  weights associated with the HUBER influence function are given by   w  Mdj  1 if Md  lt k  and w   Md   k    Md    otherwise  where ka is the    100   critical value associated with the Mds  obtained using either a scaled beta or a chi square  distribution  For details of the HUBER influence function and the MVT procedures in Scout  the  interested reader is referred to Devlin et al   1981  and Singh  1993     It is observed that the outliers have negligible influence on the estimates and Mds obtained  using the PROP function  The PROP estimates and Mds with or without outliers and the  corresponding classical MLEs and Mds based only upon the inlying observations  obtained after the  removal of outliers  are also in close agreement  This confirms that the identified  flagged   observations indeed are all of the outliers present in the data set  In order to verify that the identified  outliers are indeed the outliers  Fung  1993  suggeste
109. ence function  HUBER  1981  Devlin et al   1981  based on Mds   C  Multivariate Trimming  MVT   Devlin et al   1981  based on Mds   d  PROP influence function  Singh  1993  based on Mds     Also  numerous graphical displays are available in Scout  These include  the histogram   normal probability Q Q plots of raw data  scatter plots of raw data and contour plots  Q Q  plots and scatter plots of principal components  Q Q plot and index plot of the Mds  scatter  plots of discriminant scores  plots of prediction interval  simultaneous confidence intervals   contour plots  and some 3 D graphics   5  Principal Component Analysis  PCA   A separate PCA option is available in Scout to compute the classical dispersion and  correlation matrices  eigenvalues  eigenvectors  loadings  and principal component scores   6  Performs the linear and quadratic discriminant analysis  Confusion Matrix      Scout User s Guide 14 7    Chapter 14 Statistical Procedures    The pattern recognition option can be used to  1  obtain scatter plots of raw data   2  graph   of the PCs  and  3  compute and graph the raw discriminant scores  The corresponding   contour ellipses  5 choices are available  can also be produced on these scatter plots by   pressing the  E   e  key  For details see Johnson and Wichern  1988   Anderson  1984    Jg  D Trend and Add Means options    These two procedures are used in geostatistical applications  especially  when the spatial data   need to be detrended  so that the consta
110. enu  The  level 2 menu for the File heading  and the explanation  window for  Read ASCII File  are also both displayed     Scout is a statistical software package with several features  Navigating through the multiple  levels of Scout requires a standard nomenclature that can be easily followed  The following is  an explanation of the nomenclature we will use in describing Scout in these tutorials     Menu  A set of choices or headings     Headings  Those selections that will present further menus  lists of choices  and or  headings      Choices  Those selections that will set a given parameter  or perform a specific  function    Explanation   window  A box  appearing at any level  containing either an explanation of the    selected heading  or instructions for the performance of a Scout    Scout Tutorial 9 1    Chapter 9 Tutorial      function     Level 1 menu  This refers to the set of headings displayed in the first window  seen upon entering Scout  File  Data  Classical Method  Robust  Method  PCA  Graphics  and System     Level   headings  File  Data  Classical Method  Robust Method  PCA  Graphics  or  System  as shown in Figure 9 1 above     Level 2 menu  This refers to any of the seven menus displayed after selection of  a Level   heading     Level 2 headings   and choices  Read ASCII File  Write ASCII File  Load Scout File  Save Scout File   Merge Two Files  and Append Two Files as shown in Figure 9 1  or any  set of headings and or choices resulting from selection of
111. er plots or XY plots    3 Dimensional graphics are used to display three variable plots  which can be rotated to illustrate  the extra dimension  The Graphics menu is displayed in Figure 7 1 below     Film Data Classical Nathod Rosust Nataod PCA Geagnica Systan    CAKKKKAKERKKKERAERRERARRERRREERRAEREREAEAEREERERERRERRR EERE A A Ly  4AAAAAAACAAAACAAAACAAAACAA AA EAR A amp CAAA amp CAA 446A 444    4444    444444   Geagn Pacanmitmca    4AAAAAAACAAAAEAAAACAA A amp KEAAAA CAR AKCAA A  amp 6EARAA ACA A amp  amp 46    44 A446 444444  2 Oinanaranal    4AAAAAAACAAAACAAAACAA AR CAR AREAA ARCAR A amp KCA 4 4 amp 8 A4 4464 444    444444   5 Oinanaional    LLLA    SAAARAAAAEARAAEAARAAEARAARERAAAERARAAEAARAAEAAAAEAAAAEAARAAEAAAAEAAARAERAAAAERAAAAEAAAAAA     SAAARAARAAEARAAREAARARERAAAARERAAAAERAAAAEAAAAEAAAAEAARAAEAARAAEAAAAREAK     4AAAAAAAEAAAACAAAAGCAAAAECAAAACAAAACAAAAE AA AA AK AA AA AA AAAAEC AA AA AA AA AA A AK AA AA AA   4AAAAAAACAAAACAAAAGAAAAGCAAAACAAAACAAAAE AA AA AA AA AK AA AA AA AA AA AA AA AA AA AAA AAA   4AAAAAAACAAAACAAAAGAAAAGCAAAAGCAAAAC AA AA AA AA AK AA AK AAG AA AAKGAA AA AA AA AA A AE AA AAA   4AAAAAAACAAAACAKAAGCAAAAGAAAAGCAA AA AA AAE AAA AE AA AAE AAA AG AA AA AAA AC AA AA AA AA AAA AAA   AAAAAAAKEARAAAEAARAAEAAAAERAAAACEAAAAEAAKAAEAAAAEAARAAEAARAAE AAA AE AAA AE AAA AEAAAAEAAAAAK   4AAAAAAAEAAAACAAAAGCAAAAGCAAAACAAAAE AA AA AR AA AK AA AA AA AA AAKECAA AA AA AA AA AA AA AA A   4AAAAAAACAAAACAAAAGCAAAAGCAAAAGCAAAACAAAAE AA AAE AA AAG AK AA AA AACAA AA AA AA AK AA 
112. ere  the upper simultaneous limit  USL  and not the upper confidence limit  UCL  for the  population mean should be used  Comparing individual observations  x   with the UCL for the  population mean  u  and expecting an adequate coverage for the x  s  as is sometimes mistakenly  done in practice  is inappropriate  An interval estimate given by  4  above may be used if the  coverage for the individual sampled observation  x   is desired  The prediction interval given by  2   is used for a future and or delayed observation  xy  Robust interval estimates are used in some of  the performance evaluation  PE  studies of the U S  EPA  e g   see Horn et al   1988    For   example  Horn et al   1988  used the Biweight function  Kafadar  1982   to obtain a robust    Scout User s Guide 14 28    Chapter 14 Statistical Procedures    prediction interval for a future observation  x   using a noisy sample  with outliers  obtained from  PE studies of the U S  EPA  Also  the robust prediction intervals based on the Biweight influence  function are used to assess the performance of the various laboratories participating in the quarterly  blind  QB  PE study of the U S  EPA  Singh and Nocerino  1995   Singh et al   1993    However   interval estimates given above by  3   by definition  are more appropriate to provide simultaneous  coverage for all of the participants in such QB PE studies  Interval Estimates   The four interval estimates obtained using the classical and robust  Huber and PROP   
113. erson Darling test and Kolmogorov Smirnov goodness of fit test   graphical normal probability Q Q plot    Classical Method Menu   This module includes the two classical sequential outlier testing procedures based upon  1   the Max  Mds   and  2  the multivariate kurtosis  This module is given separately here for  the convenience of interested users  It should be noticed that  these procedures suffer from  severe masking in the presence of multiple outliers  Unmasking of multiple outliers requires  the use of a robust procedure with a high breakdown point  Some examples using this menu  are discussed in Chapter 10  The classical test based on Max Mds  with graphical Q Q and  index plots is also available in the robust module of the software package     Robust Method Menu    Scout User s Guide 14 6    Chapter 14 Statistical Procedures    The robust module of the Scout software package includes four different procedures to  compute all of the relevant statistics including the mean vector  the variance covariance  or  the correlation  matrix  the Mds  the multivariate kurtosis  and also to perform the principal  component  linear and quadratic discriminant analyses  Several examples have been  discussed in tutorial Section II  Chapter 11  The statistical procedures used for this module    are discussed in this chapter  The four outlier identification procedures in Scout are given    as follows   a  Classical MLE method  Wilks  1963  based on Mahalanobis Distances   b  HUBER influ
114. ervised classification   This  grouping can be done on the basis of similarities or distance measures obtained from the observed  variables or characteristics  analytes  defects  etc      Principal component analysis  or cluster analysis techniques  such as complete linkage   single linkage  average linkage  and Wards minimum distance  are used to separate observations   into various groups  Several clustering techniques should be applied on the same data set  If the   outcomes of these clustering techniques are roughly consistent with one another  then some well     Scout User s Guide 14 34    Chapter 14 Statistical Procedures    separated groups probably exist  This separation process is often performed only once  preferably  on training sets with known group membership to investigate the differences among the various    groups  Discriminant functions are then obtained using these separated groups     Classification procedures are less exploratory  Discriminant functions obtained in the  separatory process are used to assign current and new observations into previously defined groups   The correct classification of the current observations with known group membership is the basis for  the validity of the discriminant functions  Scout outputs the confusion  error  matrix for the linear  and quadratic discriminant analyses    However  outliers can distort the discriminant functions and the corresponding  discriminant scores significantly  This can result in several misclassif
115. ey a second time will bring out the Transformation  Menu and a histogram of that variable as shown in Figure 9 4  Several transformations    Film Data Classical Hmthod Rosust Nathnod PCA Gcagnica 5yzztmam    EERE  Q4 AAAAGCAAAAGCAAAAG AA AA AA AA AA AA amp  AA AA AA AAK amp  AA AA AA AA AA AA AA     aaaaaaa   Edit Data  eaaaaaaea4446   A4AACAAAACAAAACAAAACAA ARCAR ARECAR AKEAA AAEAA AAA     aaaaa  4   Statistica p Nocmal iby Taak     QcuuusuAAACAAAACAAAACAAAACAAAACAAAAAR       aaaaaan   Teanatocn   Kolnmogocow Snicoow  eannuaneananAeAnAACAAA4AEAAAACAA4AAAA       aaaaanaa   Peint Data   Anadacana Dacling  enaaaanasanA4EAARACAR4AAEAAAAEAAAA44A4       AAAAAAAR  AAEAARAARERAAAEAAAAEARAAEAAAAEAAAAAAR   SAAAAKAAKAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAE AAA AE AAA AE AAA AE AAA AAA   AAKAAKAAARERARAAEAARAAEARAAEAAAAEAAR AE AARAAEAAARAERAARAAEAARAAEAARAAEAAAAEAAAAE AAA AE AAAAAKR   SAAAAAAARERAAKAEAAARAEARAAAEAAAAERARAEAAAAEAAAARERAAAAEAAAAEAARAAEAAAAEAAAAEAAAAEAAAAAR   SAARAAARAARERAAAEARAARAEAARAAEAAAAERAARAEAAAAEAAAARERAAAAEAAAAEAAAAEAAAAEARARAAEAAAAEAAAAARK   SAARAAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAE AAA AE AAA AE AAAAE AKA AEAAAAEAAAAAR   AARAAKAAARERAAAEAAAAEAARAAEAAAKAEAARAAEAARAAEAAAAERAAAAEAARAAEAARAAREAAAAEAAA AE AAA AE AAAAAAK   SAAAAAAREAKAAKAEAAARAEAAAAEAAAAEARAAAEAAAAEAAAAERAAAAEAAAAEAARAAEAAAAEAAAAEAAAAEAAAAAR   SAARAAARAARERAARAEAAARAEAAAAREARAAAERAARAAEAAAAEAAARERAARAEAAAAEARAAREAAAAEARARAAEAAAAEAAAAARK   AAAAKAAARERARARAAEARARAAEARAAE AAA AEAARA
116. eys  move to the top three data  points  reveal their identities  and your display should now match Figure 11 3  Figures 11 3  11   4  and 11 5 are obtained by using the classical statistics option  There  the mean and standard  deviation  sd  used to obtain the horizontal lines on these graphs are the classical maximum  likelihood estimator  MLE  estimates     Scout Toturial 11 2    Chapter 11 Tutorial Ill    aaz Maximum USL is       asz Warning USL    dz  rz  T  c  LU  T  a  W  E  o  Du  LL  wy  c  o  tt  T  m   gt   pa   m  b   E  a  n  a      v  a  x  a    asx Warning LSL  aax Maximum LEL             T t t  4 25 4 99 74    Theoretical Quantiles  Normal Distribution        Figure 31 3  Q Q plot of the sp length variable with the identities of a  few data points revealed     Press the  lt F gt  key to save the graph to disk  The generated graph will be saved as a PCX  file and you can specify its location by including the path along with the file name  The graph can  also be saved in a postscript   EPS  format  To save the graph in a postscript format  press the   lt ESC gt  key twice to go back to the first screen and move the cursor to  Print Destination   Press   lt ENTER3 gt   in the Print Destination window  select  Encapsulated Post Script  and use the    ENTER    key to finish the selection  After you have selected the postscript printer  return to   Robust Analysis   and generate the graph  Press  lt P gt  and supply the graph with a name  press    ENTER    and the 
117. f the Mds in an unpredictable manner and often leads to the  misidentification of outliers  The use of approximate distributions of the Mds  such as chi square  or normal can also lead to the incorrect ordering of the Mds    It is well known  Huber  1981   Devlin et al   1981   Hampel et al   1986   Rousseeuw and  Leroy  1987   Rousseeuw and van Zomeren  1990   and Barnett and Lewis  1994   that for the  identification of multiple outliers  one should use robust and resistant procedures with a high  breakdown point  Most of the robust outlier identification procedures for the identification of   outliers and the estimation of population parameters of location and scale are iterative  requiring    Scout User s Guide 14 4    Chapter 14 Statistical Procedures    several passes through the data set  This  of course  will be impossible to achieve without a  computer software package  Several procedures and influence functions including the Biweight   HAMPEL  HUBER  PROP  winsorization  univariate and multivariate trimming  MVT   and MVE  based robust procedures exist in the literature    The robust procedures based on MVT  the HUBER and the PROP influence functions can  be used for univariate as well as multivariate data sets  These robust procedures  along with the  classical MLE approach to locate outliers in raw data sets  in interval estimations  and in principal  component and discriminant analyses have been incorporated in Scout  These procedures have been  tried on numerous
118. g Ma  24  3      444A4A4AAEAAAACAA AREAAAACAA A4AEAAAACAA A4 amp EAAA ACA A A4 amp CA 444    A 44464444  Exit   4AAAAAAACAAAAGAAAAEAAAAEAAAAEAAAACAAA amp KCAAA amp AEAR A KLAR A amp A 4 4 46 4 4 4 4 b     4AAAAAAAEAAAA                              A Smbug                     4sA4ACAAAAEAAAAA   saaaaaaaeaaaa l Choose Peintac HP LasacJat 5mcima lll Jececceceeccare   ea  aaaaaeaaan  Paga Dcimababina Landacagm leannaaaenaaaAA   sanaaaaaacaaaa  Uam Shading Pattacos leceaceeixaceee   e  4a44A4eaanAa l Hucizanbal Sealing Paccmat leaaanaaeaannAA   sa44a44a  eaaaa l Wmscbical Scaling Paccmat 4A44AAAAEAAAAAA   sa4au  4aueaanaa l X Zkacking Location PPPPTPTTPTPPPYS  ea  aaaaaeaaan  Y Stacting Location 4AAAAKAAEAAAAAAK   e4nad44aazasaa   Fucmntmmd Attac Peint leceeeecceesnae   eaaaaaaneaaaa l Sqmcity Peimtme Pact LPTI leaaanaaeaanAAnA   4AAAAAAAEAARAAR  4ACAAAACAAAAAAR   4AAAAAAARSAAAACAAAACAAAAS AA AAE AA AAE AA AA AA AAE AA AA AR AA AA AA AR AA AA ARE AA AA AA AAA A   4AAAAAAACAAAACAAAACAAAACAAAAC AA AAG AA AA AA ARE AA AA AR AA AA ARE AA AA AA A AE AA AA C AAA A AA   4AAAAAAACAAAACAAAACAAAAG AR AAE AA AAE AA AA AA ARE AA AA AR AA ARA ARE AA AA AA AA AA AK AA AA AA   4AAAAAAACAAAACAAAACAAAAS AA AAE AA AA AA AA AA AAE AA AA AA AAE AA AAE AA AA AA AA AA AA AA AAA A    Oremctocy  C   SSCDUT 9SSUATA Filanana  IRIS  OAT       Figure 13 7  The Printer Setup menu     13 3 Summary      There are three options in the  Graphics  module of the Scout  the modules are  displayed in the first window
119. graph will be saved with the   EPS  extension  Simply pressing  lt P gt   when  your on line printer is specified in  Print Destination   will result in your graph being printed     After the graph is saved and or printed  use the  lt ESC gt  key twice to return to the  Robust  Method  menu  Move to  Select Variables   press   ENTER    and using both the plus     and  minus     keys  de select the variable sp length and select the second variable  sp width  Perform  the same set of operations to generate the Q Q plot  and your display will match Figure 11 4     Scout Toturial 11 3    Chapter 11 Tutorial Ill    aaz Maximum USL       asz Warning USL    C       a  P   T  a  n  Li  o  pas   nm  Li  c      pe   T  m   gt   m   v  b   oO  a  n  a  I    v  T  E  a    asz Warning LSL   2 24  aax Maximum LSL   2 11             t t t  2 62 2 41  21    Theoretical Quantiles  Normal Distribution        Figure 11 4  Q Q plot of the sp width variable with the identities of a  few data points revealed     Figures 11 3 and 11 4 can be generated simultaneously by selecting both variables while  in the  Select Variables  option  When multiple graphs are generated they can be displayed  one  after another  by using the   PAGE DOWN  key while the graphic screen is displayed     Return to the  Select Graph Type  menu and select  Q Q Plot  indiv  raw data    Press  the   ENTER   key to make the selection  move the cursor to the bottom of the window and  choose  Generate Graph with Current Options 
120. gs as follows  The  example choices used throughout this manual are those displayed by default using the IRIS DAT  file  which is discussed in the tutorial section      Heading Example Choice  C  mp  te  Statistics Using maias be ON ma ero ours os EN S uda Ps Classical  Weights 212 sho Yes TRIGO EN ERN EE EN REX ERI DEUS Ead D Ed Beta       Scout User s Guide 5 3    Chapter 5 Robust Statistical Methods    Initial Estimate   oxi nk sie Peek otek ES RAR sed EMMA LSE ME Bak sok E ees Classical  Right Latl CUtolb  o oue ana ee pig dem ates p eed qiu exe edie qeu 0 05  Tramming Percent 9i re Gn Ve ee hie eo ee eg ie e ro Os Ea 0    Each of these headingss has various choices  which can be selected by repeated use of the    ENTER    key when that heading is highlighted  After a selection is made  the arrow key can  be used to move the cursor to the next heading  The process can be repeated until the desired  choices have been selected  The various choices for each of the headings of the Univariate  Statistics menu are as follows     Heading Choices  Compute Statistics Using Classical    Huber Influence  Proposed Influence  Multivariate Trimming    Weights Beta  Chi Squared  Initial Estimate Classical  Robust  Right Tail Cutoff A number between 0 01 and 0 8  active only    when PROP or Huber are chosen     Trimming percent An integer between 0 and 100  active only  when Multivariate Trimming is used     The values for number choices can be typed directly on the screen after using th
121. he she is currently using by pressing the   F1   key     8 2 Other options    The six options for the System menu are shown in Figure 8 1 below     Film Data Classical Hmthad Rosust Natihod PCA Gcagnica Systan  4AAAAAAACAAAACAAAAGEAAAAGCAAAACAAAAE AA AAEAAAAG AA AA AA AA AAA AE AA A nny  4AAAAAAACAAAAEAAAACAA A amp AEAA AR CAR AKCAA AR amp CAAAACAA A amp 6C AA A46 A 4 amp 4    4444   Usxmc a Guida  4AAAAAAACAAAAC AR AR CAR AR EAR ARCAR AK CAR A amp  amp A A amp  ACA A44 A    A4 4464 444    4444   latacnat inn    444A4AAAEAAAAEAA A ALAA A ACA A A ACA AREAA ARCA A A46 AA A A46 A AA Ak EA  amp k A46 AA A44    Sound  On     44A4AAAAEAAARCAAAACAA AAEA AA ACA AAA EAR AREA A A AE AAA ACA AA AER AK AE A A    Pcinkmec Satu  4AAAAAAACAAAAEAAAACAA AAEAA AR CAR AKCAR A amp EAAA amp CAA A amp 6E AA A amp 4   CAA4 amp 4   4444   OOS Shall    ERKRRRRRERRRRERRRRERRRRERRRRERRRRERRRRERRAAERRRAEARAAEARA AERA EA A Hinta  Dal      4444444 4  AA4 AEAA AACAA AR LAA AA EAAA4EAA 4 ACAA A AEA4 AA AA A46 A4 4464444  Exit    LLLA    SARAAAAAEAKAAAEAAAAEAAAAEARAAAEAAAAEAAAAEAAARAEAAAAE AAA AE AAAAE AAA AE AAA AE AAA AE AAAAAAK     SAAAAAAAREARAAAREARARARERAAAERAAAERAARAEAAARAEAAARAE AAA ARE AARAAE AA AAEAAARK     SAAAAAAACAAAAEAAAAGCAAAACAAAACAAAAE AK AA AA AA AA ARE AA AA AA AA AK AA AK AAG AA AA AA AAA   4AAAAAAAEAAAAEAAAACAAAACAAAACAAAAE AAAAE AA AAG AA AAECAA A AC AA AA AA AA AK AA AA AA AA AA AA   4AAAAAAAEAAAACAAAAGAAAAGCAAAAC AA AA AA AA AA AA AA AA AA AA AA AA AA AA AK AA AA AA AA AA AA   LL AG  
122. hed geostatistical technique frequently used in site  characterization studies  However  OK assumes that there are no spatial trend present  and the mean  concentration at each location is constant within the region under consideration  This assumption  is often violated by the data collected from a polluted site  Therefore  in order to use OK to  characterize the site under study  data with spatial trend need to detrended so that the constant mean  assumption is satisfied    Scout offers the D Trend option for removing trend that might be present in a  geostatistical data set obtained from a polluted site  It assumes that the data is in the same format  as for the pattern recognition option with the population IDs in the first column  Using an   appropriate multivariate technique  first the data has to be partitioned into various strata with    Scout User s Guide 14 31    Chapter 14 Statistical Procedures    significantly different statistics  e g   mean vectors   Using the geographic information of the  sample observations  a site map can be prepared exhibiting the actual sampling locations and the  respective population IDs  The D trend option subtracts the respective sub population means from  each observation in the corresponding sub population  The resulting data satisfy the constant mean  assumption   Add Means  This option is used after OK has been performed using the detrended data and a file with   extension  grd  has been created  The means subtracted using the D T
123. iagonal represents the correlation of a variable with itself  and therefore  always has an  r   value of 1 00  All other points represent the correlations of the various variables with each other     Scout Tutorial 13 1    Chapter 13 Tutorial V    Iris data in full    Variable Minimum Maximum  pt  length 1 0 6 9  pt width 0 1 2 5    Variance  3 12  0 58    0 963   0 927   gt  r  gt   0 75    0 75  gt  r  gt   0 50    r  lt  0 50      View Scatter Plot    Help   gt  Exit       Use key pad to select variables    Figure 13 2  The variable matrix for two dimensional graphics     Focusing on the highlighted point in the matrix  use the  lt RIGHT  gt    lt LEFT gt    lt UP gt  or    DOWN  arrow keys to select the variable combination for an X Y scatter diagram  For the  current tutorial  use the pt length and pt width combination  bottom row  second from the right   or  reflectively  fourth row  far right    After the variable combination is selected  as shown in    the header information of Figure 13 2  press   ENTER    to generate the scatter diagram as shown  in the Figure 13 3     Iris data in Tull    Dar ilal Hinimurn Ham  Uar imis  pi vidll 0 1 F 1 2 0 52  l  lae lle i o   12    e e LEPSE TH J  rg M  ttt  White r l  18  fh          3  La   I     2    fireent lily 2 r bz I gt   Rw  r    0 30    ENTER  Select Uarisllux  Fi  Hulp  EIC  Exil       pt wrih    Figure 13 3  The scatter plot for pt length and pt width selected from the  variable matrix shown in Figure 13 2     Scout 
124. ial Ill  11 4 Statistical Intervals    For this section  we use the data set 4 METHYL DAT from the Scout Data directory  use   Read ASCII File  in the  Files  Menu  select 4 METHYL DAT  press  lt ENTER gt    From the   Robust Analysis  menu  select  Display Graphs For      press   ENTER    select  Control Charts  Simul   Xi    press   ENTER    and return to the  Robust Analysis  menu   Select  Statistics  Options   set the parameters to match those shown in Figure 11 20  move to  Accept New  Settings   press   ENTER    and return to the  Robust Analysis  menu     Film Data Clazazical Natnod Rosust Nathod PCA Gcagnica Systan    4AAAAAAACAAAAGAAAACAAAAGC AAA A AA A A nn 4    AAAACAAAAEAAAAAA   4AA44AAAEAA4AGA4AA6A4AA4  AA 4464444444  Sulmct Vaciasnima 4AAAAAAEAAAAEAAAAA   4444A4A444  A44A4644A4A46A4A46AA4A4  AA44A44   Uniwaciabm Statistica  eaaaanneanAAe44AnA44   44444A444  A amp AA4   A4A4A46AAA46   AA4A44   A444444   Rnuaust Analysis leanaankeannAeanaAua   4AAAAAAAEAAAACAAAAEAAAAEAA 4A AC A44 4AA4   Caontusinan Nateix 4AAAAAAEAAAAEAAAAA   4A4A444AA44AEAA4AGCA4A40AA4A4  AA4 A 4   AAA4A A444   Pattaen Racogaitian leccnctceceeeeecerd   444A4AAAEAAAACAAAACA4AA amp CEAAAA4   AA4A44A44   0 Teand l eaaananeanaacanaas     AAA A A ALAS Statiskbical Dgbinaa iiit   sa444444   Congubm Statistica Using Peo Int lumacm leccccecee     APEEP Initial Katinata Rosuat lecseeeaeen   e  aanaaa l Nateix Cocecmlation leexceeeea   e  aanaaa l waignta Guta leaaaaanna   4AARAAAAAR  4AAAAAAARA   eR   4 
125. ical value are different from  those shown in the original outlier test because the dimensionality is reduced by one variable   The  variable  column provides the name of the identified causal variable  This is the variable  that  when present  always allows rejection of the discordant observation  The  Observed  column    Scout User s Guide 4 3    Chapter 4 Classical Methods for Outlier Identification    displays the value in the data set for the discordant observation and causal variable  The   Expected  column gives a prediction of the value by using multiple regression and the values  reported for the other variables in that observation   Low Lim  and  Up Lim  provide the lower  and upper limits  respectively  for a prediction interval  The type I error rate  alpha  of this  interval is the same as was chosen for the outlier test     This process is designed to identify cases where  apparently  the discordancy resulted from  substantial deviation in a single variable  This can occur when large errors in measurement are  independent  or when typographical  recording  and transcription errors cause the outlier  For  example  for the third variable in a ten dimensional data set  recording 73 56 as 37 56 or as 735 6  may cause the associated observation to be identified as an discordant  If so  executing the  Causal Variables routine will probably indicate the third variable as the cause of the discordancy     4 5 Associated Causes    This feature allows users with sufficient
126. ication results  For example   in environmental applications  it is possible that a distorted discriminant function can classify a  reasonably clean sample as coming from the contaminated population and a contaminated sample  as coming from the clean population  the background     Fisher s Robust Method for Discriminating Among k Populations   Fisher s robust classification  Anderson  1984   Singh and Nocerino  1995   procedure  is included in Scout  The procedure has been tried on some real environmental and historical data  sets  Fisher s iris data set has been used in Chapter 11  The population parameter  pj  and the  common covariance matrix  E   need to be estimated based upon training samples of size n  from   population  B  i  1 2     g  These estimates can be obtained using an appropriate procedure    Scout User s Guide 14 35    Chapter 14 Statistical Procedures     classical or the three robust procedures     Fisher s method also provides a very convenient and effective way of graphical separation  of the p dimensional data in terms of a few discriminant functions    s   The graphical displays of  the first few Fisher s discriminant functions reveal possible groupings and clustering of the g  populations  It should be pointed out that the derivation of Fisher s discriminants does not require  multinormality of the distribution of the underlying g populations  Under normality and equal  covariance matrices  Fisher s discriminant functions reduce to the linear discrimi
127. idth  and vertical  height  dimensions of the graph that is to be  printed  The actual size of the graph that is printed depends upon this scaling percentage   the page orientation  and the printer in use  The larger the percent scaling  the larger will  be the printed graph  To change your selection  highlight the scaling parameter that is to  be adjusted and press   ENTER    in order to edit the scaling value  Input the desired  value     X and Y Starting Locations  Use the X Starting Location to set either the height  of the bottom of the graph  in pixels  from the bottom of the page  Similarly  use the    Scout User s Guide 8 2    Chapter 8 System information    8 3    Y Starting Location to set the left margin  Highlight the location parameter to be changed  and press   ENTER    to edit the location value  Then input desired location     Formfeed After Print  This feature causes Scout to send a form feed command to  the printer after each graph  This will cause the printer to output one graph per page  You  would not select this choice when more than one graph per page is desired  Highlight   Formfeed After Print  and press   ENTER    to toggle from  Yes  to  No  and from  No   to  Yes      Specify Printer Port  This heading is used to change the printer port for output of  graphs  Scout defaults to LPT1  but the user may also select LPT2 or LPT3  Highlight  Specify Printer Port and press   ENTER   as needed to change the selection    DOS Shell  This choice temporarily s
128. ighlight any desired variable  To assign the highlighted  variable to an axis  type the letter of the desired axis  X    Y   or Z   When all three axes have been  selected  press   ENTER   key to view the graph     The user has complete control over the position  size  scale  and rotation of the graph  The  user can also identify and modify individual points or observations that make up the graph  The    Scout User s Guide 7 4    Chapter 7 Graphics    next few paragraphs will cover all of these controls  Should the user forget any of these controls  while in the 3D graphics mode  pressing the   F1   key will bring up a summary of them  When  the user is finished viewing a graph  pressing the  lt ENTER gt  key will return the user to the  variable selection screen  Press  lt ESC gt  to exit 3D graphics mode and return to the main menu     7 7 Moving 3D Graphs    The user can move the graph anywhere within its window on the screen  Pressing the  M   key puts the graph into movement mode  The arrow keys can now be used to move the axes to  the desired location  To exit this mode press  lt ESC gt     ENTER    or  lt SPACEBAR gt      7 8 Change Size of 3D Graphs    The user can change the size of the graph by zooming in and out of the plot  The  lt   gt  key  zooms into the plot which makes the graph appear larger  The  lt   gt  key zooms out of the plot  which makes the graph appear smaller  Each of these keys can be used as many times as needed     Scaling 3D Graphs  When the gr
129. in the right tail of the distribution of the Mds  labelled as Right Tail Cutoff in Scout  is needed in the process  Most practitioners are familiar with  choosing a significance level    value in their applications as all of the statistical tests typically use    some   level of significance  The M estimates obtained using a smaller value of    e g    0 001 0 005   usually correspond to the classical estimates  whereas larger values of     such as 0 2   0 25 help unmask multiple outliers in small data sets of large dimensionality  or even unmasking  multiple groups of discordant observations  e g   see the example on the four dimensional stack loss  data set of size 21 in Chapter 11   A few values  2 4  of   may be tried on the same data set  All  of the observations within the  1 a   100  confidence ellipsoid  after the final iteration  can be  considered to be inlying forming the main body of the data set  Moreover  no small sample  correction factors are required to provide appropriate coverage and to achieve consistency when  samples come from normal populations  The PROP procedure described here  Singh  Singh  and  Flatman  1994   can also be effectively used to decompose a mixture sample into component  populations     The multivariate kurtosis statistic  Mardia  1970   and Mardia  1974   1s also available in    Scout which given by the following equation     Scout User s Guide 14 15    Chapter 14 Statistical Procedures    b   udi  4     where the distances  Md  are giv
130. ions to be set aside  should be used for the  multivariate trimming procedure  For details see Singh  1993     Two Choices for the Numbering of Points on a Scout Graph   The points on a graph generated by Scout can be marked either by the observation number   numbers from 1 to n  or by the population ID  positive integer between 1 and 20   Thus a  maximum of 20 populations can be handled by the pattern recognition procedures  e g   PCA   Discriminant and Classification Analysis etc  in Scout  The default option is numbering by  observations  Numbering by population is used when multiple populations are present  This option  is used for pattern recognition techniques such as the PC analysis or discriminant analysis  In order  to use this option  the first column of the data file should have the population ID code  e g   see the    Fulliris data set      Ignoring a Population    Scout User s Guide 14 10    Chapter 14 Statistical Procedures    The user can de select a population  the population ID should be in the first column of the  data file  which will be ignored in all subsequent computations  For example  if enough  observations are not available or if one of the populations is significantly different from the rest of  the data  the user may wish to ignore those observations for the rest of the statistical analysis   However  user has the choice to plot or not to plot the observations from the ignored population   The default is to plot the data from the ignored population
131. isadana  Statistica o  J J Plot  Simul  Ran Ostaj      Classical Jecereeeee   eeneeaee   Zm   cnuo Loar L  J J Piot  Simul  Stendecdi zad  lececeeeee   cataaaaa   Limit Styla   Scattac Plot  Raw Datal e  Two Sidad 222222222       aaaaxsa l x Axia vacial J J Plat LPCAJ lenanaanan   saanaaaa   Y Axim Vacial  Scaktme Plot LPCAI lenanaanaa   enaanax2a l Titis la J Plat  Ganwcalizad O iat   Just Analyaia l exanaaanas     aanaaa2 l X Axia Titian   Canbcnl Chacta Indiw  Xil   4AAAAAAA  K   e  aaaaana l Humnamcing      Canteol Chacks Simul  Xil  p322 cat rons lecceeceee     ananaadx   Contouc Elli   Conteol Chacts LUmtmcta  jaa iw    Sinul  enaaanaaa   exaaaaaa l   Cl Limita Poguletion Naan   lececxccee     eanaaaa2A   Ecasm Dutgue   Pemdiction lobzmcvala 4AAAAAAAA        sanaaasas   vimm Maignba  Indas Prata      imis  urs lecccececa        zx 4ax4 l Gmsahomcabm Gea   Mulbiwaciabm Kucbnsi3 Jeceteeece   qAAAAAAA  R     4   AAAAARK     Dicmctacy  C  SSCDUT 9SSDATA Filmahanm  IRIS  ODAT       Figure 11 2  The menu for  Select Graph Type   resulting from selection  ofthe  Display Graphs for  heading in the  Robust Analysis   menu     Move the cursor to select  Generate Graph With Current Options   Press   ENTER    to  generate the graph  on the graph  notice the highlighted data point  Press  lt SHIFT   gt  and the  identity of this data point will be revealed  use the up arrow key  press  lt SHIFT   gt  again and the  identity of the next point will also be displayed  Using the arrow k
132. ive of highly contaminated areas  sections of a forest in poor or degraded states   inconsistent analytical results in a typical quality assurance and quality control  QA QC  program   or gross typing errors    Outliers  when present typically distort the classical estimates and the associated statistics   which in turn can result in incorrect conclusions based on the statistical inference employed  It is   therefore  important to identify and consequently down weight the outlying observations  appropriately  Several classical and robust outlier identification procedures are incorporated in the  Scout software package  A brief description of some of the statistical procedures used in Scout is    given in this chapter  Sufficient references are included for statistically oriented users     Scout User s Guide 14 1    Chapter 14 Statistical Procedures    Various state and federal government agencies  local communities  and industries often  need to estimate the extent of contamination at polluted sites  The entire cleanup process is  expensive and time consuming  It is  therefore  important to obtain these estimates accurately  The  presence of discordant observations can distort the entire estimation process  The use of robust and  resistant procedures is essential in the estimation phase  e g   robust kriging rather than the classical  kriging would characterize the polluted site much more accurately   Given a sample of size n from  a polluted site  the sample may represent th
133. kacy  C  SCOUTISSOATA Filmhamm  IRIS  OAT       Figure 13 6  The System menu with the User s Guide menu also displayed     The  Information  choice provides the Scout version number  and information about the  computer system on which Scout is loaded  The explanation windows can be toggled on or off  by using  Help Messages   The  Printer Setup  menu can be used to formate print output for  specific printers and requirements  The menu of various printer parameters is shown in the  Figure 13 7  The  DOS Shell  allows a user to execute DOS commands without leaving Scout   And  Exit  will first ask users if they re sure they want to exit  REMEMBER THE CAUTIONS  ABOUT DATA TRANSFORMS ALTERING FILES AND SAVING DATA UNDER  APPROPRIATE FILE NAMES   and if they do  return them to DOS     Scout Tutorial 13 5    Chapter 13 Tutorial V    Film Data Classical Nathod Rosust Nataod PCA Geagnica Syatan    4AAAAAAACAAAAEAAAAEAAAAGCAAAAC AA AAECAAAAG AR AAE AA AAKGAAA AE AA AA     A A A A3  4AA4AAAAAEAAAACAR AREAAARKCAR AA amp EAAAACAA A amp EARA AAECAA A amp CAA A46 A 4A44    A444  Uamc a Guida    4AAAAAAACAAAAGCAAAAEAAAAEAAAAEAAA amp CARAKCARA amp AR A4 amp EA A4 amp  A46 A 4 44   4444   Iatacnat ina    4A4AAAAAtAA4ACAAA AGAR ARELAA A AEA A A4 EAR A46 AA A ACA A4   A A A448 A A4 464444  Peintme Satug  4AAAAAAAEAAAACAA AREAAAACAR A amp AEAAAAKCAR A amp CARAACAA A amp  AA AA     A 44464444  OOS Shalt    444A4 4AA6AA4ACAAAAEAA ARLAA 4 AGA AAA EAA A AL A AA ACA 4 AA EAA 4 AE A AA 4C A4 A44   Hai
134. lized Distance and Multivariate Kurtosis both lead to the same    menu of three choices  cutoff values for  of 0 10  0 05  or 0 01  Once an      is selected  the  data are analyzed  and the results posted to the screen     10 2 Determining Causal Variables  and Removing Flags    Working immediately after Multivariate Kurtosis has detected the outlier  select the   Causal Variable  choice to determine the variable s  that caused the outlier  A variable is  identified as a cause if  when removed from the analyses  the observations are no longer outliers     Scout Tutorial 10 2    Chapter 10 Tutorial ll    The output is sent to the screen identifying which variables displayed values outside the expected  range     Scout Tutorial 10 3    Chapter 10 Tutorial Il    The  Remove Outlier Flags  choice is merely a means of unmarking the data that has been  identified as outliers  Once Generalized Distance or Multivariate Kurtosis has identified outliers   these outliers are colored red in the data file  The  Remove Outlier Flags  choice turns the red  data back to white  the original color of the data  After identifying the outliers with Multivariate  Kurtosis  move the cursor  highlighted rectangle  to the  Data  heading  and select  Edit Data    we will NOT be editing the data  merely examining it   Once the data is on the screen  use the  up and down arrow keys to examine the data and identify the red outliers  Now exit  Edit Data    return to  Classical Method   Remove Outlier Flag
135. mat has to be followed  As described in Section 2 1  the format requires  including information in the file as follows      a  the data set name or title  line 1    b  the number of variables  line 2     Scout User s Guide 2 2    Chapter 2 Scout File Format     c  the names of the variables  lines 3 through X  where X 2 is the number of variables    d  the values of the variables  optionally including the labeling of each data record with a  comment in single quotes      lines X 1 through the end of the file     Example spreadsheet file prepared for conversion to Scout     Geostatistical Environmental Data  3   Arsenic   Cadmium   Lead    850 11 5 18 25  Sample 1     630 8 50 30 25  Sample 2    1 02 7 00 20 00  Sample 3    1 02 10 7 19 25  Sample 4    1 01 11 2 151 5  Sample 5     In this example  the data set name should be in spreadsheet cell A1  the number of  variables in cell A2  the variable titles in cells A3 through A5  and the values of the variables  should be in cells A6 through D10  In the spreadsheet  the column D6 to D10 contains the  name of each record  each of them must be with in single quotation marks  In some of the  spreadsheet Software  such as Excel  you may have to enter one or two space bars before the  left quotation marks for the data labels  the D column in this example   Remember  both  single quotation marks should be visible from the spreadsheet before you save the spreadsheet  file in a Space Delimited or TEXT format  One or both of these forma
136. mate u and F   respectively  The median  M and 6    are computed by first arranging the data    in ascending order  Xaj XQj     Xa  The median  M  and the absolute deviations from the    median         xaM     i 1 2     n are computed next  The median of these deviations   MAD  is computed  Next  for data sets from Gaussian populations  the statistic  6       MAD 0 6745 is an unbiased estimator of the population sd  F  The use of M and 6    as the initial  start estimators in the iterative process of obtaining robust M estimators of location and scale has  been recommended in the literature  Devlin et al   1981    These statistics can be obtained using  the univariate statistics option of the robust method menu in Scout   Outliers in Univariate and Multivariate Data Sets   In order to obtain robust estimators of location and scale  a chi square  x   approximation  is typically used for the distribution of the distances  Md    The Md  are then compared with an  associated chi square reference value  Md   gt  satisfying the probability  statement  P Mdj  lt  Md  a    1      i  1 2     n  This statement represents an  approximate confidence ellipsoid for individual distances  Md    Observations with Mds larger  than the reference value are declared as outliers  However  it has also been suggested that these  cutoff points should not be used too mechanically  Cook and Hawkins  1990   Fung  1993    Atkinson  1994    The MVE based robust procedures  Rousseeuw and Leroy  1987   are 
137. metocy  C STOUSE Fi lanana  TEST UAT       Figure 3 1  The Data menu displayed showing four options with Edit  Data selected     The data set will appear in the form of a spreadsheet  You can move about the screen  and highlight any data cell  A data cell may be a label for a given observation or a value in an  observation for a particular variable  The keys for moving about the screen are the four    ARROW  keys    PAGE UP      PAGE DOWN     lt HOME gt   and   END    Observations  that appear in red have been flagged as outliers  Press  lt ESC gt  to return to the main menu  when finished     Editing Observations or Labels  Highlight the data cell you wish to edit by moving  about the screen with the keys mentioned above  then type the correct value or label and press    Scout User s Guide 3 1    Chapter 3 Managing Data in Scout      ENTER    Repeat this procedure for each cell that you wish to modify  If you are in the  process of changing a cell s value and decide that the original value was correct  you can  restore the original value by pressing the  lt ESC gt  key     Deleting Observations or Variables  Highlight the observation or variable that you  wish to delete  Any portion of the desired observation or variable you wish to delete can be  highlighted  Press the  lt DELETE gt  key  You will be given a choice of  Observation      Variable      If you wish to delete an observation  i e   an entire row of the spreadsheet  press the   O  key or the   ENTER   key  A scree
138. mit lecccceeec     esa n44n2   Limit Styia PEPPPICOTA      a444444   X Axis Vaciaaim leanaannaa     eanaaana l Y Axis Vaciaaim leanaanaaa   PEPEE EETA   Titia Rosust Analysis leccxceeea   PPPE   X Axis Titia 4444A4A4AA4   daadaazal Hunameiag Dasacwationa lecccacaaa   eaaaaaA442   ContaucEklligam Indiv    Simul leaaaanaaa     anaaaaana l PTITPEP  ENA  exukxaaa   l Ecasm Dubgutk Film leissaran   sanaaaan    Claw Waigata    Ganacal izad O1atancea IRIS NTS leccccceea   sannaaaa   Ganacata Gcagn With Cuccmat Dgtiona leccnacead     CREE    4 4  amp   amp   amp  4 4  Dicmcktacy  C   SCDUT9SSDATA Filmahanm  IRIS  OAT       Figure 5 1  The Robust Analysis menu coming from selection of Robust  Analysis in the Robust Method menu     5 3 Univariate Statistics    This heading computes univariate statistics  The four methods mentioned in the  introduction to this chapter are available   1  the classical maximum likelihood estimator  MLE     2  the Huber   3  the proposed  PROP  robust method  and  4  sequential trimming  The  weights can be computed using the exact Beta distribution of generalized distances  or the Chi   square approximation     To perform Univariate statistics  use the up and down   ARROW  key to select   Univariate Statistics  from the menu and use the  lt ENTER gt  key  At this point  a window  entitled  Univariate Robust Statistics  will be displayed  This window can be used to set various  options for calculating Univariate statistics  This window has five main headin
139. n will then appear  asking if you are sure that you wish  to delete this specific observation  The default answer to this question is  No   If you are  sure that you wish to delete the observation  type a  Y  or move the cursor to  Yes  and press    ENTER    Repeat this procedure for each observation you wish to delete     Similarly  if you wish to delete a variable  i e   an entire column of the spreadsheet    press the  V  key or highlight  Variable  with an  lt ARROWS3 key and press   ENTER    A  screen will then appear  asking if you are sure that you wish to delete this specific variable   The default answer to this question is  No   If you are sure that you wish to delete the  variable  type a  Y  or move the cursor to  Yes  with an ARROW  key and press    ENTER    Repeat this procedure for each variable you wish to delete     Inserting Observations  This heading allows the user to insert observations  i e   rows   to the data set  Move about the spreadsheet screen until you find the row in which you wish  to insert an observation  Press the  lt INSERT gt  key  You will then be given a choice of   Observation  or  Variable   Select  Observation  by highlighting  Observation  with an    ARROW  key  if necessary  and then pressing   ENTER    or by pressing the  O  key  You  will then be given a choice of what you wish the inserted observation to be  You may choose  it to be the arithmetic mean  geometric mean  or median of all of the observations for each  variable or you
140. nant functions  The  discriminants are extracted by maximizing the between groups variability relative to the within   groups variability  E     The linear combinations  y  1 x  i 1 2     S   are called Fisher s discriminant  functions  Scatter plots of the pairs    Y   yj    i  j 1 2       S  represent valuable graphical  displays of between group separation  The constant distance ellipses can also be drawn individually  for each of the g groups on the scatter plots of the discriminant scores  see fulliris data example   Chapter 11   These plots provide a formal visual separation among the various groups  The Fisher s    classification rule is  assign an observation x  to n  h 1 2     8   if    Y  1  x             minimum D  1   x 4        1 1  2       g   15   Graphical displays of the discriminant functions coupled with the contour ellipses reveal  the group separation  or overlap  very effectively  Moreover  the scatter plots of the discriminants    Scout User s Guide 14 36    Chapter 14 Statistical Procedures    versus the original variables can also be used to achieve additional insight for graphically  identifying those variables that are the most significant in discriminating among the g populations    under consideration     Scout User s Guide 14 37    Chapter 14 Statistical Procedures    REFERENCES  Anderson  T W    1984   Introduction to Multivariate Statistical Analysis  Second Edition  John    Wiley  New York     Atkinson  A C   1994   Fast very robust methods fo
141. nd an  acceptable value for  a  and wish to abort this process  press the  lt ESC gt  key     3 4 4 4 Arcsine    Transforms the data by using the Arcsine function  All of the data must be between  zero and one  This transform is typically used on data representing proportions     3 4 4 5 Undo Option    Undesirable transforms that have been selected can be removed with the  Undo Last  Transform  choice in the menu  Transforms must be undone in the reverse order that they  were selected  This feature gives you great flexibility to try various transforms without the  risk of damaging your data  Your original data in memory is not modified until you are  finished testing and selecting the transforms for all of the variables  When you wish to exit  the transform module  the program will ask you to verify that the variables be modified with  the selected transforms     3 4 5 Remarks on Transformation    When you have finished selecting the transforms for each of the variables and you are    Scout User s Guide 3 9    Chapter 3 Managing Data in Scout    ready to exit the transform module  Press the  lt ESC gt  key to do so and answer the question  box with the  lt Y gt  key  Another question box will appear asking you if you wish to modify  the variables in memory by doing the transforms that have been selected  Until now  your  original data has not been modified  you have only been testing the transforms  Answer the  question with   ENTER   or the   Y   key to apply the transforms 
142. ndardized  observation vectors  The user may wish to graph the component scores later using the Graphics  menu discussed in Chapter 7  In order to do so  these scores need to be saved  Users can save  component scores using the Transform Data heading  Before the component scores can be  graphed  Scout must be instructed to save the component scores  The component scores will  replace the original data in the memory     CAUTION  Scout uses the same computer memory to store the component scores as that used  for the original data  The  Transform Data  heading will overwrite the original data with the  component scores  If a user generates component scores and then saves them to the same file  as the original data  the original data will be lost  Therefore  once generated  the component  scores need to be saved to a different Scout file to avoid loss of the original data  However  the  PC scores  classical or robust  can be saved in the same data file without overwriting the  original data by using the Robust Method menu where extra columns are added to the data  file     Scout User s Guide 6 2    Chapter 6 PCA    The transformed data may consist of component scores and original variables  The user  must be careful not to misinterpret the resulting data     Scout User s Guide 6 3    Chapter 7 Graphics    7 1 General Description    Scout features two graphics options  2 dimensional and 3 dimensional  2 Dimensional  graphics are used to display bivariate plots  also known as scatt
143. nece   444AAAAAEAAAAEAA ARAEAA AREAA A ALAA AA AAA AEA 4A AC A4 A464 4 4 4               qa 4 4 4 4 4     AAAAAKAAEAAAAEAKAAAERAAAAEAAAAEAAAAEAARAAEAKAAAERAAAAEAARAAEAAAAEAARAAEAKAAAEARAAAEAAAAAAR     T                   Cawaciancm Mabcic 1    Pcmas CP   to a2cinb ac CESS to sxt                    SARAAAAAEAAAAEAARAAEAARAAEAARAAERAAAKAEAAAAEAAAAEAARAAE AAA AEARAAAEAAAR AE AAA AE AAA AE AA AAAAK     Oiemetocy  C  SCOUTISSOATA Fi lanana  IRIS  DAT    Figure 12 6  The covariance matrix for the principal components     12 4 Summary       IV    There are six options in the PCA module in Scout  the options are displayed in the  first window when PCA is selected from the Scout s main menu     The Select Variables option in this module is identical to the Select Variable  option in any other module of Scout     For each heading in the PCA menu  except for  Select Variables   there are two  choices   1  Covariance and  2  Correlation     Any output from the PCA module can saved by using the   P   key and typing the  desired path followed by the file name      Display Matrices  allows users to view the variances and covariances between  any set of selected variables     The cumulative variance table can be calculated using  Eigenvalues   and the  component loadings can then be viewed using  View Components     Transform Data  replaces the original data with principal components     Scout Tutorial 12 6    Chapter 13 Tutorial V    Graphics and System    13 1 Graphics    The Graphics menu
144. normality is not of concern  the Q Q plot can be replaced by a simpler index plot with  the sample index number running along the horizontal axis and the Mds plotted along the  vertical axis      Draw a horizontal line at the o 100  critical value  Md of Max Mds   which is given  by the following simultaneous confidence ellipsoid     P Md   MdP  i  1 2     n   1 q   or  7     equivalently  using the Bonferroni inequality is given by the statement  P  Md    MdP     n a  n    8   This horizontal line is labelled as  Maximum  Largest Md   on the Q Q  or index  plot      Finally  draw a horizontal line at the 01008 critical value  Md  a obtained from the  distribution   n 1   B p 2   n p1  2  n   of the individual distances  Md   satisfying  P Md   Md  a     1 0    i 1  2       n   9   This line is labelled as  Warning  Individual Md   on the Q Q plot  or index plot     Observations falling above the horizontal line obtained using  8  are potential outliers  and   observations lying between the two horizontal lines given by  8  and  9  need further examination    and points falling below the line given by  9  represent the main stream of data   For univariate populations  the simultaneous confidence interval can be obtained by    substituting p 1 in equation  7  and is given as follows     Scout User s Guide 14 19    Chapter 14 Statistical Procedures    P x  s JMdP  x   Xx   s JMdP  i  1 2     m   1 a     10     The estimates used in statements given by equations  7  through  10 
145. nt mean assumption can be satisfied before    proceeding with ordinary kriging  OK      14 3 Options Available For Robust Procedures   Two Options For The Initial Start Estimates  As recommended in the literature  an initial robust start in iterative robust procedures helps   in unmasking multiple outliers  and also in producing reliable estimates with a higher breakdown   point  Scout offers two options  given below  for the initial estimates to be used in the iterative   robust procedures  HUBER  PROP  and MVT        Classical initial start for estimation of location and scale  e g   simple mean vector and the  covariance matrix       Robust initial start with the vector of medians  and the covariance matrix with the estimates  of standard deviations to be the corresponding MADs 0 675  where MAD represents the  median absolute deviation given in the following     Scout User s Guide 14 8    Chapter 14 Statistical Procedures    Two Options For The Distribution of The Mahalanobis Distances   As mentioned earlier  most of the robust procedures such as MVT  MVE  HUBER use the  Mds  Under normality  the Mds are known to follow a scaled beta distribution  However  due  computational ease  a chi square or a normal approximation is typically used for the distribution of  the individual Mds and their corresponding cut off points  which may not lead to correct  identification of outliers  especially for large dimensional sets of small to moderate sizes  Today   using the fast personal
146. ocedures    14 1 Introduction to Statistical Procedures for the Identification of Multiple Outliers  14 2 General Description of Statistical Procedures in the Scout Software Package  14 3 Options Available For Robust Procedures  14 4 Robust Procedures in Scout  14 5 Normal Probability Q Q Plots of the Original Data  and of Principal Components  14 6   Q Q Plot of Mahalanobis Distances Using Beta Distribution  14 7 Contour Plots  14 8 Robust Principal Component Analysis  14 9 Interval Estimation  14 10 D Trend and Add Means  14 11 Outliers in Discriminant and Classification Analysis    REFERENCES    11 1  11 5  11 9  11 18  11 22  11 24  11 26  11 27    12 1  12 2  12 4  12 5    13 1  13 4  13 6    14 1  14 6  14 8  14 12    14 17  14 18  14 20  14 21  14 29  14 32  14 35    14 39    Chapter 1 Preliminaries    1 1 Introduction    Scout is a univariate and multivariate data analysis tool  Several classical and robust  procedures such as outlier testing and interactive 2D 3D graphics are included in Scout   making it a useful package for environmental and ecological applications  Straightforward  principal component  classification  and discriminant analyses are included to increase the  versatility of the software package     Scout may be used to     1  transform data    2  assess the normality of variables in the data set    3  produce histograms and Q Q plots of raw data and principal component  PC  scores    4  produce scatter plots of raw data  of PCs  and of discriminant s
147. ociated with the 3 Dimensional  graphics  consult the user s guide for further instruction  or simply work with the software   remembering to use  lt F1 gt  for help when needed     Scout Tutorial 13 3    Chapter 13 Tutorial V    Iris data in set    Variables Help    Exit  Variables  Search Mode    count  sp  length  sp uidth       Figure 13 5  One of many possible perspectives of the three  dimensional graph from Figure 13 4     13 2 System    The System menu has six options as shown in the Figure 13 6  The User s Guide heading  leads to a menu of various topics  similar to those covered in this document  To access  information on any aspect of Scout  move the cursor to highlight the appropriate section of the  User s Guide  and press   ENTER    The menu of various sections is also shown in Figure 13 6     Scout Tutorial 13 4    Chapter 13 Tutorial V    Film Data Classical Hamthod Rosust Natihod PCA Gcagnica Syatan  4AAAAAAAGAAAAGAAAAC AA AAE AA AA AA AA AA AAGCAAA AG AA AA AA A A AA A A A A LL M    4AAAAAAACAAAACAAAACAAA AGAR ARCAAAACAAAACAAA4 amp EA amp  amp A46A4A4   AA4A44   4444   Usmc sa Guida  4AAAAAAACAAAACAAAACAAA AAA A amp EAAAKCAAAACAA A44EAA amp 4 amp KE AA AAC AA A446 4444   Ilatacnat inn    RAKARAAAEAAR AEAAAAEARAdEAARACARAAEAAA AEA4d ARCA A4 AE AA AREA AR424444   Peintme 5mtug  4AAAAAAAEAAAREAAAAEAAAACAA A amp EAAAREAA AREA A A amp CA A4A4 amp EA   A amp CA 44464444  OOS Shall    4AA4AAAALAAAACAR AAEA AA ACAA AAEAR AR EAA A AEAR AA CA AA AE  A 4 AG A 4 4464444
148. or is yellow and the default shape is an  x      To select a new color and shape  press the   F2   key  The current color will now be    Scout User s Guide 7 1    Chapter 7 Graphics    highlighted  Use the   UP   or   DOWN  arrow keys to highlight the desired color and then press    ENTER   or the  lt RIGHT ARROW  key  Now the current shape will be highlighted  Again   use the   UP   or   DOWN  arrow keys to highlight the desired shape and press   ENTER   to  complete the selection     To change the graph symbol  color and shape  of an observation  first use the   F2   key  to change the color and shape  then use the   UP   or   DOWN  arrow keys to highlight the  observation that is to be changed and then press the   ENTER   key  The graph symbol  corresponding to the highlighted observation then changes to the selected graph symbol shown  in the right window  The highlighter is then moved automatically to the next observation  This  makes it very easy to change a continuous block of observations by holding down the   ENTER    key     The user can exit this screen at any time by pressing  lt ESC gt  key  All of the changes made    are retained in memory  Sometime before exiting the program  the user should save the data in  memory as a Scout file so the changes become permanent  otherwise they will be lost     7 3 Command Summary for 2D and 3D Graphics    Scout recognizes the following field commands when either 2  or 3  dimensional plots are  displayed      lt F gt  Outputs 
149. or the 4 METHYL DAT data        Scout Toturial 11 20    Chapter 11 Tutorial Ill    Using the same data set  construct the prediction interval for future observations  Select   Display Graphs For      press   ENTER    choose  Prediction Intervals   press   ENTER    and  then model the rest of the  Robust Analysis  menu to match Figure 11 23  To generate the graph   choose the  Generate Graph With Current Options  from the  Robust Analysis  menu and press   lt ENTER gt   The first output will display statistics and the prediction interval  see Figure 11 24   Press  lt Q gt  to reveal the graph  Figure 11 25      Film Data Classical MHmthod Rosust Nataod PCA Gcagnica Syatan    4AAAAAAACAAAACAAAAC AAA AC AA A AG A A AA AA a                   ARRERA RRR RRRA R  4A444A4444644446AA444  A4 A46 AA4A46   A 444444  Smimct Vaciasima leaaaanae ka 44en44A4     444A4AAAEAAAA  CAAAA  AAA4   AA4A4A amp 44A4A4444   Uniwectatm Statistica PEPEESEAEPSAY TSATPUS  444A44AA4ACAAA4   AAA4A4EAAAA4 amp AAA4 AE A A4A4A4444   Rosust Analysis 4AAAAAACAAAACAAKAAK   4444444A4  AA4AA4   AA4A44GAAAA4   A 444    4 44A4444   Contusion Nateix leaanannenAanAenanAA   4A4ARAAALAAAACAA4A  A4AA  EAAAA4CAA4 A444  Pattacn Racogartian leccacecececccecced   4444AAAACAAA amp CAAAACAAAA   AA4A4464444444   0 Teand   PEPE EETEPE ST ESET TS  4A AAA AA A MM        Ruaust Analysis ei eR A KA     aaaanA42   Dia2lay Geagna Fac Pemdickion Intacwala PEYPPPPPPT    anaaanaa l Statistica Options Peon Int lumacm lexaeecerae      aa  aa
150. phs For     in the  Robust Analysis  menu   Changing our file back to IRIS DAT  selecting  Scatter Plot  PCA   as described above  and  revising  Numbering  back to  Observations   we select  Generate Graph With Current Options    and press  lt ENTER gt   We now exercise two graphic options   1  press  lt N gt   and the identities   data labels  of the data points are displayed  and  2  press  lt E gt   and the contour ellipse is drawn  around the data  both the individual and simultaneous ellipses  if this option was not changed  since out last graph   With the exception of the title  your display should now match Figure 11   10  The title can be supplied by highlighting  Title     in the  Robust Analysis  menu  pressing   lt ENTER gt   typing in your title  pressing   ENTER    again  and the generating the graph     Scout Toturial 11 9    Chapter 11 Tutorial Ill    Scatter Plat of First Tuo PCs     4        c  v  c  D  a     9  a      m  a  Ez   u  c  E   b  a          4  t     1  amp 7    n a    Principal Component  2       Figure 11 10  The scatter plot of principal components  1 and  2  for the Setosa data     To draw the PCA scatter plots for data sets with multiple populations   Pattern  Recognition  is recommended  Change the data file to FULLIRIS DAT and move from  Robust  Analysis  to Pattern Recognition  in the  Robust Method  menu  Press  lt ENTER gt   view the  menu  and change any choices for the various headings to those shown in Figure 11 11     Scout Toturial
151. qumMM uamc a Gugr  m eM ER AcAAAAKA   cassada Allona tha uamc to wian tha antica Scout manual    nanu  zzz    44444444 nf majoc topica 13 geowidad a0 tha ussar can quickly tind Jeccecceee   esasta iatocmation saout any Logic  4AAAAAAA  A   CHAKRA  4A   AAAAAA   SAAAAAAACAAAACAAAAGAAAAGCAAAACAAAAE AA AA AK AA AA AA AA AA AA AA AA AA AK AA AA AAG AA AA AA   4AAAAAAAEAAAACAAAAGCAAAACAAAACARAAE AR AA AA AA AA AAEAA AA AA AA AA AA AK AA AA AA AA AA AA   SAAAAAAAEAAAACAAAAGCAAAAGCAAAACAAAAE AA AAG AA AA AA ARE AA AA AA AA AA AA AAA AG AA AA AA AAA A   4AAAAAAACAAAACAAAAGAAAAGCAAAACAAAACAAAAG AR AA AA AA AA AAC AA AA AK AA AK AA AA AA AA AAA   4AAAAAAAEAAAAEAAAACAAAACAAAACAAAAE AAA AK AA AAG AA AAECAA AA AA AA AA AA AK AA AA AA AA AA AA   Oiemetocy  C  SCDUTSDATA Filmoana  IRIS  OAT       Figure 8 1  The six options ofthe System menu     Information  This choice displays the Scout version and hardware configuration   including the processor  coprocessor  graphics adapter  and the amount of RAM found and used  on the system     Help Messages  The user can disable or enable the help windows that correspond to the  menu items  Unless the user is very familiar with Scout  disabling the help windows is not    recommended     Printer Setup  The printer in use must be specified in order for Scout to print graphs  This  heading allows the user to select the make and model of printer for graphs  The user can also set    Scout User s Guide 8 1    Chapter 8 System information  printer specific
152. r the detection of multiple outliers  Journal of    American Statistical Association  89  1329 1339     Barnett  V   and Lewis  T   1994   Outliers in Statistical Data  third Ed   John Wiley  UK     Campbell  N A   1980   Robust procedures in multivariate analysis I  robust covariance estimation     Applied Statistics  29 3   231 237     Cook  R D   and Hawkins  D M   1990   Comment on Unmasking multivariate outliers and leverage  points  by P J  Rousseeuw and B C  van Zomeren  Journal of American Statistical Association  85     640 644     Daniel  C   and Wood  F S   1980   Fitting Equations to Data  John Wiley  New York     Devlin  S J   Gnanadesikan  R   and Kettenring  J R   1981   Robust estimation of dispersion    matrices and principal component  Journal of American Statistical Association  76  354 362     Scout User s Guide 14 38    Chapter 14 Statistical Procedures    Dixon  W J   1953   Processing data for outliers  Biometrics  9  74 89     Fung  W   1993   Unmasking outliers and leverage points  A confirmation  Journal of American    Statistical Association  88  515 519     Hahn  G J   and Meeker  W Q   1991   Statistical Intervals  New York  John Wiley     Hampel  F R   1974   The influence curve and its role in robust estimation  Journal of American    Statistical Association  69  383 393     Hampel  F R   Ronchetti  E M   Rousseeuw  P J   and Stahel  W J   1986   Robust Statistics  the    Approaches Based on Influence Functions  New York  John Wiley     Horn  P 
153. rend option need to be added  back to the kriging estimates in the  grd  file  This can be achieved using the Add Means option   This option uses two input files  a statistics file with extension sts    Example sts  and a file with  extension add   Example add   The sts file should follow the same format as the statistics file  generated by Scout  A separate add file  e g   pb add  is required for each variable considered  The  add file has the following format   a bc  X  X  y  y  population Id1  X  X  y  y  population Id2  Repeat for each region of the site  Here   a     Total number of sub populations   b   Total number of variables   c   Number of the variable in the sts file    Scout User s Guide 14 32    Chapter 14 Statistical Procedures    X  X  y  y  are the coordinates of the boundary of a geographic region  a rectangle  belonging  to one of the sub populations  Thus  the region bounded by  x   Y    x5  YD   Xi  Y2   and  x3  Y2   belongs to the population with the corresponding ID    Example  The example add file for lead  Pb  is  Pb add   There are two populations  a 2  and 4  variables in the data file with b 4  Lead in the second variable in the sts file  therefore c  2   242   0 200 0 3500 1   200 3000 0 1220 1   1100 3000 1220 1700 1   1850 3000 1700 3500 1   200 1850 2780 3500 1   200 1100 1220 2780 2   1100 1850 1700 2780 2   So using this input file  when the add means option is activated  the mean of sub population 1 will  be added to all observations within
154. s   and press  lt ENTER gt   Return to  Edit  Data   re examine the data  and note that the previously identified outliers are now white     10 3 Summary       Outlier detection on any data set can be accomplished by using one of the two  options in the Classical Method menu of Scout        Each of the two outlier detection headings has three predetermined choices for      however  using the  Robust Method   any   between 0 001 and 0 8 can be    selected in the Generalized Distance test        In addition to outlier detection  Scout can be used to identify the variable that  caused the outlier        The outlier flags can be removed by using the  Remove Outlier Flags  option     Scout Tutorial 10 4    Chapter 11 Tutorial Ill    Robust Method    The following tutorial is on robust analysis  Classical and Robust techniques will be  applied on some well known data sets such as  IRIS DAT  Fisher s  Anderson  1984  iris data  on the Setosa species of iris   FULLIRIS DAT  data on two other species of iris  in addition to  the Setosa   4  METHYL DAT  data on the recovery of 4 methyl phenol from 1993 performance  evaluation samples   and STACKLSS DAT  Brownlee s Stack Loss data set  Daniel and Wood   1980    These data files can be found using the C  Scout Data   DAT path     11 1 Q Q Plots    Select the file IRIS DAT using  Read ASCII File  as described in tutorial I  Use  Select  Variables  from the  Robust Method  menu  choose only one variable  e g  sp length  by using  the      
155. s information can be printed either to a specified file or directly  to the printer by pressing the   P   key     3 4 5 Histogram Window    Histograms may be displayed by pressing the   H   key  This key functions as a  toggle  that is  the histogram window will be active until the   H   key is pressed again  As  you scroll through the variables in the statistics window  you will notice that the histogram is  being updated to correspond to the current  highlighted variable  The two numbers near the  bottom of the histogram window are the minimum and maximum values for the current  variable  The scale for the histogram adjusts automatically as variables and transforms are  selected     3 4 4 Transformation Menu    There are five transforms you may use  First you must highlight the variable to be  transformed and then press the   ENTER   key to bring up the transformation menu  The  menu contains five transform functions and an  undo  option  Each of these will be explained  separately in the following paragraphs     3 4 4 1 Linear    This transform allows you to change the location and scale of a variable  The program  will prompt you to enter two constants  a  and  b  to be used as follows  X     X   a    b where   b  cannot be equal to zero  Once you have entered the constants  the transform will be  applied to a copy of the data  The histogram and statistics windows will be updated according  to the results of the transform  A new window in the center of the screen displays
156. s t or a normal distribution is typically used to obtain the critical values used in  3    which can result in significantly different interval estimates      d   1 a 100  prediction interval for a future observation  Xo   Hr  ty 2S V  1  wsumZ 1   lt x  lt  x  t  a2 8 V  17 wsumz 1   1 a    14     A real data set from a QB study of the EPA is considered to demonstrate the differences  among these intervals in Chapter 11  The user can generate the graphs of these intervals by pressing    the  Q   q  key  which can be printed on a laserjet printer by pressing the  p  key  In summary  the    Scout User s Guide 14 30    Chapter 14 Statistical Procedures    procedure presented here  1  identifies multiple outliers effectively  2  uses appropriate test   statistics  3  computes the adjusted degrees of freedom  d f   associated with the test statistics by  assigning reduced weights to the outlying observations  and 4  provides more precise and accurate    estimates of the underlying population parameters and the associated intervals     14 10 D Trend and Add Means   These two options  D Trend and Add means are useful to perform geostatistical analysis   Some knowledge of geostatistical analysis such as kriging and variogram modelling is required   Users not interested in this may prefer to skip this Section  These options require knowledge of the  geographic location  e g   Easting  Northing coordinates  for each of the sample observations   Ordinary kriging  OK  is a well establis
157. scale estimates obtained using the remaining  74 inlying observations  Both graphs are very similar confirming the existence of the above  mentioned 8 outliers  This can be easily performed by creating an extra first column representing  the population IDs with the 74 inlying observations as coming from population 1  say  and the 8  outliers identified as coming from population 2  The extra column  variable  can be inserted using  the  Edit Data  option of Scout  The user then can use the  Ignore Population   2  option with   Plot Ignored Population   Yes  setting to produce graphs 5 and 6  The PROP estimates  and also  the Mds which are not included here  with or without the outliers are in close agreement with the  MLEs without the outliers  The minor differences between the robust and classical results without  the 8 outliers are due to the fact that border line observations 66 and 82 are assigned reduced    weights in the PROP procedure  The associated statistics are summarized as follows     Robust Statistics   All Observations  Covariance Matrix Mean vector  xl x2 x3 x4 Octn   xl  4435   0 82  7 27 0 24  3 95 62 650  x2  0 82 1 24 0 91  0 06  0 25 1 298  x3  1 21 0 91 12 89  0 35  0 63 56 820  x4 0 24  0 06  0 35 0 03 0 06 1 591  Octn   3 95  0 25  0 63 0 06 0 79 91 569    Scout User s Guide 14 26    Chapter 14 Statistical Procedures    Classical Statistics After Deletion of 8 outliers  Covariance Matrix Mean Vector  xl x2 x3 x4 Octn   xl  4424  0 78  7 37 0 17  4 02 62
158. se must be specified before Scout can print any graphs   See System Printer Specifications to select the make and model of the printer and other graphics  specifications  Scout can only print graphs that are displayed on the monitor  Press the  lt P gt  key  to print the graph that is on the screen  A line will move across the screen as Scout   Reads  the graph and sends it to the printer     7 4 2 Dimensional Graphs    The second heading in the graphics pull down menu   2 Dimensional   is the 2   dimensional graphics system  If any observations have been flagged as outliers  Scout will ask  the user if those outliers are to be used in statistical calculations  Scout will then place the  computer in graphics mode and display a color coded  correlation matrix of the data  Each point  in this matrix represents the correlation of two variables  The names of these two variables are  printed near the top of the screen along with some summary statistics on each of the two  variables  The correlation values are printed on the right side of the screen  The color coding  scheme works as follows  White indicates a correlation coefficient greater than 0 75  Green  indicates a correlation coefficient greater than 0 5 and less than 0 75  All other correlation  coefficients  less than 0 5  are red     The upper left point of this matrix will be highlighted with a purple box  The user can  move through the matrix with the arrow keys  and quickly get an idea of how any two variables  are rel
159. sing the Add Means  heading  This option uses two input files  a statistics file with extension sts    Example sts  and  a file with extension add   Example add   The sts file should follow the same format as the  statistics file generated by Scout  A separate add file  e g   pb add  is required for each variable    Scout User s Guide 5 12    Chapter 5 Robust Statistical Methods  considered  The add file has the following format    a b c   X  X  y  y  population Id1   X  X  y  y  population Id2   Repeat for each region of the site     Here a  Total number of sub populations  b   Total number of variables  c   Number of the variable in the sts file  X  X  y  y  are the coordinates of the boundary of a geographic region  a rectangle   belonging to one of the sub populations  Thus  the region bounded by  x   yj    X    Y1     Xi Y2   and  x   y   belongs to the population with the corresponding ID     Example  The example add file for lead  Pb  is  Pb add   There are two populations  a 2  and 4  variables in the data file with b 4  Lead in the second variable in the sts file  therefore c  2     2 4 2   0 200 0 3500  200 3000 0 1220  1100 3000 1220 1700  1850 3000 1700 3500  200 1850 2780 3500  200 1100 1220 2780  1100 1850 1700 2780    NO Nee FR ee    So using this input file  when the Add Means heading is activated  the mean of sub population  1 will be added to all observations with in the region bounded by  1100  1220    1100  1700     3000  1220   and  3000  1700   This will
160. ta     Scout User s Guide 4 4    Chapter 5 Robust Statistical Methods    5 1 Introduction to Robust Statistical Methods    Outliers are inevitable in most applied and scientific disciplines  In a manufacturing  process  outliers  anomalies  extremes  maverick observations  typically represent some  mechanical disorder of the system  unexpected experimental conditions and results  raw material  of an inferior quality  or misrecorded values  In biological dose response applications  outlying  observations may indicate an entirely different type of reaction  an unusual response  to a newly  developed drug  In this case   outliers  may be more informative than the rest of the data  In  environmental and ecological applications  outliers could be indicative of highly contaminated  areas  sections of a forest in poor or degraded states  inconsistent analytical results in a typical  quality assurance and quality control  QA QC  program  or gross typing errors     Experimentalists  especially environmental scientists  generate and analyze large amounts  of data  Most of these practitioners  therefore  are familiar with the situations when some of their  experimental results look suspicious or significantly different from the rest of the data  In data  sets of large dimensionality  it becomes tedious to identify these anomalies  Appropriate  multivariate procedures need be used to identify multivariate anomalies  Several univariate and  multivariate procedures are incorporated in the 
161. test statistic  Max  Mds   can be easily incorporated in a software package  A sequential outlier  detection procedure based on the test statistic  Max Mds  and multivariate kurtosis have been  included in the classical method menu in Scout  The robust module of Scout computes these critical  values and uses them on the Q Q and index plots of the generalized distances  Mds  to formally  define and identify outliers    Most outlier identification statistics  including the Max Mds   multivariate kurtosis  and the  minimum volume ellipsoid  MVE   are functions of the Mds  which depend upon the estimates of  population location and scale  The presence of outliers usually results in distorted and unreliable  maximum likelihood estimates  MLEs  and ordinary least squares  OLS  estimates of the population  parameters  The classical MLEs of mean and variance have a  zero  breakdown point  The  breakdown point of an estimator is the smallest possible fraction of observations that have to be  replaced to distort the estimator without any bounds  Hampel  1974     Zero  breakdown point of  an estimator means that the presence of even a single outlier can completely distort the statistic  under consideration  Thus  all other related statistics  including interval estimates  principal  components  PCs   and the estimates of regression parameters  get distorted by outliers  This means  that the test statistics and inference based on these classical estimates may be misleading  For  example 
162. the   ENTER   key when the cursor is on the   Graph title  option  When satisfied with all heading choices  use the down  lt ARROWS3 key to  move the cursor to the last selection   Begin computations with selected options   Use the    ENTER    key to generate the data pattern     Scout User s Guide 5 11    Chapter 5 Robust Statistical Methods    The first computation in this module will be the Eigenvalues and Eigenvectors  use the   lt ESC gt  key once to generate the Confusion  error  Matrix  Use the  lt ESC gt  key once more to  generate the scatterplots of Discriminant Scores  Various discriminant scores will be plotted  when the   PAGE UP   or   PAGE DOWN  key is used  Use the   E   key to generate the ellipse  corresponding to the various score clusters  If the Populations choice is used for the numbering  heading  graphs generated will use different colors for different populations     5 7 D Trend    The following two headings  D Trend and Add means are useful to perform geostatistical  analysis  Some knowledge of geostatistical analysis such as kriging and variogram modelling  is required  Users not interested in this may like to skip this Section  These headings require the  knowledge of the geographic location  e g   Easting  Northing coordinates  for each of the sample  observations  Ordinary kriging  OK  is a well established geostatistical technique frequently used  in site characterization studies  However  OK assumes that there are no spatial trend present  and 
163. the first few high variance  principal components  PCs  represent most of the variation in the data  the last few low variance  PCs provide useful information about the noise that might be present in the experimental results   Graphical displays of the first few PCs are routinely used as unsupervised pattern recognition and  classification techniques  The various contour ellipses can be drawn on the scatter plots of the PCs   The elliptical scatter of these PCs suggest normality of the data set  The normal probability Q Q  plots and the scatter plots of PCs are also used for the detection of multivariate outliers  However   since the MLE of the dispersion matrix gets distorted by outliers  the resulting classical PCs may  also be misleading  The robust PCs give more precise estimates of the variation and noise in the  data by assigning reduced weights to the outlying observations   Outliers and Principal Component Analysis   Let P   Pis Por      P   represent the matrix of eigenvectors corresponding to the  A    eigenvalues given by  A seer Ay   gt  of the sample dispersion  correlation  matrix  E      14 752     classical or robust   The eigenvector  p   corresponds to the largest eigenvalue  A    and the    vector  p  corresponds to the smallest eigenvalue  Ap  of E     The equation  y Px   represents    Scout User s Guide 14 21    Chapter 14 Statistical Procedures    the p principal components with y  p x representing the i  PC  The normal Q Q plots for the    PCs can be o
164. the scatter plot to a PCX file   lt H gt  Hides  i e   does not display  observations that were identified as outliers  toggle      N   Replaces the symbol for each observation with the observation number  toggle     P   Prints the scatter plot on a printer   Outputting a graph to a PCX file  Both 2 dimensional and 3 dimensional graphics screens  may be written to a file on disk  When the user has the desired graphics image displayed   pressing the   F   key will prompt the user for a file name  Type in a file name  including the  drive and directory  but without an extension as   PCX  will always be used  and press   ENTER    key  The graphics screen will be written to the file in PCX format which many other software  packages can read    Hiding Outliers in Scatterplots  If you wish to view a scatterplot in which the outlier    observations are not displayed  press the   H   key  Press the   H   key again and the outliers will  be displayed as before  CAUTION  Hiding outliers from a scatter plot does not change the    Scout User s Guide 7 2    Chapter 7 Graphics  statistical properties of the variables     Replacing Symbols with Observation Numbers  Sometimes it is useful to see where  individual observations  or groups of observations  are located on a scatter plot  Press the  lt N gt   key and the symbols for the observations of the scatterplot will be replaced by the observation  numbers  Press  lt N gt  key again to return to symbols    Printing a graph  The printer in u
165. the user to change the name  units   format  and any comments about the variables in the data set  Press the  lt ALT gt  and the  lt V gt   keys together  A small screen will appear  showing the name  units  format  and comment for  the first variable in the data set  Find the variable that you wish to edit by using the    ARROW  keys or by using the   PAGE DOWN    key  Pressing the   F1   key at this point  will reveal a screen that shows field edit commands that make editing easier  e g   delete to  end of line   Type in the changes you wish to make  Press  lt ESC gt  to exit     Editing the Title of the Data Set  To change the title  press the  lt ALT gt  and   T   keys  together  Type in the title of the data set  Press   ENTER    to exit     Scout User s Guide 3 5    Chapter 3 Managing Data in Scout    3 3 Summary Statistics    Scout will display summary statistics  such as mean  standard deviation  and variance   for each variable when  Statistics  is chosen from the pull down menu  The  Num  field  displays the number of valid observations that were used in the calculations for each of the  variables  The  Miss  field displays the number of missing observations for each of the  variables  The statistics can be printed by pressing   P   while the information is still on the  screen     3 4 Data Transformation    The transform module in Scout allows each of the variables in memory to be tested for  normality using the Kolmogorov Smirnov and Anderson Darling tests   If th
166. to accomplish this  Press the  lt Z gt  key to see a graph  of the X variable versus the Y variable  What Scout has really done is just rotated the graph so  that the Z axis is pointing straight out of the screen  Similarly  press the   Y   key to view the X  variable versus the Z variable  and the   X   key to view Z versus Y     7 11 Response Surfaces    The Scout has the ability to display three dimensional surface plots  The raw data must  be in a regular grid format  The data set must be defined over a complete set of evenly spaced  values in the X and Y variables  If a data set is not on a regular grid  then the user may wish to  modify the data set using other software so that a regular grid is achieved  The number of points  on the grid must be less than 1000  which is approximately a 30x30 grid     To generate a surface plot from a regular grid data set  select the X and Y axes so that  these define the grid  and select the Z axis as the response variable  Press   ENTER    to display  the three dimensional scatter plot  then press the   R   key to draw the response surface  The   R    key functions as a toggle between the scatter plot and the response surface     Scout User s Guide 7 6    Chapter 8 System information    8 1 User s Guide    This option enables the user to view the entire Scout Manual  A menu of major headings  is provided so that the user can quickly find information about any topic in Scout  The user can  access the User s Guide for the heading that 
167. to your original data  If for  some reason you wanted to abort this transform process and retain your original data  you  would answer the question with the   N   key  You should now be back in Scout s main  menu  If you have modified the variables in memory  you may wish to save them to a new  file on disk before you go on with your analysis     CAUTION  Once you exit the transform module  your transform history is not  retained  It is advised that you log all changes for future reference  If you start the  transform module again  it is a new session and all transform lists are blank     3 5 Print Data    This heading is used to print the data set currently in memory  Scout will ask the user  if the output is to be condensed  If the user answers no  then Scout will format the output  with up to six variables across each page  The printer should be set to 80 columns  If the user  answers yes to condensed printing  then Scout will format the output with up to ten variables  across each page  The printer should be set to 132 columns for this to work correctly     Scout User s Guide 3 10    Chapter 4 Classical Methods for Outlier Identification    4 1 Introduction to the Classical Methods for Outlier  Identification    This chapter discusses the various procedures available within the  Classical Method   menu  These procedures are used for outlier identification  Once a data file has been converted  into Scout format  Scout may be used to test for discordant observations in the 
168. ts are built in features  of most popular spreadsheet software     The following spreadsheet software has been tested for the ability to produce a useable  Scout file     Software Result File Format  QuattroPro 6 0 for Windows Works Text file  Excel 4 0a for Windows Works Any of 3 text file  formats  QuattroPro 1 0 for Windows Doesn t Work No text or space  delimited format  available    Scout User s Guide 2 3    Chapter 2 Scout File Format    If the file is saved as a Space Delimited print file  use the extension   prn  If the  spreadsheet software does not have built in Space Delimited format  then save the file with  the extension   prn along with the following options     1  NO MARGIN   2  PAGE LENGTH ONE   3  UNFORMATTED     After the file is saved from any spreadsheet  exit the spreadsheet Software and copy the file  into the Scout directory with extension   dat  This newly created file in the Scout directory  can be used as a Scout file     2 3 Load Scout File    Upon start up of Scout  the user is placed in the  File  heading of the main menu  The  first thing the user should do is select either  Load Scout Data File  or  Read ASCII Data File     from this pull down menu  Both headings display a menu of possible data files from the  current directory  and any subdirectories in the current directory  The user can change the  current directory by highlighting the desired subdirectory and pressing the   ENTER   key   All subdirectories are identified by placing the V sym
169. tterplots  and contour  ellipses for classical robust PCA can also be produced using the  Robust Method  menu as  discussed in Chapter 5     Using PCA  the user can look at the correlation covariance matrix directly on the screen   The PCA menu has five headings as displayed in Figure 6 1     Film Data Classical Nathod Rosuat Nataod PCA Gcagnica Syatan    4AAAAAAACAAAACAAAAGAAAAGAAAAG AAA AG AAA AG AA A AG A A A A A A A A  gH LEEK A   4444A44ACAAAAEAAAACAAAACAA4A   CAAAA46AA4A446A4A44   CAA4 4464444  5mimckt Vaciaaima lsannaas   SESXAARAGAEARAEAKRKAEXAAAENXAEAEAAKARRAXWEAAAXER A aX  amp du Uis2 ay MHatcicma PERPE  4AAAAAAACAAAACAAAACAAAAEAAAAEAAAAE AA AACAAAAEAAAACAA4AA4   Fimo valuma 4AAAAARA   4AAAAAAAEARAREAR ARCAAARAEAAAACAAA ACA AR ACA AA  amp CA AKA  amp Au  44  vian Congonaots PFTTPTPS  4AAAAAAACAAAAEAAAACAAAACAA A amp EAAAACAAA4A    A4 4 amp    AA4A44    4444  Teanatocm Data leaaaaaa   4AAAAAAACAAAAEAAAACARAKCAAAA amp EAAA AAA AACAA A  amp  amp  AA A46 4 R 4 R MM a aa 4444   SAARAAAAAEARAAAERAAAAEARARAAEAARAAEAAARAEARAAAERARAAAERAAAAEAKAARAERARAAAEAAAAERAAAAEAAAAEAAAAAR   4AAAAAAACAAAAEAAAACAAA AK AAKAAGAAAAGAAAAG AA AAGAAAAKGCAAAACAAAACAAAACAAAACAAAACAAAAAA   4AAAAAAAEAAAAEAAAACAAAACAAAAG AA AA AA AAGAAAAGAAAACAAAAGAAAACAAAAECAA AA AA AA AA AA AA   4AAAAAAAEAAAACAAAACAAAACAAAAE AA AAGAAAAGC AA AAGCAAAAGAAAAGCAAAAGCAAAACAAAACAAAACAAAAAA   4AAAAAAACAAAACAAAACAAAACAAAAGAAAAGAAAAG AA AA AA AAGAAAAGAAAACAAAACAAAAC AA AAEAAAAAA   SAARARAARACEAARAAEARAAEAAAAERAARAAEARARAAE 
170. umerical options are called for  highlight the appropriate field and type in the  correct value  When satified  move to the bottom of this window  select  Accept New Settings    and press   ENTER       Scout Toturial 11 5    Chapter 11 Tutorial Ill    Film Data Classical Nathod Rosust Nathod PCA Gcagnica Syatan    4AAAAAAACAAAACAAAAEAAAAG AA A AG A A A A X A A mmm LE LL LAER KAKO REALE   4A44AA44A   A4AA   AAA4   AAAA46A4A44   AA4 44444  Sulack Vaciaaima leceeeceeeecceteree   QARRRRRRERRRRERRRRERRRRERRRRERRRRARAA Uniwaciakm Statistica leccreeceeeencccene   4A4AAAAACAAAALAAAACAAA4CAA 4 ACA 44A44 l Rosust Analysis 4AAAAAAEAAAACAAAAA   44444A4ACAAAALAAAACAA A4  AAA4CAA44AA44   Contusion Mabcix 4AAAAAAEAAAAEAAAAA   4A444A444EA4AALAAA4   A4 A46 AAA 464444444  Pattaen Racogaitian Jecectecetecceeaaee   4A4AAAAACAAA amp    CAAAA4   CAAAACAAA4   4444444   0 Tcmad   PEPE E ESTES TTTT TELS  SL A qM Statistical Dubinna mm ii eK   14444444   TOMMubs Statistica Using Classical lectctecaas    aaa  aAa   Initial Exbimabm leaaannaaaa   Mabeceix 4AAAAAAAAR    mirata leccntcace     aaaaaaa    saaaaaaaa     a  4a444    xX Y Cuncdinabma Scalm Factor Lal 4AAAAAAAAK       AAAAAAA  K      AAAAAAAR      ananaana l Right Tail Cutott   leceeeecce   eaaanaxna l Tuning Canatant    eensaanasaa   saanaana    Comteol Chact Limits   asas An     aaaaaaa l  Teinning Paccant lecatecere   PPPE Igancm Population F lecctecece     aaaaaaa   Plot igancmd Population 444444444   4AAAAAAAR  4AAAAAAARA     aaa  aAa   Ac
171. uspends Scout and runs a secondary copy    of COMMAND COM  The user may then execute DOS commands or type EXIT to  return to Scout     Exiting Scout    The user can exit Scout and return to DOS by selecting   Yes   with this option     WARNING  Make sure that all of the desired graphs  data  and changes to files have been  saved before selecting this option  Unlike some software packages  Scout does not prompt the  user on whether the current file is to be saved  Scout will not automatically save data sets   graphs  or changes made to a file with this option  See the appropriate sections of this User s  Guide for instructions on saving graphics and data in Scout     Scout User s Guide 8 3    Chapter 9 Tutorial      Scout Basics    9 1 Nomenclature    Classical Nathnod Rosust Haxthod PCA Gcagnica Systan    4AACAAAACAAAACAAAACAAAACAA AA AA AA AR AAC AA AAEAA AA AR AA AA AAA   Raad ASCII Film  eaaaanAnenanAAC4A4AEAAAACAAAACAA AA CAR AKCAA AR CAR ARCA A A amp ERAA AA AA   Mcibm ASCII Film  ennnannneananAAEAnAAACAAAACAAAAAAALAAAAAAAEAAAACAAAKEAAA amp EAA A44   Load Scout Film  eaaaaAAnennAAkCAAAAEAAAACAAAACAAAA CAR AKCAA AR CAR ARCA A A  amp EAA AA AAA   Sawm Scout Film  eaaaan nean 444 444   EAAAACAA AACAAAACAAAAEAA AR LAA A amp EAA A amp EAA AA AAA   Nacgm Tho Films  ennnnnnaeaanAeA An AeAAAAEAAAARCAAAAEAAAAEAA AR CARA AEAA ARKEAA AK AAA   Angad Tho Filma licccueccedcateeckiacnacceecaecak ACAAAXREAA AAERACARAWAAEAXAAAAAR                SAARAAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAA
172. variate  kurtosis  Mardia  1970  1974  and Schwager and Margolin  1982   and the generalized distance   Wilks  1963  and Barnett and Lewis  1994    both of which have desirable properties as outlier  tests  The maximum generalized distance is a multivariate extension of a univariate test known  as Grubb s test  Grubbs 1950   This test is meant to identify a single outlier  It suffers from  masking in the presence of multiple outliers  Sequential application of this test is incorporated  in Scout     Mardia s multivariate kurtosis is an extension of the univariate kurtosis  This test is more  powerful than the generalized distance when multiple outliers are present  Schwager and  Margolin  1982    Mardia s multivariate kurtosis can also be used to test for deviations from  multivariate normality  However  this statistic is also not resistant to outliers  and as such  may  suffer from masking by multiple outliers    The critical values used for the test statistic are the simulated values as given in Stapanian et al    1991      This module of Scout is based on sequential application of these tests  This means that  outliers are detected sequentially  they are identified in the initial data set  removed from the data   the statistics recomputed  and the identification  removal  and recomputing repeated until no more  outliers are found  Both tests assume the data are independent observations from a single  multivariate normal distribution  If a large proportion of the data are i
173. will return  It may appear as if nothing has changed  however  in  the lower right corner of the screen is the name of the file selected  and in the lower left  corner is the path taken to get to this file  Scout has read the file and is now ready to analyze    Scout Tutorial 9 2    Chapter 9 Tutorial      it  If you experiment with other files in other directories  remember  the ASCII files  accompanying Scout end with the   DAT  extension  and their format matches that defined in  Chapter 2  Your own files may have any three character extension     9 3 Examine and Save Statistics    Assuming the file IRIS DAT has been read  use the arrow keys to move to the  Data   heading  If you re in a level 2 menu or deeper  you may have to use the  lt ESC gt  key to get  back to the level 1 menu before the left and right arrow keys will function  Pressing the    ENTER    key will give you the level 2 menu for the  Data  heading  Move the highlighted  cell  cursor  to the  Statistics  choice and press the  lt ENTER gt  key  Your screen should now  match Figure 9 2     Film Data Classical Nataod Rosust Natihnod PCA Gcagnica Sysatan     I ARIM  AA         AA      Q  QS amp AAAACAAAACAAAACAAAAE AAA AE AA AA AK AA AA A AC AA AA AA AA AA AA AA     aaaaaaa   Edit Data  ennaaA Ae AA AAEAA AREA AA ACA A AA amp ECAR ARCAR ARCAR AREAR AKEAR A amp EAA AAA     aaaaaaa   Statistica  enn nA Ae AAAAEAA AREA AA ACA A ARCAA ARCAA ARCAA AREAR AKEAR A amp EAA AAA   taiii Teanatocn 4AAAAARACAAAAEAAAAEAARAA
174. with each of these options  An explanation window  associated with each of the options provides a brief description of that heading or choice     This  Robust Method  module is independent of  cannot communicate with   Classical  Method    PCA   and  Graphics  headings in Scout  It can communicate with  File    Data   and   System  headings  For example  the Robust principal components cannot be displayed using  a 3 D graph  without first saving them in a data file and then reading in the saved data file to plot  the 3 D graph of the saved principal components     Scout User s Guide 5 2    Chapter 5 Robust Statistical Methods    Film Data Classical Nathod Rosust Nataod PCA Gcagnica Syatan    4AAAAARACAAAAEAAA AG AA AA AA A AE A AA A LA a           MSAAAASAAAAEAAAARAAR     A4A4AAAACAAAAEAA AACEA4AAEAA A4 EAA4A4A44  Smimct Vaciaaima 4AAAAAKACAAAAEAAAAAK   4AAAAAAACAAAACAAAAEAAAACAAAACAAAAA4A44   M iwaciabm Statistica 4AAAAAACAAAACAAAAA   C4AAA amp AAEAAAAEAR AACAARREAARACAAAA4A4   Rususbt Analysis POPP PPeTrerrir rrr re  44444A4ACAAAAEAA AAtEAA A amp EAA AA 64444444  Contusion Mabcix 4AAAAAAEAAAAEAAAAAR   4A4AAA4A   AAAA4EAAAA6A4 A46 AA A464 444444   Pattacn Racogoitian  ennannaennaneen44A4   444AAAAACAAAA amp 6AAAACAAA amp 4   CAA 4464444444  D Tcmad   PEP EPETESEST TELE TS  LL Rauuat Analusi3                      dMARAAAAAA     x xad xa   Display Geagna Fac J J Plat Lindiw  Raw Ostaj Perr rr errs  sanaaana   Stakiskica Dgtiana Classical    2ssa5224   eanaaaana l Taco Lowmc Li
175. x  and y axes and a scatter plot containing  only the observations of interest will appear  Press the  lt Z gt  key and Scout will return to the  original scatter plot  with the white rectangle still surrounding the observations of interest   Pressing   ENTER   key from the  zoomed  scatter plot will cause Scout to return to the  color coded correlation matrix     CAUTION  You can not use the zoom feature on a scatterplot generated by the zoom  feature  If you wish to inspect an area of a  zoomed  scatter lot in detail  you must first  redefine the white rectangle  To redefine the dimensions and location of the rectangle  return    to the original scatter plot and press the           and   KARROWS  keys until the rectangle is  at the desired size and location     If you wish to exit the zoom mode and thus eliminate the white rectangle from the original  scatter plot  press  lt ESC gt   If you press the  lt Z gt  key again  the Scout will restore the rectangle  as it was just prior to exiting the zoom mode     To return to the color coded correlation matrix from the original scatter plot  exit the zoom  mode and press  lt ESC gt      7 6 3 Dimensional Graphs    The last heading in the Graphics menu   3 Dimensional   is the 3 dimensional graphics  system  The user first selects a variable for each of the three axes  All of the variables will be  displayed on the screen with the first variable highlighted  The user may use the ARROW    keys      HOME   key  and   END    key to h
    
Download Pdf Manuals
 
 
    
Related Search
    
Related Contents
441–446. PDF. - Sociedad mexicana de Entomología  HFC-23 - 日本フルオロカーボン協会  Département de Géographie et Aménagement 2001 / 2002  Telefono ZTE F116    Artículo de Mi Vivienda, Nº 57  Speed Wheel RV  Samsung Galaxy Tab (Wi-Fi) Vartotojo vadovas  富山県グリーン購入調達方針  Produktinfo - hifisound.de    Copyright © All rights reserved. 
   Failed to retrieve file