Home
        introduction to the R Project for Statistical Computing
         Contents
1.             10 6 Y 11 XX 16 e 210  2 7 12 7A 268  3   8K 13  18 4 23 4  4 X 945 UN 19 e 24 A  BQ AOD 15 E 20 e 25 y          Figure 7  Plotting symbols    Note the use of the as  numeric method to coerce the Species  which is a  factor  to the corresponding level number  here  1  2 and 3   and then the  use of this number to index a list of printing characters  pch argument   and colours  col argument      The plot generic method is an example of a high level plotting method  which begins a new graph  Once the co  rdinate system is set up by plot   several mid level plotting methods are available to add elements to the  graph  such as lines  points  and text  Table 1 lists the principal methods     see the help for each one for more details              abline Add a Straight Line   arrows Add Arrows   axis Add an Axis   box Draw a framing Box   grid Add a Grid   legend Add Legends   lines Add Connected Line Segments  mtext Write Text into the Margins  points Add Points   polygon Draw Polygons   rect Draw Rectangles   rug Add a Rug   segments Add Line Segments  symbols Draw Symbols   text Add Text   title Plot Annotation          Table 1  Methods for adding to an existing base graphics plot    For example  to add horizontal and vertical lines at the mean and median    71    V MN MN    V   V  V V    V    V    V MN    centroids  use the abline method     abline v mean Petal Length   lty 2  col  red    abline h mean Petal Width   lty 2  col  red    abline v median Petal Length   
2.       rr   a 7 o     o o  000 ooo o  oo o    0000 O o  x o0 o o  coo o    o 0000  o o  oo o o  Ss 2    O00 COO  z o o   o    O 0000000  ns 000 o  2   o co  2 000     o o  o 7 o  o 0000  000 O     000000 O  o     T T T T T T T  1 2 3 4 5 6 7    Petal Length    Figure 6  Default scatterplot    This default plot is shown in Figure 6  It is informative but not so attrac   tive  We can customize the plotting symbol  pch argument   its colour and  size  col and cex arguments   the axis labels  xlab and ylab arguments    and graph title  main argument      Plotting symbols can be specified either by a single character  e g       for  an asterisk  or an integer code for one of a set of graphics symbols  Figure  7 shows the symbols and their codes  Note that symbols 21 26 also have  a fill  background  colour  specified by the bg argument  the main colour  specified by the col argument specifies the border     See 85 5 for details on how to specify colours     We now produce a customized plot  showing the species      gt  plot Petal Length  Petal Width  pch  21 23  as numeric Species    cex 1 2                   xlab  Petal length  cm    ylab  Petal width  cm     main  Anderson Iris data    col c  slateblue    firebrick    darkolivegreen    as numeric Species            It   s clear that the species are of different sizes  Setosa smallest  Versicolor  in the middle  Virginica largest  but that the ratio of petal length to width  is about the same in all three     70                   
3.     A    35 14 11  1   Mapped B 4 11 3 0  Class C  12  9  38  4   D 2  5 12  2                This can be entered as a list in row major order      gt  cm     c 35 14 11 1 4 11 3 0 12 9 38 4 2 5 12 2    gt  cm       27 This matrix is also used as an example in 86 1    25     1  35 1411 1 411 3 012 938 4 2 5122   gt  dim cm   NULL    Initially  the list has no dimensions  these may be added with the dim  function      gt  dim cm      c 4  4    gt  cm     1    21   31   4    1   35 4 12 2   2   14 11 9 5   3 1  11 3 38 12   4   1 0 4 2   gt  dim cm    1  4 4   gt  attributes cm    dim   1  4 4   gt  attr cm   dim     1  4 4    The attributes function shows any object s attributes  in this case the  object only has one  its dimension  this can also be read with the attr or  dim function     Note that the list was converted to a matrix in column major order  follow   ing the usual mathematical convention that a matrix is made up of column  vectors  The t  transpose  function must be used to specify row major or   der      gt  cm     t cm    gt  cm     1    21   31   4    1   35 14 11 1   2 1 4 11 3 0   3   12 9 38 4   4   2 5 12 2    A new matrix can also be created with the matrix function  which in its  simplest form fills a matrix of the specified dimensions  rows  columns   with the value of its first argument      gt   m  lt   matrix 0  5  3        11   2    3    1   0 0 0   2   0 0 0   3   0 0 0   4   0 0 0   5   0 0 0    This value may also be a vector      gt   m  lt   ma
4.     Factors are included in the design matrix as contrasts which divide the  observations according to the classifying factors  This is quite a techni   cal subject  treated thoroughly in standard linear modelling texts such as  those by Venables  amp  Ripley  57   Fox  18   Christensen  4  and Draper  amp   Smith  14   The practical importance of contrasts is mainly the interpreta   tion of the results that is possible with a given contrast  and secondly in  the computational stability     One of R s environment options is the default contrast type for unordered  and ordered factors  these can be viewed and changed with the options  function  Contrasts for specific factors can be viewed and set with the  contrasts function  using the contr helmert  contr poly  contr sum   and contr treatment functions to build contrast matrices      gt  options  contrasts       contrasts  unordered ordered   contr treatment   contr poly     Polynomial contrasts assume equal feature space distance between levels  of the ordered predictor  this may not be justified and so you may want to  change the contrast type     For example  the meuse soil pollution dataset includes a factor for flooding  frequency  this is an unordered factor but the three levels are naturally  ordered from least to most flooding  So we might want to change the data    type        data meuse    gt  str meuse ffreq    Factor w  3 levels  1   2   3   1111111     contrasts meuse ffreq    23    WN H    00  10  01   gt  ImClo
5.     ce         R http    search r project org cgi bin namazu cgi query logistic and ROC8   Qr Go le             QA my delicio us  postto del icio us ITCv AS   Wikipedia Scholarpedia Rv    Xv Sciencev Money   XinHuav DGRw          R Site Search    Note  more than two search terms may fail   Query  logistic and ROC  Search    How to search       Display  20    Description  normal    Sort    by date  latest on top         Target    L Functions       Documents   V  R help 2002        Rhelp 1997 2001    Results     References    logistic  2939     ROC  379    Total 50 documents matching your query   1  R       pseudo R square and  istic in R logistic regression from Chuck Cleland on 2008 03 27  stdin   score  1   Author  Chuck Cleland  ccleland   Date  Tue  01 Apr 2008 08 47 48  0500   R  pseudo R square and or C statistic in R logistic regression This message    Message body     More options   Related messages     Tribo Laboy    R  Rule for accessing attributes   Next message      http   finzi psych upenn edu R RhelpO2a archive 125833 html  8 951 bytes     2   R  Systematically biased count data regression model from Steven McKinney on 2007 08 09  stdin   score  1   Author  Steven McKinney  smckinney     gt   lt meta name  Subject  content   R  Systematically biased count data regression model     gt   lt meta name   Date  content  2007 08 09     lt style type  text css  Date  Fri  31 Aug 2007 20 14 08  0500   R  Systematically biased count data regression model This message    M
6.    Height   Girth  data trees    gt   X  lt   model matrix model     Intercept  Height Girth    1 1 70 8 3  2 1 65 8 6  30 1 80 18 0  31 1 87 20 6    This matrix contains the values of the predictor variables for each obser   vation  This provides a good check on your understanding of the model  structure  The matrix can be used to directly compute the least squares  linear solution     B   X X  X Y    using the t  matrix transpose  and solve  matrix inversion  function  and  the      matrix multiplication  operator  For example  to directly com   pute the regression coefficients for the model of tree volume predicted by  height and girth in the trees dataset      gt  Y  lt   trees Volume   gt    beta  lt   solve  t X  96x96 X       t X  96x96 Y        1    Intercept   57 98766  Height 0 33925  Girth 4 70816   gt    check this is the same result as from Im       gt  Im trees Volume   trees Height   trees Girth   Coefficients    Intercept  trees Height trees Girth   57 988 0 339 4 708    The direct computation may be numerically unstable and is certainly slow   Tm uses more sophisticated numerical functions     4 17 1 Models with categorical predictors    The 1m and glm functions are also used for models with categorical pre   dictors and for mixed models  as well as for models using only continuous  predictors  The categorical variables must be ordered or unordered S fac   tors  this can be checked with the is factor function or examined directly  with the str function     55
7.    The most useful plots here are the levelplot and contourplot meth   ods for 2D plotting of one response variable on two continuous depen   dent variables  for example  elevation vs  two co  rdinates   the wi reframe  method for a 3D version of this  and the cloud method for a 3D scatter   plot of three variables  All can be conditioned on a factor  Figure 13 shows  some examples  produced by the following code     pli  lt   cloud Sepal Length   Petal Length   Petal Width     groups Species   data iris  pch 20  main  Anderson Iris data  all species    screen list z 30  x  60      data volcano   pl2  lt   wireframe volcano     shade   TRUE  aspect   c 61 87  0 4     light source   c 10  0  10   zoom 1 1  box F   scales list draw F   xlab     ylab     zlab      main  Wireframe plot  Maunga Whau Volcano  Auckland      pl3     levelplot volcano     col regions gray 0 16 16     main  Levelplot  Maunga Whau Volcano  Auckland      pl4  lt   contourplot volcano  at seq floor min volcano  10  10     79       ceiling max volcano  10  10  by 10    main  Contourplot  Maunga Whau Volcano  Auckland    sub  contour interval 10 m    region T   col regions terrain colors 100     print pl1  split c 1 1 2 2   more T    print pl2  split c 2 1 2 2   more T    print pl3  split c 1 2 2 2   more T    print pl4  split c 2 2 2 2   more F    rm pll  pl2  pl3  pl4     V VV VV  t  t  t t           column       Anderson Iris data  all species Wireframe plot  Maunga Whau Volcano  Auckland             Sep
8.    lt     x   y   gt    alternate command format  coordinates meuse   lt   c  x    y     gt  class meuse    1   SpatialPointsDataFrame   attr    package     1   sp    gt  str meuse   Formal class  SpatialPointsDataFrame   package  sp   with 5 slots    Q data  Formal class  AttributeList   package  sp   with 1 slots    Q att List of 12      cadmium  num  1 155  11 7 8 6 6 5 2 6 2 83 3 2 2 8 2 4 1 6        e    ee    dist m   num  1 155  50 30 150 270 380 470 240 120 240 420        Q coords nrs   int  1 2  12    Q coords   num  1 155  1 2  181072 181025 181165 181298 181307          attr     dimnames   List of 2        NULL  e    ee     chr  1 2   x   y       bbox   num  1 2  1 2  178605 329714 181390 333611      attr      dimnames   List of 2        chr  1 2   x   y   e         chr  1 2   min   max     Q proj4string Formal class  CRS   package  sp   with 1 slots    Q projargs  chr NA    The object meuse has been promoted to an S4 class SpatialPointsDataFrame   which is shown as being a formal class defined by sp     The class hierarchy may be examined with the getClass function      gt  getClass  SpatialPointsDataFrame      Slots    Name  data coords nrs coords bbox proj4string  Class  data frame numeric matrix matrix CRS  Extends     46    Class  SpatialPoints   directly  Class  Spatial   by class  SpatialPoints   distance 2    Known Subclasses   Class  SpatialPixelsDataFrame   directly  with explicit coerce    This shows that this class inherits from class SpatialPoints and 
9.    mvtnorm Multivariate Normal and T Distribution   nlme Linear and nonlinear mixed effects models   nnet Feed forward neural networks   rgl 3D visualization device system  OpenGL    rpart Recursive partitioning   SparseM Sparse Linear Algebra   spatial Functions for Kriging and point pattern analysis   splines Regression Spline Functions and Classes   stats The R Stats Package  includes classical tests  exploratory  data anlysis  smoothing and local methods for regres   sion  multivariate analysis  non linear least squares  time  series analysis    stepfun Step Functions  including Empirical Distributions   survival Survival analysis  including penalised likelihood    utils R Utilities          Table 4  Packages in the base R distribution for Windows  libraries loaded  when R starts are shown in boldface     120    It also installs six manuals in both PDF and HTML format   1  An Introduc   tion to R   2  R Installation and Administration   3  R Language Definition    4  Reference Index   5  R Data Import Export   6  Writing R Extensions     Other operating systems For Mac OS X or other Unix based system  fol   low the appropriate links from the CRAN home page and read the instal   lation instructions  The GUI for the Mac OS X version has a code editor   data editor  data browser  and native Quartz graphics     Cross platform RStudio is explained in 83 3     The JGR project at the University of Augsburg  D       has developed a Java   based GUI which runs on any platform with
10.   220 204 191 16   3   920 629 1651 172   4   238 201 517 54   gt  cmr c 1 2 3 4      1    1   100   2 1  35   3   160   4   56    As the last example shows      also multiplies matrices and vectors     Matrix A matrix can be inverted with the solve function  usually with little ac   inversion curacy loss  in the following example the round function is used to show  that we recover an identity matrix      gt  solve cm      1    2    3    4    1   0 034811530  0 03680710  0 004545455  0 008314856   2    0 007095344 0 09667406  0 018181818 0 039911308   3    0 020399113 0 02793792 0 072727273  0 135254989   4   0 105321508  0 37250554  0 386363636 1 220066519   gt  solve cm    cm     1    2    3    4    1   1 000000e 00  4 683753e 17  7 632783e 17  1 387779e 17   2    1 110223e 16 1 000000e 00  2 220446e 16  1 387779e 17   3   1 665335e 16 1 110223e 16 1 000000e 00 5 551115e 17   4    8 881784e 16  1 332268e 15  1 776357e 15 1 000000e 00   gt  round solve cm    cm  10       1    2    3    4      1   1 0 0 0    2   0 1 0 0    3   0 0 1 0    4   0 0 0 1  Solving The same solve function applied to a matrix A and column vector b solves  linear the linear equation b   Ax for x   equations     gt  b  lt   c 1  2  3  4     gt   x     solve cm  b      1   0 08569845 0 29135255  0 28736142 3 08148559   gt  cm 926596 x      1    1   1   2   2    28     4   4  Applying The apply function applies a function to the margins of a matrix  i e  the  functions rows  1  or columns  2   For exampl
11.   It runs on many operating systems  Unix  and derivatives includ   ing Darwin  Mac OS X  Linux  FreeBSD  and Solaris  most flavours of  Microsoft Windows  Apple Macintosh OS  and even some mainframe  OS     4  It is the product of international collaboration between top compu   tational statisticians and computer language designers     5  It allows statistical analysis and visualisation of unlimited sophisti   cation  you are not restricted to a small set of procedures or options   and because of the contributed packages  you are not limited to one  method of accomplishing a given computation or graphical presen   tation     6  It can work on objects of unlimited size and complexity with a con   sistent  logical expression language     7  Itis supported by comprehensive technical documentation and user   contributed tutorials  89   There are also several good textbooks on  statistical methods that use R  or S  for illustration     8  Every computational step is recorded  and this history can be saved  for later use or documentation     9  It stimulates critical thinking about problem solving rather than a     push the button  mentality     10  It is fully programmable  with its own sophisticated computer lan   guage  84   Repetitive procedures can easily be automated by user        3 http    www gnu org copyleft gpl html    11     12     13     2 2 Disadvantages    written scripts  83 5   It is easy to write your own functions  SB    and not too difficult to write whole pack
12.   e g       begin Sinput    gt  summary Im Volume   Height   Girth  data   trees   adj r squared   end Sinput     The code was executed by R  and the code that would produce output on  the console was added to the document  again in a special BIFX environ   ment  Soutput      begin Soutput    1  0 97285   end Soutput     There is no cut and paste here  The code was executed by R and produced  the output     What you asked for  you got     The BIX environments are  defined in the Sweave BIFX package  this was specified in the preamble     8 3 The PDF document    The BIEX document produced by Sweave is then processed as any KTpx  document  usually to to produce a PDF  Figure 18 shows the output from  this example        Sweaving for reproducible data analysis    A Nonymous    14 November 2010    Here is a simple example of R and Sweave     R gt  data trees   R gt  str trees      data frame   31 obs  of 3 variables      Girth   num 8 3 8 6 8 8 10 5 10 7 10 8 11 11 11 1 11 2         Height  num 70 65 63 72 81 83 66 75 80 75         Volume  num 10 3 10 3 10 2 16 4 18 8 19 7 15 6 18 2 22 6 19 9        R gt  summary 1m Volume   Height   Girth  data   trees   adj r squared     1  0 97285             Figure 18  Example PDF produced by Sweave and IXI  X    Notice how the R input is formatted differently from the R output     103    8 4 Graphics in Sweave    Since R produces graphs  Sweave can also produce them and incorporate  them into documents  See Rossiter  47   Leisch  24  25  and
13.   edition  URL http   cran r project org doc manuals R data pdf 91    R Development Core Team  2012  R Language Definition  Vienna   The R Foundation for Statistical Computing  2 15 0  2012 03 30  draft  edition   URL http   cran r project org doc manuals R lang pdf 19   67    Ribeiro  Jr   P  J   amp  Diggle  P  J  2001  geoR  A package for geostatistical  analysis  R News 1 2  14 18  URL http   CRAN R project org doc Rnews  2  106    Ripley  B  D  1981  Spatial statistics  New York  John Wiley and Sons 2    Ritz  C   amp  Streibig  J  C  2008  Nonlinear regression with R  Use R  New  York  Springer  URL http   CRAN R project org package nlrwr 106    Rossiter  D  G  2004  Technical Note  Optimal partitioning of soil  transects with R  Enschede  NL    unpublished  online    URL http   www itc nl personal rossiter teach R R    OptPart pdf 107    Rossiter  D  G  2004  Technical Note  Statistical methods for accuracy  assesment of classified thematic maps  Enschede  NL   International  Institute for Geo information Science  amp  Earth Observation  ITC    URL http   www  itc nl personal rossiter teach R R_ac  pdf  25  92  107    Rossiter  D  G  2005  Technical Note  Fitting rational functions to time  series in R  Enschede  NL   International Institute for Geo information  Science  amp  Earth Observation  ITC     130     45      46      47      48      49      50      51      52      53     URL  http   www itc nl personal rossiter teach R R rat   pdf 107    Rossiter  D  G  2007  
14.   gt  summary trees Volume   Min  1st Qu  Median Mean 3rd Qu  Max  NA s  10 2 19 8 24 5 30 8 37 8 77 0 1 0    As the example shows  some functions  like summary  can deal with NA s   but others can t  For example  if we try to compute the Pearson s cor   relation between tree volume and girth  using the cor function  with the  missing value included      gt  cor trees Volume  trees Girth   Error in cor trees Volume  trees Girth     missing observations in cov cor    This message is explained in the help for cov  where several options are  given for dealing with missing values within the function  the most com   mon of which is to include a case in the computation only if no variables  are missing      gt  cor trees Volume  trees Girth  use  complete obs     1  0 967397    To find out which cases have any missing value  S provides the complete cases  function      gt  complete cases trees     1  FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE   14  TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE   27  TRUE TRUE TRUE TRUE TRUE    This shows that the first case  record  is not complete     The na omit function removes cases with a missing value in any variable     63     gt  trees complete     na omit trees     gt  str trees complete     data frame   30 obs  of 3 variables      Girth   num 8 6 8 8 10 5 10 7 10 8 1111        Height  num 65 63 72 81 83 66 75 80 75 79        Volume  num 10 3 10 2 16 4 18 8 19 7 15 6    The first observation  with 
15.   method for the whole  workspace        object size meuse      1  23344   gt  gcO   used  Mb  gc trigger  Mb  max used  Mb   Ncells 486087 13 0 818163 21 9 818163 21 9  Vcells 326301 2 5 905753 7 0 905753 7 0    If you work with large objects  such as images  you may well exceed R s  default memory limits      gt  tmp     matrix 1  10000  100000   Error in matrix matrix 1  10000  1e 05    cannot allocate vector of length 1000000000    There are other  similar  messages you may see  This topic is discussed in  the help page for memory limits      gt  help  Memory limits      MS Windows limits the total memory that can be allocated to R  this can  be reviewed with the memory size method  on Windows only   with the  optional max TRUE argument it shows the maximum vector than can be  allocated  the memory limit method  on Windows only  shows the total  memory that can be allocated for all objects      gt    running on MS Windows   gt  memory size max T     1  917176320    gt  memory limit     1  1038856192    115    This is about 874 Mb for one object and almost 1 Gb total     Memory can be increased up to about 2 Gb under Windows and indefinitely  under better designed operating systems  see the help      gt   Memory    The easiest way to increase memory in Windows is by adding a command   line option to the Target field of a desktop shortcut or Start menu item   For example      C  Program File R R 2 5 1 bin Rgui exe    LANGUAGE de   max mem size 2Gb   Just for fun we specify 
16.   plot  4  52  59  65  69  71  72  75  84  87    plot default  69  plot 1m  59  plotmath  73   pls package  62  pnorm  42  points  65  71  72  polygon  71  prcomp  62  predict  61  predict  Im  61  print  package  lattice   86  print  67  81    q  8   qnorm  42   qq  package  lattice   83  qqmath  package  lattice   83  qr  30   quanti le  49    R version  116  rainbow  89  randomForest package  2  rank  23  24  89  rbind  33   rbinom  11   Rcmdr package  16  read asciigrid  package  sp   97  read csv  93 95  110  read  table  34  94  recode  36   rect  71   rep  32  91   repeat  64  require  16  17  45  reshape  41  residuals  57  return  66   rfs  package lattice   83  rgb  88   rgdal package  2   rim  package mass   62  rm  21  31   rnorm  23  42  round  19  28  42  99   row  27   rownames  51  92   rpart package  2  RSiteSearch  109  114    135    rstandard  59  rstudent  59  rug  71  runi f  64    sample  38   sapply  29   scale  29   scan  91  92  97   scatterplot  package car   77  scatterplot matrix  package car   77  screeplot  62   segments  71   seq  21  91   sessionInfo  117   set seed  43   setdiff  51   setequal  51   setwd  110   show settings  package lattice   83  showMethods  48    sin  19  slotNames  47  solve  28  55    sort  23  24  37   source  123  126   sp package  2  45 47  50  88  94  97   Spatial  package sp  class  47  48   spatial package  2   SpatialLines  package sp  class  48   SpatialPixelsDataFrame  package sp   class  47   SpatialPoints  pac
17.  0 982  S formula  Im SW   SA spp   1           0 50 100 150    Stem area  cm     Figure 9  Scatterplot with math symbols  legend and model lines    74    5 1 2 Returning results from graphics methods    Many high level graphics methods return results  which may be assigned  to an S object or used directly in an expression  For example  the hist  method returns the break and mid points  counts and densities      gt  data trees    gt  h  lt   hist trees Volume      gt  str h   List of 7    breaks   num  1 8  10 20 30 40 50 60 70 80    counts   int  1 7  10951501    intensities  num  1 7  0 03226 0 02903 0 01613 0 00323 0 01613        density   num  1 7  0 03226 0 02903 0 01613 0 00323 0 01613        mids   num  1 7  15 25 35 45 55 65 75    xname   chr  trees Volume     equidist   logi TRUE    attr     class    chr  histogram      gt   hist trees Volume   mids   1  15 25 35 45 55 65 75    5 1 3 Types of base graphics plots    Table 2 lists the principal plot types  see the help for each one for more  details        assocplot Association Plots  barplot Bar Plots   boxplot Box Plots   contour Contour Plots   coplot Conditioning Plots  dotchart Cleveland Dot Plots  filled contour Level  Contour  Plots  fourfoldplot Fourfold Plots   hist Histograms   image Display a Colour Image  matplot Plot Columns of Matrices  mosaicplot Mosaic Plots   pairs Scatterplot Matrices  persp Perspective  3D  Plots  plot Generic X Y Plotting  stars Star  Spider Radar  Plots  stem Stem and Leaf Plots  s
18.  2 2   more T     print h3  split c 1 2 2 2   more T     print h4  split c 2 2 2 2   more F     rmChl  h2  h3  h4     If you are printing to a file  more F will end the page  The next call to the  print method will start a new page  This is useful to produce a multi page  PDF file     5 5 Colours    Both base    5 1  and Trellis    5 2  graphics have many places where colours  can be specified  often with the col  foreground colour  or bg  background    86    colour  optional arguments to graphics methods such as plot     Colours may be specified either by name  by code  colour specification    or by their numeric position in the active colour palette  There are a large  number of named colours  but only eight of these in the default palette     To get a list of possible colour names use the colours method  to see the  numeric colours in the active palette use the palette method      gt  colours     1   white   aliceblue   antiquewhi te    4   antiquewhitel   antiquewhite2    antiquewhite3      655   yellow3   yellow4   yellowgreen     gt  colours    655     1   yellow3     gt  palette      1   black   red   green3   blue   cyan   magenta    7   yellow   gray     gt  paletteQ  4     1   blue     The named colours can be visualised as a bar graph  Figure 15       gt  tmp  lt   seq 1 length colors       gt  plot tmp  rep 1  length tmp    type  h   lwd 2   col colors    tmp   ylim c 0 1   xlab  Colour number      TAPE iil    0 100 400  Then an individual colour number can be ide
19.  2 25 0 25 0 25 2 25 6 25 12 25   gt  sum  sample   mean sample   2     1  42    gt  sum  sample   mean sample   2   length sample  1    1  6      var sample     1  6    In the expression sample   mean sample   the mean mean sample   a  scalar  is being subtracted from sample  a vector   The scalar is a one   element vector  it is shorter than the eight element sample vector  so it  is re cycled  the same mean value is subtracted from each element of the  sample vector in turn  the result is a vector of the same length as the  sample  Then this entire vector is squared with the   operator  this also is  applied element wise     The sum and length functions are examples of functions that summarise  a vector and reduce it to a scalar     Other functions transform one vector into another  Useful examples are    sort  which sorts the vector  and rank  which returns a vector with the  rank  order  of each element of the original vector     23     gt  data trees    gt  trees Volume   1  10 3 10 3 10 2 16 4 18 8 19 7 15 6 18 2 22 6 19 9   11  24 2 21 0 21 4 21 3 19 1 22 2 33 8 27 4 25 7 24 9 34 5   22  31 7 36 3 38 3 42 6 55 4 55 7 58 3 51 5 51 0 77 0   gt  sort trees Volume    1  10 2 10 3 10 3 15 6 16 4 18 2 18 8 19 1 19 7 19 9   11  21 0 21 3 21 4 22 2 22 6 24 2 24 9 25 7 27 4 31 7 33 8   22  34 5 36 3 38 3 42 6 51 0 51 5 55 4 55 7 58 3 77 0     rank trees Volume    1  2 5 2 5 1 0 5 0 7 0 9 0 4 0 6 0 15 0 10 0   11  16 0 11 0 13 0 12 0 8 0 14 0 21 0 19 0 18 0 17 0 22 0   22  20 0
20.  211  209    192  198  199    205  206  211  212  217  221  221  211    195  204  209    212  217  218  222  224  228  222  211    201  206  215  221  222  224  228  230    232  222  211    206  212  218  227  232  234  235  240  241  234  222  211    211  218  224  230  238  244  245  247  245  234  222  211    215  222  230  235  241  250  256  256    234  223  214    96    This format is documented by ESRI  there are six header lines with self   explanatory names  followed by one row per image row     This specific format can be read directly into a spatial object with the  read asciigrid function of the sp package  however here we show how  to read it using basic R commands     The scan function  which we used above  86 1  to read from the keyboard   can also be used to read from text files  if the first argument is the optional  file  Lines can be skipped with the optional skip argument  and the num   ber of lines to read can be specified with the optional nlines argument   The value to take as the missing values  NA in R  is specified with the op   tional na  strings argument  The separator between values is by default  white space  that can be changed with the optional sep argument  Finally   scan expects to read double precision numbers  this can be changed with  the optional what argument     In this example  we first read in the dimensions of the image  The header  lines have both character strings  e g  NCOLS  and numbers  e g  12   we  read them into a character v
21.  23 0 24 0 25 0 28 0 29 0 30 0 27 0 26 0 31 0    Note how rank averages tied ranks by default  this can be changed by the  optional ties method argument     This example also illustrates the   operator for extracting fields from  dataframes  see 84 7     4 5 Vector and list data structures    Many S functions create complicated data structures  whose structure must  be known in order to use the results in further operations  For example   the sort function sorts a vector  when called with the optional index TRUE  argument it also returns the ordering index vector      gt  SS     sort samp  index TRUE     gt  str ss    List of 2    x  num  1 10   1 2392 0 0377 2 2405 3 7608        ix  int  1 10  12 365478 10 9    This example shows the very important str function  which displays the  S structure of an object     Lists In this case the object is a list  which in S is an arbitrary collection of  other objects  Here the list consists of two objects  a ten element vector of  sorted values ss x and a ten element vector of the indices ss ix  which  are the positions in the original list where the corresponding sorted value  was found  We can display just one element of the list if we want      gt  ss ix   1  12 3 6 5 4 7 810 9    This shows the syntax for accessing named components of a data frame    or list using the   operator  object   component  where the   indicates  that the component  or field  is to be found within the named object     We can combine this with the vector ind
22.  3 5 Applied mathematics programs    MATLAB is a widely used applied mathematics program  especially suited  to matrix maniupulation  as is R  see 84 6   which lends itself naturally  to programming statistical algorithms  Add on packages are available for  many kinds of statistical computation  Statistical methods are also pro   grammable in Mathematica        9 http   www gstat org   10 http   www  openoffice org     3 UsingR  There are several ways to work with R     with the R console GUI  83 1      with the RStudio IDE  83 3      with the Tinn R editor and the R console  83 4      from one of the other IDE such as JGR     from a command line R interface  CLI   83 2      from the ESS  Emacs Speaks Statistics  module of the Emacs editor   Of these  RStudio is for most ITC users the best choice  it contains an    R command line interface but with a code editor  help text  a workspace  browser  and graphic output     3 1 R console GUI    The default interface for both Windows and Mac OS X is a simple GUI   We refer to these as  R console GUI  because they provide an easy to use  interface to the R command line  a simple script editor  graphics output   and on line help  they do not contain any menus for data manipulation or  statistical procedures     R for Linux has no GUI  however  several independent Linux programs    provide a GUI development environment  an example is RStudio  83 3      3 1  On your own Windows computer    You can download and install R for Windows as i
23.  5708    The final value of the expression  in this case the value of mu  is printed   because the parentheses force the expression to be evaluated as a unit     Removing objects from the workspace You can remove objects when  they are no longer needed with the rm function      gt  rm s    gt  sS  Error  Object  s  not found    4 3 Methods and their arguments    In the command s     seq 10   seq is an example of an S method  often  called a function by analogy with mathematical functions  which has the  form     method name   arguments      Some functions do not need arguments  e g  to list the objects in the  workspace use the 1s function with an empty argument list      gt  IsO    Note that the empty argument list  i e  nothing between the   and   is still  needed  otherwise the computer code for the function itself is printed     Optional arguments Most functions have optional arguments  which may  be named like this      gt  s  lt   seq from 20  to 0  by  2    gt  sS   1  20 18 1614 1210 8 6 4 2 0    Named arguments have the form name   value    Arguments of many functions can also be positional  that is  their meaning  depends on their position in the argument list  The previous command  could be written      gt  S  lt   seq 20  0  by  2   s   1  20 18 16 14 12 10 8 6 4 2 0    because the seq function expects its first un named argument to be the  starting point of the vector and its second to be the end     21    The command separator This example shows the use of the  
24.  6 6 57 7 5 8 9 9 5  John 0 00 011 1  Paul 0 10 110 0  George 1 01 010 0    The levels have the same internal numbers but different labels  This use  of factor with the optional labels argument does not change the order  of the factors  which will be presented in the original order  If for example  we want to present them alphabetically  we need to re order the levels  ex   tracting them with the levels function in the order we want  as indicated  by subscripts   and then setting these  in the new order  with the optional  levels argument to factor      gt  tests student  lt   factor tests student  levels levels tests student   c 3 1 2      gt  str tests    data frame   9 obs  of 2 variables     student  Factor w  3 levels  George   John      231231231    score   num 96 5 887 569 5 8 7   gt  table tests     score   student 6 6 57 7 5 89 9 5  George 1 01 010 0  John 0 00 011 1  Paul 0 10 110 0    Now the three students are presented in alphabetic order  because the  underlying codes have been re ordered  The car package contains a useful  recode function which also allows grouping of factors     Factors require special care in statistical models  see 84 17 1     4 9 Selecting subsets    We often need to examine subsets of our data  for example to perform a  separate analysis for several strata defined by some factor  or to exclude  outliers defined by some criterion     Selecting known elements If we know the observation numbers  we sim   ply name them as the first subscript  
25.  A GUI data editor    The fix function provides a simple graphical data editor for data frames   It can be used to correct data entry mistakes or to build new data frames        38 This matrix is also used as an example in 84 6    92    For example  to edit the sample trees dataset        data trees    gt  fix trees     This opens the data editor window  Figure 17   Warning   any changes  made to the frame will be stored in the object  You can change data values   add rows  add columns  new variables   change column names  and  if you  give the optional edit  row names T argument  change row names                 R Data Editor Cuy  C   1 es  7  mes      Es es je fos    EN  8 10 2  ye  5 16 4  BEN 10  if 18 8    BS 1 151    3  xj e pes                             y 11 18 2   i 22 6 p  Erg 11 2 19 9   B8  1 1 1       pua  s faa               Figure 17  R graphical data editor    To edit a new data frame  you must first create it as a null data frame using  the as data  frame function  and then call the fix function      gt  myNewFrame     as data frame NULL    gt  fix myNewFrame     6 3 Importing data from a CSV file    The Comma Separated Values or  CSV  file is a common interchange for   mat which can be prepared in many ways  For example  a CSV file can be  created directly as a text file  using Notepad or some other plain text edi   tor  However  it is common to have data already in a spreadsheet such as  Excel  In this case the procedure is as follows     1  Prepare the da
26.  Data Import Export    91    variables  but new names can be given with the fieldname   variable  syntax  For example      gt  oly 100  lt   data frame year yr  men  women     gt  str oly 100       data frame   27 obs  of 3 variables      year   num 1900 1904 1908 1912 1916         men   num 11 11 10 8 10 8 NA 10 8 10 6 10 8 10 3 10 3        women  num NA NA NA NA NA NA NA 12 2 11 9 11 5        gt  rm yr  men  women     To enter a matrix  first enter the data with the scan function or as a list  with the c function  and then place it in matrix form with the matrix     cre   ate a matrix   function  We illustrate this with the sample confusion ma   trix of Congalton et al   6   also used as an example by Skidmore  53  and  Rossiter  43     This example also illustrates the rownames and colnames  functions to assign  or read  the row and column names of a matrix      gt  cm     scan    1 3514111411301293842 5 122  17    gt  cm     matrix cm  4  4  byrow T    gt  cm    1    21   31   4    1   35 14 11 1   2 1 4 11 3 0   3 1  12 9 38 4   4   2 5 12 2   gt  colnames cm   lt   c   A    B    C    D       rownames cm      LETTERS 1 4    gt  cm    V OnUu2  m  N  wo  UJ  oo    35 1411 1   gt  cm  A    C     1  11    Note the use of the byrow optional argument to the matrix function  to  indicate that we entered the data by row  also the use of the rownames and  colnames functions to label the matrix with upper case letters supplied  conveniently by the LETTERS built in constant     6 2
27.  Explorer  this is because Windows  considers any file name that begins with     to be a    hidden    file  You need  to select the Tools   Folder options in Explorer  then the View tab   and click the radio button for Show hidden files and folders  You  must also un check the box forHide file extensions for known file  types     3 1 5 Saving your analysis steps    The File   Save to file     menu command will save the entire con   sole contents  i e  both your commands and R s response  to a text file   which you can later review and edit with any text editor  This is useful for  cutting and pasting into your reports or thesis  and also for writing scripts  to repeat procedures     3 1 6 Saving your graphs    In the Windows version of R  you can save any graphical output for in   sertion into documents or printing  If necessary  bring the graphics win   dow to the front  e g  click on its title bar   select menu command File    Save as     and then one of the formats  Most useful for insertion into  MS Word documents is Metafile  most useful for BIEX is Postscript  most  useful for PDFLaTeX and stand alone printing is PDF  You can later review  your saved graphics with programs such as Windows Picture Editor  If you  want to add other graphical elements  you may want to save as a PNG or  JPEG  however in most cases it is cleaner to add annotations within R itself     You can also review graphics within the Windows R GUI itself  Create the  first graph  bring the graphics win
28.  Listing the workspace gives a clue      gt  IsO   1   Volume   trees     The name Volume occurs twice  once as a local variable  visible with  1sQ  and once as a field name  visible with str  trees   Even though  the trees frame is attached  the Volume field is masked by the Volume  local variable  which in this case is just a random permutation of the  integers from 1 to 31  so the cor method gives an incorrect result     One way around this problem is to name the field explicitly within its  frame using    e g  trees Volume      gt  cor trees Volume  Girth    1  0 96712    Another way is to delete the workspace variable with rm    this makes  the field name in the attached frame visible      gt  rm Volume    gt  cor Volume  Girth    1  0 96712    Another way is to use names for local variables that do not conflict  with field names in the attached data frames     10 3 How do I find the method to do what I want     R has a very rich set of methods  and there are often several ways to ac   complish the same thing  especially with contributed packages     1  Look at the help pages for methods you do know  they often list  related methods  For example  the help page for the linear models  method   1m  gives related methods for prediction  summary  regres   sion diagnostics  analysis of variance  and generalised linear models   with hyperlinks  in the HTML help  to directly access their help     2  Search for keywords  For example help search   sequence   lists    methods to ge
29.  a dangerous procedure when  applied blindly  step can select the  best  model  based on AIC  us   ing forward or backward selection and a user specified stopping and  starting points       The pls package  27  implements partial least squares regression     PLSR  and principal components regression  PCR   these are often  used in spectroscopy and chemometrics       The boot package provides bootstrapping functions       Principal components of multivariate matrices are computed by the    prcomp function  the results can be visualised with the biplot and  screeplot functions     There are many other modelling functions  see 810 3 for some ideas on  how to find the one you want  The advanced text of Venables  amp  Ripley  57     62    4 20    has chapters on many sophisticated functions  all with theory  references   and S code  Many of these functions are implemented in the MASS package     Missing values    A common problem in a statistical dataset is that not all variables are  recorded for all records  R uses a special missing value value for all data  types  represented as NA  which stands for    not available     Within R  it may  be assigned to a variable     For example  suppose the volume of the first tree in the trees dataset is  unknown      gt  trees Volume 1      NA   gt  str trees    data frame   31 obs  of 3 variables     Girth   num 8 3 8 6 8 8 10 5 10 7 10 8 11 11    Height  num 70 65 63 72 81 83 66 75 80 75        Volume  num NA 10 3 10 2 16 4 18 8 19 7 15 6 
30.  about  shorter trees    52     gt  model  lt   Im Volume   Height     but even if not  the variable names can be referred to a frame with the  data  named argument      gt  model     Im Volume   Height  data trees     Additive effects More complicated models include additive effects  using  the   formula operator      gt  model     Im Volume   Height   Girth  data trees     Note this is not an arithmetic addition  but rather a special use in the  model notation  Here the tree volume is explained by by both its height  and girth  considered as independent predictors     Interactions The   formula operator is used to indicate interactions  usu   ally these are used in addition to additive terms      gt  model  lt   Im Volume   Height   Girth   Height Girth  data trees     Here the tree volume is explained by both its height and girth  as well as  their interaction  i e  that the effect of girth is different at different heights     The   formula operator is shorthand for all linear terms and interactions  of the named independent variables  so that the previous example could    have been more simply written as      gt  model  lt   Im Volume   Height   Girth  data trees     The A formula operator is used to indicate predictor crossing to the speci   fied degree      gt  model  lt   Im Volume    Height   Girth A2  data trees     Here the A2 expands to all interactions between the named predictors   since there are only two  this is equivlent to Height   Girth   Height Girth   wh
31.  applications  clever    uses of R  and R programming     Installing R for Windows To install R on Windows  download the setup  program from CRAN by following the links for  Windows   then the  base   package  then selecting the setup program  the exact name depending on  the version   Download the file  about 32 Mb  and run it  this will install R  and its base packages     This link will redirect to the current Windows binary release        http   mirrors dotsrc org cran bin windows base release htm        Note for Windows system managers  the  R Windows FAQ  in  the same directory as the setup program has extensive infor   mation on administering R for Windows     The setup installs the base R system and some of the most common li   braries  Table 4         65 http   www r project org   66 http    cran r project org     119             base The R Base Package   chron Chronological objects which can handle dates and times   class Functions for classification   cluster Functions for clustering   datasets The R Datasets Package   foreign Read data stored by Minitab  S  SAS  SPSS  Stata        graphics The R Graphics Package   grDevices The R graphics devices s upport for colours and fonts   grid The Grid Graphics Package   KernSmooth   Functions for kernel smoothing   lattice Lattice Graphics   MASS Main Library of Venables and Ripley s MASS   methods Formal Methods and Classes   mle Maximum likelihood estimation   multcomp Multiple Tests and Simultaneous Confidence Intervals
32.  be edited and run as batch processes       The user must learn a new way of thinking about data  as data    frames    4 7  and objects each with its class  which in turn supports  a set of methods    4 13   This has the advantage common to object   oriented languages that you can only operate on an object according  to methods that make sense  and methods can adapt to the type of  object        The user must learn the S language  84   both for commands and    the notation used to specify statistical models  84 17   The S statis   tical modelling language is a lingua franca among statisticians  and  provides a compact way to express models          For example  the t  transpose  method only can be applied to matrices    For example  the summary and plot methods give different results depending on the  class of object     2 3 Alternatives    There are many ways to do computational statistics  this section discusses  them in relation to R  None of these programs are open source  meaning  that you must trust the company to do the computations correctly     2 3 1 S PLUS    S PLUS is a commercial program distributed by the Insightful corporation   and is a popular choice for large scale commerical statistical computing   Like R  it is a dialect of the original S language developed at Bell Laborato   ries     S PLUS has a full graphical user interface  GUI   it may be also used  like R  by typing commands at the command line interface or by running  scripts  It has a rich interacti
33.  command  separator  This allows several commands to be written on one line  In this  case the first command computes the sequence and stores it in an object   and the second displays this object  This effect can also be achieved by  enclosing the entire expression in parentheses  because then S prints the  value of the expression  which in this case is the new object      gt   s     seq from 20  to 0  by  2     1  20 18 16 14 1210 8 6 4 2 0    Named arguments give more flexibility  this could have been written with  names      gt   s     seq to 0  from 20  by  2     1  2018 16 141210 8 6 4 2 0    but if the arguments are specified only by position the starting value must  be before the ending value     For each function  the list of arguments  both positional and named  and  their meaning is given in the on line help      gt    seq    Any element or group of elements in a vector can be accessed by using  subscripts  very much like in mathematical notation  with the      select  array elements  operator      gt  samp 1     1   1 239197      samp 1 3     1   1 23919739 0 03765046 2 24047546   gt  samp c 1 10      1   1 239197 9 599777    The notation 1 3  using the   sequence operator  produces the sequence  from 1 to 3     The catenate function The notation c 1  10  is an example of the very  useful c or catenate   make a chain   function  which makes a list out of  its arguments  in this case the two integers representing the indices of the  first and last elements in the ve
34.  fields by using the optional sep     argument      gt  write table round as data frame kxy   4   file  KrigeResult csv              sep      quote T  row names F   col names c  E    N    LPb    LPb var       In this example the precision of the output was limited with the round  function  and named the fields with the col names  optional argument to  write table  We also specified that no row names be written  with the  optionaol row names F argument  and that any strings be quoted with the  optional quote T argument     Here are the first few lines of the file KrigeResult csv viewed in a plain   text editor such as Notepad     99     E   N    LPb    LPb var   181180 333740 5 7214 0 2932  181140 333700  5 6772 0 2254  181180 333700  5 6064 0 2391    100    8 Reproducible data analysis    Since R is a programming language  it can be used to write scripts  83 5   which can then be run at any time to reproduce the analysis  This is  especially useful if the data changes  or to apply the same analysis to a  different dataset  It also allows the exact same figure to be produced     A step further is to integrate the R analysis into a text document  Rather  than cutting and pasting  a better approach is to prepare the text doc   ument with BIEX  23  22    and incorporate executable R code into the  document  This ingenious approach is known generically as  Weaving  or     literate programming   21   and has been implemented for R as  Sweave    24  25      Note  A similar approach 
35.  industry standard Java Run   time Engine  including Mac OS X and most Windows systems  This includes  the iplots interactive graphics package     A 1 Installing new packages    The ITC network installation includes the optional packages requested  by ITC users  If you need to install packages on your own copy of R   use the Packages   Install Package s  from CRAN     menu item  while connected to the Internet  You need administrator privledges on  your system  if you can install the base program  you can install packages     A brief description and full documentation of the available packages is  available from the CRAN home page  click on the link for  Packages      If you need a new package on the ITC network  contact the person who  maintains the R distribution 95     A 2 Customizing your installation    Many aspects of R s interactive behaviour can be changed to suit your pref   erences  For example  you can set up your own copy of R to load libraries  at startup  and you can also change many default options  You do this  by creating a file named  Rprofile either in your home directory  or in  a working directory from which you will start R  or both  The second of  these is used if it exists  otherwise the master copy in the home directory     To see the current options setings  use the options function without any  arguments  it s easier to review these by viewing its structure  Individual  settings can then be viewed by their field name      gt  str optionsO   List of 
36.  means that the file could  not be found     Check the current working directory with the getwd method  and see  if the file is there with the list files method  or  you could look  for the file in Windows Explorer       gt  getwd O    1    Users rossiter ds DavisGeostats    gt  list files pattern  dlv csv    character 0     In fact this file is in another directory  One way is to give the full  path name  note the use of the front slash   as in Unix  R interprets  this correctly for MS Windows      gt  dlv  lt   read csv   Users rossiter ds DLV dlv csv     gt  dim dlv    1  88 33    Another way is to change the working directory with the setwd method      gt  setwd   Users rossiter ds DLV      110     gt  getwd     1    Users rossiter ds DLV    gt  dlv     read csv  dlv csv     gt  dim dlv    1  88 33    3  A common problem is the attempt to use a method that is in an  unloaded package     For example  suppose we try to run a resistant regression with the  lqs method      gt  lgs lead   om  data meuse   Error  couldn t find function  lqs     This error has two possible causes  either there is no such method  anywhere  for example  if it had been misspelled 1qr   or the method  is available in an unloaded package     To find out if the method is available in any package  search for the  topic with the help search method      gt  help search  1qs      In this case  the list of matches to the search term  1qs  shows two   both associated with package MASS     Once the required 
37.  records and fields  respectively  Rows are generally accessed by the  row number  although they can have names   and columns by the vari   able name  although they can also be accessed by number   A data frame  can also be considerd a list whose members are the fields  these can be  accessed with the        list access  operator     Sample data R comes with many example datasets  83 8  organized as  data frames  let s load one  trees  and examine its structure and several  ways to access its components      gt   trees    gt  data trees     gt  str trees       data  frame   31 obs  of 3 variables      Girth   num 8 3 8 6 8 8 10 5 10 7 10 8 11       Height  num 70 65 63 72 81 83 66 75 80 75        Volume  num 10 3 10 3 10 2 16 4 18 8 19 7        The help text tells us that this data set contains measurements of the girth   diameter in inches    measured at 4 5 ft     height   height  feet  and timber       28 1 inch   2 54 cm  29 1 ft   30 48 cm    30    volume  cubic feet    in 31 felled black cherry trees  The data frame has  31 observations  rows  cases  records  each of which has three variables   columns  attributes  fields   Their names can be retrieved or changed by  the names function  For example  to name the fields Var 1  Var 2 etc  we  could use the paste function to build the names into a list and then assign  this list to the names attribute of the data frame      gt   saved names     names trees      1   Girth   Height   Volume     gt   names trees   lt   paste  
38.  species to the scatterplot of petal width and length     xyplot Petal Width   Petal Length  data iris     colec  darkgreen    navyblue    firebrick      as numeric iris Species    pch 20   xlab  Petal length   ylab  Petal width    main  Anderson Iris data      panel   function x  y       t1  panel  fi11 Ccol  antiquewhite3    panel xyplot x  y          panel abline Im y   x   col  black     for  lvl in 1 lengthClevels Species       panel abline Im y   x    subset  Species  levels Species  1lv1      col c  darkgreen    navyblue    firebrick   1v1    Tty 2              This plot is shown in Figure 14     The panel argument is used whenever we want to control the appear   ance of the panel beyond the default plot  Here it is a function  which  takes as arguments the dummy variables x and y  which represent the  two variables of the x y plot  The     argument to the panel passes  through all the extra arguments specified previously  e g  data iris   Within the un named panel function  several pre defined panel functions  are called to add graphic elements  These all have names beginning with  panel   First we use panel fill to change the main plot background     81    Anderson Iris data    Petal width       1 2 3 4 5 6 7  Petal length    Figure 14  Trellis scatter plot with some added elements    then panel xyplot to draw the points  note that this must be called ex   plicitly if there is a panel function  Then we add the regression lines with  panel abline  Note the use of the 
39.  the Sweave  web site  for many examples        43 http   www stat uni muenchen de  leisch Sweave     104    9 Learning R    The present document explains the basics of using R  83   the S language   84   R graphics  85   and the on line help system  83 2 2   There are many  other resources for learning and applying R  this section explains the most  useful     9 1 Task views    Many applications are covered in on line Task Views     These are a  summary by a task maintainer of the facilities in R to accomplish cer   tain tasks  For example  Roger Bivand maintains a task view    Analysis of  Spatial Data      which discusses how to represent spatial data in R  how to  read and write it  how to analyze point patterns  and geostatistical analy   sis  Another useful task view is    Multivariate Statistics  9 maintained by    Paul Hewson     The contributed documentation at CRAN     has many introductions and  reference cards for specific kinds of analysis  for example regression anal   ysis   and time series analysis       9 2 Rtutorials and introductions    Translations    From the R console  you can follow the introductory course  An Introduc   tion to R  as follows     1  Select menu command Help   Html help  a web page will appear  in your web browser     2  In this web page  select the link    An Introduction to R     another web  page will appear     You will probably want to start with the section  Appendix A  A sample  session   scroll down the web page to find this in t
40.  the factor  It uses formulae similar to  the statistical formulae introduced in   4 17 to specify the variables to be  plotted and their relation in the plot  Customization of Trellis graphics  parameters  for example  default background and symbol colours  is ex   plained in   5 2 6     5 2 1 Univariate plots  As a simple example  consider the iris dataset  To produce a kernel  density plot  a sophisticated histogram  on the whole dataset  use the    densityplot method      gt  densityplot   Petal Length  data iris     77       The   operator here has no left hand side  since there is no dependent  variable in the plot  it is univariate  The petal length is the independent  variable  and we get one plot  this is shown shown on the left side of Figure                                        All species Split by species  i j f 2 4 6  L L L L i L 1 L  setosa versicolor virginica  0 25 4 les  25  H  0 20 4 L  2 0   H  0 15 4 s  E 15     E E  8 5  10 4 j a  o 1 0   m  0 05 4 H 05  L  0 00 4  elo  oo  E 0 0        Cooma  ONNEDOOD       T T T T T T T T T T T T T T  0 2 4 6 8 2 4 6 2 4 6  Petal Length Petal Length          Figure 11  Trellis density plots  without and with a conditioning factor       To add conditioning  we use the   operator  which can be read as  condi   tioned on  the variable s  named on its right side        densityplot   Petal Length   Species  data iris     Here there is one panel per species  this is shown shown on the right side  of Figure 11  We can clearl
41.  time series  44        optimal partitioning of soil transects  42      9 5 Web Pages to learn R      http   www rseek org     The  RSeek  website is a unified portal to R discussion lists  books   FAQ  R Journal  user contributed documentation  etc     Figure 19 shows the results of an RSeek search      http   www maths bath ac uk  jjf23 book    Practical Regression and Anova in R  by Julian Faraway  University  of Bath  UK   this has been updated into two commerical textbooks    16  17      http   rgm2 lab nig ac jp RGM2      R Graphical Manual   a huge collection of graphs produced with R   all with source code      http   addictedtor free fr graphiques        55 http   www stat pitt edu stoffer tsa2 index html  56 http    ww itc nl personal rossiter pubs list html pubs_m_R    107    RSeek org R project Search Engine             aj   G A l Al     4     Q nhttp   www rseek org  7cx 01092314434370259875 0  Q7 Google  QA my del icio us post to del icio us ITCv ASv Rv  RXv Sciencev XinHuav DGRw Searchy Moneyv NLv       cokriging  Search    Results 1   10 for cokriging   0 61 seconds  Introductions Support Lists  Functions R code Books    R Graphical Manual Google   The function provides the following prediction methods  simple  ordinary  custom Search    and universal kriging  simple  ordinary  and universal cokriging  point  or  bm2 genes nig ac jp RGM2 pkg php p gstat    PoF  The meuse data set  a tutorial for the gstat R package   File Format  PDF Adobe Acrobat   View as HT
42.  trigonometric functions such as       25 In RGui  menu command Help   Manuals   R Language Manual  26 In RGui  menu command Help   Manuals   R Introduction    19    asin      gt  log 10   log10 10   10g2 10    1  2 3026    1  1    1  3 3219    gt  round log 10     1  2    gt  sqrt 5     1  2 2361   sin 45    pi 180     1  0 7071    gt   asin 1  pi  180   1  90    4 2 Creating new objects  the assignment operator    New objects in the workspace are created with the assignment operator       which may also be written as        gt  mu     180   gt  mu   180    The symbol on the left side is given the value of the expression on the right  side  creating a new object  or redefining an existing one   here named  mu  in the workspace and assigning it the value of the expression  here  the scalar value 180  which is stored as a one element vector  The two     character symbol  lt   must be written as two adjacent characters with no  spaces      Now that mu is defined  it may be printed at the console as an expression      gt  print  mu    1  180   gt  mu     1  180    and it may be used in an expression      gt  mu pi   1  57 2958    More complex objects may be created      gt  S  lt   seq 10    gt  sS     1  1 2 3 4 5 6 7 8 910    This creates a new object named s in the workspace and assigns it the vec   tor  1 2    10    The syntax of seq 10  is explained in the next section      Multiple assignments are allowed in the same expression     20     gt   mu     theta     pi 2    1  1
43.  typing commands in a command line  interface window    in response to a command prompt  which usually  looks like this     The    is a prompt symbol displayed by R  not typed by you  This is R s  way of telling you it s ready for you to type a command     Type your command and press the Enter or Return keys  R will execute  your command     If your entry is not a complete R command  R will prompt you to complete  it with the continuation prompt symbol        13 An alternative for some analyses is the Rcmdr GUI explained in 83 6     10    R will accept the command once it is syntactically complete  in particular  the parentheses must balance  Once the command is complete  R then  presents its results in the same command line interface window  directly  under your command     If you want to abort the current command  i e  not complete it   press the  Esc     escape     key     For example  to draw 500 samples from a binomial distribution of 20 trials  with a 40  chance of success   you would first use the rbinom method and  then summarize it with the summary method  as follows       x     rbinom 500 20  4    summary  x    Min  1st Qu  Median Mean 3rd Qu  Max   2 000 7 000 8 000 8 232 10 000 15 000    This could also have been entered on several lines     x  lt   rbinom   500 20  4    You can use any white space to increase legibility  except that the assign   ment symbol  lt   must be written together      lt   rbinom 500  20  0 4     R is case sensitive  that is  method rbino
44. 0  30  40  50  60  70  80  90  100   37 0 50 0 65 8 77 2 87 0 123 0 148 0 182 6 226 4 290 4 654 0  d  El  vi                 O  Postvak   R2 Rfo    JAM Garcode        S  Microsof    ascne        crown    Thexo par Me Clipboar   _   AO  10 52  pp Br A pi UA E co G Ei    de caca          Figure 2  The Tinn R screen  with the R command line interface also visible       3 5 Writing and running scripts    After you have worked out an analysis by typing a sequence of commands   you will probably want to re run them on edited data  new data  subsets  etc  This is easy to do by means of scripts  which are simply lists of com   mands in a file  written exactly as you would type them at the command       20 http    www sciviews org Tinn R     14    line interface  They are run with the source method  A useful feature  of scripts is that you can include comments  lines that begin with the    character  to explain to yourself or others what the script is doing and  why     Here s a step by step description of how to create and run a simple script  which draws two random samples from a normal distribution and com   putes their correlation      1  Open a new document in a plain text editor  i e   one that does not  insert any formatting  Under MS Windows you can use Notepad or  Wordpad  if you are using Tinn R or RStudio open a new script     2  Type in the following lines     x     rnorm 100  180  20   y     rnorm 100  180  20   plot x  y    cor test x  y     3  Save the file with the nam
45. 0 3 0 4 0 5  Fitted values Leverage    Figure 4  Regression diagnostic plots     gt  text h  ifelse h  gt  3  mean h   seq 1 length h        pos 2     A direct way to access diagnostics is with the influence measures func   tion applied to the fitted model      gt  infl     influence measures  model    gt  strCinfl   List of 3    infmat  num  1 31  1 8  0 23213 0 06511  0 15754 0 00173  0 02461          attr      dimnames   List of 2        chr  1 31   1   2   3   4                chr  1 8   dfb 1    dfb Hght   dfb Grth   dfb Hg G        is inf  logi  1 31  1 8  FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE          attr     dimnames   List of 2       chr  1 31   1   2   3   4             chr  1 8   dfb 1    dfb Hght   dfb Grth   dfb Hg G     call   language Im formula   Volume   Height   Girth  data   trees     attr     class    chr  infl     60    The key field here is 1s  inf  which specifies which observations were es   pecially influential in the fit  using eight different measures of influence   which are listed as attributes of this field      gt  attr infl infmat   dimnames    21      1    7      gt  infl is inf  which applyCinfl is inf  1  any     dfb 1_ dfb Hght dfb Grth dfb Hg G dffit cov r cook d hat  FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE  FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE  FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE  FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE  FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE    2  3  18  20  31     dfb 
46. 1      dfb Hght   dfb Grth   dfb Hg G   dffit   cov r    cook d       hat     The most common use here is to list the observations for which any of the  influence measures were excessive          This example illustrates the use of the which function to return the in   dices in a vector where a logical condition is true  and the any function  to perform a logical or on a set of logical vectors  in this case the eight  diagnostics  In this example  observation 31 has a high leverage  18 a high  DFFITS value  and all five high COVRATIO     Notice also the use of the apply function to apply the any  is any element  of the vector TRUE   function to each row     4 18 2 Model based prediction    The fitted function only gives values for the observations  often we want  to predict at other values  For this the predict generic function is used   in the case of linear models this specialises to the predict  1m function   The first argument is the fitted model and the second is a data frame in  which to look for variables with which to predict  these must have the same  names as in the model formula  Both confidence and prediction intervals  may be requested  with a user specified confidence level     For example  to predict tree volumes for all combinations of heights  in  10 foot increments  and girths  in 5 inch radius increments      along with  the 99  confidence intervals of this prediction      gt  model  lt   Im Volume   Height   Girth  data trees     V    new data     data fra
47. 1  or not  logical FALSE  associated by R with the number  0   that there are three soil types arbitrarily named    1        2    and    3      not  in any order   and that there are three flood frequency classes named  1       2    and    3     and these are in this order  from most to least flooded  n b   we know this from the data documentation  not from anything we can see  in the CSV file      Missing values Missing values are expressed in CSV files by the two let   ters NA  without quotes   For example     X y cadmium elev dist om ffreq soil lime landuse  181072 333611 NA 7 909 0 00135803 NA 1 1 1 Ah  181025 333558 8 6 6 983 0 0122243 14 1 1 1 Ah    In the first record neither the cadmium concentration nor organic matter  proportion are known  these will be imported as missing values  symbol   ized in R by NA     Matrices A matrix can also be prepared as a CSV file  perhaps exported  from Excel  and imported into R with the read csv function  The matrix  can also be prepared with row and column labels     For example here is the CSV file for a small confusion matrix  as prepared  in a text editor  the first row has the column names  reference classes  and  each of the other rows begins with the row name  mapped class   there is  no entry for the very first column of the first row     SA QB   C    A  35 14 11 1   B  4 11 3 0   C  12 9 38 4   D  2 5 12 2    Suppose this file is named cm  csv  it can be read in as follows  note that       40 These are not the numbers 1 2  
48. 2  103  104    105  105  105  106  107  107  108    110  110  112  113  115  116  117    119  121  121  122    123    126    128    133    iv    List of Tables    13  16  17  18  19  20  21    Aa wn Re    Availalecolours                ptes sany irodd 87  Example of    COlour ranp  uas sia eer E g 89  R graphical data editor  lt  s i452 84 meek 2S g ETag 93  Example PDF produced by Sweave and BIpX             103  Results of an RSeek SearCh        o o ooo    oo       108  Results of an R site search         o o ooo ooo oo       109  Visualising the variability of small random samples        125  Methods for adding to an existing base graphics plot       71  Base graphics plot  ypes    uon nos ERE ee ee A 75  Trellis graphics plot types   Lxx   83    Packages in the base R distribution for Windows    O If you are impatient        1 Whatis R     1  Install R and RStudio on your MS Windows  Mac OS X or Linux sys   tem  SA      2  Run RStudio  this will automatically start R within it     3  Follow one of the tutorials  89 2  such as my  Using the R Environ   ment for Statistical Computing  An example with the Mercer  amp  Hall  wheat yield dataset    48      4  Experiment     5  Use this document as a reference     R is an open source environment for statistical computing and visualisa   tion  It is based on the S language developed at Bell Laboratories in the  1980 s  20   and is the product of an active movement among statisti   cians for a powerful  programmable  portable  an
49. 4 2 80 31 7  14 5 74 36 3  17 3 81 55 4    Each call to sample will give a different result     By default sampling is without replacement  so the same element can  not be selected more than once  for sampling with replacement use the  replace T optional argument     In this example  the command dim trees  uses the dim function to give    the dimensions of the data frame  rows and columns   the first element of  this two element list is the number of rows  dim trees   1      Splitting on a factor Another common operation is to split a dataset into  several strata defined by some factor  For this  S provides the split func     38    tion  which we illustrate with the iris dataset which has one factor  the    species of Iris      gt  data iris   str ir   data frame   150 ob    Sepal Length  num    Sepal Width   num    Petal Length  num    Petal Width   num    Species    gt  attach iris      gt  ir s  lt   split iris     List of 3     setosa   data       Sepal Length       Sepal Width        Petal Length       Petal Width        Species     versicolor  data       Sepal Length       Sepal Width        Petal Length       Petal Width        Species     virginica   data       Sepal Length       Sepal Width        Petal Length     is   s     frame    num  1   num  1   num  1   num  1     Factor  frame    num  1   num  1   num  1   num  1     Factor  frame    num  1   num  1   num  1     Species      NOR wo  NJ 4 amp  ww     ORR     onus  PRU     setosa     str ir s    50 obs  riabl  
50. 4 88486 X 69     29  Nason  G  P  2008  Wavelet methods in statistics with R  Use R  New  York   London  Springer 106        30  Paradis  E  2002  R for Beginners  Montpellier  F   University of Mont   pellier  URL http    cran r project org doc contrib rdebuts  en pdf  106     31  Paradis  E  2002  R para Principiantes  Montpellier  F   University of  Montpellier  URL http   cran r project org doc contrib rdebuts  es pdf  106     32  Paradis  E  2002  R pour les d  butants  Montpellier  F   University of  Montpellier    129     33      34      35      36      37      38      39      40    41      42      43      44     URL http   cran r project org doc contrib rdebuts  fr pdf  106    Pebesma  E  J  2004  Multivariable geostatistics in S  the gstat package   Computers  amp  Geosciences 30 7  683 691 2 6    Pebesma  E  J   amp  Bivand  R  S  2005  Classes and methods for spatial  data in R  R News 5 2  9 13  URL http   CRAN R project org doc Rnews  45    Pebesma  E  J   amp  Wesseling  C  G  1998  Gstat  a program for geostatis   tical modelling  prediction and simulation  Computers  amp  Geosciences  24 1  17 31   URL http    www gstat org  6    R Development Core Team  2012  An Introduction to R  Vienna  The  R Foundation for Statistical Computing  2 15 0  2012 03 30  edition  URL http   cran r project org doc manuals R intro pdf 19   42  64  65  69    R Development Core Team  2012  R Data Import Export  Vienna  The  R Foundation for Statistical Computing  2 15 0  2012 03 30
51. 45       67 http   www  rosuda org   68 As of the date of these notes  their author    121             prompt   chr   gt       continue   chr        htmlhelp   logi TRUE   gt  optionsO  digits   1  5    Here is an example  Rprofile which sets some options and loads some  libraries that will always be used in a project     options show signif stars   FALSE    options html help TRUE     options digits 5    options prompt  R gt      options continue    R       options timeout   20    library gstat   library lattice      optional  function to run at startup   First  lt   function      print  Welcome to R  you ve made a wise choice         optional  function to run at shutdown   Last  lt   function     graphics offQ  print  Get a life         A 3 Rin different human languages    R s menus and messages have been translated to several common lan   guages  the codes of these are listed as folders under folder share locale  in the R installation     MS Windows If your system is set up to work in a non English language   e g  Spanish  code es   R GUI will automatically display its menus  and R  its messages  in that language     If you d like R to work in a different language than your overall system   the simplest method is to     1  Create a desktop shortcut to R Gui  probably created during installa   tion      2  Open its properties     3  In the the Target field  add the environment variable LANGUAGE and  the code for the language you want  for example to force English   LANGUAGE 
52. 50  5 1  50  3 5  50  1 4  50  0 2 O   w  3 level    of  9  3   4  2  S  50 obs  of  6   2  5  5  s  f  8    iab  4   6 5  3 3 6  1  25  ds  22  0     QJ d   QU Un    AA    9 3   1 7  0 4      versi    setosa ve  5 variables     S    50  7 6   50  3 2  50  4 7  50  1 4 1   w  3 level  50 obs   50  6 3 5   50  3 3 2 7  50  6 5 1 5     4  3   4     Ou Uu   Run  Oo    non  2    Setosa ve  5 Variables     Za  o    NO ww  hunda  Hooo  Oo u w w  M 0 wo  NONU  NONGO  Pus        Petal Width        Species    50  2 5 1 9  w  3 levels    num  1     Factor    setosa 3 3 25   The split function builds a list of data frames named by the level of the  factor on which the original data frame was split  Here the original 150  observations have been split into three lists of 50  one for each species   These can be accessed by name      gt  summary Cir s setosa Petal Length   Min  1st Qu  Median Mean 3rd Qu   1 00 1 40 1 50 1 46 1 58    Max   1 90    4 9 1 Simultaneous operations on subsets    We often want to apply some computation to all subsets of a data frame   For example  to compute the mean petal length of the iris data set for  each species separately  we could first split the set as shown in the pre   vious section  84 9   compute each subset s mean  and join them in one  vector  This can be accomplished in one step using the by function      gt  by Petal Length  Species  mean     INDICES  setosa    1  1 462   INDICES  versicolor   1  4 26    39    INDICES  virginica   1  5 552    In 
53. G   STATS   and ordination  CONOCO   The algorithms in these programs have       6 http   www  insightful  com    7 There are differences in the language definitions of S  R  and S PLUS that are important  to programmers  but rarely to end users  There are also differences in how some  algorithms are implemented  so the numerical results of an identical method may be  somewhat different    8 See the list at http    ww  stata  com links stat_software htm     or can be programmed as an R package  examples are the gstat program  for geostatistical analysis   35   which is now available within R  33   and  the vegan package for ecological statistics     2 3 4 Spreadsheets    Microsoft Excel is useful for data manipulation  It can also calculate some  statistics  means  variances       directly in the spreadsheet  This is also  an add on module  menu item Tools   Data Analysis     for some common  statistical procedures including random number generation  Be aware that  Excel was not designed by statisticians  There are also some commer   cial add on packages for Excel that provide more sophisticated statistical  analyses  Excel s default graphics are easy to produce  and they may be  customized via dialog boxes  but their design has been widely criticized   Least squares fits on scatterplots give no regression diagnostics  so this is  not a serious linear modelling tool     OpenOffice   includes an open source and free spreadsheet  Open Office  Calc  which can replace Excel     2
54. Intercept  student   9 556  0 917    The problem is that the student is considered as a continuous variable  when in fact it is a factor  We do much better if we make the appropriate  conversion      gt  tests student     as factor tests student    gt  str tests    data frame   9 obs  of 2 variables     student  Factor w  3 levels  1   2   3   123123123    score   num 9 6 588 7 5 69 5 8 7   gt  Im score   student  data tests   Coefficients    Intercept  student2 student3  8 83  1 50  1 83    This is a meaningful one way linear model  showing the difference in mean  scores of students 2 and 3 from student 1  the intercept      Factor names can be any string  so to be more descriptive we could have  assigned names with the labels argument to the factor function      gt  tests student     factor tests student  labels c  Harley    Doyle    JD      gt  str tests    data frame   9 obs  of 2 variables     student  Factor w  3 levels  Harley   Doyle      123123123    score   num 9 6 5887 569 5 8 7   gt  table tests   score  student 6 6 5  Harley 0 O0  Doyle O 1  JD 1 0    7     ROON  Obrov  HH H oo  O OH   o  OO Hn  u    An existing factor can be recoded with an additional call to factor and  different labels      gt  tests student     factor tests student  labels c  John    Paul    George      gt  str tests      data frame   9 obs  of 2 variables     35      student  Factor w  3 levels  John   Paul      123123123    score   num 9 6 588 7 569 587   gt  table tests     score   student
55. Introduction to the R Project for Statistical Computing    for use at ITC    D G Rossiter  University of Twente    Faculty of Geo information Science  amp  Earth Observation  ITC     Enschede  NL     http   ww itc nl personal rossiter    August 14  2012          Actual    Frequency    Actual vs  modelled straw yields                        oe  e   a                  E  e       e TS  qe  8  e ee  D  H         ys  T T T T T T  4 5 6 7 8 9  Modelled    Frequency histogram  Meuse lead concentration    53          0 100 200 300 400 500 600 700    lead concentration  mg kg 1  Counts shown above bar  actual values shown with rug plot    Grain yield  lbs per plot    320000 325000 330000 335000 340000  f Y L Y    315000  f                                                                                                                                                                                                                                                                         TITITIITETEITITEITTITILTET  1 3 5 7 9 11 13 15 17 19 21 23 25    Column number    GLS 2nd order trend surface  subsoil clay                  AA SENSN  fist  e           VN  AE NU EUR RI      vy AA  pec er ur e JM  SALA      eer       Contents    O If you are impatient     1  1 Whatis R  1    2 WhyR for ITC   2 1 Advantages    4466404468 814 455 55 bee Xo ode hw wd  2 2  DiSadvantages  cp 4 3  4 acca la ee hh Ade do  2 3  AbltermallVQS a eee fe ee we xod de de  Pale S PEUS PD ota dr is de ccce  2 3 2 Statist
56. ML   block  co kriging or simulation for each of the varieties      kriging  local kriging  or  cokriging  but provide others  e g  package geoR and        cran r project org web packages gstat vignettes gstat pdf       R  Creates gstat Objects      objects that hold all the information necessary for univariate or multivariate  geostatistical predicti imple  ordinary or universal  co  kriging          www stat ucl ac be ISdidactique Rhelp library gstat html gstat htm              Por  introduction to the R Project for Statistical Computing   File Format  PDF Adobe Acrobat   co kriging  34      fitting rational functions to time series  31          Technical Note   Co kriging with the gstat pack         cran r  ntri       Rintro ITC pdf    s   All 39 versions           Related Tools  Introduction to the R Project for Statistical  These include general data analysis   logistic regression  confusion matrices  co   kriging   partioning transects  and fitting  rational functions       www itc nl  rossiter teach R Rintro_ov pdf       1 More results          Figure 19  Results of an RSeek search     Addicted to R  graphics gallery  also with source code    9 6 Keeping up with developments in R    R is a dynamic environment  with a large number of dedicated scientists  working to make it both a rich statistical computing environment and a  modern programming language  Almost every day brings new and modi   fied packages added to CRAN  New versions of the R base appear regularly   abo
57. Petal Length     Residuals     Min 1Q Median 3Q Max   0 565  0 124  0 019 0 133 0 643    44    Coefficients    Estimate Std  Error t value Pr  gt  t     Intercept   0 36308 0 03976  9 13 4 7e 16  iris Petal Length 0 41576 0 00958 43 39  lt  2e 16    Residual standard error  0 206 on 148 degrees of freedom  Multiple R Squared  0 927 Adjusted R squared  0 927  F statistic  1 88e 03 on 1 and 148 DF  p value   lt 2e 16     gt  summary table iris Species    Number of cases in table  150  Number of factors  1    S has functions for testing if an object is in a specific class or mode  and  for converting modes or classes  as long as such a conversion makes sense   These have the form is   test  or as   convert   followed by the class name   For example        is factor iris Petal Width    1  FALSE   gt  is factor iris Species    1  TRUE     as factor iris Petal Width    1  0 2 0 2 0 2 0 2 0 2 0 4 0 3 0 2 0 2 0 1 0 2 0 2 0 1 0 1     145  2 5 2 3 1 92 2 3 1 8  22 Levels  0 1 0 2 0 3 0 40 5 0 6 1 1 1 1 2 1 3 1 4    2 5     as numeric iris Species     1  111111111111111111111111111     149  3 3    The is factor and is numeric class test functions return a logical value   TRUE or FALSE  depending on the class their argument  The as factor  class conversion function determined the unique values of petal length   22  and then coded each observation  this is not too useful here  in prac   tice you would use the cut function  The as  numeric conversion function  extracts the level number of 
58. R  by John Verzani of the  City University of New York  58   also available as a set of web pages  89 5    This shows how to use R for the usual univariate and bivariate statistics  and tests  linear regression  and ANOVA  It has since been updated as a  commercial textbook  59                                       Eis Soca AED  R x BOO  HA  MAA RBS Kt    iz  52                                    9 3 Textbooks using R    e books in the Crawley  10  is a good introduction to many kinds of analysis using R  with  ITC library emphasis on statistical modelling     Other books The introductory text of Dalgaard  11  was mentioned above  Venables  amp    in the Ripley  57  present a wide variety of up to date statistical methods  includ    ITC library ing spatial statistics  with algorithms coded in S  for both R and S PLUS   There are a variety of other texts using S or S PLUS  which are mostly ap   plicable to R  Fox  19  explains how to use R for regression analysis  in   cluding advanced techniques  this is a companion to his text  18   A more  mathematically sophisticated approach  but with a heavy emphasis on R  techniques  is the free text by Faraway  15   which was expanded into two  commercial texts  16  17      UseR  series Springer publishes the UseR  series     with 39 titles      which includes prac   tical texts  with many examples of R code  among the topics are time series   26   data prepration  54   spatial data analysis  2   Bayesian data anlysis   1   Lattice gra
59. RAN R project org doc Rnews  17  77    Sarkar  D  2008  Lattice   multivariate data visualization with R  Use  R  New York  Springer  URL http    Imdvr r forge r project org  106    Shumway  R  H   amp  Stoffer  D  S  2006  Time Series Analysis and Its  Applications  with R examples  Springer Texts in Statistics  Springer   2nd edition   URL http    www stat pitt edu stoffer tsa2 index html 107    Skidmore  A  K  1999  Accuracy assessment of spatial information  In  Stein  A   Meer  F  v  d    amp  Gorte  B  G  F   eds    Spatial statistics for  remote sensing  pp  197 209  Dordrecht  Kluwer Academic 25  92     54      55      56      57      58      59     Spector  P  2008  Data manipulation with R  Use R  New York  Springer  106    Tatem  A  J   Guerra  C  A   Atkinson  P  M    amp  Hay  S  I  2004  Momen   tous sprint at the 2156 Olympics  Women sprinters are closing the gap  on men and may one day overtake them  Nature 431 525 91    Venables  W  N   amp  Ripley  B  D  2000  S Programming  Springer  ISBN  0 387 98966 8 48    Venables  W  N   amp  Ripley  B  D  2002  Modern applied statistics with S   New York  Springer Verlag  4th edition   URL http    www stats ox ac uk pub MASSA  17  56  58  62  106   114    Verzani  J  2002  simpleR   Using R for Introductory Statistics  volume  2003  New York  CUNY  0 4 edition   URL  http   www math csi cuny edu Statistics R simpleR   index html 106    Verzani  J  2004  Using R for Introductory Statistics  Chapman  amp   Hall CRC Pres
60. Region   23 boxplot ph a r geo  dlv r  main  pH  A   xlab  Geomorphic Region   ELEM boxplot dens a r geo  dlv r  main  Bulk Density  A   xlab  Geomorphic Region  Lake Valencia  25 boxplot vfs a r geo  dlv r  main  Very fine sand X  A   xlab  Geomorphic Region   26 bti  em                         13 1 El  Top Level    R Script        Region 2  o 9 Region 3  Console   ds DLY em S _    Region 6    num 0 69 0 25 1 19 0 22 0 27     hd 9 Region 7    num 1 06 1 54 2 34 0 76 0 87             4    num 3 96 4 1 2 93 5 31 2 2     a e  num 10 85 18 99 11 66 44 94 8 65        e e  2 o  num 5 62 3 92 3 53 26 91 4 87     3  0 00  num 22 2 28 8 21 7 78 1 16 9     z 8J HJ 8 e  num 65 3 51 2 45 8 15 4 64 3       Y i    num 12 5 20 32 5 6 5 18 8 5 T   29      num 2 18 3 69 3 2 0 19 2      ecos  num 7 6 7 4 7 8 8 4 7 4     o s 9o      e    num 2 2 2 9 0 7 0 2 4 4    2 e  int 43 50 38 0 19 5032800    my  15 23 1435252118568       205 87 500 16 630 58 61 86 46 38      2e  Es          T T T       640000 650000 660000 670000    UTME  Subsoil bulk density  by geomorphic region       Figure 1  The RStudio screen          18 http    ww rstudio org   19 Windows  Mac OS X  Linux    13    3 4 The Tinn R code editor    For Windows user  the Tinn R code editor for Windows   is recommended  for those who do not choose to use RStudio  This is tightly integrated with  the Windows R GUI  as shown in Figure 2  R code is syntax highlighted and  there is extensive help within the editor to select the proper comman
61. Technical Note  Co kriging with the gstat pack   age of the R environment for statistical computing  Enschede  NL    International Institute for Geo information Science  amp  Farth Observa   tion  ITC   2 1 edition   URL http   www  itc nl personal rossiter teach R R_ck  pdf  107    Rossiter  D  G  2010  Technical Note  An example of data analysis  using the R environment for statistical computing  Enschede  NL   In   ternational Institute for Geo information Science  amp  Earth Observation   ITC   1 2 edition   URL http    www itc nl personal rossiter teach R R    corregr pdf 107    Rossiter  D  G  2012  Technical Note  Literate Data Analysis  Interna   tional Institute for Geo information Science  amp  Earth Observation  ITC    1 3 edition  29 pp    URL http   www itc nl personal rossiter teach R LDA  pdf  101  104    Rossiter  D  G  2012  Tutorial  Using the R Environment for Statistical  Computing  An example with the Mercer  amp  Hall wheat yield dataset   Enschede  NL   International Institute for Geo information Science  amp   Earth Observation  ITC   2 51 edition   URL  http   www itc nl personal rossiter teach R R mhw   pdf 1  107    Rossiter  D  G   amp  Loza  A  2012  Technical Note  Analyzing land  cover change with R  Enschede  NL   International Institute for  Geo information Science  amp  Farth Observation  ITC   2 32 edition  67  pp    URL  http   www itc nl personal rossiter teach R R LCC   pdf 107    Sarkar  D  2002  Lattice  R News 2 2  19 23  URL http    C
62. Var   1 dim trees  2   sep         1   Var 1   Var 2   Var 3      gt  names trees  1       Perimeter     gt  names trees     1   Perimeter   V 2   v3     gt   names trees      saved names     1   Girth   Height   Volume     gt  rm saved names     Note in the paste function how the shorter vector  Var  was re cycled to  match the longer vector 1 dim trees  2   This was just an example of  how to name fields  at the end we restore the original names  which we  had saved in a variable which  since we no longer need it  we remove from  the workspace with the rm function     The data frame can accessed various ways      gt    most common  by field name   gt  trees Height    1  70 65 63 72 81 83 66 75 80 75 79 76 76 69   15  75 74 85 86 71 64 78 80 74 72 77 81 82 80   29  80 80 87   gt    the result is a vector  can select elements of it   gt  trees Height 1 5     1  70 65 63 72 81   gt    but this is also a list element   gt  trees  2      1  70 65 63 72 81 83 66 75 80 75 79 76 76 69   15  75 74 85 86 71 64 78 80 74 72 77 81 82 80    29  80 80 87   gt  trees  2   1 5     1  70 65 63 72 81   gt    as a matrix  first by row       gt  trees 1     Girth Height Volume  1 8 3 70 10 3   gt        then by column   gt  trees  2    gt  trees  2      1  70 65 63 72 81 83 66 75 80 75 79 76 76 69   15  75 74 85 86 71 64 78 80 74 72 77 81 82 80    29  80 80 87   gt    get one element   gt  trees 1 2     1  70       301 ft    28 3168 dm     31     gt  trees 1   Height     1  70     trees 1   Hei
63. a German language GUI with the LANGUAGE flag    After re starting R from the shortcut      gt  memory 1imitO   1  21474835648    If you are doing serious image processing  MS Windows is unlikely to be  a satisfactory platform  Fortunately  almost all your R code will run un   changed in R running under an OS that does not limit memory  such as  some variety of Unix     10 5 What version of R am I running     The R version system variable contains this information  if you want to  report a possible program error it is always necessary to specify this      gt  R version    platform x86  64 apple darwin9 8 0  arch x86  64   os darwin9 8 0   system x86_64  darwin9 8 0  status   major 2   minor 15 0   year 2012   month 03   day 30   svn rev 58871   language R    version string R version 2 15 0  2012 03 30     To determine the version of an installed package  whether loaded or not    use the help method with the optional package  argument      gt  help package  sp      Information on package  sp     116    Description     Package  sp   Version  0 9 98   Date  2012 03 26   Title  classes and methods for spatial data   Author  Edzer Pebesma  lt edzer pebesma uni muenster de gt      Roger Bivand  lt Roger Bivand nhh no gt     Depends  R   gt   2 10 0   methods  graphics   Suggests  lattice  RColorBrewer  rgdal  rgeos   gt   0 1 8    Imports  utils  lattice  grid   Index    CRS class Class  CRS  of coordinate reference system  arguments   DMS class Class  DMS  for degree  minute  decim
64. a data frame to a CSV file  This format can be read into Excel   ILWIS  and many other programs     A common reason for exporting is that you have computed some result in  R which you want to use in another program  For example  suppose you   ve  computed a kriging interpolation with the krige function of the gstat  package       load sp library to support spatial classes     load gstat for variograms and kriging   library sp   library gstat      load sample data and interpolation grid   data meuse   data meuse grid      convert these to spatial objects   coordinates meuse   lt     x   y  coordinates meuse grid   lt     x y    krige using a known variogram model   kxy     krige log lead  x y  locations meuse     newdata meuse grid  maxdist 800   model vgm 0 34   Sph   1140  0 08       using universal kriging    gt  str as data frame kxy       data  frame   3103 obs  of 4 variables       x    y      num 181180 181140 181180 181220 181100        num 333740 333700 333700 333700 333660          varl pred  num 5 72 5 68 5 61 5 54 5 67        varl var   num 0 293 0 225 0 239 0 257 0 173        Note the use of the coordinates function to promote the meuse    and meuse grid data frames to objects of class SpatialPointsDataFrame    as required by the krige function  and the as data frame  function to extract the data frame from the kriged object as  required by the write table function     To export this data frame we use the write table function  specifying  that a comma separate the
65. a frame  with the apply function      gt  apply trees  2  median   Girth Height Volume  12 9 76 0 24 2    The margin is specified in the second argument  1 for rows  2 for columns   fields in the case of data frames      49    Surprisingly  base R has no functions for skewness and kurtosis  these are  provided by the e1071 package written by TU Wien        gt  require e1071     gt  skewness trees Volume    1  1 0133    gt  kurtosis trees Volume    1  0 24604    The kurtosis here is  excess  kurtosis  i e   subtracting 3 from the kurtosis     4 15 Classification tables    For data items that are classified by one or more factors  the table func   tion counts the number of observations at each factor level or combination  of levels  We illustrate this with the meuse dataset included with the sp  package  This dataset includes four factors      gt  require sp   data meuse   str meuse    data frame   155 obs  of 14 variables       ffreq   Factor w  3 levels  1   2   3   1    111111111       soil   Factor w  3 levels  1   2   3   1112222112    lime   Factor w  2 levels  0   1   2221111111       landuse  Factor w  15 levels  Aa   Ab   Ag      44 4 11 4 11       attach meuse    gt  table ffreq     ffreq  1 2 3  84 48 23   gt  round 100  table ffreq  length ffreq   1   ffreq  1 2 3    54 2 31 0 14 8   gt  table ffreq  landuse   landuse    ffreq Aa Ab Ag Ah Am B Bw DEN Fh Fw Ga SPO STA Tv W  108 519 6 3 3 015 3 0 0030  210 01411 0 1 1 04 0 1 2 112  3 10065 02 00100 0 0 8    The last e
66. ages if you invent some new  analysis     All source code is published  so you can see the exact algorithms be   ing used  also  expert statisticians can make sure the code is correct     It can exchange data in MS Excel  text  fixed and delineated formats   e g  CSV   so that existing datasets are easily imported  86   and re   sults computed in R are easily exported  87      Most programs written for the commercial S PLUS program will run  unchanged  or with minor changes  in R  82 3 1      R has its disadvantages  although  every disadvantage has its advantage       1     The default Windows and Mac OS X graphical user interface  GUI      3 1  is limited to simple system interaction and does not include  statistical procedures  The user must type commands to enter data   do analyses  and plot graphs  This has the advantage that the user  has complete control over the system  The Rcmdr add on package     3 6  provides a reasonable GUI for common tasks  and there are  various development environments for R  such as RStudio    3 3        The user must decide on the analysis sequence and execute it step     by step  However  it is easy to create scripts with all the steps in  an analysis  and run the script from the command line or menus     3 5   scripts can be preared in code editors built into GUI versions  of R or separate front ends such as Tinn R    3 5  or RStudio    3 3   A  major advantage of this approach is that intermediate results can be  reviewed  and scripts can
67. al Length                Petal Length                   Levelplot  Maunga Whau Volcano  Auckland Contourplot  Maunga Whau Volcano  Auckland    200    60 200 60    50 180 50    40    40    30 30    column    20 20       TOW  TOW    contour interval 10 m       Figure 13  Trellis trivariate plots       80       Note that the volcano data set is just a matrix of elevations      gt  str volcano     num  1 87  1 61  100 101 102 103 104 105 105 106 107 108        The levelplot method converts this into one response variable  the z  values  and two predictors  i e  the row and column of the matrix   the x  and y values      This example shows that high level lattice methods do not themselves  draw a graph  they return an object of class trellis which can be printed  with the print method  R s default behaviour when working interactively   at the console  is to print the results of any expression except an assign   ment  so the casual user doesn t see this behaviour  It is however quite  useful to place multiple graphs on the same page as illustrated here and  explained in more detail in 85 4     5 2 4 Panel functions     gt                                                  A Trellis plot must be constructed in one go  unlike in the base graphics  package  where elements can be added later  Each additional element be   yond the default is specified by a so called panel function  For example   suppose we want to add a least squares regression line as well as regres   sion lines for each
68. al second  values   zerodist find point pairs with equal spatial coordinates    To find all packages loaded at a given time  use the sessionInfo com   mand      gt  sessionInfo    R version 2 15 0  2012 03 30   Platform  x86 64 apple darwin9 8 0 x86  64  64 bit     locale     1  C   attached base packages     1  stats graphics grDevices utils datasets methods   7  base    other attached packages    1  gstat 1 0 10 spacetime 0 6 2 xts 0 8 6   4  zoo  1 7 7 sp  0 9 98    loaded via a namespace  and not attached     1  grid  2 15 0 lattice 0 20 6 tcltk 2 15 0 tools  2 15 0    10 6 What statistical procedure should I use     This of course depends on your application field  your research questions   and your dataset  You should always refer to textbooks and research pa   pers  R is a computer program to carry out your computations  not a  statistical wizard to design them     Within R  each help page gives references to textbooks or articles which  you should consult if you are unsure about the theory behind the method    or its options  For example      gt  help  glm      gim package stats R Documentation  Fitting Generalized Linear Models    Description      glm  is used to fit generalized linear models  specified by  giving a symbolic description of the linear predictor and a  description of the error distribution     References   Dobson  A  J   1990  An Introduction to Generalized Linear  Models  London  Chapman and Hall     Hastie  T  J  and Pregibon  D   1992  Generalized l
69. amp  yan    o ae   332000   a   e e L  e e  o   gt  g   g  uda  ee  e       e e e  00o  331000 4  amp  E  e  soo    o    e 2590     e  e       es  e sa e  330000   e S     o e  e ee e          T T T T T T  178500 179000 179500 180000 180500 181000 181500  X    Figure 16  Example of a colour ramp  The key methods here are topo colors to return a range of colours from    dark blue through light pink  length to find the number of points and  from this set the length of the colour ramp  and rank to return the rank of    89    each sample  based on its cadmium content  the lowest pollution is rank 1   and so on to the most polluted with rank equal to the number of samples   Then the correct colour for each sample is extracted from the vector with  the      subscript  operator     90    6 Preparing your own data for R    R comes with many interesting example datasets  83 8   but of course you  are most interested in your own data  There are many ways to prepare this  for R  a comprehensive guide is given in the online R Data Import Export  manual        also provided as a PDF file  37      6 1 Preparing data directly in R    A small dataset may be entered directly in R in several ways  Here we  illustrate this with the winning times from the 100 m footrace in the mod   ern Olympic games  55   which we will assemble into a data frame  with  the cases being each year and the fields being the year number  the men s  winning time  and the women s winning time     One method is to assig
70. and 3     95    VM  qa  33    onw  m  N    cl   1     gt  at   dim    V    we inform R that the row names are in the first column  and convert the  default class  data frame  to a matrix with the as matrix function  The  attributes function shows the correct data type and attributes  here the  dimension names for the rows and columns         as matrix read csv  cm csv   row names 1      ass cm    matrix   tributes cm      1  4 4     dim   dim   1      dim   1     names  names   1    WA  Bg   Gn  Dp     names    21   PAM  Bg     en  Dp     6 4 Importing images     gt  fi  NCOL  NROW    Some data  such as satellite imagery  is naturally organized as a matrix  of data values on a regular grid  Many export formats write a header of  several lines  giving information on the file contents  followed by the data  values as one long vector or else with each row of the imagery as a row in  the text file  An example is the ESRI asciigrid format     Here is a small example of a 12 by 12 image  stored as ASCII text file  image grd  We display it within R with the file show function     le show  image grd    S 12  S 12    XLLCORNER 180000  YLLCORNER 331000  CELLSIZE 40   NODATA  VALUE  9999  141 148 159 166 176  143 153 160 170 177  147 154 164 172 182  151 159 166 176 183  153 163 170 177 188  153 164 175 182 189  153 164 176 186 193  157 164 176 188 198  159 169 176 188 199  159 170 180 188 199  163 170 182 192 199  166 175 182 193 204    186  188  189  193  195  199  201  205  209  211 
71. and r for random samples     2  The R name of the distribution  as listed above  e g  norm for the  normal distribution     So the functions for the normal distribution are    p pnormfor the CDF    d  dnorm for the density    q qnorm for the quantiles  inverse probability     r rnormto draw random numbers   For example  to find the proportion of people in a normally distributed  population with mean height 170 cm and standard deviation 15 cm shorter    than 200 cm use pnorm      gt  pnorm 200  170  15    1  0 9772499    To plot the bell shaped normal curve use dnorm      gt  q  lt   seq  3  3  by  05    gt  plot q  dnorm q   type  1   xlab  z   ylab  Prob z       To find which normal score corresponds to a list of critical Type I error  probabilities use qnorm      gt  alpha     c 0 1  0 05  0 01  0 001    gt  qnorm 1 alpha 2    1  1 644854 1 959964 2 575829 3 290527    Finally  to simulate sampling ten individuals from a normally distributed  population with mean height 170 cm and standard deviation 15 cm  with  a measurement precision of 1 cm  use rnorm draw the sample and then  round the results to the nearest integer      gt  sort round rnorm 10  170  15      1  147 159 166 169 169 174 176 180 183 185    Each time this command is issued it will give different results  because the  sample is random     42     gt  sort round rnorm 10  170  15      1  155 167 170 177 181 182 185 186 188 199    To start a simulation at the same point  e g  for testing  use the set seed  funct
72. aphics device to use for the next  graphics output  For example  to compare scatterplots of tree volume vs   two possible predictors  height and girth  in adjacent windows      gt  dev set 2    gt  plot Height  Volume    gt  dev set 3    gt  plot Girth  Volume     5 4 Multiple graphs in the same window    This depends on the graphics system  base or trellis  85 2   as implemented  in the lattice R package  You can determine which system is used by a  given graphics command at the top of its help page  For example         boxplot    shows the page title as boxplot  graphics  indicating that it s in the  base graphics package  whereas     gt   xyplot    shows the page titleas xyplot  lattice  indicating that it s a trellis plot     5 4 1 Base graphics    The parameters of the active graphics device are set with the par method   One of these parameters is the number of rows and columns of indivual  plots in the device  By default this is  1  1   i e  one plot per device  You  can set up a matrix of any size with the par  mfrow       or par mfcol        commands  The difference is the order in which figures will be  drawn  either row  or column wise     For example  to set up a two by two matrix of four histograms  and fill  them from left to right  top to bottom     85    V VN NN VY      the  par  method refers to the active device  par mfrow c 2  2     hist rnorm 100    hist rbinom 100  20   5    hist rpois 100  1    hist runif 100    par mfrow c 1 1       next plot will fill t
73. arameters in the model   1    m   1   n   p     1  R        The AIC  Akaike   s An Information Criterion  function is often used to com   pare models  the lower AIC is better      gt  AIC Im Volume   Height   Girth     1  155 47    gt  AIC Im Volume   Height   Girth     1  176 91    gt  AIC Im Volume   Girth      1  181 64    gt  AIC Im Volume   1      1  264 53    In this example the successively more complicated models have lower AIC   i e  provide more information     4 18 1 Model diagnostics    V VN N    The model summary  84 18  shows the overall fit of the model and a sum   mary of the residuals  but gives little insight into difficulties with the  model  e g  non linearity  heteroscedascity of the residuals   A graphical  presentation of some common diagnostic plots is given by the plot func   tion applied to the model object  note that the generic plot function in  this case automatically calls the specific plot  Tm function      model  lt   Im Volume   Height   Girth  data trees    par mfrow c 2 2     set up a 2x2 matrix for the 4 diagnostic graphs  plot model    par mfrow c 1 1      reset the graphics device to show just 1 graph    This shows four plots  Figure 4    1  residuals vs  fitted   2  normal 00 plot  of the residuals   3  scale vs  location of residuals   4  residuals vs  leverage   The intepretation of these is explained in many regression textbooks  The  par function sets up a matrix of graphs in one figure  here there are four  diagnostic graphs produce
74. are 256    16 777 216 possible  colours     Grey scales use the slightly different gray method  its argument is a vector  of values between 0   1 giving the gray level      gt  gray seq 0  1  by  125      1   4000000   4202020    404040    606060   4808080   49F9F9F    7    BFBFBF    DFDFDF    FFFFFF     gt  gray 0 8   8     1   4000000    202020    404040    606060   4808080   49F9F9F    7       BFBFBF    DFDFDF   4ZFFFFFF     gt  gray c 0   2   3  1      1   4000000    333333    4D4D4D     FFFFFF     gt  col2rgb gray 0 4        1   red 102  green 102  blue 102       36 Note the American spelling of    colour        88    V VN N    Custom colour ramps can be produced with the hsv and rainbow meth   ods  see their help for details     All of these ramps can be indexed to get individual colours  They are  most useful  however  when these colours are linked to data values  For  example  to plot soil sample points in the meuse soil pollution data set   with the points coloured in order of their rank  from least to most polluted  by cadmium      library sp    data meuse    attach  meuse    xyplot y   x   data meuse  asp  iso   pch 20 cex 2   col topo colors length cadmium    rank  cadmium        This plot is shown in Figure 16  Note the use of the asp argument to  xyplot to specify the aspect  ratio of axes   in this case proportional to  the two scales     isometric        Soil samples  Meuse  colour ramp by Cd value       Cee  a 0       e  e  333000   Pid Xn    e  q    i e d   
75. ave the ability to modify or construct other subrou   tines and evaluate the result as an integral part of the language  itself    38  Ch  6     This may seem quite exotic  but it has some practical applications even  for the non programmer R user  in addition to the deeper applications  explained in the Definition     For example  consider the problem of summarising a set of variables that  are named B1  B2     B256       To avoid writing m 1      mean B1   m 2       mean B2  etc  we d like to loop through the numbers and form the  variable name  with the B prefix and a number  and perform the operation   We do this in three steps     1  Build up a syntactically correct string to be evaluated  using the paste  function     2  Parse this into an R language object with the parse function        35 Perhaps reflectances from a hyperspectral sensor    67    3  Evaluate it with the eval function      gt    demonstrate how the string is built up    gt  paste  m    b      lt   mean B   b       sep       Error in paste  m    b      lt   mean B   b       sep        object  b  not found    gt    must define a value for b to see how this works   gt b  lt   4     gt  paste  m    b      lt   mean B   b       sep        1   m 4   lt   mean B4      gt    what does this look like as a parse language object    gt  parse text paste  m    b      lt   mean B   b       sep       expression m 4   lt   mean B4       gt    initialise the results vector   gt  m     NULL   gt    evaluate this one expr
76. away  J  J  2006  Extending the linear model with R   generalized lin   ear  mixed effects and nonparametric regression models  Boca Raton   Chapman  amp  Hall CRC 106  107     18  Fox  J  1997  Applied regression  linear models  and related methods   Newbury Park  Sage 56  58  59  106     19  Fox  J  2002  An R and S PLUS Companion to Applied Regression  New   bury Park  Sage 106  114     20  Ihaka  R   amp  Gentleman  R  1996  R  A language for data analysis and  graphics  Journal of Computational and Graphical Statistics 5 3  299   314 1     21  Knuth  D  E  1992  Literate programming  CSLI lecture notes 27  Stan   ford  CA  Center for the Study of Language and Information 101     22  Kopka  H      Daly  P  W  2004  Guide to ATEX  Boston  Addison Wesley   Ath edition 101     23  Lamport  L  1994  BTrEX  a document preparation system   user s guide  and reference manual  Reading  MA  Addison Wesley  2nd edition 101     24  Leisch  F  2002  Sweave  part I  Mixing R and   TpX  R News 2 3  28 31  URL http    CRAN R project org doc Rnews  101  104    25  Leisch  F  2006  Sweave User s Manual Vienna  A   TU Wein  2 7 1  edition  URL http   www stat uni muenchen de  leisch Sweave  101   104     26  Metcalfe  A  V   amp  Cowpertwait  P  S  2009  Introductory Time Series  with R  Use R  Springer  DOI  10 1007 978 0 387 88698 5 106     27  Mevik  B  H  2006  The pls package  R News 6 3  12 17 62     28  Murrell  P  2005  R Graphics  Boca Raton  FL  Chapman  amp  Hall CRC   ISBN 1 58
77. be placed in the same window for display or printing   see 85 4  and several graphics windows can be opened at the same time   see  5 3     To get a quick appreciation of R graphics  run the demostration programs        demo graphics    gt  demo image      demo lattice     5 1 Base graphics    A technical introduction to base graphics is given in Chapter 12 of  36    Here we give an example of building up a sophisticated plot step by step   starting with the defaults and customizing     The example is a scatter plot of petal length vs  width from the iris data  set  A default scatterplot of two variables is produced by the plot default  method  which is automatically used by the generic plot command if two    equal length vectors are given as arguments      gt  data iris    gt  str iris      data frame   150 obs  of 5 variables      Sepal Length  num 5 1 4 9 4 7 4 6 5 5 4 4 6    Sepal Width   num 3 5 3 3 2 3 1 3 6 3 9 3 4    Petal Length  num 1 4 1 4 1 3 1 5 1 4 1 7 1     Petal Width   num 0 2 0 2 0 2 0 2 0 2 0 4 O     Species   Factor w  3 levels  setosa   ve       attach iris    gt  plot Petal Length  Petal Width     page   Sed Page eee a  Set eee    In this form  the x  and y axes are given by the first and second arguments   respectively  The same plot can be produced using a formula   4 17 show   ing the dependence  in which case the y axis is given as the dependent    variable on the left hand side of the formula      gt  plot Petal Width   Petal Length     69          
78. can apply to a class  review the help  using  the class   class name gt  syntax        class SpatialPointsDataFrame     class SpatialPoints   gt  class Spatial    As this example shows  classes may be organised in an inheritance struc   ture  so that the behaviour of a more generalised class is automatically  inherited by a more specialised class  these are said to extend the base  class     For example class SpatialPointsDataFrame extends class SpatialPoints   which in turn extends the base class Spatial  These have appropriate  slots and methods     Spatial Only has abounding box  slot bbox and projection  slot proj4string    no spatial objects as such     SpatialPoints Also has co  rdinates  slot coords  of each point  at this level of hier   archy there are also class SpatialLines and class SpatialPolygons     SpatialPointsDataFrame Also has attributes  slot data  for each point     SpatialGridDataFrame Points are on a regular grid  inherited from class GridTopology  as well as class SpatialPointsDataFrame     The real power of this approach is seen when generic methods are applied  to objects of the various classes  each class then specialises the generic  method appropriately  For example  the spplot plotting method gives a  different kind of plot for each class that inherits from class Spatial  we  can see this with the showMethods method      gt  showMethods spplot     Function  spplot      obj    SpatialPixelsDataFrame   obj    SpatialGridDataFrame    obj    SpatialP
79. closer    1   Point 6 is not closer    1   Point 7 is not closer   8     1   Point is not closer     1   Point 9 is not closer     1   Point 10 is not closer     gt  print paste  Closest point  x    round sample pts closest pt  x   4        y    round sample pts closest pt  y   4       1   Closest point  x  0 2061   y  0 8003    plot sample pts  xlim c 0 1   ylim c 0 1   pch 20   col ifelse row names sample pts     closest pt   main  Finding the closest point     text sample pts  row names sample pts   pos 3    points new pt  pch 20  col  blue   cex 1 2    text new pt   target point   pos 3  col  blue     arrows new pt x  new pt y  sample pts closest pt  x     sample pts closest pt   y    length 0 05     red    green        VVVV   V    4 22 User defined functions    An R user can use the function function to define functions with argu   ments  including optional arguments with default values  See the example  in   C and a good introduction in Chapter 10 of  36   These then are objects  in the workspace which can be called     For example  here   s a function to compute the harmonic mean of a vector     65    Finding the closest point       1 0    target point 8  3    0 6     eu    0 4  eo    0 2             Figure 5  Finding the closest point    this is defined as    On     ll vi   i  l  n    where 7 is the length of the vector v  but is more reliably computed by  taking logarithms  dividing by the length  and exponentiating      gt  hm  lt   function v  exp sum log v   lengt
80. ctor     4 4 Vectorized operations and re cycling  A very powerful feature of S is that most operations work on vectors or    matrices with the same syntax as they work on scalars  so there is rarely  any need for explicit looping commands  which are provided  xe g  for      22    These are called vectorized operations  As an example of vectorized oper   ations  consider simulating a noisy random process      gt   sample     seq 1  10    rnorm 10     1   0 1878978 1 6700122 2 2756831 4 1454326   5  5 8902614 7 1992164 9 1854318 7 5154372   9  8 7372579 8 7256403    This adds a random noise  using the rnorm function  with mean 0 and  standard deviation 1  the default  to each of the 10 numbers 1  10  Note  that both vectors have the same length  10   so they are added element   wise  the first to the first  the second to the second and so forth    If one vector is shorter than the other  its elements are re cycled as needed      gt   samp     seq 1  10    rnorm 5     1   1 23919739 0 03765046 2 24047546 4 89287818   5  4 59977712 3 76080261 5 03765046 7 24047546   9  9 89287818 9 59977712    This perturbs the first five numbers in the sequence the same as the sec   ond five     A simple example of re cycling is the computation of sample variance di   rectly from the definition  rather than with the var function      gt   sample     seq 1 8      1  12345678    gt   sample   mean sample      1   3 5  2 5  1 5  0 5 0 5 1 5 2 5 3 5    gt   sample   mean sample   2    1  12 25 6 25
81. d  In the following example we change the   superposition  symbol which is used in the xyplot method to show groups    of points  as in the example just above     The current settings are shown graphically by the show  settings method      gt  show settingsQ   gt  options     trellis par get     gt  names options      1     4     7    10    13    16    19    22    25    28    31      grid pars    el 1 p    plot polygon    box umbrella    plot line      strip background    superpose line      regions    axis text    layout widths    par ylab text      fontsize     add  line    box dot     dot  line    plot symbol    strip shingle    superpose symbol    shade colors    axis components    box 3d    par zlab text      background    add text    box rectangle    dot symbol    reference  line    strip border    superpose polygon    axis line    layout heights    par xlab text    par main text     83     34   par sub text     gt  options superpose symbo l    alpha    1  1111111    cex    1  0 8 0 8 0 8 0 8 0 8 0 8 0 8    col    1   40080ff     ffOOff   darkgreen    0000   orange    6   400ff00   brown     font    1  1111111    pch    1  1111111    fill    1   transparent transparent   5   transparent transparent     options superpose symbol pch   1  1111111      options superpose symbol pch     20   options superpose symbol col  lt   c  blue    green    red   magenta     cyan    black   grey     options superpose symbol cex     1 4  trellis par set  superpose symbol   options superpo
82. d by the default plot  1m function  so a 2x2 grid  is set up  See 85 4 for more information     Many other diagnostics are available  There are several functions to di   rectly access these from the model itself  dfbetas  dffits  covratio   cooks distance and hatvalues  methdods rstandard and rstudent  return the standardized and Studentized residuals  respectively  These  diagnostic measures are explained in regression texts  e g  14  18   many  of these were developed or expanded by Cook  amp  Weisberg  8      For example  to plot the  hat  values  leverage  for each point  with the  number of the most influential observations      gt  h  lt   hatvalues  model    gt  plot h  type  h      59    Residuals vs Fitted Normal Q Q                                                          o y  a 4  280 280  t  o 5  o o oo  o o r 4 o  a 0 o an o9   o o S  o g o   3 o ee  O 2 o o o 9  56 90000  o N  Bow  il g o    o o 3  o S T o   RE   T o  7  o  1  18 oe       9 4 Y 7  o  18  018  o  i T T T T T T T T T T T T  10 20 30 40 50 60 70 80  2  1 0 1 2  Fitted values Theoretical Quantiles  Scale Location Residuals vs Leverage  018 xz  e  a 4  gt  x       028  gt   T      1  280 a    TNR  gt  gt  Jos  016  gt  eel o di  E 9  o o   S o  T o o o 2  3 S o  3 9  3 o o o  S 7   of o  3 E 5 o  N N  El v id  5 o  E   14 96     c 0 5  S w E  9  7  o cg e   E is  ae 241  eI os oe    P         Cook sglistance         o       18 ie  S      T T T T T T T T T T T T T  10 20 30 40 50 60 70 80 0 0 0 1 0 2 
83. d is written on several  lines to show the matching braces and parentheses     applyCcm  2  function x   i     gt          sapply x  function i    4    i mean x   sd x         end function        end sapply         end function         end apply    This particular result could have been better achieved with the scale    29    Other  matrix  functions     scale a matrix  function  which in addition to scaling a matrix column   wise  with or without centring  with or without scaling  returns attributes  showing the central values  means  and scaling values  standard devia   tions  used      gt  scale cm       1    2  L3    4   1 381930  0 10742  0 24678  0 689     2    0 087464 1 39642  0 44420  0 053   3    0 297377  0 32225 1 46421 1 431   4    0 997089  0 96676  0 77323  0 689  attr   scaled center      1  15 25 4 50 15 75 5 25  attr   scaled scale      1  14 2916 4 6547 15 1959 4 7170    There are also functions to compute the determinant  det   eigenvalues  and eigenvectors  eigen   the singular value decomposition  svd   the OR  decomposition  qr   and the Choleski factorization  chol   these use long   standing numerical codes from LINPACK  LAPACK  and EISPACK     4 7 Data frames    The fundamental S data structure for statistical modelling is the data  frame  This can be thought of as a matrix where the rows are cases   called observations by S  whether or not they were field observations   and  the columns are the variables  In standard database terminology  these  are
84. d open computing en   vironment  applicable to the most complex and sophsticated problems  as  well as  routine  analysis  without any restrictions on access or use  Here  is a description from the R Project home page       R is an integrated suite of software facilities for data manip   ulation  calculation and graphical display  It includes       an effective data handling and storage facility       a suite of operators for calculations on arrays  in partic   ular matrices       alarge  coherent  integrated collection of intermediate tools  for data analysis       graphical facilities for data analysis and display either on   screen or on hardcopy  and      a well developed  simple and effective programming lan   guage which includes conditionals  loops  user defined re   cursive functions and input and output facilities      The last point has resulted in another major feature       Practising statisticians have implemented hundreds of spe        l http   www itc nl personal rossiter pubs list html  pubs  m R  item 2    http   www r project org     cialised statistical produres for a wide variety of appli   cations as contributed packages  which are also freely   available and which integrate directly into R     A few examples especially relevant to ITC s mission are     the gstat  geoR and spatial packages for geostatistical analysis   contributed by Pebesma  33   Ribeiro  Jr   amp  Diggle  39  and Ripley   40   respectively       the spatstat package for spatial point pa
85. ded  it starts up in another window  with its own menu system   You can run commands from these menus  but you can also continue to  type commands at the R prompt  Figure 3 shows an R Commander screen  shot             L R Commander Qu                           File Edit Data Statistics Graphs Models Distributions Help  E Data set    trees Ed ew edi     No active model    Submit   Mata jesess  pastel Stem and leaf display     attach  trees  Boxplot     names  trees  Quantile comparison plot     summary  trees  Scatterplot     Scatterplot matrix     3D scatterplot     Line graph     Plot of means     Bar graph     Pie chart       Save graph to file     gt  data trees  package  datasets       gt  attach  trees         gt  names  trees    1   Girth   Height   Volume                 gt  summary trees     Girth Height Volume  Min    B 3 Min   63 Min  10 2  ist Qu   11 1 ist Qu  72 ist Qu  19 4  Median  12 9 Median  76 Median  24 2  Mean De ters Mean 776 Mean 7302  JO ML yb eget ie ee      3rd Qu  80 JE QU  Max    20 6 Max   87 Max  277 0          Figure 3  The R Commander screen  Menu bar at the top  a top panel showing commands  submitted to R by the menu commands  a bottom panel showing the results after execution  byR       16    To use Rcmdr  you first import or activate a dataset using one of the  commands on Rcmdr s Data menu  then you can use procedures in the  Statistics  Graphs  and Models menus  You can also create and graph  probability distributions with the Distributi
86. dev  19 98  Pr t   0 86  3    Y  2  o  2  Ww  N  o  I T T T T T 1  120 140 160 180 200 220 240  Sample 3          Frequency    Frequency          mu   180  sigma   20         mean  184 31  sdev  17 63  Pr t   0 19          f T T T T T l  120 140 160 180 200 220 240    Sample 2    mu   180  sigma   20    mean  181 4  sdev  19 47  Pr t   0 7          r T T T T T l  120 140 160 180 200 220 240    Sample 4          Figure 21  A visualisation of the variability of small random samples  Each sample of 30  has been divided into ten histogram bins on  130   230   The blue  solid  normal curves  all have u   180 and      20  the red  dashed  normal curves are estimated from each  sample  Note the bias  left or right of the blue curves  and variances  narrower or wider    than the blue curves         125    C Anexample function    A more powerful approach than writing and sourcing a script is to write a  function which is loaded into the workspace with the source function and  then run as if it were a built in R function  The main advantage is that you  can make the function adaptable with a set of arguments  parameters that  can be sent to the function   so you don t have to edit the script     Here we have converted the script of Appendix B into a function  with  only one required argument  the sample size  and six optional arguments  which control aspects of the display  for example the number of samples  to compare on one plot     You prepare this with a plain text editor in the sa
87. discrete multivariate analysis sta   tistical techniques  Photogrammetric Engineering  amp  Remote Sensing  49 12  1671 1678 25  92     7  Cook  D   Swayne  D  F    amp  Buja  A  2007  Interactive and Dynamic  Graphics for Data Analysis   with R and GGobi  Use R  New York   Springer Verlag 106     8  Cook  R   amp  Weisberg  S  1982  Residuals and influence in regression   New York  Chapman and Hall 59     9  Correa  J  C   amp  Gonz  lez  N  2002  Gr  ficos Estadisticos con R  Medel   l  n  Colombia  Universidad Nacional de Colombia  Sede Medell  n  Pos   grado en Estad  stica 106     10  Crawley  M  J  2007  The R book  Chichester  Wiley  amp  Sons  URL http    www dawsonera com depp reader protected   external AbstractView S9780470515068 106     11  Dalgaard  P  2002  Introductory Statistics with R  Springer Verlag 106   114     12  Davis  J  C  2002  Statistics and data analysis in geology  New York   John Wiley  amp  Sons  3rd edition 114     13  Diggle  P  J   amp  Ribeiro Jr   P  J  2007  Model based geostatistics  Springer  106  114     14  Draper  N   amp  Smith  H  1981  Applied regression analysis  New York   John Wiley  2nd edition 56  58  59     15  Faraway  J  J  2002  Practical Regression and Anova using R  Depart   ment of Mathematical Sciences  University of Bath  UK   self published   web    URL http    www maths bath ac uk  jjf23 book  106  114    128     16  Faraway  J  J  2005  Linear models with R  Boca Raton  Chapman  amp   Hall CRC 106  107     17  Far
88. dow to foreground  and then select the  menu command History   Recording  After this all graphs are auto   matically saved within R  and you can move through them with the up and  down arrow keys     You can also write your graphics commands directly to a graphics file in  many formats  e g  PDF or JPEG  You do this by opening a graphics device   writing the commands  and then closing the device  You can get a list of  graphics devices  formats  available on your system with  Devices  note  the upper case D      For example  to write a PDF file  we open a PDF graphics device with the  pdf function  write to it  and then close it with the dev  of f function     pdf  figurel pdf   h 6  w 6   hist rnorm 100   main  100 random values from N 0 1      dev of FQ     Note the use of the optional height  and width  arguments  here abbre   viated h  and w   to specifiy the size of the PDF file  in US inches   this  affects the font sizes  The defaults are both 7 inches  17 18 cm      3 2 Working with the R command line    These instructions apply to the simple R GUI and the R command line  interface window within RStudio  One of the windows in these interfaces  is the command line  also called the R console     It is possible to work directly with the command line and no GUI       Under Linux and Mac OS X  at the shell prompt just type R  there are  various startup options which you can see with R  help       Under Windows    3 2 1 The command prompt    You perform most actions in R by
89. ds  and arguments  Commands can be sent directly from the editor to R  or  saved in a file and sourced              E     A Tinn R    0  CourseData E lear ning DEGeostatistics CD exercises Reode ex11 R   Jol     File Project Edit Format Search Options Tools R View Window Web Help    A  x  Qu 8 jmie9 GQiatsDis gjs ncuisps ssmgdB  R complex X CY     aleule 1      3     l SlP  eun 444s  Gl  zloeo R          ex41 R    ext2 R      x11 R     median  meuse  lead     mean  meuse  lead   range  meuse  lead    90   Ig    A  var  meuse  lead  Operators  arithmetic  lal  sd  meuse  lead  Operators  logical   quantile  meuse  lead  PONSA    quantile  meusejlead  seq 0 1  1         COM      Computer   Project   P card   A explorer        gt  AIC f    l    approx x  y        PERE ERE REAP PERE PERE PERE EAE PEPE PERE PE PE RRR PEE PEPE PP R RRHH ao  levianceffit   vl        chunk number 24           JHRHHHNHRBRHHHBHHHRNHBRHNHRHHHHNHHHHHRHHHRHHHRHHRNHHEE Comedies the Aka information erberorrocAIE   s    EAI L    Lin 157 453  Col 1 Normal mode smNormal Size  11 43 KB Tinn R hotkeys inactive    RR Consol Ju E     l                                            1  37   gt  max  meuse  lead   i  654   gt  median  meuse  lead    1  123   gt  mean  meuse  lead   1  153 36   gt  range  meuse  lead   1 37 654   gt  var  meuse  lead   1  12392   gt  sd meuse  lead   1  111 32   gt  quantile  meuse  lead   os 25  50  75  100   37 0 72 5 123 0 207 0 654 0   gt  quantile  meuse lead  seq 0 1  1    0t 10  2
90. e  the formula weight  group told the  unstack function that group was the column with the new column names     This process also works in reverse  when we have a frame with several  variables to make into one  we use the stack function      gt  pg stacked  lt   stack pg   str pg stacked     data frame   30 obs  of 2 variables      values  num 4 17 5 58 5 18 6 11 4 5 4 61 5 17 4 53     ind   Factor w  3 levels  ctr1   trt1      111111   gt  names pg stacked   lt   c  weight    group    str pg stacked      data frame   30 obs  of 2 variables      weight  num 4 17 5 58 5 18 6 11 4 5 4 61 5 17 4 53     group   Factor w  3 levels  ctr1   trt1      111111    The stacked frame has two columns with default names value  for the  variables which were combined  and ind  for their names   these can be  changed with the names function     A more general function for data re shaping is reshape     4 11 Random numbers and simulation    R includes functions to evaluate a cumulative distribution function  CDF    the probability density function  PDF  and the quantiles  and to draw ran   dom samples  from a large number of distributions including the uniform   R name uni f   normal  R name norm   Student s t  R name t   binomial  R    41    name binom   Poisson  R name pois   and many others  see Chapter 8 of   36  for a complete list     The names of the functions are built from two parts     l  A prefix indicating the type of function  p for the CDF  d for the  density  q for the quantile  
91. e  to compute the row and column  to matrix sums of the confusion matrix  use apply with the sum function as the  margins function to be applied      gt   rsum  lt   apply cm  1  sum     1  61 18 63 21   gt   csum  lt   apply cm  2  sum     1  53 39 64 7    These can be used  along with the diag function  to compute the pro   ducer s and user s classification accuracies  since di ag  cm  gives the correctly   classified observations      gt   pa     round diag cm  csum  2     1  0 66 0 28 0 59 0 29   gt   ua  lt   round diag cm  rsum  2     1  0 57 0 61 0 60 0 10    The apply function has several variants  lapply and sapply to apply  a user written or built in function to each element of a list  tapply to  apply a function to groups of values given by some combination of factor  levels  and by  a simplified version of this  For example  here is how to  standardize a matrix    by hand   i e   subtract the mean of each column  and divide by the standard deviation of each column      gt  applyCcm  2  function x  sapply x  function i  Ci mean x   sd x       1    2  L3    4     1   1 381930  0 10742  0 24678  0 689    2    0 087464 1 39642  0 44420  0 053    3    0 297377  0 32225 1 46421 1 431    4    0 997089  0 96676  0 77323  0 689    The outer apply applies a function to the second margin  i e   columns    The function is defined with the function command     The structure is clearer if braces are used  optional here because each  function only has one command  and the comman
92. e test R  in a convenient directory   4  Start R  if it   s not already running   5  In R  select menu command File   Source R code    6  In the file selection dialog  locate the file test R that you just saved   changing directories if necessary  and select it  R will run the script     7  Examine the output     You can source the file directly from the command line  Instead of steps  5 and 6 above  just type source   test R   at the R command prompt   assuming you ve switched to the directory where you saved the script      Appendix B contains an example of a more sophisticated script     For serious work with R you should use a more powerful text editor  The  R for Windows  R for Mac OS X and JGR interfaces include built in edi   tors  another choice on Windows is WinEdt    And both RStudio  83 3   and Tinn_R  83 4  have code editors  For the ultimate in flexibility and  power  try emacs   with the ESS  Emacs Speaks Statistics  module     learn   ing emacs is not trivial but rather an investment in a lifetime of efficient  computing        21 What is the expected value of this correlation cofficient   22 http    www winedt com    23 http    en wikipedia org wiki Emacs   24 http    stat ethz ch ESS     15    3 6 The Rcmdr GUI    The Rcmdr add on package  written by John Fox of McMaster University   provides a GUI for common data management and statistical analysis tasks   It is loaded like any other package  with the require method      gt  require  Rcemdr      As it is loa
93. ector and then extract the number with the  as numeric function  Note that we read the second line by specifying  skip 1 and then nlines 1      gt  ncol  lt   scan file  image grd   nlines 1  what  character      gt  ncol    1   NCOLS   12     gt  ncol  lt   as numeric ncol 2      gt  ncol    1  12    gt   nrow  lt   as numeric scan file  image grd   skip 1  nlines 1  what  character   2      1  12    Now we read the image data into a matrix  as one vector      gt  img     scan file  image grd   skip 6  na strings   9999    Read 144 items   gt  summary  img   Min  1st Qu  Median Mean 3rd Qu  Max   141 176 202 200 222 256   gt  class img    1   numeric     Although this small image has no missing values  in general we may expect  them  so we need to specify the format of the missing values     Finally  we specify the dimensions and format of the image  thereby pro   moting it to a matrix  with the matrix function  By default R interprets a  matrix in column major order  each column is a vector   but the image is  stored by convention in row major order  therefore we specify the optional  byrow T argument      gt  img     matrix img  nrow  ncol  byrow T     97     gt  class img    1   matrix    gt  str img     num  1 12  1 12  141 143 147 151 153 153 153 157 159 159        98    7 Exporting from R     VVVVVVVV VY         Data frames are easily exported from R  For all the possibilities  see the R  Data Import Export manual  Here we explain the most common operation   exporting 
94. egression  using the Im method  and a robust  regression  using the 1qs method of the MASS package      ablineCIm Petal Width   Petal Length   lty  longdash   col  red    require MASS    for lqs function  abline lqs Petal Width   Petal Length   lty 2  col  blue      This customized plot is shown in Figure 8     72    Anderson lris data                44     o 99 7  Three species of Iris     o 9 P d  9 setosa i 0000 999   e   8 versicolor     90   g      virginica    E ROY     Sy 000 e     o           00000      i je a v      oar Tr 4  B  Eu   mem m  o       m mm o  Eoo Obusesesk  bke adu uodis ede iuis                       ee  gt  srl ace  D B wo mm  a och mm mm  o rod  S 7 a e  o 009 6  000 0     Opc  co o  a eo  5 T T T T T T  1 2 3 4 5 6 7    Petal length  cm   Centroids  mean  green  and median  gray     Figure 8  Custom scatterplot    5 1 1 Mathematical notation in base graphics    To include mathematical notation in one of the graphics functions that  draw text  for example text  axis or title  there is a special syntax ex   plained in the help for the plotmath function  The trick is to turn the argu   ment into an R expression by using the expression function  in this case  the text functions interpret the argument as a mathematical expression  and the output will be formatted according to TpX like rules  It is possible  to produce many mathematical symbols  sub  or superscripts  fractions   operators  greek letters etc   run demo plotmath  to see examples     The exp
95. en  Note the format  upper case variable name  language  name exactly as in the list of available languages  no quotes  the  whole thing after the program name     See the R for Windows FAQ Questions 2 2 and 2 15 for more details  this  FAQ is available via the Help menu in RGui     122    Mac OS X TheR for Mac OS X FAQ   explains that R detects the language  settings from the Language  amp  Text section of the System Preferences  and  presents translated messages and GUI if they are available in the selected  language  The Formats information for numbers is also used     This can be over ridden on a system wide basis  not per session   by set   ting the force LANG defaults setting  For example  to force the US English  text on a system running another language  open the Terminal  i e   a shell   and enter this command     defaults write org R project R force LANG en US UTF 8    Other languages and encodings can be specified with the International Or   ganization for Standardization  ISO  standard for the identification of lan   guages  ISO 639 1 or  2   and locales  ISO 3166 1      B Anexample script    This is an example of a moderately complicated script  which gives both a  numerical and a visual impression of the variability of small random sam   ples  The output is shown in Figure 21 on the following page  If you want  to experiment with the script  cut and paste it into a text editor  for exam   ple  Tinn R or the built in editor of RStudio or R console   modify it as 
96. ents     Estimate Std  Error t value Pr  gt  t       Intercept  69 3963 23 8358 2 91 0 00713    Height  1 2971 0 3098   4 19 0 00027  Girth  5 8558 1 9213   3 05 0 00511  Height Girth 0 1347 0 0244 5 52 7 5e 06    Residual standard error  2 71 on 27 degrees of freedom  Multiple R Squared  0 976 Adjusted R squared  0 973  F statistic  359 on 3 and 27 DF  p value   lt 2e 16    This provides the most important information about the model  for more  insight consult textbooks which explain linear modelling  e g  Venables  amp   Ripley  57   Fox  18   Christensen  4  or Draper  amp  Smith  14        The model formula with which 1m was called       A summary of the residuals  by definition the mean is zero so is not    reported       The model coefficients  first column        The standard error associated with each coefficient  second column      these can be used to construct confidence intervals       The significance level of each coefficient  third column   this is the    probability that rejecting the null hypothesis for the listed coefficient  would be a mistake       The residual standard error  which is defined as the square root of    the estimated variance of the random error  0     1  n p  x  gt   vi    where 7  is one of the n residuals and p is the number of coefficients       The coefficient of determination R   both the unadjusted fraction of    variance explained by the model R    1      Residual SS Total SS  and    58    the coefficient adjusted for the number of p
97. essage body     More options   Related messages    gallon li     R  compute ROC curve   Next message     Felix Andrews    R  Sea  http  finzi psych upenn edu R RhelpO2a archive 106735 html  19 480 bytes        Figure 20  Results of an R site search    Attend the useR  user   s conference every two years since 2004  The  papers from these meetings are available on line          58 links to http   finzi psych upenn edu search htm   59 http   www  r project org conferences htm     109    10 Frequently asked questions  10 1 Help  I got an error  what did I do wrong     1  Read the error message carefully  Often it says exactly what is  wrong  For example      gt  x  lt   rnorm 100    gt  summary  X   Error in summary X    Object  X  not found    This means exactly what it says  there is no object named X in the  workspace nor in attached data frames  so R could not find it when  it tried to execute the summary method  In this case the solution is  clear  the variable is a lower case x  not an upper case X      gt  summary  x   Min  1st Qu  Median Mean 3rd Qu  Max    2 750  0 561  0 070  0 014 0 745 2 350    2  If the command involves an external file  make sure your R session  is in the right directory  or else use the full path name  For example   the read csv method requires a file name      gt  dlv     read csv  dlv csv     Error in file file   r     unable to open connection  In addition  Warning message    cannot open file  dlv csv     The  unable to open connection  message
98. ession     gt  eval  parse text paste  m    b      lt   mean B   b       sep        Error in mean B4    object  B4  not found     gt    must define this variable to compute its mean    gt  B4  lt   0 100    gt  eval  parse text paste  m    b      lt   mean B   b       sep         gt    result so far    gt m     1  NA NA NA 50   gt    apply to all 256 variables  need B1    B256 defined   gt  for  b in 1 256     eval   parse   text    paste  m    b      lt   mean B   b       sep              Note that the for expression in this example does not need to be written  on separate lines  it is so written here for clarity     The result of the parse function is an expression  it is also possible to  create an R expression  ready to be evaluated with the eval function  with  the expression function  For example      gt  tmp  lt   expression 2 pi r    gt r lt   3    gt  eval  tmp     1  2 0944    68    5 Rgraphics    R provides a rich environment for statistical visualisation  28   There are  two graphics systems  the base system  in the graphics package  loaded  by default when R starts  and the trellis system  in the lattice package      R graphics are highly customizable  see each method s help page for de   tails and  for base graphics  the help page for graphics parameters   par   Except for casual use  it s best to create a script  83 5  with the graphics  commands  this can then be edited  and it also provides a way to produce  the exact same graph later     Multiple graphs can 
99. ev   num 7 91 6 98 7 80 7 66 7 48        dist   num 0 00136 0 01222 0 10303 0 19009        om   num 13 6 14 13 8 8 7 7 8    ffreq   int 111111    soil   int 111222    lime   int 111000    landuse  Factor w  2 levels  Ah   Ga   1112 12    R could determine that landuse is a factor  categorical variable   because  it was non numeric  It could also determine the variable names from the  first row  The other factors were not recognized  and in fact they have  different R data types  which we now assign  using the as   functions to  change data types      gt  ds soil  lt   as factor ds soil    gt  ds ffreq     as ordered ds ffreq        39 pead  csv is a specialised wrapper to the very general read  table function which can    deal with tables in many other formats    94     gt  ds lime  lt   as logical ds lime    gt  str ds      data frame   6 obs  of 10 variables      x   int 181072 181025 181165 181298         y   dnt 333611 333558 333537 333484         cadmium  num 11 7 8 6 6 5 2 6 2 8 3     elev   num 7 91 6 98 7 80 7 66 7 48         dist   num 0 00136 0 01222 0 10303 0 19009        om   num 13 6 14 13 8 8 7 7 8     ffreq   Ord factor w  3 levels  1    2    3   1       soil   Factor w  3 levels  1   2   3   111        lime   logi TRUE TRUE TRUE FALSE FALSE FALSE    landuse  Factor w  2 levels  Ah   Ga   1112 12    Using the correct data type is important for many modelling methods  here  we inform R that lime was either applied  logical TRUE  associated by R  with the number 
100. ew data frame  and the  runif function to draw random numbers  by default in the range 0   1        34 also used in C and its derivatives such as C   and Java  of Algol it has been aptly said  that it was a great improvement over its successors     64    Note also the use of the ifelse function to select one of two alternatives   in this case the colour to plot  based on some truth condition     We also plot the sample points and target point  showing the nearest point  and shift  This uses the plot  points  text and arrows base graphics  functions  The result is shown in Figure 5       simulate ten sample points   n pts  lt   10   sample pts     data frame x   runif n pts   y   runif n pts       simulate the given point   new pt  lt   data  frame x   runif 1   y   runif 1     print paste  Target point  x    round new pt x 4      1   Target point  x  0 0756   y  0 8697     gt    the distance to the nearest point can t be further on a 1x1 grid    gt  min dist  lt   sqrt 2   close pt  lt   NULL   for  pt in 1 n pts    d  lt   sqrt  sample pts pt  x     new pt x A2    sample pts pt  y     new pt y  2     if  d    min dist   min dist     d   closest pt     pt   print paste  Point  else     print paste  Point   pt  is not closer       1   Point 1 is closer  d  0 4066     V MV NN N      y    round new pt y 4        pt   is closer  d    round d 4             4  4   V     1   Point 2 is not closer    1   Point 3 is closer  d  0 1478    1   Point 4 is not closer    1   Point 5 is not 
101. exing operation     24     gt  ss ix length ss ix      1  9    So the largest value in the sample sequence is found in the ninth position   This example shows how expressions may contain other expressions  and S  evaluates them from the inside out  just like in mathematics  In this case       The innermost expression is ss ix  which is the vector of indices in  object ss       The next enclosing expression is length       the length function  returns the length of its argument  which is the vector ss ix  the  innermost expression        The next enclosing expression is ss ix        which converts the  result of the expression length ss ix  to a subscript and extracts  that element from the vector ss ix     The result is the array position of the maximum element  We could go one  step further to get the actual value of this maximum  which is in the vector  ss x      gt  samp ss ix length ss ix2     1  9 599777    but of course we could have gotten this result much more simply with the  max function as max  ss x  or even max  samp      4 6 Arrays and matrices    An array is simply a vector with an associated dimension attribute  to give  its shape  Vectors in the mathematical sense are one dimensional arrays  in S  matrices are two dimensional arrays  higher dimensions are possible     For example  consider the sample confusion matrix of Congalton et al   6    also used as an example by Skidmore  53  and Rossiter  43           Reference Class  A B C D                           
102. for loop to add one line for each  species     See  panel functions and the explanation of the panel parameter in   xyplot for details     5 2 5 Types of Trellis graphics plots    Table 3 lists the principal plot types  see the help for each one for more  details     5 2 6 Adjusting Trellis graphics parameters    The Trellis graphics environment as implemented in the lattice package  sets reasonable defaults for its graphics parameters  based on the output  device  screen  PDF file       Changing these requires three steps   1  re   trieve the parameters into a data structure in memory   2  modify them     82    Univariate                                  assocplot Association Plots   barchart bar plots   bwplot box and whisker plots   densityplot kernel density plots   dotplot dot plots   histogram histograms   qqmath quantile plots against mathematical dis   tributions   stripplot 1 dimensional scatterplot   Bivariate   qq q q plot for comparing two distribu   tions   xyplot scatter plots   Trivariate   levelplot level plots   contourplot contour plots   cloud 3 D scatter plots   wireframe 3 D surfaces   Hypervariate   splom scatterplot matrix   parallel parallel co  rdinate plots   Miscellaneous   rfs residual and fitted value plot   tmd Tukey Mean Difference plot             Table 3  Trellis graphics plot types     3  write the modified parameters back as permanent options     Parameters are retrieved with the trellis par get method and set with  the trellis par set metho
103. function to repeat a sequence  Also note that  an arithmetic expression  in this case    1000  can be applied to an entire  vector  in this case rep seq 182 183  length 5   2       In practice  this data frame would probably be prepared outside R and  then imported  see 86     32    Adding rows to a data frame The rbind   row bind   function is used to  add rows to a data frame  and to combine two data frames with the same  structure  For example  to add one more trees to the data frame     V    x  182000  182250  182500  182750  183000  182000  182250  182500  182750  10 183000  11 183500    WO ONDA UU a UN H    y  381000    310300  381000  310300  381000  310300  381000  310300  381000  310300  381000     ds  lt   rbind ds  c 183500  381000  15       n trees  10  12  22  4  12  15  7  18  2  16  15    This can also be accomplished by directly assigning to the next row      gt  ds 12     gt  ds    x  182000  182250  182500  182750  183000  182000  182250  182500  182750  10 183000  11 183500  12 183400    WOONA UN 2 UN H        c 183400  381200  18     y  381000    310300  381000  310300  381000  310300  381000  310300  381000  310300  381000  381200    n trees  10   12   22   4   12   15    Adding fields to a data frame A vector with the same number of rows as  an existing data frame may be added to it with the cbind     column bind      function  For example  we could compute a height to girth ratio for the  trees  a measure of a tree s shape  and add it as a new field to 
104. g meuse lead    meuse ffreq    Coefficients     Intercept  ffreq2 ffreq3  5 106  0 666  0 626     gt  meuse ffreq  lt   as ordered meuse ffreq    gt  str meuse ffreq     Ord factor w  3 levels  1    2    3   1111111111        gt  contrasts  meuse ffreq    L i   1  7 0711e 01 0 40825  2  9 0738e 17  0 81650  3 7 0711e 01 0 40825   gt  ImClog meuse lead    meuse ffreq    Coefficients     Intercept  meuse ffreq L meuse ffreq Q   4 675  0 443 0 288    The unordered factor has treatments contrasts  sometimes called    dummy    56    variables   whereas the ordered factor has orthogonal polynomial con   trasts  These result in different fitted coefficients     4 17 2 Analysis of Variance  ANOVA     The results of linear models can be expressed in the traditional language  of ANOVA as found in many textbooks with the aov function  this calls 1m  and formats its results in a traditional ANOVA table      gt  model  lt   aov Volume   Height   Girth  data trees    gt  class model    1   aov   Im    gt  summary  model   Df Sum Sq Mean Sq F value Pr  gt F     Height 1 2901 2901 193 4 5e 14  Girth 1 4783 4783 317    2e 16  Residuals 28 422 15    Two models on the same dataset may be compared with the anova func   tion  this is one way to test if the more complicated model is significantly  better than the simpler one      gt  model 1     aov Volume   Height   Girth  data trees    gt  model 2     aov Volume   Height   Girth  data trees      anova model 1  model 2    Analysis of Variance Tab
105. ght   1  70    The forms like  Height use the   operator to select a named field within  the frame  The forms like  1  2  show that this is just a matrix with col   umn names  leading to forms like trees 1   Height    The forms like  trees 1   Height show that each row  observation  case  can be consid   ered a list with named items  The forms like trees  2   show that the  data frame is also a list whose elements can be accessed with the        operator     Attaching data frames to the search path To simplify access to named  columns of data frames  S provides an attach function that makes the  names visible in the outer namespace      gt  attach trees    gt  Height 1 5    1  70 65 63 72 81    V MM N    FOONDUBRWNE    Building a data frame S provides the data  frame function for creating  data frames from smaller objects  usually vectors  As a simple example   suppose the number of trees in a plot has been measured at five plots in  each of two transects on a regular spacing  We enter the x co  rdinate as  one list  the y co  rdinate as another  and the number of trees in each plot  as the third     x     rep seq 182 183  length 5   2  1000  y     rep c 381000  310300   5   n trees     c 10  12  22  4  12  15  7  18  2  16    ds     data frame x  y  n trees    x y n trees  182000 381000 10  182250 310300 12  182500 381000 22  182750 310300 4  183000 381000 12  182000 310300 15  182250 381000 7  182500 310300 18  182750 381000 2  0 183000 310300 16    Note the use of the rep 
106. graphics argument  70  chisq test  50   chol  30   circular package  2  114  class  44  45   cloud  package  lattice   79  83  cm colors  88  coefficients  57   col  27   col graphics argument  70  86  col2rgb  88   colnames  92   colours  87  complete cases  63  contour  75   contourplot  package lattice   79  83  contr helmert  56  contr poly  56    133    contr sum  56  contr treatment  56  contrasts  56   cooks distance  59   coordinates  package sp   46  99  coplot  75   cor  63  111  113   cov  63   covratio  59   cut  45    data  18   data frame  32  64   datasets package  18   densityplot  package lattice   77  83  det  30   detach  34   dev off  10   dev set  85   dfbetas  59   dffits  59   diag  27  29   dim  26  38   dnorm  42   dotchart  75   dotplot  package lattice   83  duplicated  51   dvonmises  package circular   114    e1071 package  50  eigen  30  eval  68  73   exp  19  expression  68  73    factor  34 36  file show  96  filled contour  75  fitted  57 61   fix  92  93   floor  19 51  for  22  64  82  formula  57  fourfoldplot  75  function  29  65    gc  115   geoR package  2  106  getClass  46  47  getwd  110    glm  52  55  62   graphics package  69   gray  88   grid  71  72   GridTopology  package sp  class  48  groups trellis graphics argument  79  gstat package  2  6  17  94  99    hatvalues  59   heat colors  88   help  12  116  help search  12  111  114   hist  75  84   histogram  package lattice   83  86  hsv  89    I  54   identify  77   if    el
107. h v     gt  hm 1 99     1  37 6231    gt  mean 1 99     1  50    As it stands  this function does not check its arguments  it only makes  sense for a vector of positive numbers      gt  hn cC 1   2  1  2     1  NaN   Warning message    NaNs produced in  log x     To correct this behaviour we write a multi line function with a conditional  statement and the return function to leave the function      gt  hm  lt   function v     if Clis numeric v   1  print  Argument must be numeric    return  NULL        else if  any v  lt   OD      66    print  A11 elements must be positive    return NULL        else return exp sum log v   length v           gt  classChm    1   function    gt  hm  function v        gt  hm letters     1   Argument must be numeric    NULL    gt  hn cC 1   2  1  25     1   All elements must be positive   NULL    gt  hm 1 99     1  37 6231    Note the use of the any function to check if any elements of the argument  vector are not positive  Also note how simply typing the function name  lists the function object  this is shorthand for the print function   to call  the function you must supply the argument list     Note also the use of the if    else control structure for conditional ex   ecution  these can be nested  i e  the statement for either the 1f or else  can itself containa if    else control structure     4 23 Computing on the language    As explained in the R Language Definition      R belongs to a class of programming languages in which sub   routines h
108. he package in the  usual way  For example  to get help on the variogram method of the    gstat package  once this package has been loaded         variogram    17    3 8 Sample datasets    R comes with many example datasets  part of the default datasets pack   age  and most add in packages also include example datasets  Some of  the datasets are classics in a particular application field  an example is  the iris dataset used extensively by R A Fisher to illustrate multivariate  methods     To see the list of installed datasets  use the data method with an empty  argument      gt  dataQ  To see the datasets in a single add in package  use the package  argument    gt  data package  gstat      To load one of the datasets  use its name as the argument to the data  method      gt  data iris     The dataframe representing this dataset is now in the workspace     18    4 The S language    R is a dialect of the S language  which has a syntax similar to ALGOL like  programming languages such as C  Pascal  and Java  However  S is object   oriented  and makes vector and matrix operations particularly easy  these  make it a modern and attractive user and programming environment  In  this section we build up from simple to complex commands  and break  down their anatomy  A full description of the language is given in the R  Language Definition  38    and a comprehensive introduction is given in  the Introduction to R  36       This section reviews the most outstanding  features of S     Al
109. he table of contents    This will give you some familiarity with the style of R sessions and more  importantly some instant feedback on what actually happens  Don t worry  if you don t understand everything  this is just to give you a feel for how  R works and what it can do  For individual commands  it is always best to  look at its help topic     The introductory tutorial or similar has been translated to many languages   including Chinese  Croatian  Farsi  French  German  Hungarian  Italian         5 http   cran r project org web views index html   45 http   cran r project org web views Spatial html   46 http   cran r project org web views Multivariate html   47 http   cran r project org other docs html   48 http   cran r project org doc contrib Ricci refcard regression  pdf  49 http   cran r project org doc contrib Ricci refcard ts pdf    105    Japanese  Spanish  Polish  Portuguese and Vietnamese  these can be ac   cessed by following the    CRAN    link on the R Project home page     and  then  Contributed documentation  and  other languages      A good introduction to R concepts is the 100 page  R for Beginners  by  Emmanuel Paradis of the University of Montpellier  F   30   This is also  availble in his native French  32  and in a Spanish translation  31   Correa   amp  Gonz  lez  9   is a Spanish language introduction to R graphics  Dalgaard   11  is a clearly written introductory statistics textbook  using R in all ex   amples  Another useful introduction is  simple
110. he window    5 4 2 Trellis graphics    V MN N    V MV NV VN NN N    A trellis  85 2  graphics window can also be split  but in this case the  print method of the lattice package must be used on an object of class  trellis  which might be built in the same command   using the split    optional argument to specify the position  x and y  within a matrix of  plots  For all but the last plot on a page the more T argument must be  specified     Repeating the previous example     print histogram rnorm 100    split c 1 1 2 2   more T    print histogram rbinom 100  20   5    split c 2 1 2 2   more T    print histogram rpois 100  1    split c 1 2 2 2   more T    print histogram runif 100    split c 2 2 2 2   more F     A more elegant way to do this is to create plots with any lattice method   including levelplot  xyplot  and histogram  but instead of displaying  them directly  saving them  using the assignment operator  lt    as trellis  objects  this is the object type created by lattice graphics methods   and  then print them with lattice   s print method  The advantage is that the  same plot can be printed in a group of several plots  alone  or on different  devices  without having to be recomputed     For example     hl  lt   histogram rnorm 100   col  lightblue       h2  lt   histogram rbinom 100  20   5   col  snow3     h3  lt   histogram rpois 100  1   col  springgreen1     h4  lt   histogram runif 100   col  red4      print hl  split c 1 1 2 2   more T     print h2  split c 2 1
111. ical packages         o    ooo    n  2 3 3 Special purpose statistical programs              2 3 4 Spreadsheets                             2 3 5 Applied mathematics programs                   Dd YN UU UVA Ww Ww    3 Using R 7  2  Roconsole GUL os 222 2 oe RUE a da e 7  3 1 1 On your own Windows computer               7   3 1 2 On the ITC network    ci cea sorea ee Bee Roms 7   3 1 3 Running the R console GUI                   8   3 1 4 Setting up a workspace in Windows              8   3 1 5 Saving your analysis steps                     9    3 1 6 Saving your graphs                    len  9   3 2 Working with the R command line                   10  S l The command prompt  ss soa s reie E 10  3 2 4 On line help NR  cocoa a a a 11   3 3 The RStudio development environment               13  3 4 The Tinn R code editor   140  4 sie eed a A a 14  3 5 Writing and running Scripts  coxis 14  210 The Remdr Gul  a  vs ew ee Oe ee ee ee X S 16  3 7 Loading optional packages             0000 ee eee 17  38 Sample datasets iso otitis a  a Rma 18  4 The S language 19  4 1 Command line calculator and mathematical operators      19  4 2 Creating new objects  the assignment operator          20  4 3 Methods and their arguments               less 21  4 4 Vectorized operations and re cycling                 22  4 5 Vector and list data structures                     24  46 Arrays and Matrices io sonara BoE Oe ee eaa 25  4 7 JDAbadralesc scosi nen OM Ke ee we a 30  JS PACTOS is maed 2 oy ee a 
112. ich in this two predictor case is also the same as Height    Girth     Removing terms Sometimes it is convenient to specify a model and then  remove a term from it with the   formula operator  As a somewhat arti   ficial example  to model tree volume by only tree girth and its interaction  with height      gt  model  lt   Im Volume   Height  Girth   Height  data trees   This is equivalent to     53     gt  model  lt   Im Volume   Girth   Girth Height  data trees     The   formula operator is often used to remove the intercept  see below     Nested models The   operator is used to specify that the second named  predictor is nested within the first named predictor  i e  levels of the nested  predictor are not independent factors  Nested models are often used in de   signed experiments such as split plot designs or replicated measurements  within an experimental unit  This is also used for the analysis of covari   ance  ANCOVA   where the covariates are nested within the treatment  in  this case the intercept should be removed  see next        No intercept The intercept term  e g  the mean  is implicit in model for   mulas  For regression through the origin  it must be explicitly removed  with the   formula operator  in this case the implied intercept  with the  expression  1  Or  the origin can be named explicitly with the   formula  operator  with the expression   0  For example  it s certainly true that a  tree with no girth has no height  so if we want to force the regressio
113. iduals    NI o Td Be fact Sepal Width   98  Bo Be  ra Petal Length  Sepal Length  ve log o 00   Sepal Length M   MEME ab i bn    Petal Width  B ego d B E Ha  o 003 po o o P o f  s EM o   E  d   s   boa d a3    d d ki a  a ha  5 g E 4 2 6 1 8 9 d 7  e  2      2    od d i oh    P og   d    9o oo    oo oo  a  os    s e    19 14 15 16 17 18 19 20 2    23 2  RR  MEM Sepal  Width A Se 800  i a          ia    Pod  o o o    D  3 498 o b P y fao o m o     o   2 9 St ow poc po  NS RM Po g Bio 88000 o Rob a    d s  1 4    a   o   34 g RB     axe  gt  3 Po oo     oS EI o  56  o ge qoc 0 41 42 43 44 45 46 47 48  84 2 o o t Epor po o     gt  44  F z    49 51 52 53 54 55 56 57 58 59 60  seb   et of east    y T   9    9 r3  gt   EU as vA  61 64 65 66 67 68 69 70 71 72  o 9 Co  o Petal Li th N a   P F    lt P  lt         sp  SP scc etal Leng 2 r  73 74 76 77 78 79 80 81 82 83 84  o o o po   p S ox t oc    gt  9  9 Lo 85 87 88 89 90 91 92 93 94 95 96  iss  gt      2 97 98 99 100 101 102 108 104 108  i EU  ES  iim o    lt P  lt p T  34 a ic   vc o Ei 109 110 111 116 120  A g of s 109 T 120  af 00 in  o 0  pollog   oj 9   NV y K   7 e epe o o e Petal Width 121 122 123 128 132  m  aie ster   pe  lt    p    ml oo E 135 140 144                Figure 10  Some interesting base graphics plots       Some packages have implemented their own variations of these and other    76    plots  for example scatterplot and scatterplot matrix in the car  package and truehist in the MASS package     5 1 4 Interacti
114. in turn  can be extended as class SpatialPixelsDataFrame     S4 classes have named slots  marked by the     sign in the output of str and  also shown in the output of getClass  these can also be listed with the  slotNames function      gt  slotNames  meuse    1   data   coords nrs   coords   bbox   proj4string     The contents of these slots are explained in the help for the S4 class      gt    SpatialPointsDataFrame class     Each slot is either a primitive S object  e g  slot bbox which is a matrix  or  another S4 class  e g  slot data which is an object of class AttributeList  also defined by the sp package       gt  class meuseGbbox    1   matrix     gt  class meuse data    1   AttributeList   attr    package     1   sp     Slots may be accessed directly with the   operator  just as fields in data  frames are accessed with the   operator  For example  to extract the  bounding box  limiting co  rdinates  of this spatial data set      gt  meuse bbox  min max  x 178605 181390  y 329714 333611   gt  meuse bbox  x    min      1  178605    However  it is better practice to use access methods      gt  bbox meuse   min max   x 178605 181390   y 329714 333611    Each S4 class has a set of methods that apply to it  bbox is an example that  applies not only to objects of class SpatialPointsDataFrame  but to ob   jects of the generalised class from which class SpatialPointsDataFrame  is specialised  namely class SpatialPoints and class Spatial     47    To determine the methods that 
115. inear models   Chapter 6 of Statistical Models in S eds J  M  Chambers and T   J  Hastie  Wadsworth  amp  Brooks Cole     McCullagh P  and Nelder  J  A   1989  Generalized Linear Models   London  Chapman and Hall     Venables  W  N  and Ripley  B  D   2002  Modern Applied  Statistics with S  New York  Springer     You can also look up technical terms in your favourite statistics textbook     One of R s strongest points is that you are not limited to  textbook  or   routine  methods  you can use the most modern techniques that you see  referenced in papers  usually with contributed packages  As a last resort   you can program a procedure yourself     118    A Obtaining your own copy of R    You may want your own copy of R for your portable computer  your home  computer  or your organisation s computer or network  This is free and  legal  not only that  it is encouraged     Within ITC  R  RStudio and Tinn R can be installed via the Software Man   ager  The latter two should be installed after installing R itself     From anywhere in the world  everything R is found via the R Project Home  Pages    which has links to       Download R and additional packages from CRAN  The Comprehen   sive R ArchiveNetwork       the first time you attempt to download you  will be asked to select a mirror  i e  one of the many servers that host  the R distribution       Installation instructions      Manuals      Frequently Asked Questions  FAQ       The R Journal  including innovative statistical
116. ion      gt  set seed 61921    gt  sort round rnorm 10  170  15      1  129 157 157 166 168 170 173 175 185 193   gt  set seed 61921    gt  sort round rnorm 10  170  15      1  129 157 157 166 168 170 173 175 185 193    Now the results are the same every time     4 12 Character strings    R can work with character vectors  also known as strings  These are often  used in graphics as labels  titles  and explanatory text  A string is created  by the   quote operator      gt   label      A good graph     1   A good graph     Strings can be built from smaller pieces with the paste function  parts can  be extracted or replaced with the substring function  strings can be split  into pieces with the strsplit function        paste label       15   x   20   cm     1   A nice graph   15 x 20 cm    gt   labels  lt   paste  B   1 8  sep        1   B1   B2   B3   B4   B5   B6   B7   B8    gt  substring label  1  4    1   A go    gt  substring label  3       nice   label   1   A nice graph      gt  strsplit label          111     1   A   nice   graph     gt  unlist strsplit label          1   A   nice   graph     gt  unlist strsplit label        3    1   graph    Note the use of the unlist function to convert the list  of one element   returned by strsplit into a vector     Numbers or factors can be converted to strings with the as character    function  however this conversion is performed automatically by many  functions  so an explicit conversion is rarely needed     43    4 13 Objects a
117. ions such as plot and boxplot and trellis  graphics  85 2  functions such as levelplot     The simplest form is where a  mathematically  dependent variable is  math   ematically  explained by one  mathematically  independent variable  like all  model formulae this uses the   formula operator to separate the left  de   pendent  from the right  independent sides of the expression      gt   model     Im trees Volume   trees Height    Call   ImCformula   trees Volume   trees Height     Coefficients      Intercept  trees Height   87 12 1 54    This is to be read like a mathematical function  where the left hand side is  the result     dependent     and the right hand side is the expression     inde   pendent      In other words  a formula is read as      a response variable  left hand side             1s explained by  the   symbol            a formula including one or more predictor variables  In this example  the tree volume is to be explained as a linear function  of its height  this is just a first order linear regression  with the best fit  least squares line y      87 12   1 54x  for every foot increase in height   the volume increases by 1 54 ft   also  a zero height tree would have a  negative volume       If the data frame has been attached  this can be written more simply        attach trees           Nicely illustrating the risks of extrapolating outside the range of calibration  this data    set only has trees from 63 to 87 feet tall  so the fitted relation says nothing
118. ir de   scription  along with the packages where these methods are found  whether  already loaded or not     In the list shown as a result of the above method  we see the Binomial   stats  topic  we can get more information on it with the   method  this  is written as the   character immediately followed by the method name      gt   Binomial    This shows the named topic  which explains the rbinom  among other   methods     Vignettes Packages have a long list of methods  each of which has its own documen   tation as explained above  Some packages are documented as a whole by  so called vignettes      for now most packages do not have one  but more  will be added over time     You can see a list of the vignettes installed on your system with the vignette  method with an empty argument      gt  vignette    and then view a specific vignette by naming it      gt  vignette  sp         16 also available via the Help   Search help     menu item  17 from the OED meaning    a brief verbal description of a person  place  etc   a short  descriptive or evocative episode in a play  etc      12    3 3 The RStudio development environment    RStudio   is an excellent cross platform     integrated development envi   ronment for R  A screenshot is shown in Figure 1  This environment in   cludes the command line interface  a code editor  output graphs  history   help  workspace contents  and package manager all in one atttractive in   terface  The typical use is   1  open a script or start a ne
119. is taken by the more recent knitr package       This approach has three steps   1  preparing the source NoWeb document    2  compiling this with Sweave into a BIEX source file   3  compiling the  BIEX source into a PDF document  Following is a brief introduction  for an  extended discussion see Rossiter  47      8 1 The NoWeb document    This is plain text file with extension  Rnw     R NoWeb    It contains two  kinds of commands     1  ATpXcommands  as in a standard PIX document     2  R commands  to be executed     The R commands are placed in code chunks  delimited with the special  symbols       to begin a chunk  and     to end one     Here is a minimal example  suppose this is in a text file named example  Rnw      documentclass article    usepackage pdftex  final   graphicx    usepackage Sweave     begin document     title Sweaving for reproducible data analysis    author A Nonymous     date 14 November 2010     maketitle   Here is a simple example of R and Sweave     lt  lt  gt  gt     data trees    str trees    summary Im Volume   Height Girth  data trees   adj r squared       4 http   www  latex project org   42 http   yihui name knitr     101        end document     The KTFfX commands begin with    e g    documentclass  The R code is  written as if at the R console  but between the  lt  lt  gt  gt    beginning a code  chunk  and the    ending it   e g   data trees   Everything else is text   e g   Here is a simple example     This is processed in R with the Sweave f
120. ition for the second subscript  so all columns  are selected     To make a list of the observation numbers where a certain condition is  met  use the which function      gt  which trees Volume  gt  60     1  31    gt  trees which trees Volume  gt  60     Girth Height Volume   31 20 6 87 77    37    Logical expressions may be combined with logical operators such as  amp   log   ical AND  and    logical OR   and their truth sense inverted with    logical  NOT   For example  to select trees with volumes between 20 and 40 cubic  feet      gt  tr     trees Volume  gt  20  amp  Volume  lt   40      N Bb UN H    Note that      like S arithmetical operators  is vectorized  i e  it operates on  each pair of elements of the two logical vectors separately     Parentheses should be used if you are unclear about operator precedence   Since the logical comparaison operators  e g   gt    have precedence over bi   nary logical operators  e g   amp    the previous expression is equivalent to     tr     trees  Volume  gt  20   amp   Volume  lt   40       Another way to select elements is to make a subset  with the subset func   tion      tr small     subset trees  Volume    18    Girth Height Volume    1  1    8    kr O    co     3 70 10 3   6 65 10 3   8 63 10 2  z5 72 16 4   0 66 15 6    Selecting random elements of an array Random elements of a vector  can be selected with the sample function     trees sort sample 1 dim trees  1   5       Girth Height Volume    11 4 76 21 4  13 3 86 27 4  1
121. kage sp  class  47  48   SpatialPointsDataFrame  package  sp   class  46 48  99   SpatialPolygons  package  sp  class  48   spatstat package  2   splines package  2   split  38  39   splom  package lattice   83   spplot  package sp   48   sqrt  19   stack  41   stars  75   stem  75   step  62   str  24  45  47  55  57   stripchart  75    stripplot  package  lattice   83  strsplit  43   subset  38   substitute  73   substring  43   sum  23  29  summary  4  11  44  49  58  63  110  summary  Im  58  sunflowerplot  75   svd  30   Sweave  102   symbols  71    t  4  26  55   table  34  35  50   tapply  29   terrain colors  88   text  65  71 74   title  71 73   tmd  package lattice   83  topo colors  88  89   trees dataset  30  33  37  51  55  63  93  trellis par get  package lattice   83  trellis par set  package lattice   83  truehist  package mass   77   trunc  19    union  51  unique  51  unlist  43  unstack  41    var  23   variogram  package gstat   17  vegan package  2  6   vignette  12   volcano dataset  81    wavelet package  2   whi ch  37  61   while  64   windows  77  84   wireframe  package lattice   79  83  write table  99    xlab graphics argument  70  xyplot  package lattice   78  83  86  89    ylab graphics argument  70    136    
122. l the functions  packages and datasets mentioned in this section  as well  as the rest of this note  are indexed  8C  for quick reference     4 1 Command line calculator and mathematical operators    The simplest way to use R is as an interactive calculator  For example  to  compute the number of radians in one Babylonian degree of a circle      gt  2 pi 360   1  0 0174533    As this example shows  S has a few built in constants  among them pi for  the mathematical constant rr  The Euler constant e is not built in  it must  be calculated with the exp function as exp 1      If the assignment operator  explained in the next section  is not present   the expression is evaluated and its value is displayed on the console  S has  the usual arithmetic operators               and some less common ones  like     modulus  and      integer division   Expressions are evaluated in  accordance with the usual operator precedence  parentheses may be used  to change the precedence or make it explicit      gt  3   2A2   2   pi   1  7 03319    gt   C3   2 A2   2    pi   1  13 3518    Spaces may be used freely and do not alter the meaning of any S expres   sion     Common mathematical functions are provided as functions  see 84 3   in   cluding log  10g10 and log2 functions to compute logarithms  exp for ex   ponentiation  sqrt to extract square roots  abs for absolute value  round   ceiling  floor and trunc for rounding and related operations  trigono   metric functions such as sin  and inverse
123. le    Model 1  Volume   Height   Girth   Model 2  Volume   Height    Girth  Res Df RSS Df Sum of Sq F PrF    1 28 422   2 27 198 1 224 30 5 7 5e 06    In this case the interaction term of the more complicated model is highly  significant     4 18 Model output    The result of a 1m  linear models  function is a data structure with detailed  information about the model  how it was fitted  and its results  It can be  viewed directly with the str function  but it is better to access the model  with a set of extractor functions  coefficients to extract a list with the  model coefficients  fitted to extract a vector of the fitted values  what the  model predicts for each observation   residuals to extract a vector of the  residuals at each observation  and formula to extract the model formula      gt  model  lt   Im Volume   Height   Girth  data trees    gt  coefficients  model      Intercept  Height Girth Height Girth  69 39632  1 29708  5 85585 0 13465     fitted model   1 2 3 4 5 au    57    8 2311 9     9974 10 8010 16 3186 18 3800         gt  residuals model     1  2 068855    2 3 4 Digit aes  0 302589  0 600998 0 081368 0 420047         gt  formula model   Volume   Height    Girth    The results are best reviewed with the summary generic function  which  for linear models is specialized into summary  1m      gt  summary  model     Call   ImCformula    Residuals   Min      Volume   Height   Girth  data   trees     1Q Median 3Q Max     6 582  1 067 0 303 1 564 4 665    Coeffici
124. lty 2  col  blue    ablineCh median Petal Width   lty 2  col  blue      The lty argument specifies the line type  style   These can be specified  as a code  O blank  1 solid  2 dashed  3 dotted  4 dotdash  5 longdash   6 twodash  or as a descriptive name  blank    solid    dashed    dotted     dotdash    longdash   or  twodash       To add light gray dotted grid lines at the axis ticks  use the grid method   gridO    To add the mean and median centroids as large filled diamonds  use the  points method     points mean Petal Length   mean Petal Width    cex 2  pch 23  col  black   bg  red     points median Petal Length   median Petal Width    cex 2  pch 23  col  black   bg  blue      Titles and axis labels can be added with the title method  if they were  not already specified as arguments to the plot method     title sub  Centroids  mean  green  and median  gray       Text can be added anywhere in the plot with the text method  the first two  arguments are the co  rdinates as shown on the axes  the third argument  is the text  and optional arguments specify the position of the text relative  to the coordinates  the colour  text size  and font     text 1  2 4   Three species of Iris   pos 4  col  navyblue      A special kind of text is the legend  added with the legend method     legend 1  2 4  levels Species   pch 21 23  bty  n    colec  slateblue    firebrick    darkolivegreen       The abline method can also add lines computed from a model  for ex   ample the least squares r
125. m must be written just that way   not as Rbinom or RBINOM  these might be different methods   Variables are  also case sensitive  x and X are different names     Some methods produce output in a separate graphics window      gt  hist x     3 2 2 On line help in R    Individual  methods    Both the base R system and contributed packages have extensive help  within the running R environment     In Windows   you can use the Help menu and navigate to the method you  want to understand  You can also get help on any method with the         14 This simulates  for example  the number of women who would be expected  by chance   to present their work at a conference where 20 papers are to be presented  if the  women make up 40  of the possible presenters    15 Your output will probably be somewhat different  why     11    method  typed at the command prompt  this is just a shorthand for the  help method     For example  if you don t know the format of the rbinom method used  above  Either of these two forms      gt   rbinom     help rbinom     will display a text page with the syntax and options for this method  There  are examples at the end of many help topics  with executable code that you  can experiment with to see just how the method works     Searching for If you don t know the method name  you can search the help for relevant  methods methods using the help search method         gt  help search  binomial      will show a window with all the methods that include this word in the
126. me expand grid Height   seq 50  100  by 10      Girth   seq 5  25  by 5        gt  pred  lt   predict model  new data  interval  prediction      V    level 0 99     add the predictor values for easy interpretation    pred     cbind new data  pred      gt  str pred        33 Some of these combinations would result in strange looking trees     61     data frame   30 obs  of 5 variables       Height     Girth      fit    Iwr    upr    num 50 60 70 80 90 100 50 60 70 80      num 55555 5 10 10 10 10          num 8 93 2 69  3 55  9 79  16 03        num  7 37   9 37  12 74  18 89  27 88        num 25 222 14 743 5 639  0 685  4 167         gt    fits for trees 50 feet tall   gt  pred pred Height  50       fit lwr upr Height Girth  1 8 9265  7 3694 25 222 50 5  7 13 3109 3 0322 23 590 50 10  13 17 6952 5 7180 29 672 50 15  19 22 0796 2 6126 41 547 50 20  25 26 4639  2 0348 54 963 50 25    4 19 Advanced statistical modelling    The Im function is the workhorse of modelling in S  because of the im   portance of linear models and its versatility  However  R has many other  modelling functions  including       Imimplements weighted least squares if the weights are specified as    an optional argument       loess for local fitting       glm for generalised linear models     r1m and 1qs  in the MASS package  for robust fitting of linear models        m ridge  also in the MASS package  for ridge regression     nls for non linear least squares fitting     step for stepwise regression  this is
127. me way as a script  for  example creating a file named plot normals R and then read it into R  with the source function  This is a script  as it is executed it defines the  function  Once the function is defined in the workspace  which you can  verify with the 1s function   you run it just like a built in R function      gt  source  plot_normals R      gt  150   plot normals    gt  plot normals 60     gt  plot normals 60  mu 100  sd 15     gt  plot normals 60  rows 3  cols 3  mu 100  sd 15     126           function to visualise the variability of small random samples                                  pl  pa  re  bi    X     y     fo    j    pa       required arguments   n   sample size  arguments with reasonable defaults   rows cols   dimensions of display  mu  sd   mean  s d  of normal distribution to sample  bsd   histogram bins to represent each s d   sdd       number of s d  to display  ot normals  lt   function n  rows 2  cols 2  mu 0  sd 1   bsd   2  sdd 3 5       set up graphic display  r mfrow c rows  cols      number of random samples  ps  lt   rows cols    histogram bin width  n width sd bsd    scale x axis  min  lt   mu  sdd sd   x max  lt   mu  sdd sd   scale y axis  max  dnorm 1 0  0 3989  adjust to sample and bin sizes  and normalize by s d   and leave room for higher bars    Sh Hk Gk Gk    max  lt   n 0 5 bin width sd         compute and display each graph  r  i in 1 reps     v  lt   rnorm n  mu  sd   hist v  xlim c x min x max   ylim c 0 y max    breaks   seq 
128. mu 5 sd  mu 5 sd  by bin width      main paste  mu     mu     sigma     sd    xlab paste  Sample  i   col  1lightblue   border  gray    freq TRUE     X  lt   seq x min x max  length 120     true normal distribution  points x dnorm x  mu  sd   n bin width    type  1   col  blue   Tty 1  1wd 1 8     normal dist  estimated from sample  points x dnorm x  mean v   sd v    n bin width    type  1   col  red   lty 2  Iwd 1 8     print sample params     and Pr Type I error   text x min  0 9 y max  paste  mean    round mean v  2   pos 4   text x min  0 8  y max  paste  sdev    round sd v  2   pos 4   text x min  0 7 y max   paste  Pr t     round  t test v  mu mu   p value 2   pos 4       clean up  r mfrow c 1 1         127    References     1  Albert  J  2007  Bayesian computation with R  Use R  New York   Springer 106     2  Bivand  R  S   Pebesma  F  J    amp  G  mez Rubio  V  2008  Applied Spatial  Data Analysis with R  UseR  Springer  URL http    www asdar book org  106     3  Chambers  J  M  1998  Programming with Data  New York  Springer   ISBN 0 387 98503 4  URL http    cm bell Tlabs com cm ms departments sia Sbook   48     4  Christensen  R  1996  Plane answers to complex questions  the theory  of linear models  New York  Springer  2nd edition 55  56  58     5  Cleveland  W  S  1993  Visualizing data  Murray Hill  N J   AT amp T Bell  Laboratories  Hobart Press 77     6  Congalton  R  G   Oderwald  R  G    amp  Mead  R  A  1983  Assessing  landsat classification accuracy using 
129. n      gt  dim trees  1     1  31 3    gt  floor dim trees   1  2 3     1  20    gt   tr calib  lt   trees sort sample 1 dim trees  1   floor dim trees   1  2 3         Girth Height Volume    2 8 6 65 10 3  4 10 5 72 16 4  30 18 0 80 51 0    If we now want to select the validation set  i e  the remaining trees  we use  the setdiff function on the two sets of row names  extracted with the  rownames function      gt  rownames trees     1   1   2   3   4   og   g   7   gr  9g   1o   11   12   13   14   15    16   16   17   18   19   20   21   22   23   24   25   26   27   28   29   30    31   31    gt  rownames tr calib     1   2   4   5   ge  10   11   12   13   14   16   18   20   21   22   23    16   26   27   28   29   30    gt  setdiff rownames trees   rownames tr calib      1   1   3   g   7   9g   15   17   19   24   25   31    gt   tr valid     trees setdiff rownames trees   rownames tr calib        Girth Height Volume    1 8 3 70 10 3  3 8 8 63 10 2  6 10 8 83 19 7  31 20 6 87 77 0    31     gt  dim tr calib    1  20 3   gt  dim tr valid    1  11 3    The dataframe has been split into mutually exclusive frames for calibra   tion  20 observations  and validation  11 observations      4 17 Statistical models in S    Statistical models in S are specified in symbolic form with model formulae   These formulae are arguments to many statistical functions  most notably  the 1m  linear models  and glm  generalised linear models  functions  but  also in base graphics  85 1  funct
130. n a vector directly to a variable  In the case of reg   ular data  this can be easily accomplished with the seq or rep functions   For irregular data  the c function must be used to create a list     yr     seq 1900 2004 4    produces 1900  1904  1908      2004  men     c 11  11  10 8  10 8  NA  10 8  10 6  10 8  10 3   10 3  NA  NA     10 3  10 4  10 5  10 2  10  9 95  10 14  10 06   10 25  9 99  9 92  9 96  9 84  9 87  9 85         vov    Note the   continuation prompt from R  as long as the list was not com   pleted  not enough   to match the open Q  R could determine that the  expression was not complete  and prompted for more input     Also note the use of the special NA value to indicate a missing value  no  Olympic games were held in 1916  1940 or 1944  so there is no time for  those years     Vectors can also be entered with the scan function  This waits for you to  type at the console  until a blank line is entered  It shows the number of  the next data item as a continuation prompt      gt  women     scan     1  NA NA NA NA NA NA NA   8  12 2 11 9 11 5 NA NA   13  11 9 11 5 11 5 11 11 4 11 08 11 07 11 08 11 06 10 97  23  10 54 10 82 10 94 10 75 10 93   28     When the 28 was displayed  the user pressed the    Enter    key  and R re   alised the list was complete     The three vectors are then gathered a data frame  and the local copies are  discarded  By default the fields in the frame are named from the local       37 In RGui  select menu command Help   Manuals   R
131. n for the command to see what the  command actually does  Pay especial attention to the arguments and  any defaults  All methods have examples  experiment with these to  make sure you understand what the method really does     2  Look for the same commands in a tutorial or text and follow the  examples     3  Break down the command into smaller parts  make sure each part  does what you think     4  Experiment with test data or  toy  examples to understand how the  command really works     5  Look at the data structures with str and class  sometimes it has a  different structure than you thought     6  Look at your data with summary  head  or print  Maybe your data is  not what you thought     7  A common problem occurs when a variable defined in the workspace   also called a local variable  has the same name as a field in a data  frame  The local variable is found by R when it looks for the name   and masks the field name        data trees   str trees    data frame   31 obs  of 3 variables     Girth   num 8 3 8 6 8 8 10 5 10 7 10 8 11 11 11 1    Height  num 70 65 63 72 81 83 66 75 80 75        Volume  num 10 3 10 3 10 2 16 4 18 8 19 7 15 6 18 2   gt   Volume     sample 1 31     1  6 7 32122 124 91114 191320 8 5302931    112     19  18 10 23 15 12 27 16 2 26 17 28 4 25   gt  attach trees     gt  cor Volume  Girth     1  0 30725    This is not the expected value  it is very low  and we know that a  tree   s volume should be fairly well correlated with its girth  What is  wrong 
132. n of  height on girth to go through  0  0       gt  model  lt   Im Height   Girth   1  data trees    gt  model  lt   Im Height   O   Girth  data trees     V    V    M    Note  Although this seems logical  in the range of timber trees  this may  give a poorer predictive relation than allowing a  physically impossible   intercept and only predicting in the range of the calibration data     Arithmetic operations in formulas Since the characters          and    have special meaning in formulas  they must be  quoted  with the I op   erator if they are to interpreted as arithmetic operators  For example  to  model tree volume from the height to girth ratio     model  lt   Im Volume   I Height   Girth   data trees     To model volume as the square of girth     model  lt   Im Volume   I Girth 2   data trees     This is only needed if there is a danger of mis interpretation  most func   tions can be used directly in formulas  e g  the log function to compute  natural logarithms  For example  to fit a log log regression of tree height  by width     model  lt   1m  log Height    log Girth       For further description of model formulae  see the help topic      formula    54    The design matrix For full control of linear modelling  R offers the ability  to extract or build design matrices of linear models  this is discussed in  most regression texts  for example Christensen  4      The design matrix of a model is extracted with the model matrix func   tion      gt  model  lt   Im Volume
133. nd classes    S is an object oriented computer language  everything in S  including vari   ables  results of expressions  results of statistical models  and functions   is an object  each with a class  which says what the object is and also con   trols the way in which it may be manipulated  The class of an object may  be inspected with the class function      gt  class 1m     1   function     gt  class letters     1   character     gt  class seq 1 10      1   integer     gt  class seq 1 10  by  01     1   numeric     gt  class diag 10      1   matrix     gt  class iris     1   data frame     gt  class iris Petal Length     1   numeric     gt  class iris Species     1   factor     gt  class iris Petal Length  gt  2    1   logical     gt  classCImCiris Petal Width   iris Petal Length     1   Im      gt  class hist iris Petal Width     1   histogram       class table iris Species     1   table     The letters built in constant in this example is a convenient way to get  the 26 lower case Roman letters  for upper case use LETTERS     As the last three examples show  many S functions create their own classes   These then can be used by generic functions such as summary to determine  appropriate behaviour      gt  summary Ciris Petal Length   Min  1st Qu  Median Mean 3rd Qu  Max   1 00 1 60 4 35 3 76 5 10 6 90     gt  summary Ciris Species   setosa versicolor virginica  50 50 50     gt  summary Im iris Petal Width   iris Petal Length    Call   ImCformula   iris Petal Width   iris 
134. nerate sequences  vectors of sequences  and sequences  of dates for time series analysis     113      Look at the Task Views    These are a summary of the facilities in R  to accomplish certain tasks  multivariate methods  spatial statistics        including the names of the applicable packages and methods       Review the tutorials  89 2   they cover many common methods       The contributed documentation at CRAN   has many introductions  and reference cards for specific kinds of analysis  for example the  brief reference cards by Vito Ricci for regression analysis     and time   series analysis          Many textbooks use R to illustrate their discussion  89 3   e g    11   19  57  15  13   you can adapt their examples to your needs       If you don t find what you re looking for  perhaps the method is in  a contributed package which has not yet been installed on your  system  You can search for it on CRAN  the R archive         For example  the von Mises distribution is the circular analogue of the  normal distribution  12  p  322   On R with the default installation   a search for this term with the help search method will give no  results      gt  help search  von Mises    No help files found with alias or concept or title    matching  von Mises     However  you can search the R archive for the term    Von Mises    with  the RSiteSearch method     RSiteSearch  von Mises    RSiteSearch  von Mises   restrict  functions      The second form restricts the search to just func
135. ng with base graphics plots    If the output graphics device is a screen  e g  as initialised with the windows  method  it is possible to query the graph with the identify method for  scatterplots  This reads the position of the graphics pointer when the left  mouse button is pressed  and searches the co  rdinates given as its for the  closest point in the plot  If this point is close enough to the pointer  its  index is added to a list to be returned  once the right mouse button is  pressed     The co  rdinate pairs for identify is normally the same as the scatterplot      gt  plot Petal Length  Petal Width    gt   p     identify Petal Length  Petal Width     1  44 65 99   gt  iris p    Sepal Length Sepal Width Petal Length Petal  Width Species    44 5 0 3 5 1 6 0 6 setosa  65 5 6 2 9 3 6 1 3 versicolor  99 5 1 2 5 3 0 1 1 versicolor    This is quite useful for identifying unusual points in the plot     5 2 Trellis graphics    The Trellis graphics system is a framework for data visualization devel   oped at Bell Labs  where S originated  based on the ideas in Cleveland  5    It was implemented in S PLUS and then in R with the lattice package   its design and basic use are well explained by its author Sarkar  50  in the  R Newsletter  It is harder to learn than R base graphics  but can produce  higher quality graphics  especially for multivariate visualisation when the  relationship between variables changes with some grouping factor  this is  called conditioning the graph on
136. nstructed in 8A  as for a  typical Windows program  this will create a Start menu item and a desktop  shortcut     3 1 2 On the ITC network    R has been installed on the ITC corporate network at     Itcnt03 Apps R bin RGui   exe    For most ITC accounts drive P  has been mapped to   Itcnt03 Apps  so  R can be accessed using this drive letter instead of the network address     P  R bin RGui exe  1 http   www  linuxlinks com article 20110306113701179 GUIsforR  html       You can copy this to your local desktop as a shortcut   Documentation has been installed at     P  R doc    3 1 3 Running the R console GUI    R GUI for Windows is started like any Windows program  from the Start  menu  from a desktop shortcut  or from the application   s icon in Explorer     By default  R starts in the directory where it was installed  which is not  where you should store your projects  If you are using the copy of R on  the ITC network  you do not have write permission to this directory  so you  won t be able to save any data or session information there  So  you will  probably want to change your workspace  as explained in   3 1 4  You can  also create a desktop shortcut or Start menu item for R  also as explained  in   3 1 4     To stop an R session  type q   at the command prompt   or select the  File   Exit menu item in the Windows GUI     3 1 4 Setting up a workspace in Windows    An important concept in R is the workspace  which contains the local  data and procedures for a given stati
137. ntified interactively by left   clicking on the vertical colour bar at the midpoint  right click anywhere in  the graph when done                                               Colour number    Figure 15  Available colours     gt  abline h 0 5  1wd 3    to aim at   gt  selected     identify tmp  rep 0 5  length tmp      gt  selected   colour number    87     gt  colors   selected    colour name    The Red  Green  Blue of any colour can be examined with the col2rgb  method      gt  col2rgb  yellow3        1   red 205  green 205  blue 0    Single colours can be created with the rgb method  specifying Red  Green  and Blue contributions each in the range 0    1  completely absent     saturated       gt  rgb 0 25 0 5  0    1    408000     There are several built in colour ramps  sequences of colours that give  a pleasing visual impression   these are returned by the heat colors   terrain colors  topo colors  and cm colors methods  another palette  is provided by the bpy colors method of the sp package    These all re   turn a vector of colours from defined endpoints  according to the number  of levels requested        terrain colors 5     1   400A600    E6E600    EAB64E    EEB99F    F2F2F2     gt  terrain colors 10    1   400A600    2DB600    63C600    A0D600    E6E600    E8C32E    7   4EBB25E    EDB48E    FOC9CO    F2F2F2     gt  terrain colors 10   1    1    00A600     The hexidecimal codes here represent Red  Green  and Blue  from 00  no  colour  to FF  full colour   thus there 
138. oe d XL 34  459 Selecting SUbDSetS     cw xy hn ee ee ee hex 36       Version 4 0 Copyright    D G Rossiter 2003   2012  All rights reserved   Non commercial reproduction and dissemination of the work as a whole  freely permitted if this original copyright notice is included  To adapt or  translate please contact the author     4 9 1 Simultaneous operations on subsets              Z IORearrangme dalas o sok o c ox oy 9er om E RR oe Y Xm e Res  4 11 Random numbers and simulation                    a 12 Character Sugs gcexe ck  ve E RS Xen URS  4 130bjects and Classes i d Rom dme x x EA a   4 13 1 The S3 and SA class systems                   4 14 Descriptive statistics        o    o                          415 Classification tables a2 sii e a 64 bod edo e  sellos dq essa tren a es Bre ete we Se ee ea ee a  4 17 Statistical modelsinS                            4 17 1 Models with categorical predictors                4 17 2 Analysis of Variance  ANOVA                   4 18  Model output e sos s Gow ve ee a ck ux   4 181 Madel diggnostibS  a   ose Sog dee ee   4 18 2 Model based prediction                      4 19 Advanced statistical modelling                      4 20 Missing values  s 4 4 ox dae dL kee a  4 21 Control structures and looping                       4 22 User defined functions   c s   rato cess Deea eens  4 23 Computing on the language    5 22 dg hn    5 Rgraphics  S L Base graphieS aos wee m s Rosso Rls A we Da owe depo  5 1 1 Mathematical notation in base gra
139. ols  lt   2  reps  lt   rows cols    parameters of the normal distribution  mu     180  sd     20    set up graphic display  par mfrow c rows  cols      number of s d  s for histogram display  sdd     3 5    compute bin width from s d and    the number of bars for each  bin width sd 3    scale x axis  x min  lt   mu  sdd sd   x max  lt   mu  sdd sd     scale y axis  y max  lt   n 0 5 bin width sd    compute and display each graph  for  i in 1 reps     v     rnorm n  mu  sd   hist v  xlim c x min x max   ylim c 0  y max    breaks   seq mu 5 sd  mu 5 sd  by bin width    main     xlab paste  Sample  i      X     seq x min  x max  length 120     true normal distribution  points x dnorm x  mu  sd    n  xbin width    type  1   col  blue   Tty 1  l1wd 1 8     distribution estimated from sample  points x dnorm x  mean v   sd v    n bin width    type  1   col  red   lty 2  l1wd 1 8     print sample params     and Pr Type I error   text x min  0 9 y max  paste  mean    round mean v   2    pos 4   text x min  0 8 y max  paste  sdev    round sd v   2   pos 4   text x min  0 7 y max   paste  Pr t     round  t test v  mu mu   p value 2   pos 4        clean up  par  mfrow c 1 1    rm n  rows  cols  reps  mu  sd  v  i  sdd  bin width  x min  x max  y max  x        124                           mu   180   sigma   20  mean  177         sdev  16 45    I  Pr t   0 33  3    44     o  9  m  a y  o d   I T T T T T 1  120 140 160 180 200 220 240  Sample 1  mu   180   sigma   20  mean  180 66       s
140. olygonsDataFrame   obj    SpatialLinesDataFrame   obj    SpatialPointsDataFrame     For more on the S4 class system  see Chambers  3  Ch  7  amp  8  and Venables   amp  Ripley  56  Ch  5      4 14 Descriptive statistics    Numeric vectors can be described by a set of functions with self evident  names  e g  min  max  median  mean  length      gt  data trees   attach trees     48     gt  min Volume   max Volume   median Volume      mean Volume   length Volume     1  10 2    1  77    1  24 2    1  30 17097    1  31    Another descriptive function is quanti le        quantile Volume   0  25  50  75  100   10 2 19 4 24 2 37 3 77 0   gt  quantile Volume   1   10   15 6   gt  quantile Volume  seq 0 1 by  1    0  10  20  30  40  50  60  70  80  90  100   10 2 15 6 18 8 19 9 21 4 24 2 27 4 34 5 42 6 55 4 77 0    The summary function applied to data frames combines several of these  descriptive functions      gt  summary  Volume   Min  1st Qu  Median Mean 3rd Qu  Max   10 20 19 40 24 20 30 17 37 30 77 00    Some summary functions are vectorized and can be applied to an entire  data frame      gt  mean trees   Girth Height Volume  13 24839 76 00000 30 17097   gt  summary  trees   Girth Height Volume   Min    8 30 Min  163 Min  2 10 20  Ist Qu  11 05 Ist Qu  72 Ist Qu  19 40  Median  12 90 Median  76 Median  24 20  Mean 713 25 Mean  76 Mean 130 17  3rd Qu  15 25 3rd Qu  80 3rd Qu  37 30  Max   20 60 Max   87 Max   77 00    Others are not  but can be applied to margins of a matrix or dat
141. ometimes the same data must be arranged  different ways into data frames  depending on what we consider the ob   servations and columns     A typical re arrangement is stacking and its inverse  unstacking  In stack   ing  several variables are combined into one  coded by the original variable    name  unstacking is the reverse     For example  consider the data from a small plant growth experiment     40       data PlantGrowth   str PlantGrowth      data frame   30 obs  of 2 variables     weight  num 4 17 5 58 5 18 6 11 4 5 4 61 5 17 4 53    group   Factor w  3 levels  ctr1   trt1      1111111    There were two treatments and one control  and the weights are given in  one column  If we want to find the maximum growth in the control group   we could select just the controls and then find the maximum      gt  max PlantGrowth weight PlantGrowth group     ctr1      1  6 11    But we could also unstack this two column frame into a frame with three  variables  one for each treatment  and then find the maximum of one  new   column  for this we use the unstack function      gt  pg     unstack PlantGrowth  weight   group  str pg    data frame   10 obs  of 3 variables      ctrl  num 4 17 5 58 5 18 6 11 4 5 4 61 5 17 4 53 5 33 5 14    trtl  num 4 81 4 17 4 41 3 59 5 87 3 83 6 03 4 89 4 32 4 69    trt2  num 6 31 5 12 5 54 5 5 5 37 5 29 4 92 6 15 5 8 5 26    gt  max pg ctr1     1  6 11    The names of the groups in the unstacked frame become the names of  the variables in the stacked fram
142. ons menu     When using Rcmdr  observe the commands it formats in response to your  menu and dialog box choices  Then you can modify them yourself at the  R command line or in a script     Rcmdr also provides some nice graphics options  including scatterplots   2D and 3D  where observations can be coloured by a classifying factor     3 7 Loading optional packages    R starts up with a base package  which provides basic statistics and the R  language itself  There are a large number of optional packages for specific  statistical procedures which can be loaded during a session  Some of these  are quite common  e g  MASS   Modern Applied Statistics with S   57   and  lattice  Trellis graphics  50   85 2   Others are more specialised  e g  for  geostatistics and time series analysis  such as gstat  Some are loaded by  default in the base R distribution  see Table 4      If you try to run a method from one of these packages before you load it   you will get the error message    Error  object not found    You can see alist of the packages installed on your system with the library  method with an empty argument      gt  libraryO    To see what functions a package provides  use the library method with  the named argument  For example  to see what s in the geostatistical  package gstat      gt  library help gstat     To load a package  simply give its name as an argument to the require  method  for example      gt  require gstat   Once it is loaded  you can get help on any method in t
143. package is loaded  the method is available      gt  library MASS     gt  lgs lead   om  data meuse    Coefficients     Intercept  om   19 9 15 4   Scale estimates 46 9 45 9    4  If the command which produced the error is compound  break it  down into small pieces  beginning with the innermost command and  then working outwards     5  Review the documentation for the command  it may explain situ   ations in which an error will be produced  For example  if we try  to compute the non parametric correlation between lead and organic  matter in the Meuse data set  we get an error      gt  library gstat   data meuse     gt  cor meuse lead  meuse om  method  spearman     Error in cor lead  om    missing observations in cov cor   gt   cor    The help for cor says   If use is  all obs   then the presence of  missing observations will produce an error   in the usage section it    111    shows that use    all obs  is the default  so we must have some  missing values  We can check for these with the is na method      gt  sum is na lead    sum is na om     1  0   1  2    There are two missing values of organic matter  but none of lead    Then we must decide how to deal with them  one way is to compute  the correlation without the missing cases      gt  cor lead  om  method  spearman   use  complete     1  0 59759    10 2 Why didn t my command s  do what I expected     Because R does what you said  not what you meant  Some ideas to make  the two match     1  Review the on line documentatio
144. phics            5 1 2 Returning results from graphics methods           5 1 3 Types of base graphics plots     lt  o esoe sec areenaa  5 1 4 Interacting with base graphics plots              o2 Trellis graphics  srs 4m Rm eh ace a  5 2 1 Univariate Plots   cue dede oe EL  5 2 7 Bivariat   plots  cusa vere RE EUR oe  5 2 3  Tr  variate plots  coe onewa ee a    2k   EE  5 2 4 Panel functions  cre vra neresti naa pk ESG  5 2 5 Types of Trellis graphics plots          osse  5 2 6 Adjusting Trellis graphics parameters             5 3 Multiple graphics windows                         5 3 1 Switching between windows                    5 4 Multiple graphs in the same window                  DAL Basepraphies 232223 94mm b REY  5 4 2 Trellisgraphics                eee eee eee  5 9   CO  QUES uia a a ws    6 Preparing your own data for R  6 1 Preparing data directly im R 2 0 2 2 6 je eee m  6 2 A GU data editor       2  oa wea xa ee kee at  6 3 Importing data froma CSV file  cios x EVI  6 4  Importing Images   c eee so ee pe e gro    7 Exporting from R    List of Figures    Reproducible data analysis   8 1  The NoWeb  document ss 6s  va we 9e m 9 39 eo RERO Y  bl Te ies COCUMIEN 14283 9 3o 3 X aet eek eee EROR Een  8 3 The PDF document zd isdem hh kere oR E See g  8 4 Graphics in Sweave        ooo    ees    Learning R   QJ Task VIEWS  nne we EE won a oe ce ae ee ee Ree o  9 2 R tutorials and introductions   a 2 2222 da  9 3  Textbooks using R ca e ae iese A deck Rs SZ  9 4 Technical no
145. phics  51   dynamic visualization  7   wavelet analysis  29    and non linear regression  41       It is increasingly common for sophisticated statistics texts to use R to  illustrate their concepts  an example is the geostatistics text of Diggle  amp   Ribeiro Jr   13   which uses the authors  geoR package  39   So theory and  practice go hand in hand        50 http   www r project org    5l http    www biosino org R R doc       http    rbbs biosino org Rbbs forums list page   33 http   www  springer  com series 6991 detai lsPage titles  5 as of 13 August 2012    106    The text of Shumway  amp  Stoffer  52  on time series analysis uses R code for  illustration  they also provide an on line tutorial at the book s supporting  website       9 4 Technical notes using R    I have written several technical notes on statistical topics  using R to com   pute and graph  these are all available as PDF files on line      If you work  through these and use them as starting points for your own analysis  you  will have a good basis in R  One note  48  is designed as a tutorial on R  and the S language  Another  46  is designed to show as many R tech   niques as possible  exploratory data analysis  univariate statistics  bivari   ate correlation and regression  multivariate analysis including PCA  and  some geostatistics  Others are more specialised      land cover change with logistic regression  49       assessing map accuracy  43       co kriging  45       fitting rational functions to
146. re called factors and are of two types  unordered  nominal   and ordered  ordinal   An example of the first is a soil type  of the second  soil structure grade  from  none  through  weak  to  strong  and  very  strong   there is a natural order in the second case but not in the first   Many analyses in R depend on factors being correctly identified  some  such as table  84 15  only work with categorical variables     Factors are defined with the factor and ordered functions  They may be  converted from existing character or numeric vectors with the as  factor  and as ordered function  these are often used after data import if the  read table or related functions could not correctly identify factors  see  86 3 for an example  The levels of an existing factor are extracted with the  levels function     For example  suppose we have given three tests to each of three students    and we want to rank the students  We might enter the data frame as fol   lows  see also 86 1      34     gt  student     rep 1 3  3    gt  score  lt   c 9  6 5  8  8  7 5  6  9 5  8  7    gt  tests     data frame cbind student  score     gt  str tests       data frame   9 obs  of 2 variables      student  num 12312312 3     score   num 96 5 887 569 5 87    We have the data but the student is just listed by a number  the table  function won t work and if we try to predict the score from the student  using the Im function  see 84 17  we get nonsense      gt  Im score   student  data tests   Coefficients    
147. ression function turns its argument into an R expres   sion  ready to be evaluated with the eval function  more on this  in 84 23     Figure 9 shows an example of a graph with superscripts  e g      R      and  math  notation  e g  Greek letters such as 6 and a summation  produced  with the following code  Note the use of the substitute function to place  a numeric value  computed on the fly from a linear model result  into a  text string  along with math  symbols  Note also the alignment of the text    73    V    V    V    V MN    V    with the pos  position  argument to the text function     plot single SW  single SA  pch as numeric single spp     xlab expression paste  Stem area     plain cm A2     y lab expression paste  Sapwood area     plain cm A2     xlim c  0 150   ylim c 0 90    abline O coefficients m all sw sa spp i st  1    title main  Sapwood area vs  stem area  smaller trees    for  i in 2 9  1  abline O coefficients m all sw sa spp i st   1    coefficients m all sw sa spp i st  i  lty i 1      legend x 0 y 90  legend spp  pch c 2 length spp   1   Ity c 2 length spp   1    text 150 0  S formula  Im SW   SA spp   1   pos 2   text 150 6 substitute plain R A2    x   list  x round summary m all sw sa spp i st  adj r squared 3     pos 2   text 150 12 expression paste plain Model       y    beta 1  s   sum beta 2  x k   k  1  n     epsilon     pos 2     Sapwood area vs  stem area  smaller trees             Sapwood area  cm     Model y   Bis   Y  Box        k 1    R   
148. s  URL http    www math csi cuny edu UsingR 106    132    Index of R concepts     operator  38     formula operator  53     operator  19  27     formula operator  53  54    operator  19     formula operator  53  54    operator  19     formula operator  54     operator  19     formula operator  53     operator  22      64      operator  37       operator  20     operator  20    gt   operator  38      11  12    Devices  10     operator  47    E 1  operator  30  32       operator  22  36  37  90      operator  28  55       operator  19      operator  19    amp  operator  38   A formula operator  53   A operator  19  23     formula operator  52  78    formula operator  78     operator  38     operator  24  32  47     operator  43      64      64    abline  71  72  abs  19   AIC  59   anova  57   any  61  67   aov  57  apply  29  49  61  arrows  65  71  as    94  as character  43  as data  frame  93  99  as factor  34  45    as matrix  96   as numeric  45  71  97   as ordered  34   asin  20   asp trellis graphics argument  89  assocp lot  75  83   attach  32   attr  26   AttributeList  package sp  class  47  attributes  26  96   auto key trellis graphics argument  79  axis  71  73    barchart  package  lattice   83  barplot  75   bbox  package sp   47   bg graphics argument  70  86  biplot  62   boot package  62   box  71   boxplot  52  75   bpy colors  package sp   88  bwplot  package  lattice   83  by  29  39    c  22  91  92   car package  36  77  cbind  33   ceiling  19   cex 
149. se  64  67  ifelse  65   image  75  influence measures  60  intersect  51   iplots package  121  iris dataset  18  39  69  75  77  is element  51  is factor  45  55  is na  112  is numeric  45    krige  package gstat   99    lapply  29   lattice package  5  17  69  77  81  82  85   86   legend  71  72   length  23  25  48  89   LETTERS constant  44  92   letters constant  44   levelplot  package  lattice   52  79   83  86   levelplot  81   levels  34  36   library  17  45   lines  71   list files  110   lin  35  52  55  57  58  62  72   Im ridge  package mass   62   loess  62   log  19  54    134    10910  19   10g2  19   lqs  package mass   62  111  lqs  72   Ts  21   Tty graphics argument  72    main graphics argument  70   MASS package  17  62  63  72  77  111  matplot  75   matrix  26  92  97   max  25  48   mean  48   median  48   memory limit  115  memory size  115   meuse dataset  46  50  56  89  94  99  meuse grid dataset  99   min  48   mle vonmises  package circular   114  model matrix  55   mosaicplot  75   mtext  71    na omit  63  names  31  41  nlme package  2  nls  62   nnet package  2    object size  115  options  56  121  order  34  ordered  34    pairs  75   palette  87   panel trellis graphics argument  81  panel abline  package lattice   82  panel fill  package lattice   81  panel xyplot  package lattice   82  par  59  85   parallel  package lattice   83  parse  67  68   paste  31  43  67   pch graphics argument  70   pdf  10   persp  75   pi constant  19 
150. se symbol   xyplot Sepal Width   Sepal Length  group Species  iris     non  non  non    transparent   transparent     transparent     non non    V VN FM    There are a large number of options  each with sub options  For example   the superposition symbol has a character code  pch   a vector of colours   col   a vector of fonts  font   and a character expansion fraction  cex    These can all be set and then written back as shown  Subsequent graphs  use the changed parameters     Colour options are often specified with colour ramps  see 85 5 for details     5 3 Multiple graphics windows    To open several graphics windows at the same time  use the windows  method  R opens the first graphics window automatically in response to  the first graphics method such as plot or hist  in the following example  we assume no such commands have yet been given      gt  dev listQ  NULL   gt  windows      gt  dev listQ  windows   2   gt  dev cur    windows   2    At this point  there is only one window and it is  of course  the current    graphics device  i e  number 2  for some reason  1 is not used   The results  of any plotting commands will be displayed in this window     84    Now we open another window  it becomes the current window      gt  windows      gt  dev  listQ  windows windows  2 3   gt  dev cur    windows  3    At this point  any plot commands will go to the most recently opened win   dow  i e  number 3     5 3 1 Switching between windows    The dev set method specifies which gr
151. stics project  Under Windows this is  usually determined by the folder from which R is started    Under Windows  the easiest way to set up a statistics project is     1  Create a shortcut to RGui   exe on your desktop     2  Modify its properties so that its in your working directory rather  than the default  e g  P   R bin      Now when you double click on the shortcut  it will start R in the directory  of your choice  So  you can set up a different shortcut for each of your  projects     Another way to set up a new statistics project in R is     1  Start R as just described  double click the icon for program RGui   exe  in the Explorer     2  Select the File   Change Directory     menu item in R     3  Select the directory where you want to work        12 This is a special case of the q method    Revealing hid   den files in  Windows    4  Exit R by selecting the File   Exit menu item in R  or typing the  qO command  R will ask  Save workspace image    Answer y  Yes    This will create two files in your working directory   Rhistory and    RData     The next time you want to work on the same project   1  Open Explorer and navigate to the working directory  2  Double click on the icon for file   RData    R should open in that directory  with your previous workspace already  loaded   If R does not open  instead Explorer will ask you what programs  should open files of type   RData  navigate to the program RGui   exe and  select it      If you don t see the file   RData in your
152. ta as an Excel spreadsheet with named columns     2  Export from Excel to a  CSV  file  using Excel s File   Save As  menu item     3  Import into R with the read csv function     4  Adjust data types in R if necessary     93    We illustrate this with a simplified version of the meuse dataset from the  gstat and sp packages  which has been prepared to illustrate some issues  with import     Here is a small CSV file written by Excel and viewed in a plain text editor  such as Notepad     X y cadmium elev dist om ffreq soil lime landuse  181072 333611 11 7 7 909 0 00135803 13 6 1 1 1 Ah  181025 333558 8  983 0 0122243 14 1 1 1 Ah  181165 333537 6  8 0 103029 13 1 1 1 Ah  181298 333484 2  655 0 190094 8 1 2 0 Ga   2  4   3 1    181307 333330   48 0 27709 8 7 1 2 0 Ah    6 6  5 7  6 7  8 7  181390 333260 3 7 791 0 364067 7 8 1 2 0 Ga    Note that     There is one line per observation  record        Each record consists of the same number of fields  which are sepa   rated by commas       The first line has the same number of fields as the others but consists  of the field names     Suppose this file is named example csv  To read into R  we first change  to the directory where the file is stored and then read into an R object with    the read  csv function          gt  ds     read csv  example csv       gt  str ds    data frame   6 obs  of 10 variables     x   int 181072 181025 181165 181298        y   int 333611 333558 333537 333484        cadmium  num 11 7 8 6 6 5 2 6 2 8 3    el
153. tes using R            oo    o          9 5 WebPabestoleamR 4 222229 Xe   xXx  9 6 Keeping up with developments in R                     10 Frequently asked questions    B    10 1 Help  I got an error  what did I do wrong                10 2 Why didn   t my command s  do what I expected            10 3 How do I find the method to do whatIwant              10 4 Memory problems   uuo m Rmo n Um REINDESYXONCEDE x ADS Y  10 5 What version ofRamIrunning                  ls   10 6 What statistical procedure should I use                   Obtaining your own copy of R   Al Installing new packages  uem orm x RR ex eg  A 2 Customizing yourinstallation                  rn   A 3 Rindifferenthumanlanguages                css     An example script    C An example function    References    Index of R concepts    The RStudio screen                 c  The DiBnsR SCvGen  osie nod daa id dd  The R Commander screen                en ne nee  Regression diagnostic plots             00002 eee  Finding the closest point        o o   ooo    srr nnu  Default scatterpl  t    sis o e o on  Plotting symbols    2x eea eea ERR ds  Custorn scatterplot   x so emu kem ee aS  Scatterplot with math symbols  legend and modellines      Some interesting base graphics plots                  Trellis density plots   ooo a e  Trellis Scatter plots  occitano Rucker bed xx  Trellis trivartate Plots   ooo    ee ee a  Trellis scatter plot with some added elements              CON Dd UH YN    REPO  Ba wne o    101  101  10
154. the data  frame  we illustrate this with the trees example dataset introduced in    84 7      gt  attach trees     gt  HG Ratio  lt   Height Girth  str HG Ratio   num  1 31  8 43 7 56 7 16 6 86 7 57        gt  trees  lt   cbind trees  HG Ratio   str trees    data frame   31      Girth    Height    Volume      hum    hum    hum      HG Ratio  num   gt  rm HG Ratio     obs  of 4 variables    8 3 8 6 8 8 10 5 10 7 10 8       70 65 63 72 81 83 66 75 80 75      10 3 10 3 10 2 16 4 18 8 19 7      8 43 7 56 7 16 6 86 7 57       33    Note that this new field is not visible in an attached frame  the frame    must be detached  with the detach function  and re attached      gt  summary HG Ratio    Error  Object  HG Ratio  not found    gt  detach trees   attach trees     gt  summary  HG Ratio   Min  1st Qu  Median Mean 3rd Qu  Max   4 22 4 70 6 00 5 99 6 84 8 43    Sorting a data frame This is most easily accomplished with the order  function  naming the field s  on which to sort  and then using the returned    indices to extract rows of the data frame in sorted order      gt  trees order trees Height  trees Girth      Girth Height Volume    3 8 8 63 10 2  20 13 8 64 24 9  2 8 6 65 10 3  4 10 5 72 16 4  24 16 0 72 38 3  16 12 9 74 22 2  23 14 5 74 36 3  18 13 3 86 27 4  31 20 6 87 77 0    4 8 Factors    Note that the trees are first sorted by height  then any ties in height are  sorted by girth     Some variables are categorical  they can take only a defined set of values   In S these a
155. the factor for each object  this can be useful  if we want to give a numeric argument derived from the factor    4 13 1 The S3 and S4 class systems    S has two class systems  referred to as  3  old style  and S4  new style    Most R functions still use S3 classes  as presented in the previous section   but most new and rewritten packages use the more powerful and modern  S4 classes  The difference between the two systems is readily apparent in  the output of both the class and str functions     A good example of S4 classes is the class structure of the sp spatial statis   tics package  34   First we look at an R object that is an old style  S3  class   Note the use of the require function instead of library  this ensures  that the library is loaded only once     45     gt  require sp    gt  data meuse    gt  class meuse    1   data frame    gt  str meuse    data frame   155 obs  of 14 variables     x   num 181072 181025 181165 181298 181307        y   num 333611 333558 333537 333484 333330        cadmium  num 11 7 8 6 6 5 2 6 2 8 3 3 2 2 8 2 41 6         dist m   num 50 30 150 270 380 470 240 120 240 420        The sample data meuse is imported as an S3 class  namely a data  frame   Notice that the co  rdinates of each point are listed as fields  However  sp  has defined some S4 classes to make the spatial nature of the data explicit     sp also provides a coordinates function to set the spatial co  rdinates  and thereby create explict spatial data      gt  coordinates meuse
156. the missing volume  was removed     4 21 Control structures and looping    S is a powerful programming language with Algol like syntax     and control  structures     Single statements may be grouped between the braces   and    separated  either by new lines or the command separator    The value of the   and    expression is the value of its last statement     The if    else control structure allows conditional execution  These can  be nested  i e  the statement for either the if or else can itself contain a  if    else control structure     The for  while  and repeat functions allow looping  Note however that  because of R s many vectorized functions looping is much less common in  R than in non vectorized languages such as C     For example  to convert the sample confusion matrix from counts to pro   portions  you might be tempted to try this        cmp     cm   gt  for  i in 1 length cm   cmp i   lt   cm i  sumCcm    gt  cmp     1    2    3    4    1   0 21472393 0 08588957 0 06748466 0 006134969   2   0 02453988 0 06748466 0 01840491 0 000000000   3   0 07361963 0 05521472 0 23312883 0 024539877   4   0 01226994 0 03067485 0 07361963 0 012269939    But you can get the same result with a vectorized operation      gt  cmp     cm sum cm     See Chapter 9 of  36  for more details     An example of both the for and if    else control structures is the  following code to find the closest sample point to a given point  This also  illustrates the data frame function to create a n
157. this example  we applied the mean function to the Petal Length field  in the attached data frame  grouping the petal lengths by the Species  categorical factor     A function can be applied to several fields at the same time  and the results  can be saved to the workspace      gt  iris m  lt   by iris  1 4   Species  mean    gt  class iris m    1   by    gt  str iris m   List of 3    setosa   Named num  1 4  5 006 3 428 1 462 0 246      attr     names    chr  1 4   Sepal Length   Sepal Width         versicolor  Named num  1 4  5 94 2 77 4 26 1 33      attr     names    chr  1 4   Sepal Length   Sepal Width         virginica   Named num  1 4  6 59 2 97 5 55 2 03      attr     names    chr  1 4   Sepal Length   Sepal Width         attr     dim    int 3    attr     dimnames   List of 1      Species  chr  1 3   setosa   versicolor virginica     attr     call    language by data frame data   iris   1 4    INDICES   Species  FUN   mean      non      attr     class    chr  by     gt  iris m setosa   Sepal Length Sepal Width Petal Length Petal Width  5 006 3 428 1 462 0 246    gt  iris m setosa 3    Petal Length  1 462    gt  iris m setosa  Petal Length     Petal Length  1 462    As this example shows  the result is one list for each level of the grouping  factor  here  the Iris species   Fach list is a vector with named elements   the dimnames attribute      4 10 Rearranging data    As explained above  84 7   the data frame is the object class on which  most analysis is performed  S
158. tions  by default  documentation and the R help mailing list archives are also searched   This opens the search page http   search r project org  in a browser     In this case  several matches are shown  including two packages  A  review of their contents shows that the circular package for circu   lar statistics is the more complete  so it should be installed on your  system    A 1      Once the new package is installed  its contents are available to be  searched  and this time the term    von Mises    is found  Several meth   ods in the circular package are relevant        60 http   cran   6  http   cran   8  http   cran     63 http    cran    r project  r project  r project     r project  6 http    cran     r project     org web views index html    org other docs html   org doc contrib Ricci refcard regression pdf   org doc contrib Ricci refcard ts pdf   org     114     gt  help search  von Mises    dmixedvonmises circular    Mixture of von Mises Distributions  mle vonmises circular    von Mises Maximum Likelihood Estimates  mle vonmises bootstrap ci  circular    Bootstrap Confidence Intervals  pp plot circular    von Mises Probability Probability Plot  dvonmi ses circular    von Mises Density Function    You can get help on any of these by loading the package and viewing  the help      gt  library circular    gt   dvonmises    10 4 Memory problems    Information on memory usage is returned by the object size method for  individual objects  and the gc   garbage collection 
159. tripchart 1 D Scatter Plots  sunflowerplot Sunflower Scatter Plots                Table 2  Base graphics plot types    Figure 10 shows examples of a boxplot  a conditioning plot  a pairwise  scatterplot  and a star plot  all applied to the Anderson iris dataset      gt  boxplot Petal Length   Species  horizontal T     75    t VtVtVtE44    col  lightblue   boxwex  5     xlab  Petal length  cm    ylab  Species    main  Grouped box plot      coplot Petal Width   Petal Length   Species     col as numeric Species   pch as numeric Species      pairs iris  1 4   col as numeric Species    main  Pairwise scatterplot    stars iris  1 4   key  loc c 2 35   mar c 2  2  10  2      main  Star plot of individuals   frame T           Grouped box plot                                           Given   Species                                                                                                                                                                                                          147                         virginica  versicolor  g  E   4   MIB         E  2   gt   1 2 3 4 5 5 7    1 1     1 Lg  ET H3  fee  H llo    F A    HE HAF    g 5        3   te La  o 8  a emus PME    3 2  o 2  9 2 pa  s Fs  2     3 4J  D  d  g 34  3   of       i  3     PE s  Era   s LOON   24 aad m  s 9080  T T T T T T T 1383   1 2 3 4 5 6 T T T T T  1 2 4 5  Petal length  cm   Petal Length  Pairwise scatterplot  20 25 30 35 40 05 10 15 20 25  jt pco pee pe ee S  cog     el EE Star plot of indiv
160. trix 1 15  5  3  byrow T      1    2    3     26     1   1 2 3    2   4 5 6    3   7 8 9    4   10 11 12    5   13 14 15    gt   m     matrix 1 5  5  3      1    2    3     1   1 1 1    2   2 2 2    3   3 3 3    4   4 4 4    5   5 5 5  In this last example the shorter vector 1 5 is re cycled as many times as  needed to match the dimensions of the matrix  in effect it fills each column  with the same sequence   A matrix element s rows and column are given by the row and col func   tions  which are also vectorized and so can be applied to an entire matrix     gt  col m       1    2    3    1   1 2 3     2   1 2 3   3   1 2 3   4   1 2 3   5   1 2 3    The diag function applied to an existing matrix extracts its diagonal as a  vector      gt   d  lt   diag cm     1  35 11 38 2    The diag function applied to a vector creates a square matrix with the  vector on the diagonal      d     diag seq 1 4         1    2    3    4    1 1 1 0 0 0   2 1 0 2 0 0   3   0 0 3 0   4   0 0 0 4    And finally diag with a scalar argument creates an indentity matrix of the  specified size      gt   d  lt   diag 3      1    2    3    1   1 0 0   2   0 1 0   3   0 0 1    Arithmetic operators such as   operate element wise on matrices as on    27    any vector  if matrix multiplication is desired the     operator must be  used      gt  cm  cm     1    21   31   4    1   1225 196 121 1   2   16 121 9 0   3 1  144 81 1444 16   4   4 25 144 4   gt  cm   cm     1    21   31   4    1   1415 748 857 81   2 
161. ttern analysis and simula   tion       the vegan package of ordination methods for ecology     the circular package for directional statistics     the sp package for a programming interface to spatial data       the rgdal package for GDAL standard data access to geographic data  sources     There are also packages for the most modern statistical techniques such  as       sophisticated modelling methods  including generalized linear mod   els  principal components  factor analysis  bootstrapping  and robust  regression  these are listed in 84 19      wavelets  wavelet       neural networks  nnet       non linear mixed effects models  n1me       recursive partitioning  rpart        Splines  splines        random forests  randomForest     2 WhyR for ITC      ITC  is an abbreviation for University of Twente  Faculty of Geo information  Science amp  Earth Observation  It is a faculty of the University of Twente lo   cated in Enschede  the Netherlands  with a thematic focus on geo information  science and earth observation in support of development  Thus the two  pillars on which ITC stands are development related and geo information    R supports both of these     2 1 Advantages    R has several major advantages for a typical ITC student or collaborator     1  It is completely free and will always be so  since it is issued under  the GNU Public License      2  It is freely available over the internet  via a large network of mirror  servers  see Appendix A for how to obtain R     3
162. unction  which takes as its argu   ment the name of the NoWeb file      gt  Sweave  example Rnw      The result is a BIX input file  with the standard file extension   tex and the  same first part of the name  where the NoWeb chunks have been evaluated   and formatted as both input and output  In this example the file is named  example tex     8 2 The BIEX document    This is now just a standard BIEX input file  R code has been formatted in a  special Schunk PTE enviroment  provided by the BIEX Sweave package  as  specificed in the  usepackage Sweave  declaration in the preamble  In  this example we have the following file      documentclass article    usepackage pdftex  final   graphicx    usepackage Sweave    begin document    title Sweaving for reproducible data analysis    author A Nonymous    date 14 November 2010    maketitle  Here is a simple example of R and Sweave    begin Schunk    begin Sinput   R gt  data trees   R gt  str trees    end Sinput    begin Soutput    data  frame   31 obs  of 3 variables     Girth   num 8 3 8 6 8 8 10 5 10 7 10 8 11 11 11 1 11 2        Height  num 70 65 63 72 81 83 66 75 80 75        Volume  num 10 3 10 3 10 2 16 4 18 8 19 7 15 6 18 2 22 6 19 9       end Soutput    begin Sinput   R gt  summary Im Volume   Height   Girth  data   trees   adj r squared   end Sinput    begin Soutput    1  0 97285   end Soutput    end Schunk    end document     102    Note how the input code has been parsed and placed in the special Sinput  IATEX environment
163. using the      select array elements   operator      gt  trees 1 3    Girth Height Volume    1 8 3 70 10 3  2 8 6 65 10 3  3 8 8 63 10 2     gt  trees c 1  3  5     Girth Height Volume    36    1 8 3 70 10 3   3 8 8 63 10 2   5 10 7 81 18 8    gt  trees seq 1  31  by 10     Girth Height Volume   1 8 3 70 10 3   11 11 3 79 24 2   21 14 0 78 34 5   31 20 6 87 77 0    A negative subscript in this syntax excludes the named rows and includes  all the others      gt  trees   1 27     Girth Height Volume    28 17 9 80 58 3  29 18 0 80 51 5  30 18 0 80 51 0  31 20 6 87 77 0    Selecting with a logical expression The simplest way to select subsets  is with a logical expression on the row subscript which gives the criterion   For example  in the trees example dataset introduced in   4 7  there is  only one tree with volume greater than 58 cubic feet  and it is substantially  larger  we can see these in order with the sort function      gt  attach trees     gt  sort Volume    1  10 2 10 3 10 3 15 6 16 4 18 2 18 8 19 1 19 7   11  19 9 21 0 21 3 21 4 22 2 22 6 24 2 24 9 25 7   21  27 4 31 7 33 8 34 5 36 3 38 3 42 6 51 0 51 5   31  55 4 55 7 58 3 77 0    To analyze the data without this    unusual    tree  we use a logical expression  to select rows  observations   here using the     less than  logical compara   ison operator  and then the      select array elements  operator to extract  the array elements that are selected      gt  tr     trees Volume    60      Note that there is no cond
164. ut twice a year  This means the R user needs to invest some time to    keep up with developments     Read the R Journal   This is issued about four times a year  and is  announced on R home page  It is an attractive PDF document with  news  announcements  tutorials  programmer s tips  bibliographies  and much more  Prior to 2009 it was known as the R Newsletter  this  is archived along with the Journal  Many important developments  in R were and are explained  and illustrated with examples  in the    Journal and Newsletter     Subscribe to one or more mailing lists  follow the  Mailing Lists  link  on the R Project home page  The most relevant for most ITC users    are       R announce  major announcements  e g  new versions       57 http    journal r project org     108      R packages  announcements of new or updated packages      R help  discussion about problems using R  and their solutions   The R gurus monitor this list and reply as necessary  A search  through the archives is a good way to see if your problem was  already discussed  You can access this search via the Search link  on the R home page    or from within R with the RSiteSearch  method        RSiteSearch  logistic and ROC      restrict  Rhelp02a   sortby    date late      A search query has been submitted to http   search r project org  The results page should open in your browser shortly    Figure 20 shows the results of this search     nnn R site search   lt logistic and ROC gt          4     4        A A 
165. ve graphics environment called Trellis  which  has been emulated with the lattice package in R  85 2   S PLUS is licensed  by local distributors in each country at prices ranging from moderate to  high  depending factors such as type of licensee and application  and how  many computers it will run on  The important point for ITC R users is that  their expertise will be immediately applicable if they later use S PLUS in a  commercial setting     2 3 2 Statistical packages    There are many Statistical packages  including MINITAB  SPSS  Statistica   Systat  GenStat  and BMDP    which are attractive if you are already familiar  with them or if you are required to use them at your workplace  Although  these are programmable to varying degrees  it is not intended that special   ists develop completely new algorithms  These must be purchased from  local distributors in each country  and the purchaser must agree to the li   cense terms  These often have common analyses built in as menu choices   these can be convenient but it is tempting to use them without fully un   derstanding what choices they are making for you     SAS is a commercial competitor to S PLUS  and is used widely in industry   It is fully programmable with a language descended from PL I  used on  IBM mainframe computers      2 3 3 Special purpose statistical programs    Some programs adress specific statistical issues  e g  geostatistical analysis  and interpolation  SURFER  gslib  GEO EAS   ecological analysis  FRA
166. w script   2  change  the working directory to this script s location   3  write R code in the script    4  pass lines of code from the script to the command line interface and  evaluate the output   5  examine any graphs and save for later use         ds DLV   RStudio  Q  2  AE    plv      rgeo R  am Workspace History A    A  JSeuceonsaee Q      Run     9     Source    3 EP Load        Saver   j ImportDataset   Y Clear All  1 Data  2 div 88 obs  of 33 variables  a are Cau aly CSV div r 87 obs  of 33 variables  5 div  r lt  div dlv r geo gt   1  length dlv  eines  e ETE clay b aov aov 13   7 r geo cols character 7   s r  geo cols  lt   cC    red    green3            blue    magenta   92 plot jitter dlv r  2  4   jitter dlv r 4   cex dlv r dens b max dlv r dens b   pl    IUE title main  Lake Valencia  sub  Subsoil bulk density  by geomorphic region   6   7     pch 21  pt bg c  red    green3      11 Legend  topleft   paste  Region   c  2      12 puteis   13   14 par mfcol c 2 2   col main  darkblue    15 boxplot clay b r geo  div r  mai ay    B   xlab  Geomorphic Region   16 LUCES EP AE B   xlab  Geomorphic Region     17 boxplot dens b r geo  div r  main  Bulk Density  B   xlab  Geomorphic Region   18 boxplot vfs b r geo  dlv r  main  Very fine sand    B   xlab  Geomorphic Region   19 Felli  Reems   Files Plots Packages Help    21 par mfcol c 2 2   col main  darkblue    a 49 Zoom   Export     Q     s Clear All  22 boxplot clay a r geo  div r  main  Clay    A   xlab  Geomorphic 
167. xample is the cross classification or contingency table showing  which land uses are associated with which flood frequency classes     The x  test for conditional independence may be performed with the chisq test  function      gt  chisq test ffreq  lime   Pearson s Chi squared test    data  ffreq and lime       31 No  I don t know what the name stands for  either     50    X squared   26 81  df   2  p value   1 508e 06    Both the x  statistic and the probability that a value this large could occur  by chance in the null hypothesis of no association is true  the    p value      are given  the second is from the x  table with the appropriate degrees of  freedom     df      Here it is highly unlikely  meaning the flood frequency and  liming are not independent factors     4 16 Sets    S has several functions for working with lists  including vectors  as sets  i e   a collection of elements  these include the is  element  union  intersect   setdiff  setequal functions  The unique function removes duplicate el   ements from a list  the duplicated function returns the indices of the  duplicate elements in the original list     The setdiff function can be used to select the complement of a defined  set  For example  in 84 9 we selected random elements of the trees ex   ample dataset  We repeat that here  suppose this is a subsample  perhaps  for calibration of some model  We select 2 3 of the trees for this set at  random  rounded down to the nearest integer with the floor functio
168. y see that the multi modal distribution of the  entire data set is due to the different distributions for each species     5 2 2 Bivariate plots    The workhorse here is the xyplot method  now with a dependent  y axis   and independent  x axis  variable  this can also be conditioned on one or  more grouping factors     xyplot Petal Width   Petal Length  data iris   groups Species  auto key T    xyplot Petal Width   Petal Length   Species  data iris   groups Species       V  V    78    These are shown in Figure 12  Note the use of the groups argument to  specify a different graphic treatment  in this case colour  for each species   and the auto key argument to get a simple key to the colours used for  each species           All species  setosa o  versicolor  virginica    1    Split by species    o    123 45 6 7  AE ji       1   1  versicolor       1 1 1   1 1  setosa virginica          2 0 4    Petal Width       0 0 4                      L L L L  25  ee b  H         m e  e  ooo     o         ecco o   20    eof  m ee H a  os   e  ec  ec eco o         e     o   S J L  i    sesso Lo m keen ae  m     Pastos  m  E de  IP       a   o  eee oo E 10     m        0 5 4     055 e L    oco o ome  eoo o bd  e  corso  o came  nmm     Ley 0 0   L  T T T T T T T T T T T T TT T T T T T TT T T T T T T  1 3 4 5 6 7 123 45 67 12345 67  Petal Length Petal Length          Figure 12  Trellis scatter plots  without and with a conditioning factor       5 2 3 Triivariate plots    VttvVt  4  VV44 4 9V 
169. you  like  for example  change the sample size n or the number of replications    save it as a command file  and run it with the source function as explained  in 83 5     You prepare this with a plain text editor as a text file  for example plot4 R    and then  source  it into R  which executes the commands immediately        source  plot4 R      If you want to change the plotting parameters  you have to change the  script and re source it  The next section offers a better solution     A note on R style It is good practice to make all parameters into vari   ables at the beginning of the script  so they can be easily adjusted  If you  find yourself repeating the same number in many expressions  it probably  should be converted to a variable  Examples here are the sample size and  parameters of the normal distribution  You could write     v     rnorm 30  180  20   hist v  breaks   seq 800  280  by  20 3     points x  dnorm x  180  20   30  20 3          69 http    cran r project org bin macosx RMacOSX  FAQ  htm   70 http   www  loc gov standards iso639 2 php English  list php  71 The  R file extension is customary for R scripts     123    but it is more elegant to write     n     30  mu     180  sd     20  bin width     sd 3   v     rnorm n  mu  sd    hist v  breaks   seq mu 5 sd  mu 5 sd  by bin width    points x  dnorm x  mu  sd   n bin width             visualise the variability of small random samples    sample size  n     30    number of plot rows  columns  rows  lt   2  c
    
Download Pdf Manuals
 
 
    
Related Search
    
Related Contents
  Protection de la culture  TESTO COORDINATO 81/08 CON EVIDENZA  DEM500 - FuturaShop  BBL™ CultureSwab  Règlement Intérieur  Zenoah BCZ3200DW User's Manual    Mode d`emploi guide Maisons-bois      Copyright © All rights reserved. 
   Failed to retrieve file