Home
Unix User Guide
Contents
1. Action emacs keystrokes vi keystrokes backward character CTRL B H forward character CTRL F L previous line CTRL P K next line CTRL N J beginning of line CTRL A SHIFT 6 COMMAND LINE EDITING Table 2 1 Command line editing in S PLUS Action emacs keystrokes vi keystrokes end of line CTRL E SHIFT 4 forward word ESC F Ww backward word ESC B B kill char CTRL D X kill line CTRL K SHIFT D delete word Esc D D W search backward CTRL R yank CTRL Y SHIFT Y transpose chars CTRL T X P In command mode Must press ESC to enter command mode In vi mode the editor puts you in insert mode automatically Thus any editing commands must be preceded by an Esc As an example of using the command line editor suppose you ve started S PLUS with the emacs option for the EDITOR environment variable Suppose you attempt to create a plot by typing the following gt pliot y Problem Couldn t find a function definition for plto Type CTRL P to recall the previous line then use CTRL B to return to the t in plto Finally type CTRL T to transpose the t and the o Press RETURN to issue the edited command To recall earlier commands use the backward search command CTRL R in emacs mode in vi mode followed by the command or first portion of command For example suppose you ve recently issued the following command gt plot xdata yd
2. The data region of a panel on a graph resulting from a general display function is a rectangle that just encloses the data The sole responsibility for drawing in a data region is given to a panel function that is an argument of the general display function The other arguments of the general display A ROADMAP OF TRELLIS GRAPHICS Core S PLUs Graphics Printing Devices and Settings Data Structures function manage the superstructure of the graph scales labels boxes around the data region and keys The panel function manages the symbols lines and so forth that encode the data in the data regions Panel functions are discussed in the section Panel Functions page 246 Trellis Graphics is implemented in the core traditional S PLUS graphics Also when you write a panel function you use functions and graphics parameters from the traditional graphics system Some core S PLUS graphics features are discussed in the section Commonly Used S Plus Graphics Functions and Parameters page 248 To send a graph to the printer first open a hardcopy device for example with trellis device postscript or trellis device pdf graph To actually send the graphics to the printer enter the command dev off For color graphics printing set the color T flag the default is black and white when opening the device for example gt trellis device postscript color T Trellis Graphics has many settings for graph rendering details pl
3. programmatic access or otherwise Anyone wishing programmatic access will need to be established as a user under the terms of this Agreement iii Limited Warranty iv You may make copies of the Software solely for archival purposes Any copy that you make of the Software in whole or in part is the property of MathSoft You agree to reproduce and include MathSoft s copyright trademark and other proprietary rights notices on any copy you make of the Software You must have a reasonable mechanism or process that ensures that the number of users at any one time does not exceed the number of licenses you have paid for and that prevents access to the Software to any person not authorized under the above license to use the Software You may receive the Software in more than one medium Regardless of the type or size of media you receive you may use only one medium that is appropriate for your single computer You may not use or install the other medium on another computer You may not loan rent lease or otherwise transfer the other medium to another user You may not translate reverse engineer decompile or disassemble the Software except and only to the extent that such activity is expressly permitted by applicable law notwithstanding this limitation If the Software is labeled as an upgrade you must be properly licensed to use a product identified by MathSoft as being eligible for the upgrade in order to use the Softw
4. setenv S FIRST S PLUS expression C shell set S FIRST S PLUS expressior export S_FIRST Bourne or dt Korn shell For example the following C shell command tells S PLUS to start the default graphics device setenv S_FIRST motif To avoid misinterpretation by the command line parser it is safest to CUSTOMIZING YOUR SESSION AT START UP AND CLOSING Customizing Your Session at Closing surround complex S PLUS expressions with a single or double quote whichever you do vot use in your S PLUS expression You can also combine several commands into a single S PLUS function then set FIRST to this function For example gt startup lt function options digits 4 options expressions 128 You can call this function each time you start S PLUS by setting S_FIRST as follows setenv S_FIRST startup Variables cannot be set while S PLUS is running just at initialization Any changes to S_FIRST will take effect only upon restarting S PLUS When S PLUS quits it looks in your data directory for a function called Last If Last exists S PLUS runs it A Last function can be useful for cleaning up your directory by removing temporary objects or files 317 CHAPTER 9 CUSTOMIZING YOUR S PLUS SESSION USING PERSONAL FUNCTION LIBRARIES If you write functions that you want to use many times you should not store them in your working directory because objects in this directory are easily
5. A histogram for continuous numeric data is a rough estimate of a smooth underlying population density curve which gives the relative frequency with which the data fall in different intervals This underlying density curve formally called a probability density function allows you to compute the probability that your data fall in any interval Thus you may prefer a smooth 149 CHAPTER 6 TRADITIONAL GRAPHICS Quantile Quantile Plots 150 estimate of this density to a rough histogram estimate To get such a smooth density estimate in S PLUS use plot with the function density The optional argument width controls the smoothness of the plot For example gt plot density car gals type 1 gt plot density car gals width 2 4 type 1 gt P lt k E a a 2 Il oO T v os Boo Soa go gt s 5 Oo oor C p Bic B36 2 S ou i Bisa s s 3 5 10 15 20 25 5 10 15 20 25 density car gals x density car gals width 2 4 x Figure 6 16 Probability density plots The default value for width results in a somewhat rough density estimate in the tail whereas the choice width 2 4 produces a smoother density estimate The value 2 4 in the second plot is obtained by applying the choice width 2 iqd to the car gals data where iqd is the interquartile distance You can obtain the IQD from summary by subtracting the value 1st Qu from the value 3rd Qu gt summary car gals Min 1st
6. If you do not assign the output from the vi function either back to the original function or to a new function the changes you make are simply scrolled across the screen they are not incorporated into any function definition The value is also stored until a new value is returned by S PLUS in the object Last value You can therefore recover the changes by immediately typing the following gt myfunction lt Last value S PLUS comes with a large number of built in data sets These data sets provide examples for illustrating the use of S PLUS without forcing you to take the time to enter your own data When S PLUS is used as a teaching aid the built in data sets provide a useful basis for problem assignments in data analysis To get S PLUS to display any of the built in data sets just type its name at the gt prompt The built in data sets in S PLUS include data objects of various types To get quick hard copy of your S PLUS objects including data objects and functions use the 1pr function For example to print the object diff hs use the following command Ipr diff hs A copy of your data will be sent to your standard printer 35 CHAPTER 2 GETTING STARTED Adding Row And Column Names Adding Names To Vectors Adding Names To Matrices 36 Names can be added to a number of different types of S PLUS objects In this section we discuss adding labels to vectors and matrices To add names to a vector
7. S for Small replaced by P for Peewee to avoid duplication with Sporty gt mysymbols a CLO ae Ps ee panel superpose has an argument pch that can be used to specify the symbols gt xyplot Mileage Weight data fuel frame aspect 1 groups Type pch mysymbols panel panel superpose Notice that again we specify an argument of the panel function in this case pch by giving it as an argument to xyplot which passes it along to the panel function panel superpose will also superpose curves To superpose a line and a quadratic x lt seq 0 1 1length 50 linquad lt c x x 2 xX lt rep x 2 which lt rep c linear quadratic c 50 50 xyplot linquad x xlab Argument ylab Functions aspect 1 groups which type 1 panel panel superpose The argument type controls the method of plotting For the argument type p the default the data are rendered by plotting symbols For type 1 the data are rendered by lines The function panel superpose uses the graphical parameters in the Trellis setting superpose symbol for the default plotting symbols For black and white postscript the setting results in different symbol types gt trellis device postscript gt trellis par get superpose symbol cex 1 0 85 0 65 0 85 0 85 0 85 0 85 0 85 col TDR EEE ee font Bk taitaiy pch 1 gol TEn ea Be Mee ee gs 253 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS T
8. To name each level of each dimension use the dimnames argument to array This passes a ist of names in the same way as is done for matrices For more information on dimnames see section Naming Rows and Columns page 84 LISTS LISTS Creating Lists Up to this point all the data objects described have been atomic meaning they contain data of only one mode Often however you need to create objects that not only contain data of mixed modes but also preserve the mode of each value For example the slots of an array may contain both the dimension a numeric vector and the Dimnames slot a character vector and it is important to preserve those modes gt attributes iris dim 1 50 4 3 dimnames dimnames 1 character 0 dimnames 2 1 Sepal L Sepal W Petal L Petal W dimnames 3 1 Setosa Versicolor Virginica The value returned by attributes is a simple example of an S PLUS Zst Lists are a very general data type Lists are made up of components where each component consists of one data object of any type That is from component to component the mode and type of the object can change For example the attributes list for the iris data set consists of two components a dim component and a dimnames component The dim component the value of the Dim slot is a numeric vector of length three The dimnames component the value of the Dimnames slot is another list with
9. 18 00 Ist Qu 3 00 Median 79 00 Median 4 00 Mean 79 89 Mean ee 3rd Qu u 131 00 3rd Qu 5 00 Max 206 00 Max 29 00 kyphosis Kyphosis present Kyphosis Age Number absent 0 Min lt 1500 Min t 3 000 present 17 Let Wiss 73 09 ist Ous 42000 Median 105 00 Median 5 000 Mean 97 82 Mean 5 176 3rd Qu 128 00 ore Qus 6 000 Max 157 00 Max 10 000 The applied function supplied as the FUN argument must accept a data frame as its first argument if you want to apply a function that does not naturally accept a data frame as its first argument you must define a function that does so on the fly For example one common application of the by function is to repeat model fitting for each level or combination of levels the modeling functions however generally have a formula as their first argument The following call to by shows how to define the FUN argument to fit a linear model to each level Min lst Med Mea 3rd Max Start 1 00 He 21100 jan 14 00 n 12 61 Qu 16 00 518 00 Start Min 4 1b Uilue S Median 6 Mean 3T ora Qursi Max 214 gt by kyphosis list Kyphosis kyphosis Kyphosis Older kyphosis Age gt 105 function data 1 m Number Start data data Kyphosis absent Older FALSE Calli lm formula Number Start data data Coefficients Intercept Start 4 885736 0 08764492 Degrees of freedom 39 total 37 residual Residual standard error 1 261852 Kyphosis pres
10. An Example As you try out the various features of the motif device you can use the following S PLUS commands to generate an easily reproducible graphic gt plot corn rain corn yield type n main Plot Example gt points corn rain corn yield pch col 2 gt lines lowess corn rain corn yield Ity 2 col 3 Me legend 1Z2 23 Color 1 Color 2 Calor 3 pons liveetl 0 23 colei 2 2 Note that in the call to legend there is a space before and after the in the argument pch The plot generated by these commands is shown in figure 8 1 290 GRAPHICS WINDOW DETAILS Plot Example LO gt e e m e Ke O n 7 e s e o v a gt O e oO e LO N e A Color 1 e Color 2 oO a Color 3 N e I 8 10 12 14 16 corn rain Figure 8 1 Plot example By default the color of the title legend box axis lines axis labels and axis titles are color 1 We have specified the points to have color 2 and the dashed line representing the smooth from the lowess command to have color 3 Although we cant show you the difference in the colors in Figure 8 1 you will see the differences in your graphics window 291 CHAPTER 8 WORKING WITH GRAPHICS DEVICES The Motif Figure 8 2 shows what the Motif graphics window looks like when you first Graphics Window start the S PLUS motif windowing graphics device The features of this in S PLUS window are listed be
11. Otherwise the next plot will overwrite the previous plot and the previous plot will be irretrievably lost To create a series of Encapsulated PostScript files in a single call to postscript omit the file argument gt postscript onefile F print F gt plot corn rain gt plot corn yield Starting to make postscript file Generated postscript file ps out 0001 ps Because onefile is FALSE postscript generates a postscript file as soon as the new call to plot tells it that nothing more will be added to the first plot The file ps out 0001 ps contains the plot of corn rain A file containing the plot of corn yield is generated as soon as a new call to plot or a call to dev off closes the old plot gt plot corn rain corn yield Starting to make postscript file Generated postscript file ps out 0002 ps You can give a series specific naming convention for the series of files using the tempfile argument to postscript gt postscript onefile F print F tempfile corn dHHH ps gt plot corn rain gt plot corn yield Starting to make postscript file Generated postscript file corn 0001 ps gt plot corn rain corn yield Starting to make postscript file Generated postscript file corn 0002 ps gt dev off Starting to make postscript file Generated postscript file corn 0003 ps PRINTING YOUR GRAPHICS Setting PostScript Options The behavior of the postscript g
12. The surface of any graphics device can be divided into two regions the outer margin and the figure region The figure region contains one or more figures each of which is composed of a plot area or region surrounded by a margin By default a device is initialized with one figure and the outer margin has zero area that is typically there is just a plot area surrounded by a margin The plot area is where the data is shown In the typical plot the axis line is drawn on the boundary between the plot area and the margin Each margin whether the outer margin or a figure margin is divided into four parts as shown in figure 6 26 bottom side 1 left side 2 top side 3 and right side 4 Margin 3 Margin 2 Margin 4 Margin 1 Figure 6 26 The four sides of a margin You can change the size of any of the regions Changing one area causes S PLUS to automatically resize the regions within and surrounding the one that you have changed For example when you specify the size of a figure the margin size is subtracted from the figure size to obtain the size of the plot area S PLUS does not allow a figure with a margin that takes more room than the figure Most often you change the size of regions with the mfrow or mfcol layout parameters when you specify the number of rows and columns S PLUS automatically determines the appropriate figure size To control region size explicitly work your way inward by specifying first
13. color scheme 1i Name olor scheme 2 color scheme 22 F f color scheme 3 Background black Lines Create New Color Scheme Text yellow red cyan Polygons Images A Figure 8 5 Changing color schemes The Available Color Schemes option menu has enough space to show the first five available color schemes If there are more than five available color schemes a scrollbar appears to the right of the menu You can view the names of the additional color schemes by using this scrollbar Creating New Color Schemes To create a new color scheme follow these steps 1 Click on the button marked Create New Color Scheme Figure 8 6 shows what happens in the dialog box when you do this The name unnamed appears as the last available color scheme in the Available Color Schemes option menu The default values under the Color Scheme Specifications option menu are the name unnamed a black background and white lines 299 CHAPTER 8 WORKING WITH GRAPHICS DEVICES 300 PA Move the pointer to the Name box and click The borders of the Name box darken and the cursor shape changes into an I Now type in text from the keyboard To delete letters to the right of the cursor use the DELETE key to delete letters to the left of the cursor use the BACKSPACE key Once you have decided on a name for the new color scheme move the pointer to the Background box and follow the same procedure as in step 2 The background c
14. contour switzerland 3 D PLOTS CONTOUR PERSPECTIVE AND IMAGE PLOTS 12 sgg 60060 30g 10 a 1 nrow switzerland 6000 8000 6000 8000 2 4 6 8 10 12 1 ncol switzerland Figure 6 22 Contour plot of Switzerland By default contour draws contour lines for each of five levels and labels each one You can change the number of levels with either the nlevels or the levels argument The nlevels argument specifies the approximate number of contour intervals desired while 1eve1s specifies a vector of heights for the contour lines You control the size of the labels for the contour lines with the labex argument You specify the size as a relative value to the current axis label font so that 1abex 1 the default yields labels which are the same size as the axis labels Setting 1abex 0 gives you unlabeled contour lines For example to view a voice spectrogram for the word five use contour on the built in data object voice five Because voice five generates many contour lines we suppress the labels with 1 abex 0 gt contour voice five labex 0 If you have an equal number of observations for each of three variables you can use interp to generate interpolated values for z on an equally spaced xy grid For example to create a contour plot of the ozone data you can use interp and contour as follows gt ozone fit lt interp ozone xy x ozone xy y ozone median gt contour ozone fi
15. from the studies For example consider the following data frames gt rand dfl norm unif binom 1 1 64542042 0 45375156 41 2 1 64542042 0 83783769 44 3 0 13593118 0 31408490 53 4 026271524 057312325 34 5 0 01900051 0 25753044 47 6 0 14986005 0 35389326 41 Pi 0 07429523 0 53649764 43 8 0 80310861 0 06334192 38 9 0 47110022 0 24843933 44 10 1 70465453 0 78770638 45 gt rand df2 norm binom chisq 1 0 3485193 50 19 359238 2 1 6454204 41 13 547288 3 1 4330907 53 4 968438 4 0 8531461 55 4 458559 5 0 8741626 47o 2 589351 These data frames have the common variables norm and binom we subscript and combine the resulting data frames as follows gt rbind rand df1l c norm binom rand df2 cC norm binom COMBINING DATA FRAMES norm binom 1 1 64542042 41 2 1 64542042 44 3 013593118 53 4 0 26271524 34 5 0 01900051 47 6 0 14986005 41 i 0 07429523 43 8 0 80310861 38 9 0 47110022 44 10 1 70465453 45 11 0 34851926 50 12 1 64542042 41 13 1 43309068 53 14 0 85314606 55 15 0 87416262 47 Warning Use rbind and in particular rbind data frame only when you have complete data frames as in the above example Do not use it in a loop to add one row at a time to an existing data frame this is very inefficient To build a data frame write all the observations to a data file and use read table to read it in Merging Data In many situations you may have data from multiple source
16. if not You can use the options function to specify a new default pager at any time during your S PLUS session Modifications to S_PAGER however take effect only when you next start S PLUS Using options usually in your First function is the preferred method for setting your pager Simply use the following function call gt options pager pager where pager is a character string containing the command with any necessary flags used to start the pager 321 CHAPTER 9 CUSTOMIZING YOUR S PLUS SESSION ENVIRONMENT VARIABLES AND PRINTGRAPH S PLUS uses environment variables to set defaults for the printgraph function Your system administrator already set these variables system wide but if you would like to change the default values for your S PLUS session use your UNIX shell command to set a new value for the environment variable before you start S PLUS Note The printgraph function sets its defaults differently from the defaults for the Print button on graphics devices such as motif For example to make printgraph produce plots with the x axis on the short side of the paper type the following from the C shell setenv S_PRINT_ORIENTATION portrait Start S PLUS Any plots made with printgraph are now produced in portrait mode S PLUS uses the following environment variables with printgraph e S_PRINT_ORIENTATION controls the orientation of plots It has two possible values portrait which put
17. logical values 77 Lotus files 64 low level graphics functions 163 low level plotting functions 188 lty argument 133 lty parameter 248 lwd parameter 248 M mai parameter 172 main argument 128 242 main title of a plot 128 main effects ordering of levels 236 make groups function 259 make symbol function 195 mar parameter 172 margin 170 matrices 67 82 matrix datatype 116 matrix function 20 matrix function 83 max function 114 mean function 114 merge function 99 107 by X argument 108 by y argument 108 methods obtaining help 15 mex parameter 172 mfcol parameter 170 mfrow parameter 126 mgp parameter 182 mileage means vector 216 model matrix datatype 116 modeling statistical 50 modules add on 2 more argument 228 most useful graphics parameters 197 INDEX mtext function 178 multi line argument 68 multiple plots 129 mypanel function 246 N n argument 138 names function 81 84 nclass argument 148 ncol argument 83 nint argument 219 nrow argument 83 numeric function 80 numeric summaries 111 numeric values 77 79 O object oriented programming 77 oma parameter 171 omd parameter 171 omi parameter 171 on line help 2 operators comparison 27 logical 27 precedence hierarchy of 29 Operators arithmetic 26 ordered function 93 orientation of axis labels 182 outer margin 170 outlier data point 137 overlay figures 188 ozone data set 158 P p argument 250 page argument 263 pairs function 154 pan
18. low level plotting 43 operators comparison 27 logical 27 precedence hierarchy of 29 G gas data set 204 gauss data set 223 general 177 general display function 202 210 glm function 203 Graph Measurements with Labels 227 Graph Multivariate Data 227 graphics 176 graphics parameters 163 group component 88 H Help system on line help 2 training courses 3 high level graphics functions 163 hist function 148 histogram function 210 219 How to Change the Rendering in the Data Region 246 hstart time series 111 hypothesis testing 48 I I function 206 identify function 137 INDEX image function 158 importData function 33 56 99 importing data 33 54 dBase files 64 Lotus files 64 initialization options function 312 internally labeled axis 183 interp function 158 interpolates 158 interrupting evaluation 10 intervals argument 241 iris data set 85 87 155 J jitter argument 213 K key argument 252 255 kyphosis data frame 111 L lab parameter 181 labels argument 93 94 labex argument 159 layout algorithm 233 layout argument 230 length argument 79 80 length attribute 76 levelplot function 224 levels argument 92 93 levels attribute 90 levels function 240 line types 133 lines function 139 248 list datatype 116 list function 22 list function 87 lists 87 components 22 Im function 138 203 332 locator function 140 loess function 203 log function 130 logical function 80
19. number of rows and columns For example gt mat lt rep 1 4 rep 3 4 gt mat tJ 1 11222333444 gt dim mat lt c 3 4 gt mat LeddL 210 3304 Hg 1 2 3 4 2 i 2 3 4 3 3 t Z 3 4 More often you need to combine several vectors or matrices into a single matrix To combine vectors and matrices into matrices use the functions cbind and rbind The cbind function combines vectors column by column and rbind combines vectors row by row You can easily combine counts for a 2x3 contingency table using rbind gt rhind 200688 24 33 0 201083 27 115 eee al 1 200688 24 33 2 1 201083 27 115 Use the cbind function similarly for columns When vectors of different lengths are combined using cbind or rbind the shorter ones are replicated cyclically so that the matrix is filled in If matrices are combined they must have matching numbers of rows when using cbind and matching numbers of MATRICES columns when using rbind Otherwise S PLUS prints an error message and the objects are not combined Use the function matrix to convert objects to matrices Combine the values into a single vector using c and then group them by specifying the number of columns or rows To create a matrix from two vectors grp and thw use matrix as follows gt heart lt matrix c grp thw ncol 2 If you provide fewer values as arguments to matrix than are required to complete the matrix the values are replicated cyclically
20. par mfrow c 3 1 col 4 1ty 2 gt create some plots gt several more calls to par gt par par orig wg Warning When a device is first started before any plots are produced the graphics parameter new is set equal to T In this case a call to a high level graphics function will not clear the device before putting up a new plot see the section Overlaying Figures page 188 Thus if you follow the above commands to restore all graphics parameters to their original state you need to call frame before issuing the next plotting command Separate sets of graphics parameters are maintained for each active graphics device When you change graphics parameters with the par function you are changing their value only for the current graphics device For example if you have both a graphsheet and a postscript graphics device active and the postscript device is the current device than calling par to change graphics parameters will affect only the graphics parameters for the postscript device gt motif gt postscript gt dev listi motif postscript 2 3 gt dev cur postscript 3 gt par mfrow c 2 2 SETTING AND VIEWING GRAPHICS PARAMETERS gt par mfrow H 22 gt dev set motif 2 gt par mfrow 1 11 169 CHAPTER 6 TRADITIONAL GRAPHICS CONTROLLING GRAPHICS REGIONS 170 The location and size of a figure are determined by parameters that control graphics regions
21. type Then all the observations on a particular set of variables can be grouped into a single data frame This is particularly useful in data analysis where it is typical to have a character variable labeling each observation one or more numeric variables of observations and one or more categorical variables for grouping observations An example is a built in data set solder with information on a welding experiment conducted by AT amp T at their Dallas factory gt sampleruns lt sample row names solder 10 gt solder sampleruns Opening Solder Mask PadType Panel skips 380 L Thick A3 L7 2 0 545 L Thiek B3 D4 g 0 462 L Thin A3 D6 3 3 809 S Thick BG L9 2 7 609 a TRICK B3 L4 3 19 492 M Thin A6 D6 3 8 525 S Thin A6 L6 3 18 31a M Thin A3 L6 i 1 408 M Thick A6 D7 3 gl 540 S Thin A6 Ly 3 Z A sample of 10 of the 900 observations is presented for all six variables The variable skips is the outcome which measures the number of visible soldering skips on a particular run of the experiment The other variables are categorical and describe the levels of various factors which define the run The row names on the left are the run numbers for the experiment Combined in solder are character data the row names categorical data the factors and numeric data the outcome CREATING DATA FRAMES CREATING DATA FRAMES You can create data frames in several ways e importData reads data from a variety of application files as we
22. you can identify as many points as you wish In this case you must signal S PLUS that you ve finished identifying points by taking an appropriate action for example pressing the right mouse button or pressing both the left and right mouse buttons together depending on your configuration When you make a scatter plot you may notice an approximately linear association between the vertical axis variable and the horizontal axis variable In such cases you may find it helpful to display a straight line which has been fit to the data You can use the function abline a b to add a straight line with intercept a and slope b on the current plot The best known method of fitting a straight line to a scatter plot is the method of least squares The S PLUS function 1m fits a linear model using the method of least squares The 1m function requires a formula argument expressing the dependence of the response variable y on the predictor variable x See the Guide to Statistics for a complete description of formulas and statistical modeling To get a least squares line simply use abline on the results of 1m For example use the following S PLUS expressions to obtain a scatter plot and dotted line least squares fit gt plot x y gt abline Im y x 1ty 2 While the fitting of a least squares line to data in the plane is probably the most common data fitting procedure in the world the least squares approach has a fundamental weakness it lacks ro
23. 0 760029093 The argument fil1 T limits line length in the output file to the width specified in your options object To use cat to write to a file simply specify a file name with the file argument gt xX lt gt 121000 gt cat x file mydata fil1l T The files written by cat and write do not contain S PLUS structure information to read them back into S PLUS you must reconstruct this information The write table function can be used to export a data frame into an ASCII text file gt write table fuel frame fuel txt gt tyi fuel txt row names Weight Disp Mileage Fuel Type Eagle Summit 4 2560 97 33 3 030303 Smal1 Ford Escort 4 2345 114 33 3 030303 5mal1 Ford Festiva 4 1845 81 37 2 702703 Smal1 Honda Civic 4 2260 91 32 3 125000 Smal1 Mazda Protege 4 2440 113 32 3 125000 Smal11 73 CHAPTER 3 IMPORTING AND EXPORTING DATA Mercury Tracer 4 2285 97 26 3 846154 Smal1 Nissan Sentra 4 2275 97 33 3 030303 Smal1 Pontiac LeMans 4 2350 98 28 3 571429 Smal1 74 DATA OBJECTS Basic Data Objects Coercion of Values Vectors Creating Vectors Naming Vectors Matrices Creating Matrices Naming Rows and Columns Arrays Creating Arrays Lists Creating Lists List Component Names Factors and Ordered Factors Creating Factors Creating Ordered Factors Creating Factors from Continuous Data 76 77 79 79 81 82 82 84 85 86 87 87 89 90 91 93 94 75 CHAPTER 4 DATA OBJECTS BASIC DATA OBJECTS 7
24. 2105 127 295 2L0T The numbers denote the column widths s denotes a string data type f denotes a float data type and the asterisk denotes a skip You may need to skip characters when you want to avoid importing some characters in the file For example you may want to skip blank characters or even certain parts of the data If you wish to import only some of the rows specify a starting and ending row If each row ends with a new line S PLUS will treat the newline character as a single character wide variable that is to be skipped 63 CHAPTER 3 IMPORTING AND EXPORTING DATA Notes on Importing Excel Files Notes on Importing Lotus Files Notes on Importing dBase Files Notes on Importing Data From Enterprise Databases 64 S PLUS can read only older format Excel files Version 4 x and earlier To read Excel files from later versions of Excel including Excel 95 and Excel 97 you must save them in the Version 4 format Formatting that requires newer features will be lost If your Excel worksheet contains only numeric data in a rectangular block starting in the first row and column of the worksheet then all you need to specify is the file name and file type If a row contains names specify the number of that row at the Name Row prompt it does not have to be the first row You can select a rectangular subset of your worksheet by specifying starting and ending columns and rows Excel style co
25. 3 125000 Small As a more substantial example consider the built in data sets cu summary cu specs and cu dimensions Each of these data sets contains observations about a number of car models but the list of car models is slightly different in each All however contain data for the cars listed in the data set common names gt common names 1 Acura Integra Acura Legend 3 Audi 100 Audi 80 5 BMW 325i BMW 535i 7 Buick Century Buick Electra The data sets match summary match specs and match dims contain the row subscripts to obtain observations about the models listed in common names from respectively cu summary cu specs and cu dimensions We can use these data sets and the cbind function to compile a general car information data set 105 CHAPTER 5 DATA FRAMES Combining Data Frames by Row 106 gt car mine lt cbind cu dimensions match dims cu specs match specs cu summary match summary row names common names Compare car mine to the built in data set car a11 constructed in a similar fashion Suppose you are pooling the data from several research studies You have data frames with observations of equivalent or roughly equivalent variables for several sets of subjects Renaming variables as necessary you can subscript the data sets to obtain new data sets having a common set of variables You can then use rbind to obtain a new data frame containing all the observations
26. 7375 50 3 44 6 8 1 61 3 75 3 41 1 51 5 44 7 3927 12 40 8 67 4 53 3 62 2 65 5 47 5 51 2 74 9 59 0 40 5 OTHER DATA IMPORT FUNCTIONS Reading Data Frames If your data is in fixed format with fixed width fields you can use scan to read it in using the widths argument For example suppose you have a data file dfile with the following contents Olgiraffe 9346H01 04 88donkey 1220M00 15 77ant L04 04 20gerbil 1220L01 12 22swallow 2333L01 03 121lemming LO1 23 You identify the fields as numeric data of width 2 character data of width 7 numeric data of width 5 character data of width 1 numeric data of width 2 a hyphen or minus sign that you don t want to read into S PLUS and numeric data of width 2 You specify these types using the what argument to scan To simplify the call to scan you define the list of what arguments separately gt dfile what lt list code 0 name x 0 s n1 0 NULL n2 0 NULL indicates suppress scanning of the specified field You specify the widths as the widths argument to scan Again it simplifies the call to scan to define the widths vector separately art le Wigins lt gt C625 75 Be le Ze Te 2 You can now read the data in dfile into S PLUS calling scan as follows gt dfile lt scan dfile what dfile what widths dfile widths If some of your fixed format character fields contain leading or trailing white space you can use the strip white argument to strip it
27. CHAPTER 3 IMPORTING AND EXPORTING DATA Table 3 1 Arguments to importData Argument Required Description valueLabelAsNumber Optional centuryCutoff Optional 58 logical flag if TRUE SPSS variables with labels will be imported as numbers a numeric value Dates with two digit years are assigned to the 100 year span beginning with this value The default of 1930 means that 6 15 30 is read as June 15 1930 and 12 29 29 will be read as December 29 2029 This argument is used only when importing two digit years from an ASCII file SETTING THE IMPORT FILTER SETTING THE IMPORT FILTER The filter argument to importData allows you to subset the data you import By specifying a query or filter you gain additional functionality such as taking a random sampling of the data Use the following examples and explanation of the filter syntax to create your statement A blank filter is the default and results in all data being imported Note The filter argument is ignored if the type argument or equivalently file extension specified in the file argument is set to ASCII or FASCII Case Selection You select cases by using a case selection statement in the filter argument The case selection or where statement has the following form variable expression relational operator condition Warning The syntax used in the filter argument to importData and exportData
28. Crookston Waseca The size font and color of the text in the strip labels can be changed by the argument par strip text a list whose components are the parameters cex for size font for the font and col for the color For example we can make huge strip labels by par strip text list cex 2 The argument strip allows very delicate control of what is put in the strip labels One usage is to remove the strip labels altogether SUPT p F Another is to control the inclusion of names of conditioning variables in strip labels gt dotplot variety yield year site data barley strip function strip default strip names c T T The argument strip names takes a logical vector of length two The first element tells whether or not the names of factors should be included along with the names of the levels of the factor and the second element tells whether or not the names of shingles should be included The default is c F T 245 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS PANEL FUNCTIONS How to Change the Rendering in the Data Region Passing Arguments to a Default Panel Function Writing a Panel Function panel Argument 246 The data region of a panel on a Trellis display is the rectangular region where the data are plotted A panel function has the sole responsibility for drawing in the data regions produced by a general display function The panel function is given as an argument of the general display
29. E as well but E varies in a nearly continuous way there are 83 unique values out of a total of 88 values Clearly we cannot condition on single values Instead we condition on intervals This is done in figure 7 26 On each panel NOx is graphed against C for E in an interval The intervals which are portrayed by the darkened bars in the strip are ordered from low to high so as we go left to right and bottom to top through the panels the intervals go from low to high The intervals overlap The next section describes how they were created and the expression that produced the graph 8 10 12 14 16 18 8 10 12 14 16 18 ae EEE eee ee Ee ee ee ee ee ee AR ai GIVEN GIVEN E a oe st GIVEN E GIVEN E 5 Ls o P o o g a o o p o ie 7 og Pe 6 O O q g o ie o o 4 P fe 8 G ort pO o Op 8 o Q o o 8 x q 9 GIVEN E GIVEN E GIVEN E GIVEN E GIVENE 44 o O og gq e ERa Po 00 oo a oo oo oe a E 8 10 12 14 16 18 8 10 12 14 16 18 8 10 12 14 16 18 Figure 7 26 Conditioning intervals The nine intervals in figure 7 26 were produced by the equal count algorithm gt GIVEN E lt equal count ethanol E number 9 overlap 1 4 There are two inputs to the algorithm the number of intervals and a target fraction of points to be shared by each pair of successive intervals In figure 7 26 the inputs are 9 and 1 4 The algorithm picks interval endpoints that are values of the data the left endpoi
30. From the Color Schemes dialog box you can select an alternate color scheme modify existing color schemes or define new color schemes See the chapter Customizing Your S PLUS Session for details on working with color schemes You may want to experiment with 135 CHAPTER 6 TRADITIONAL GRAPHICS 136 many values to find the most pleasing color map For other graphics devices see the device s help file for a description of the color map S PLUS uses the color map cyclically that is if you specify col 9 and your color map has only 8 colors S PLUS prints color 1 Color 0 is the background color over plotting items using color 0 erases them on most graphics devices INTERACTIVELY ADDING INFORMATION TO YOUR PLOT INTERACTIVELY ADDING INFORMATION TO YOUR PLOT Identifying Plotted Points The functions described so far in this chapter create complete plots Often however you want to build on an existing plot in an interactive way For example you may want to identify individual points in a plot and label them for future reference Or you may want to add some text or a legend or overlay some new data In this section we describe some simple techniques for interactively adding information to your plots More involved techniques for producing customized plots are described in the section Customizing Your Graphics page 163 While examining a plot you may notice that some of the plotted points are unusual in some way To ident
31. O or O O aw O coo ED C0 3000 3500 H 3500 H 3000 Weight 2500 DO G DO 2000 4 2000 2500 Figure 7 15 Scatterplot matrix 221 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS parallel Parallel coordinates are an interesting method but it is unclear at the time of this writing whether they have the power to uncover structure that is not more readily apparent using other graphical methods Figure 7 16 is a parallel coordinates display of the variables in fuel frame gt parallel fuel frame Type Fuel Mileage Disp Weight Figure 7 16 Parallel coordinates display 222 GENERAL DISPLAY FUNCTIONS A Data Set gauss contourplot To further illustrate the general display routines we will compute a function of two variables over a grid gt datax lt rep seq 1 5 1 5 length 50 50 gt datay lt rep seq 1 5 1 5 length 50 rep 50 50 gt dataz lt exp datax 2 datay 2 datax datay gt gauss lt data frame datax datay dataz Thus dataz is the exponential of a quadratic function defined over a 50x50 grid in other words the surface is proportional to a bivariate normal density Contour plots are helpful displays for studying a function f x y when we have no need to study the conditional dependence of fon x given y or of fon y given x Conditional dependence is revealed far better by multipanel conditio
32. ON IMPORTING FILES For Informix you need to have the Informix environment variables needed are ESQL C installed The Variable Value Example INFORMIXDIR The location where homes informix7 3 ESQLIC was installed LD_LIBRARY_PATH Need to include INFORMIXDIR INFORMIXDIR lib lib INFORMIXDIR lib and INFORMIXDIR esql lib esql INFORMIXSERVER The name of the inf_dyn_tcp Informix server The environment variables needed for Oracle are Variable Value Example ORACLE HOME The location where opt1 oracle7 ORACLE was installed LD_LIBRARY_PATH Need to include opt1 oracle7 lib ORACLE_ HOME lib For Sybase you need to have the CT library installed The environment variables needed for Sybase are Variable Value Example LD_LIBRARY_PATH Need to include the lib directory where CT library was installed homes sybase lib The arguments to importData that are required when importing from these databases are type A character string specifying the database type either informix oracle or sybase server The name of the database server This is site specific user The name of the user that is allowed to connect to the database password The password for user to connect to the database 65 CHAPTER 3 IMPORTING AND EXPORTING DATA 66 database The name of the database to imp
33. Once you destroy the graphics window any changes to the original default settings are lost unless you use the Save button see below Reset Click on this button to reset the printing specifications If you have not yet clicked on the Apply button then the specifications are set to how they were when you first entered the dialog box If you have at some time clicked on the Apply button then the specifications are reset to how they were immediately after the last time you clicked on the Apply button Print Click on this button to apply any changes you have made to the printing specifications and send the graph to the printer e Save Click on this button to save the current printing specifications configuration as the default Now every time you start S PLUS this configuration of default specifications appears e Close Click on this button to make the dialog box disappear e Help Click on this button to pop up a Help window for this dialog box Figure 8 8 shows how the Printing dialog box in figure 8 7 changes when the Method specification changes from PostScript to LaserJet The Resolution option menu appears and the Command specification for sending the graph to the printer changes 304 GRAPHICS WINDOW DETAILS lt gt PostScript H Landscape Q Portrait Command Resolution wero dpi 4 150 dpi 100 dpi lt gt 200 dpi Figure 8 8 Changing printing methods 305 CHAPTER 8 WORKING WITH GRAPHICS
34. PLOTTING OPTIONS 110 rt 100 5 gt plot 1 10 1 10 main Straight Line gt hist rnorm 50 main Histogram of Normal gt qqnorm rt 100 5 main Samples from t 5 gt plot density rnorm 50 main Normal Density Straight Line 20 10 o Histogram of Normal FR r T T T T T 1 3 2 1 0 1 2 3 rmorm 50 Normal Density density morm 50 y Quantiles of Standard Normal Figure 6 4 A four plot layout density rnorm 50 x When you are ready to return to one plot per figure use gt par mfrow c 1 1 The function par is used to set many general parameters related to graphics See the section Setting and Viewing Graphics Parameters page 166 and the par help file for more information on using par The section Controlling Multiple Plots page 185 contains more information on using the mfrow parameter and describes another method for creating multiple plots 127 CHAPTER 6 TRADITIONAL GRAPHICS Titles car miles Figure 6 5 128 You can easily add titles to any S PLUS plot You can add a main title which goes at the top of the plot or a subtitle which goes at the bottom of the plot To get a main title on a plot of the car miles versus car gals data use the argument main to plot For example gt plot car gals car miles main MILEAGE DATA To get a subtitle use the sub argument gt plot car gals car miles sub Miles versus Gallon
35. Qu Median Mean 3rd Qu Max 5 80 12 30 13 090 12 72 13 50 25 70 Here IQD 13 50 12 30 1 20 A width of twice the interquartile distance generally gives a smooth plot but may obscure local details of the density On the other hand rougher density estimates may highlight random effects See Silverman 1986 for a discussion of the issues involved in choosing a width parameter A quantile quantile plot or qqplot is a plot of one set of quantiles against another set of quantiles There are two main forms of qqplots The most frequently used form checks whether a data set comes from a particular hypothesized distribution shape In this case one set of quantiles consists of the ordered set of data values which are in fact quantiles for the empirical VISUALIZING THE DISTRIBUTION OF YOUR DATA QQplots for Checking Distribution Shape car gals 25 20 15 10 distribution for the data and the other set of quantiles consists of quantiles for your hypothesized distribution If the points in this plot cluster along a straight line the data set probably has the hypothesized distribution The second form of qqplot is used when you want to find out whether two data sets have the same distribution shape If the points in this plot cluster along a straight line the two data sets probably have the same distibution shape To produce the first type of qqplot when your hypothesized distribution is normal use the function qqn
36. T T T T 0 7 0 8 0 9 1 0 1 1 122 gas E Figure 7 1 Scatterplot of gas NOx against gas E Certain single symbol operators that perform functions in S PLUS have a special meaning in the formula language for example and although Trellis as we will see uses only and If you want to use any of these operators for their conventional meaning in any formula expression 205 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS subset Argument 206 for example if you want to use as multiplication you must put the expression inside the identity function I unless it is already given as an argument to a function Here is an example log 2 gas NOx base 2 I 2 gas E We use I on the right of the formula to protect against the in 2 gas E but not on the left because 2 gas NOx sits inside a function data argument One annoyance in the use of the above formulas is that we had to continually refer to the data frame gas This is not necessary if we attach gas to the search list of databases We can draw figure 7 1 by gt attach gas gt xyplot NOx E Another possibility is to use the argument data gt xyplot NOx E data gas In this case the variables of gas are available for use in the formula argument just during the execution of xyplot The effect is the same as gt attach gas gt xyplot NOx E gt detach gas The use of the data argument has another benefit In the call to xyplot we see explicit
37. Whenever any X11 program tries to create a window on your display the program first looks at your X11 resource data base to get default values The xrdb command uses the C preprocessor to set the defaults that are appropriate for your machine See the xrdb manual page for more information SETTING UP YOUR WINDOW SYSTEM S PLus XII Resources Common Resources for the Motif Graphics Device The file SPlusMotif in the directory s HOME splus lib X1 1 app defaults holds the system wide default values for the motif graphics device Many of the resources declared in the defaults file are discussed below When you specify a resource use the form resource value where resource is the name of the resource you want to use and value is the value you want to give it For example set the resource which tells xterm windows to have a scrollbar with this command xterm scrollBar True When you add this resource to your X11 resource data base then create another window with the UNIX xterm command the window has a scroll bar In this example the name of the application for which you set defaults is xterm When you want to set resources for your motif devices you must use the proper application name sgraphMotif For example if you put the following resource into your resource data base sgraphMotif copyScale 0 75 you would specify the ratio of the size of your original graph to the size of any copies you created from it When you
38. away The scan function always strips white space from numeric fields See the scan help file for more details Data frames in S PLUS were designed to resemble tables They must have a rectangular arrangement of values and typically have row and column labels Data frames arise frequently in designed experiments and other situations If you have a text file with data arranged in the form of a table you can read it into S PLUS using the read table function For example consider the data file auto dat 69 CHAPTER 3 IMPORTING AND EXPORTING DATA Model AcuralIntegra4 Audi1005 BMW32576 ChevLumina4 FordFestiva4 Mazda929V6 MazdaMX 5Miata Nissan300ZXV6 OldsCalais4 ToyotaCressida6 Price 11950 26900 24650 12140 6319 23300 13800 27900 9995 21498 Country Reliab Mileage Type Japan 5 NA Smal Germany NA NA Medium Germany 94 NA Compact USA NA NA Medium Korea 4 37 Smal Japan 5 21 Medium Japan NA NA Sporty Japan NA NA Sporty USA 2 23 Compact Japan 3 2a Medium All fields are separated by spaces and the first line is a header line To create a data frame from this data file use read table as follows gt auto lt read table auto dat header T gt auto AcuraIntegra4 Audi1005 BMW325716 ChevLumina4 FordFestiva4 Mazda929V6 MazdaMX 5Miata Nissan300ZXV6 OldsCalais4 ToyotaCressida6 As with scan you 70 Price Country Reliab Mileage Type 11950 Japan 5 NA Smal 26900 Germany NA NA Medium 24
39. column A row of data must always end with a new line Note that field width specifications are irrelevant for ASCII files and are ignored S PLUS auto detects the file delimiter from a preset list that includes commas spaces and tabs All cells must be separated by the same delimiter that is each file must be comma separated space separated or tab separated Multiple delimiter characters are not grouped and treated the same as a single delimiter For example if the comma is a delimiter two commas are interpreted as a missing field Double quotes are treated specially They are always treated as an enclosure marker and must always come in pairs Any data contained between double quotes are read as a single unit of character data Thus spaces and commas can be used as delimiters and spaces and commas can still be used within a character field as long as that field is enclosed within double quotes Double quotes cannot be used as standard delimiters If a variable is specified to be numeric and if the value of any cell cannot be interpreted as a number that cell is filled in with a missing value Incomplete rows are also filled in with missing values NOTES ON IMPORTING FILES Notes on Importing FASCII Formatted ASCII Files You can use FASCII import to specify how each character in your imported file should be treated For example you must use FASCII for fixed width columns not separated by delimiters if the row
40. command is particularly useful for obtaining information on classes and methods If you use with a function call S PLUS offers documentation on the function name itself and on all methods that might be used with the function if evaluated In particular if the function call is methods name where name is a function name S PLUS offers documentation on all methods for name available in the current search list For example gt methods summary The following are possible methods for summary Select any for which you want to see documentation 15 CHAPTER 2 GETTING STARTED Reading S PLUS Help Files 16 summary aov summary aovlist summary data frame summary default summary factor summary gam summary glim summary 1m summary loess 10 summary mim 11 summary ms 12 summary nls 13 summary ordered 14 summary terms 15 summary tree Selection OON AD THF WY You enter the number of the desired method and S PLUS prints the associated help file if it exists the command does not check for the existence of the help files before constructing the menu After each menu selection S PLUS presents an updated menu showing the remaining choices To get back to the S PLUS prompt from within a menu enter 0 You call help with the name of an S PLUS function operator or data set as argument For instance the following command displays the help file for the c function gt help c The quote marks are option
41. components of the list are labeled by double square bracketed numbers here 1 and 2 followed by colons This notation distinguishes numbering of list components from vector and matrix numbering After each component label S PLUS displays the contents of that component For greater ease in referring to list components it is often useful to name the components You do this by giving each argument in the 1ist function its own name For instance you can create the same list as above but name the ec 39 e29 p components a and b and save the list data object under the name xyz gt xyz lt list a 101 119 b c char string 1 char string 2 To take advantage of the component names that were given in the above list command use the name of the list followed by a sign followed by the name of the component For example the following two commands display component a and component b of the list xyz gt xyz a 1 101 102 103 104 105 106 107 108 109 110 111 112 113 14 114 116 116 117 118 119 gt xyz b i char string 1 char string z7 In S PLUS any object you create at the command line is permanently stored on disk until you remove it This section describes how to name store list and remove your data objects To name and store data in S PLUS use one of the assignment operators lt or For example to create a vector consisting of the numbers 4 3 2 1 and store it with the name x use t
42. create a copy of your motif graphics device the copy is three fourths the size of your current S PLUS graphics window The following resources are commonly used with the motif graphics device e sgraphMotif copyScale sets the size ratio of the copy you produce when you click on the Copy Graph button S PLUS multiplies the height and the width of the canvas by the value in the copyScale resource to create the dimensions for the new window The default resource declaration produces a copy with dimensions one half those of the current window sgraphMotif copyScale 0 5 e sgraphMotif fonts sets the fonts that the motif graphics device use for creating axis labels and plotting characters The fonts must be named in order from smallest to largest Use the UNIX command 325 CHAPTER 9 CUSTOMIZING YOUR S PLUS SESSION xlsfonts to see a complete list of the fonts available on your screen As an example the following resources tells the motif graphics device to use the vg family of fonts ranging in point size from 13 to 40 sgraphMotif fonts vg 13 vg 20 vg 25 vg 31 vg 40 Note If you select names that are too long to fit on one line use multiple lines and make sure that each line but the last ends with a backslash Since these fonts are intended to list available sizes of the same font the actual font used is controlled by the current value of par cex and the size of the fonts relative to the defaultFont desc
43. data The quotation marks around the vec data argument to scan are required You can now type x to display the data object named x that you have read into S PLUS from the UNIX file vec data If the UNIX file you want to read is not in the same directory from which you started S PLUS you must use the entire path name So if the UNIX file vec data is in a subdirectory with path name usr mabel test vec data then type gt vec data lt scan usr mabel test vec data After you have created an S PLUS data object you may want to change some of the data you have entered For editing simple vectors and S PLUS functions the easiest way to modify the data is to use the fix function which uses the editor specified in your S PLUS session options by default vi With fix you create a copy of the original data object edit it then reassign the result under its original name If you already have a favorite editor you IMPORTING AND EDITING DATA can use it by specifying it with the options function For example if you prefer to use the emacs editor you can set this up easily as follows gt options editor emacs To create a new data object by modifying an existing object use the vi function assigning the result a new name For example if you want to create your own version of a system function such as 1m you can use vi as follows gt my lm lt vi lm Warning Built in Data Sets Quick Hard Copy
44. dbf II I II IV files Microsoft Excel EXCEL xls Versions 2 1 through 4 only note that Excel 95 and Excel 97 are not supported FoxPro use same import filter as dBase files above Gauss GAUSS or dat automatically reads the related DHT GAUSS96 file Informix INFORMIX Informix database connection No file argument should be specified Lotus LOTUS wk wrk Matlab MATLAB mat must contain a single matrix in file Oracle ORACLE Oracle database connection No file argument should be specified Quattro Pro QUATTRO wq wb SPSS TSPSS sav 54 IMPORTING DATA FILES Default Format Type Extensions Notes SPSS Export SPSSP por SAS files SASI ssd01 Files from HP IBM or Sun SAS4 ssd04 Files from Digital Unix SAS sd2 Files from Windows SAS Transport SAS_TPT tpt xpt version 6 x Some special export options may need to be specified in your SAS program We suggest using the SAS Xport engine not PROC CPORT to read and write these files STATA STATA dta Versions 2 0 and higher Sybase SYBASE Sybase database connection No file argument should be specified Systat SYSTAT Sys double or single precision sys files To import a data file In most cases all you need to do to import a data file is to call importData with the name of the file as a character string argument As long as the specified file has one of the default ex
45. dimension or dim of the matrix is stored in a s ot in the representation of the matrix class All structure classes have at least one slot Data which must contain a vector The classes matrix and array have one additional required slot Dim to hold the dimension and one optional slot Dimnames to hold the names for the rows and columns of a matrix and their analogues for higher dimensional arrays Like simple vectors structure objects are atomic all of their values must be of a single mode Data objects can contain not only logical numeric complex and character values but also functions operators function calls and evaluations All the different types classes of S PLUS objects can be manipulated in the same way saved assigned edited combined or passed as arguments to functions This general definition of data objects coupled with class specific methods forms the backbone of object oriented programming and provides exceptional flexibility in extending the capabilities of S PLUS When values of different modes are combined into a single atomic object S PLUS converts or coerces all values to a single mode in a way that preserves as much information as possible The basic modes can be arranged in order of increasing information 1ogical integer numeric complex and character Thus mixed values are all converted to the mode of the value with the most informative mode For example suppose we combin
46. distributions Figure 7 14 is a density plot of mileage gt densityplot Mileage data fuel frame aspect 1 2 width 5 The argument width controls the width of the smoothing window in the same units as the data mpg here as the width increases the smoothness increases 0 0 4 oo0o00000000000 ooo 9o 0 I I I I I I 15 20 25 30 35 40 Mileage Figure 7 14 Density plot GENERAL DISPLAY FUNCTIONS splom The scatterplot matrix is an exceedingly powerful tool for displaying measurements of three or more variables Figure 7 15 is a scatterplot matrix of the variables in fuel frame gt splom fuel frame Note that the factor Type has been converted to a numeric variable and plotted just like the other variables which are numeric The six levels of Type simply take the values 1 to 6 in this conversion smal 4 0o00 00 a0 OO OF van oow o amo o owo o OCOGDO OO f spay joao aoo ow co comp wo sa onw a0 ww o aw o000 YPE meau 4 o d o gooo o 0g y Large anan aw o o com ao o igi om 3 Omg 000 do T 5H 5 o 0 0O ol o 45 s0 55 me n g oO Gao fp 5 0 oo oo aD 2 D GN o 9 Dw 8 o ica o8 6 8 o o Fuel 404 8 kea as 4 8 g o o 6 d o o g odana Pe A os v s 9 j P Bo WRP 88 O le p90 oO amp o Mileage 8 8 g o o ad By o o 0 o E fe g a ooo 2o goa g Bo 2 88 d 9 OF 300 T 00g is 8 20 250 309 200 Disp i a6 a g 3 o oO O 28 O e O
47. from one of the six basic cases shown in the table below Table 5 1 Rules for combining objects into data frames model matrix data frame 116 Data Types Sub types Rules vector numeric 1 contribute a single variable as is complex Factor ordered character character 1 converted to a factor data type logical 2 contribute a single variable category matrix matrix 1 each column creates a separate variable 2 column names used for variable names Tist Tist 1 each component creates one or more separate variables 2 variable names assigned as appropriate for individual components column names for matrices etc model matrix 1 object becomes a single variable in result data frame 1 each variable becomes a variable in result design design 2 variable names used for variable names As you add new classes you can ensure that they are properly behaved in data frames by defining your own as data frame method for each new class In most cases you can use one of the six paradigm cases either as is or with slight modifications For example the character method is a straightforward modification of the vector method gt as data frame character function x row names NULL optional F na strings NA ses ADDING NEW CLASSES OF VARIABLES TO DATA FRAMES as data frame vector factor x exclude na strings row names optional This method converts its input to a factor then calls the function as d
48. from which selection may be made A single element of a matrix can be selected by typing its coordinates inside the square brackets as an ordered pair separated by commas We use the built in dataset state x77 to illustrate The first index inside the operator is the row index and the second index is the column index The IMPORTING AND EDITING DATA following command displays the value in the third row eighth column of state x77 gt state x77 3 8 1 113417 You can also display an element using row and column dimnames if such labels have been defined So to display the above value which happens to be in the row named Arizona and the column named Area use the following command gt state x77 Arizona Area fi 113417 To select sequential rows and or columns from a matrix object use the operator for both the row and or the column index The following expression selects the first 4 rows and columns 3 through 5 for assignment to object x and then displays x gt X lt state x77 174 3 5 gt x Illiteracy Life Exp Murder Alabama Zal 69 05 15 1 Alaska ee 69 31 11 3 Arizona 158 70456 7 8 Arkansas 19 70 66 10 1 The c function can be used to select rows and or columns of matrices just as it was used for vectors above For instance the following expression chooses rows 5 22 and 44 and columns 1 4 and 7 of state x77 gt state x77 e 5 22 44 c 1 4 7 Population Life Exp Frost Cal
49. function The other arguments of the general display function manage the superstructure of the graph scales labels boxes around the data region and keys The panel function manages the symbols lines and so forth that encode the data in the data region Every general display function has a default panel function In all the examples given so far in this chapter the default panel function has been doing the drawing You can change what is drawn in the data region by one of two mechanisms First a default panel function has arguments You can change the rendering by using these arguments in fact you can give them to the general display function which will pass them along to the panel function Second you can write your own panel function The name of the default panel function for a general display function is panel followed by the name of the general function For example the default panel function for xyplot is panel xyplot You can use S PLUS online help to see the arguments of a default panel function For example panel xyplot tells you about the panel function for xyp1ot You can give an argument to a panel function by giving it to the general display function the general display function passes it on to the panel function For example xyplot can pass pch to panel xyplot to specify a 9 as the plotting symbol gt xyplot NOx E data gas aspect 1 2 pch If you write your own panel function you giv
50. gt table fuel frame Type Compact Large Medium Small Sporty Van 15 3 13 13 9 Fi The vehicle types are encoded by using different plotting symbols Nothing on the graph indicates which symbol is for which type but the next section contains information about drawing a legend or key The panel function panel superpose carries out such a superposition gt xyplot Mileage Weight data fuel frame aspect 1 groups Type panel panel superpose The factor Type is given to the argument groups of xyplot But groups is also an argument of panel superpose so Type is passed along to the panel function to be used to determine the plotting symbols The plotting symbols are the defaults that are set up by the trellis device function trellis device such trellis settings were discussed in the section Panel Functions and the Trellis Settings page 249 The specific settings used by panel superpose are discussed later in this section The default symbols have been chosen to enhance the visual assembly of each group of points that is we want to effortlessly assemble the plotting symbols of a given type to form a visual gestalt or whole If assembly can be performed efficiently then we can compare the characteristics of the data for different automobile types SUPERPOSING TWO OR MORE GROUPS OF VALUES ON A PANEL You can choose your own plotting symbols For example suppose that we want to use the first letters of the vehicle types but with
51. have eight panels per page but there are seven plots On both pages the last panel is skipped The skipping has been done because the conditioning variable income has seven levels The argument page can add page numbers text or graphics to each page of a multipage Trellis display page should be a function of a single argument n the page number the function tells what to draw on page n For example gt update market plot page function n gt text x 75 y 95 paste page n adj 5 text an S PLUS core graphics function uses a coordinate system that is the same as the panel rectangle coordinate system for the argument key 0 0 is the lower left corner and 1 1 is the upper left corner 265 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS SUMMARY OF TRELLIS FUNCTIONS AND ARGUMENTS Table 7 1 An alphabetical guide to Trellis Graphics 266 Statement Purpose Example as data frame array function iris df lt as data frame array iris col dims 2 as data frame ts function data frame ts hstart 1 5 aspect argument Xyplot NOx E data gas aspect 1 2 xlab Equivalence Ratio ylab Oxides of Nitrogen main Air Pollution sub Single Cylinder Engine barchart function barchart names mileage means mileage means aspect 1 between argument barley plot lt update barley plot between listiy 0 0 0 0 1 0 0 0 7 bwplot function bwplot Type Mileage data fuel frame aspect 1 cloud f
52. ie CO i o ie 20 7 000 0 ok O 0O O oOo O T T T 2000 2500 3000 3500 Weight Van Sporty E oaei Small J 00 Ea e Medium Large Compact 0 ef e T T T T 20 25 30 35 Mileage Figure 7 21 Multiple graphs on a page 229 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS MULTIPANEL CONDITIONING A Data Set barley About Multipanel Display formula Argument Columns Rows and Pages 230 The data frame barley contains data from an experiment carried out in Minnesota in the 1930s At six sites ten varieties of barley were grown in each of two years The data collected for the experiment are the yields in bushels acre for all combinations of site variety and year so there are 6 X 10 X 2 120 observations yield is numeric the others are factors gt names barley 1 yield variety year site Figure 7 22 uses multipanel conditioning to display the barley data Each panel displays the yields of the ten varieties for one year at one site variety is graphed along the vertical scale and yield is graphed along the horizontal scale For example the lower left panel displays values of variety and yield for Grand Rapids in 1932 The panel variables are yield and variety and the conditioning variables are year and site Figure 7 22 was made by the following command gt dotplot variety yield year site data barley The is read as given Thus the formula is read as variety is graphed against y
53. lottery lt makegroups lottery payoff lottery2 payoff lottery3 payoff market plot function update market plot page function n text x 75 y 95 paste page n adj 5 page argument see market plot example panel argument panel special lt function x y biggest lt y max y points x biggest y biggest pch points x biggest y biggest pch M panel superpose function xyplot Mileage Weight data fuel frame aspect 1 groups Type panel panel superpose 267 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS Table 7 1 An alphabetical guide to Trellis Graphics Statement Purpose Example panel loess function xyplot NOx C GIVEN E data ethanol aspect 2 5 panel function x y panel xyplot x y panel loess x y span 1 panel xyplot function see panel loess example parallel function parallel fuel frame par function par ask TRUE par strip test argument par strip test list cex 2 piechart function piechart names mileage means mileage means prepanel argument xXyplot NOx E C data ethanol prepanel function x y prepanel loess x y span 1 2 degree 2 layout c 1 6 panel function x y panel xyplot x y panel loess x y span 1 2 degree 2 prepanel loess function see prepanel example print function print box plot position c 0 0 1 4 more T print trellis function print trellis pscales argument pscales 1 qq function qq Type Mileage data fuel frame aspect 1 subset Type C
54. main bty n plot x y main heavy box box 20 184 CONTROLLING MULTIPLE PLOTS CONTROLLING MULTIPLE PLOTS Multiple figures can be created using par and mfrow For example to set a three row by two column layout gt par mfrow c 3 2 In this section we describe controlling multiple plots in more detail When you specify mfrow or mfcol S PLUS automatically changes several other parameters as follows Table 6 8 Changes induced by specifying mf row or mfcol Paramter Effects fty Set to c by mfcol and to r by mfrow This is how S PLUS knows to go along rows or columns mfg Contains the row and column of the current figure and the number of rows and columns in the current array of figures cex and If either the number of rows or the number of columns is mex greater than 2 then both cex and mex are set to 0 5 To override mfrow s choice of mex and cex you must issue separate calls to par gt par mfrow c 2 2 gt par mex 6 cex 6 The mfrow and mfcol layout parameters automatically create multiple figure layouts in which all figures are the same size You can create multiple figure plots in which the figures are different sizes by using the fig layout graphics parameter The fig parameter gives the coordinates of the corners of the current figure as fractions of the device surface An example is given in figure 6 31 in which the first plot uses the top third of the device t
55. missing parenthesis S PLUS provides a continuation prompt to remind you CIEL to complete the expression The default continuation prompt is Here are two examples of incomplete expressions which cause S PLUS to respond with a continuation prompt gt ae zi i amp 3 gt el34160 1 3416 In the first example S PLUS determined that the expression was not complete because the multiplication operator must be followed by a data object In the second example S PLUS determined that c 3 4 1 6 was not complete because a right parenthesis is needed In each of the above cases the user completed the expression after the continuation prompt and then S PLUS responded with the result of the evaluation of the complete expression Sometimes you may want to stop the evaluation of an S PLUS expression For example you may suddenly realize you want to use a different command or the output display of data on the screen is extremely long and you don t want to look at all of it RUNNING PLus Error Messages To interrupt S PLUS use the UNIX interrupt command which on most systems consists of either CTRL C pressing the C key while holding down the CONTROL key or the DELETE key If neither CTRL C nor DELETE stop the scrolling consult your UNIX manual for use of the stty command to see what key performs the interrupt function or consult your local system administrator Do not be afraid of making mista
56. object newvec and returns an S PLUS prompt To view the contents of the newly created object just type its name gt newvec 1 3416 To quit S PLUS and get back to UNIX use the q function gt a0 The are required with the q command to quit S PLUS because q is an S PLUS function and parentheses are required with all S PLUS functions This section introduces basic typing syntax and conventions in S PLUS S PLUS ignores most spaces For example Be 7 1 10 CHAPTER 2 GETTING STARTED Upper And Lower Case Continuation Interrupting Evaluation Of An Expression 10 However do not put spaces in the middle of numbers or names For example if you wish to add 321 and 1 the expression 32 1 1 causes an error Also you should always put spaces around the two character assignment operator lt otherwise you may perform a comparison instead of an assignment S PLUS is case sensitive just like UNIX All S PLUS objects arguments names etc are case sensitive Hence QWERT is different from qwert In the following example the object SeX is defined as M You get an error message if you do not type SeX exactly as stated including matching all upper case and lower case letters gt Sex L1 mu gt Sex Problem Object sex not found When you type a RETURN and it is clear to S PLUS that an expression is incomplete for example the last character is an operator or there is a
57. of data use the names function You assign a character vector of length equal to the length of the data vector as the names attribute for the vector For example the following commands take the integers 1 to 5 assign them to a vector x assign the spelled out words for those integers to the names attribute of the vector then display the result PR WS gt names x lt c one two three four five x one two three four five 1 2 3 4 5 You also use names to display the names associated with a vector gt names x one two three four five In a matrix both the rows and columns can be named Often the columns have meaningful alphabetic word names because the columns represent different variables while the row names are either integer values indicating the observation number or character strings identifying case labels Lists are useful for adding row names and column names to a matrix as we now illustrate The dimnames argument to the matrix function is used to name the rows and columns of the matrix The dimnames argument must be a list with exactly 2 components The first component gives the labels for the matrix rows and the second component gives the names for the matrix columns The length of the first component in the dimnames list is equal to the number of rows and the length of the second component is equal to the number of columns For example if we add an additional argument to the matrix command when
58. original observations make up the packet Knowing these subscripts is helpful for getting the values of other variables that might be needed for rendering on the panel In such a case the panel function argument subscripts contains the subscripts To see the observation numbers added to the graph of NOx against E given C gt xyplot NOx E C data ethanol aspect 1 2 panel function x y subscripts text x y subscripts cex 75 The core graphics functions commonly used in writing panel functions are points lines text segments and polygon You can use the S PLUS online help to see what they do The core parameters commonly used in writing panel functions are col 1ty pch 1wd and cex Use par for their definitions PANEL FUNCTIONS AND THE TRELLIS SETTINGS PANEL FUNCTIONS AND THE TRELLIS SETTINGS trellis par get Trellis Graphics as we have discussed is implemented using traditional S PLUS core graphics which has controllable graphical parameters that determine the characteristics of plotted objects For example if we want to use a symbol to show points on a scatterplot graphical parameters determine the type size font and color of the symbol In Trellis Graphics the default panel functions for the general display functions select graphical parameters to render plotted elements as effectively as possible But because the most desirable choices for one graphics device can be different from those for another dev
59. overwritten Instead to prevent yourself from inadvertently removing your functions you should create a personal function library to hold them A personal function library is simply an S chapter that you add to your S PLUS search path allowing you to access your functions from wherever you start S PLUS If you are working on a number of different projects you can create personal function libraries for each project to store the functions developed for that project To set up your own library there are two main steps 1 Create an S chapter to hold your library of functions and helpfiles 2 Place the new directory in your S PLUS search path We describe these steps in detail in the following subsections Note Creating an S Chapter If your function library would be useful to many people on your system you can ask your system administrator to create a system wide version of your function library that everyone can access with the S PLUS library function To create a chapter you use the UNIX mkdir command from the UNIX prompt followed by the S PLUS utility CHAPTER For example to create an S PLUS chapter called mysplus in your home directory use the following commands cd mkdir mysplus cd mysplus Splus CHAPTER The Splus CHAPTER utility creates a Data directory in the directory you created with mkdir you will store your functions in this Data subdirectory The Data subdirectory is created with two
60. plots in S PLUS with one or more of a collection of frequently used options These options include e Controlling plot shape and multiple plot layout e Adding titles and axis labels e Setting axis limits and specifying logarithmic axes e Choosing plotting characters and line types e Choosing plotting colors When you use an S PLUS plotting function the default shape of the box enclosing the plot is rectangular Sometimes you prefer to have a square box around your plot For example a scatter plot is usually displayed as a square plot You get a square box by using the global graphics parameter function par as follows 2 partpoty s All subsequent plots are made with a square box around the plot If you want to return to making rectangular plots use gt par pty The pty stands for plot type and the s stands for square However you should think of pty as standing for plot shape to avoid confusion with a different meaning for plot type see the section Plot Types page 130 You may want to display more than one plot on your screen or on a single page of paper To do so you use the S PLUS function par with the layout parameter mfrow to control the layout of the plots as illustrated by the following example In this example you use par to set up a four plot layout with two rows of two plots each Following the use of par we create four simple plots with titles gt par mfrow c 2 2 FREQUENTLY USED
61. putting up a plot As soon as a high level OVERLAYING FIGURES Overlay Figures by Using subplot graphics function is called new is set to FALSE In this case high level graphics functions such as plot move to the next figure or erase the current figure if there is only one in order to avoid overwriting a plot 8 y gap You can take advantage of the new graphics parameter to call two high level plotting functions in succession without having the first plot disappear The code below produces an example of a plot with the same x axis but different y axes We first set mar so that there is room for a labeled axis on both the left and the right then produce the first plot and the legend gt par mar c 5 4 4 5 1 gt plot hstart ylab Housing Starts type 1 gt legend 1966 3 220 c Housing Starts Manufacturing Shipments 1ty 1 2 Now we set new to TRUE so that the first plot won t be erased and specify direct axes for the x axis in the second plot gt par new T xaxs d gt plot ship axes F 1ty 2 type 1 gt axis side 4 gt mtext side 4 1line 3 8 Manufacturing millions of dollars gt par xaxs r release the direct axis The subplot function is another way to overlay plots with different scales The subplot function allows you to put any S PLUS graphic except brush and spin into another graphic You specify the graphics function and the coordinates of the subplot The following c
62. random samples generated from N 0 1 and N 1 1 distributions We set the random number seed with the function set seed so this example is reproducible gt set seed 19 gt x lt rnorm 10 gt y lt rnorm 5 mean 1 gt t test x y Standard Two Sample t Test data x and y t 43l2 dF 134 pevalue 0 176 alternative hypothesis true difference in means is not equal to 0 95 percent confidence interval 1 7254080 0 3502894 sample estimates mean of x mean of y 0 4269014 0 2606579 49 CHAPTER 2 GETTING STARTED Statistical Models 50 Most of the statistical modeling functions in S PLUS follow a unified modeling paradigm in which the input data are represented as a data frame and the model to be fit is represented as a formula Formulas can be saved as separate S PLUS objects and supplied as arguments to the modeling functions A partial listing of S PLUS modeling functions is given in Table 2 8 Table 2 8 S PLUS modeling functions Function Description aov manova Im glim gam loess tree nls ms Ime nlme factanal princomp pam fanny daisy clara diana agnes Analysis of variance models Linear model regression Generalized linear model including logistic and Poisson regression Generalized additive model Local regression model Classification and regression tree models Nonlinear models Mixed effects models Factor analysis Principal components ana
63. row names attribute having unique values In the above example the object was my df gt my df Kyphosis Age Number 1 absent 71 3 2 absent 158 2 3 present 128 4 4 absent 2 5 5 absent i 4 6 absent 1 ri absent 61 2 8 absent 37 3 9 absent 113 2 10 present 59 6 11 present 82 5 12 absent 148 3 13 absent 18 5 14 absent 1 4 16 absent 168 E 17 absent 1 3 18 absent 78 6 19 absent 175 5 20 absent 80 5 gl absent 27 4 101 CHAPTER 5 DATA FRAMES 102 The row names are not just the row numbers in our subset the number 15 is missing The fifteenth row of kyphosis and hence my df has the row Ww W name 16 The attributes of special types of vectors such as factors are not lost when they are combined in a data frame They can be retrieved by asking for the attributes of the particular variable of interest More detail is given in the section This method takes account of user supplied row names but ignores the argument optional a flag that is TRUE when the method is not expected to generate non trivial row names or variable names for a calling function page 117 Each vector adds one variable to the data frame Matrices and data frames provide as many variables to the new data frame as they have columns or variables respectively Lists because they can be built from virtually any data object are more complicated they provide as many variables as all of their components taken together When combining objec
64. see two menu items displayed Color Scheme and Printing The ellipses three trailing periods indicate that dialog boxes will appear if you choose these items The Color Scheme Dialog Box The Color Scheme dialog box is a powerful feature of the motif windowing graphics device it lets you change the colors in your plot interactively and immediately see the results Figure 8 4 shows an example of the Color Scheme dialog box This window has a title bar with a window menu button and the title S PLUS Color Scheme Editor 295 CHAPTER 8 WORKING WITH GRAPHICS DEVICES Available Color Schemes ff Color Scheme Specifications color scheme 2 F l Images AA Figure 8 4 The Motif Color Scheme dialog box When you first call up the Color Scheme dialog box the pane contains e The Available Color Schemes menu e The Color Scheme Specifications editor showing the specifications for the default color scheme A button marked Create New Color Scheme e A button marked Apply A button marked Reset e A button marked Save 296 GRAPHICS WINDOW DETAILS e A button marked Close e A button marked Help The Help Button The Help button is located in the lower right hand corner of the Color Scheme dialog box Click on this button to view a pop up help window for this dialog box Click on the Close button in the Help pop up window to make it disappear once you are done with it The Color Scheme Specifications Edi
65. so we use yl im c 0 270 You can obtain greater flexibility for the positioning of the legend by using the function legend after you have made your bar plot rather than relying on the automatic positioning that results from using the optional argument legend See the section Adding Legends page 140 for more information Many other options are available to you as arguments to barp ot see the help file for complete details The dot chart was first described by Cleveland 1985 as an alternative to bar plots and pie charts The dot chart displays the same information as the bar plot or pie chart but in a form that is often easier to grasp In particular the dot chart reduces most data comparisons to straightforward length comparisons on a common scale The simplest use of dotchart is analogous to the simplest use of barplot as you can see by applying dotchart to the first column of the digits matrix gt dotchart digits 1 digit names MAKING BAR PLOTS DOT CHARTS AND PIE CHARTS higit 1 Higit 2 higit 3 figit 4 digit 5 Figure 6 12 Making dot charts with the digits data To get a display of all the data in the matrix digits you could use the following command gt dotchart digits digit names or you could use the following command gt dotchart t digits sample names The argument t digits uses the function t to transpose the matrix digits i e to interchange the rows and columns of digits
66. specify more items if we choose The order of the items is the order of specification in the argument key in the above expression points is first and text is second so in the key the symbol is the first item and the text is the second Had we specified text first the symbol would have followed the text in each entry The two entries by default are drawn as an array with one column and two rows We can change this by the argument columns Also we can switch the order of the symbols and the text update barley plot key list text list levels barley year points Rows trellis par get superpose symbol 1 2 columns 2 The argument space allocates space for the key in the margins It takes one of four values top bottom right left allocating the space on the side of the graph described by the value So far it has been allocating space at the top which is the default and placing the key in the allocated space More will be said about the space argument later If the default location of the key seems a bit too far from the rest of the graph the key can be repositioned and a border can be drawn around it update barley plot key list points Rows trellis par get superpose symbol 1 2 text list levels barley year columns 2 border 1 Space tLap wS SUPERPOSING TWO OR MORE GROUPS OF VALUES ON A PANEL y 1 02 corner c 5 0 The argument border draws a border it takes a number that specifie
67. subdirectories _Help and Meta which are used to store help files and object metadata respectively USING PERSONAL FUNCTION LIBRARIES Note You can create your S chapter directory anywhere you have write permission and you can name it anything you like Placing the Chapter in Your Search Path To add an S chapter to your search path use the S PLUS attach function which provides temporary access to a directory during an S PLUS session You name the directory to be added as a character string argument to attach For example to add the chapter usr rich mysplus to your search path with attach use the following expression gt attach usr rich mysplus When specifying directories to attach you must specify the complete path name S PLUS does not expand such UNIX conventions as bob or HOME Any directories you attach are detached when you quit S PLUS In order to have your functions available at all times create a First function or modify it if it already exists and add a command to attach mysplus to your S PLUS search list as in the following example gt First lt functiont attach spud users mysplus Whenever you start S PLUS mysplus is automatically attached and your functions and help files are made available 319 CHAPTER 9 CUSTOMIZING YOUR S PLUS SESSION SPECIFYING YOUR WORKING DIRECTORY 320 Whenever you assign the results of an S PLUS expression to an objec
68. superpose symbol 1 2 text list levels barley year space right To draw a border and to position the key by putting the upper left corner of the border rectangle at the same vertical position as the top of the panel rectangle and at a horizontal position slightly to the right of the right side of the panel rectangle update barley plot key list points Rows trellis par get superpose symbol 1 2 text list levels barley year space right border 1 corner c 0 1 257 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS 258 x 1 05 y 1 So far we have seen that the components points and text can be used to create items in key entries A third component lines draws line items To illustrate this let us return to graphing Mileage against Weight for six types of vehicles The following code makes the plot and adds two loess smooths with two different values of the smoothing parameter span superpose line lt trellis par get superpose line superpose line col 3 6 lt 0 superpose symbol lt trellis par get superpose symbol xyplot Mileage Weight data fuel frame groups Type aspect 1 panel function x y panel superpose x y panel loess x y span 1 2 lwd superpose line lwd 1 lty superpose line lty 1 col superpose line col 1 panel loess x y span 1 lwd superpose line lwd 2 lty superpose line lty 2 col superpose line col 2 key list transparent T x 95 y 95 cor
69. that draws the symbols The list is accessed by trellis par get Here is the list plot symbol for the motif device gt trellis device motif gt plot symbol lt trellis par get plot symbol gt plot symbol cex 1 0 8 col 1 2 font 1 1 pch 1 2 The pch of 1 and col of 2 produces a cyan circle If type is 1 which means that lines is used to plot the data then the graphical parameters for the lines are in the settings list plot 1ine gt trellis device motif gt plotline lt trellis par get plot line gt plot line col Et 2 Silty 1 1 lwd 1 1 This is a cyan colored solid line 250 PANEL FUNCTIONS AND THE TRELLIS SETTINGS show settings trellis par set show settings displays the graphical parameters in the Trellis settings for the current device To see the result for black and white postscript gt trellis device motif gt show settings Each panel displays one or more settings lists The names of the settings appear below the panels For example the panel in the third row from the top and first column shows plotting symbols with graphical parameters plot symbol and lines with graphical parameters plot 1ine and the panel in the third row and third column shows that the panel function of the general display function histogram uses the graphical parameters in bar fil1 for the color that shades the bars of a histogram The Trellis settings for the current
70. the 1as graphics parameter You can choose between labels that are written parallel to the axes the default 1as 0 horizontally 1as 1 or perpendicular to the axes 1as 2 Try the following commands gt par mfrow c 2 2 gt plot x y las 0 main Parallel las 0 gt plot x y las 1 main Horizontal las 1 gt plot x y las 2 main Perpendicular las 2 gt plot x y axes F main Customized gt axis 2 gt axis l at c 2 4 6 8 labels c 2 10 4 10 6 10 S710 9 gt box The command box ensures that a complete rectangle is drawn around the plotted points see the section Controlling Axis Boxes page 184 The xaxt and yaxt parameters also control axis plotting If one of these parameters is equal to n the tick marks for the corresponding axis are not drawn For example you could also create the last panel produced by the code above with the following commands gt plots y xaxta nn gt axis l at c 2 4 6 8 labels c 2 10 4 10 6 10 8 10 To set the distance from the plot to the axis title use the mgp general parameter The parameter mgp is a numeric vector with three elements in units of mex the first element gives the location of the axis title the second the location of the tick labels and the third the location of the axis line The default value is c 3 1 0 You can use mgp to control how much space the axes consume For example if you have small margins you migh
71. the outer margins and then the figure margins CONTROLLING GRAPHICS REGIONS Controlling the You usually specify an outer margin only when creating multiple figures per Outer Margin page You can use the outer margin to hold a title for an entire page of plots or to label different pages consistently when some pages have multiple plots and others have a single plot You must specify a size for the outer margin if you want one the default size is 0 To specify the size of the outer margin use any one of three equivalent layout parameters oma omi or omd The most useful of these is oma specified as a numeric vector of length four one element for each side where the values are expressed in mex the size of the font for one line of text in the margins If you specify the outer margin with oma the specified values correspond to the number of lines of text that will fit in each margin For example to leave room for a title at the top of a page of plots we could set the outer margin as follows gt par oma c 0 0 5 0 You can then use mtext as follows to add a title to obtain figure 6 27 gt mtext A Title in the Outer Margin side 3 outer T cex 1 5 gt boxt A Title in the Outer Margin Figure 6 27 A plot with an outer margin Setting the parameter oma automatically changes both omi the outer margin in inches and omd the outer margin as a fraction of the device surface See the par help f
72. the years 1931 and 1932 are distinguished by different plotting symbols The plot has been saved in the Trellis object barley p1ot for use later on 254 SUPERPOSING TWO OR MORE GROUPS OF VALUES ON A PANEL key Argument The general display function dotp1 ot has not sent the factor variety to the panel function to be the y vector for the function rather it has sent a numeric vector of values from 1 to 10 with 1 corresponding to the first of the 10 levels of the factor 2 corresponding to the second level and so forth The display function has sent the values of yield as the vector x and the conditioning vector is site Thus on each panel there are 20 values of x and 20 values of y for each level of variety there are two values of x one for 1931 and one for 1932 and two values of y and there are 10 levels of variety The plotting symbols are drawn by panel superpose at the 20 values of x and y on each panel The panel function for this dotplot example is more complicated than that for the xyplot examples because along with superposing the plotting symbols by panel superpose the horizontal lines of the dot plot must be drawn abline draws the lines at the unique values of y The characteristics of the line are specified by the Trellis setting dot 1ine A key can be added to a Trellis display through the argument key of the general display functions The argument is a list With one exception the component names are the name
73. to 2 10 1 gt barley plot lt update barley plot layout c 2 10 1 gt barley plot Rows 1 to 5 starting from the bottom have the 1932 data and rows 6 to 10 have the 1931 data The change in the value of the year variable from rows 5 to 6 is indicated by the text of the strip label but a stronger indication of a change would occur if there was a break in the display between rows 5 and 6 The argument between can be used to insert space between adjacent rows or adjacent columns of a Trellis display To illustrate this try the following which puts space between rows 5 and 6 of the barley display gt barley plot lt update barley plot between 1ist y c 0 0 0 0 1 0 0 0 0 gt barley plot The argument between is a list with components x and y either of which can be missing x is a vector whose length is equal to the number of columns minus one the values are the amount of space measured in character height to be inserted between columns Similarly y specifies the amount of space between rows The argument skip which takes a logical vector controls skipping Each element says whether or not to skip a panel For example market plot lt bwplot age log 1 usage income pick strip function strip default strip names T skip elf FFF PF aha hs layout c 2 4 2 data market survey 4 MORE ON ASPECT RATIO AND SCALES PREPANEL FUNCTIONS page Argument gt market plot The layout will
74. to the object diff hs At the S PLUS prompt type in the name diff hs and assign to it the results of the scan command S PLUS responds with the prompt 1 which means that you should enter the first value You can enter as many values per line as you like separated by spaces When you press RETURN S PLUS prompts with the index of the next value it is waiting for In the following example S PLUS responds with 6 because you entered 5 values on the first line When you finish entering data press return in response to the prompt and S PLUS returns to the S PLUS command prompt gt 33 CHAPTER 2 GETTING STARTED Reading An ASCII File Editing Data 34 The complete example appears on your screen as follows gt diff hs lt scan Ts 06 ala 214 F 05 Of aal 12 a23 05 3 Lis s62 gt 29 pE 71 15 gt Entering data from the keyboard is a relatively uncommon task in S PLUS More typically you have a vector data set stored as an ASCII file which you want to read into S PLUS An ASCII file usually consists of numbers separated by spaces tabs newlines or other delimiters Lets say you have a UNIX file called vec data in the same UNIX directory from which you started S PLUS containing the following data 62 60 63 59 63 67 71 64 65 66 88 66 71 67 68 68 56 62 60 61 63 64 63 59 You read the file vec data into S PLUS by using the scan command with vec data as an argument gt X lt scan vec
75. which is permanently saved until you remove it For example gt weather lt c hot day COLD NIGHT gt weather 1 hot day COLD NIGHT Some functions in S PLUS are commonly used with no arguments For example recall that you quit S PLUS by typing q The parentheses are still required so that S PLUS can recognize that the expression is a function When you accidentally leave the off when you type a function the function text is displayed on the screen Typing any object s name causes S PLUS to print that object a function object is simply the definition of the function To call the function you simply need to retype the function name with parentheses after the function has finished displaying For instance if you accidentally type q instead of q when you wish to quit S PLUS the body of the function q is displayed In this case the body of the function is only two lines long gt q FUNCTION oral nternalgts eds Sdummy Ty 33 2 No harm has been done All you need to do now is correctly type q and you will return to your UNIX system prompt gt q An operator is a function which has at most two arguments and can be represented by one or more special symbols which appear between the two arguments PLus LANGUAGE BASICS For example the usual arithmetic operations of addition subtraction multiplication and division are represented by the operators and respectively
76. x y ADDING SPECIAL SYMBOLS TO PLOTS Adding Stars and Other Symbols gt for i in seq along x segments x i yli xLitl yLi 1 T T 0 2 0 4 0 6 0 8 xX Figure 6 33 Adding segments to plots You can add a third dimension of data to your plots by using the symbols function to encode it as stars circles or other special symbols To plot cities with circles whose areas represent the population the steps involved are described below First create the data We select twelve cities reasonably well distributed across the country from among those listed in the built in data set city name gt select lt c Atlanta Atlantic City Bismarck Boise Dallas Denver Lincoln Los Angeles Miami Milwaukee New York Seattle As described in the section Overlaying Figures page 188 use names to assign the city names as vector names for the data sets city x city y and city name Before city x city y or city name can be used as an argument to a replacement function it must first be assigned locally 193 CHAPTER 6 TRADITIONAL GRAPHICS gt city x lt city x city y lt city y city name lt city name gt names city x lt city name gt names city y lt city name gt names city name lt city name By assigning names in this way we can access the information necessary to plot the cities without learning their vector indices From an almanac or similar referenc
77. you use the argument col 3 in an S PLUS plotting function you are referring to the third color listed in the current color scheme Note When specifying a color scheme in your X resources the first color listed is the background color and corresponds to col 0 e Colors are repeated cyclically starting with color 1 which corresponds to col 1 For example if the current color scheme includes three colors not including the background color and you use the argument col 5 in an S PLUS plotting function then the second color is used e You may abbreviate a list of colors with the specification color n color2 This list is composed of 2 colors color1 color2 and n colors that range smoothly between color and color2 For example the color scheme blue red 10 lawn green specifies a list of 13 colors blue then red then 10 colors ranging in between red and lawn green and then lawn green Note This method of specification is especially useful with the image plotting function e You may specify a list of colors as halftones with the specification color hn color2 This list is composed of 2 colors which are actually tile patterns with progressively more co or2 on a background 309 CHAPTER 8 WORKING WITH GRAPHICS DEVICES 310 of color1 Halftone specifications are useful on devices with a limited number of simultaneous colors For example the color scheme blue red hl0 lawn
78. 00 857 yet 65 1 fun par usr c o usr 1 2 0 1 04 max den p y xaxt 1 lines den p box The xaxt 1 parameter is necessary in the first marginal density plot since price is plotted with a logarithmic axis To plot the density estimate for mileage along the right of the main plot use subp1ot as follows gt Subp lottxect 85 1 yard 85 fun par usr c 0 1 04 max den m y o usr 3 4 lines den m y den m x box 191 CHAPTER 6 TRADITIONAL GRAPHICS ADDING SPECIAL SYMBOLS TO PLOTS Arrows and Line Segments 192 In the section Interactively Adding Information to Your Plot page 137 we saw how to add lines and new data to existing plots In this section we describe how to add arrows stars and other special symbols to existing plots To add one or more arrows to an existing plot use the arrows function To add a line segment which is essentially an unpointed arrow use the segments function Both segments and arrows take beginning and ending coordinates so that one or more line segments are drawn on the plot For example the following commands plot the corn rain data and draw arrows from the ith to i 1th observation gt plot corn rain gt for i in seq along corn rain arrows 1889 i corn rainlLi 1890 i corn rain Li 1 corn rain T T T T 1890 1900 1910 1920 Time Figure 6 32 Adding arrows to plots Use the segments function similarly gt plot
79. 1 1ty 2 gt box To control the number of tick marks on an axis you can set the lab parameter The 1ab parameter is an integer vector of length three that gives the approximate number of tick marks on the x axis the approximate number of tick marks on the y axis and the number of characters for tick labels The number is only approximate because S PLUS tries to use round numbers for tick labels It may take some experimentation with lab to get just the axis that you want To control the format of tick labels in exponential notation use the exp graphics parameter as follows Table 6 5 Controlling the format of tick labels Setting Effect exp 0 Exponential tick labels are printed on two lines so that 1e6 is printed with the 1 on one line and the e6 on the next exp 1 Exponential tick labels are printed on a single line in the form 1e6 exp 2 Default value Exponential tick labels are printed on a single line in the form 10 6 Uses of the 1ab and exp parameters are illustrated with the following code gt par mfrow c 2 2 gt plot price mileage main lab c 5 5 7 gt plot price mileage lab c 10 3 7 main lab c 10 3 7 gt plot price mileage lab c 5 5 4 main Tab c 5 5 4 exp 0 181 CHAPTER 6 TRADITIONAL GRAPHICS 182 gt plot price mileage lab c 5 5 4 exp 1 main lab c 5 5 4 exp 1 To control the orientation of the axis labels use
80. 15 3 13 13 9 7 GENERAL DISPLAY FUNCTIONS xyplot We have already seen xyplot in action in our previous examples This function is a basic graphical method graphing one set of numerical values on a vertical scale against another set of numerical values on a horizontal scale Figure 7 5 is a scatterplot of mileage against weight gt xyplot Mileage Weight data fuel frame aspect 1 The variable on the left of the goes on the vertical or y axis and the variable on the right goes on the horizontal or x axis o 35 4 o as o o oo o o o 30 4 o o oO 00 o0 oo o oo 25 4 o o o L o o 000 o o o Oo oo O o o0 o o o 20 5 0000 ok o oo o 00 0 I I I I 2000 2500 3000 3500 Weight Figure 7 5 Scatterplot 211 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS bwplot The box and whisker plot or boxplot is a very clever invention of John Tukey that is widely used for comparing the distributions of several data sets Figure 7 6 is a boxplot of mileage classified by vehicle type gt bwplot Type Mileage data fuel frame aspect 1 The factor Type is on the left in the formula because it goes on the vertical axis and the numeric vector Mileage is on the right because it goes on the horizontal axis Van e f Sporty Rare a Small o e Medium Las Large ee Compact pon ep 20 25 30 35 Mileag
81. 2 by default Resizing the graphics window has no effect on PostScript output created from the resized window it retains the aspect ratio of the original unresized window PRINTING YOUR GRAPHICS Using the Print Option from Graphics Window Menus The motif windowing graphics device is a convenient tool for exploratory data analysis and interactive graphics You can easily create PostScript versions of graphics created on these devices by using the Print option from the Graph menu The behavior of this option is determined by options specified in the Printing Options dialog box selected from the Options menu The following choices are available Method Should show PostScript selected If not move the pointer to the PostScript method and click e Orientation Determines the orientation of the graphic on the paper Landscape orientation puts the x axis along the long side of the paper Portrait orientation puts the x axis along the short side of the paper To choose the orientation move the pointer to the desired choice and click e Command A UNIX command executed when you select the Print option from the Graph menu The default value when Method is set to PostScript is the command stored in the value of ps options command To change this command move the pointer to this line and click to ensure the line has input focus then edit the command As the default command is normally to send a file to a printer the most common
82. 2599 0 256793795i1 0 53622210 il TRUE 0 9026407992 0 637563583i1 0 07595690 12 TRUE 1 1558698525 0 6552714751 0 32395563 13 FALSE 0 1049802819 0 7061285721 1 35316648 14 TRUE 0 2302154933 0 3734514291 2 42261503 16 FALSE 2 3956811151 0 086245694i 0 34412995 17 TRUE 0 0824999817 0 258623377i 2 46456956 18 FALSE 0 0248816697 0 417373099i 2 99062594 19 TRUE 0 7525617816 0 636045368i1 1 55640891 20 TRUE 1 1078423455 0 011345901i1 1 27173450 21 TRUE 2 2280610717 0 517812594i 1 54472022 X1 X2 Kyphosis Age Number 1 0 80316229 2 28681400 absent 71 3 2 0 58580658 0 06509133 absent 158 3 3 0 88756407 0 89849793 present 128 4 4 2 35672715 0 68797076 absent 2 5 5 1 26986158 0 76204606 absent 1 4 6 1 10805175 1 02164143 absent 1 2 7 0 56273335 1 34946448 absent 61 2 8 0 24542337 1 35936982 absent 37 CREATING DATA FRAMES 9 0 29190516 2 24852247 absent 113 2 10 0 98675866 1 27076525 present 59 6 11 0 10125951 0 19835740 present 82 5 12 0 30351481 2 48467422 absent 148 3 13 0 04480753 1 60470965 absent 18 5 14 1 43504492 1 35172992 absent 1 4 16 2 45929501 0 58286780 absent 168 3 17 0 90746053 0 48598155 absent I 3 18 0 50886476 0 96350421 absent 78 6 19 1 11844146 0 56341008 absent 175 5 20 0 51371598 1 32382209 absent 80 5 21 0 58229738 0 87364793 absent 27 4 The names of the objects are used for the variable names in the data frame Row names for the data frame are obtained from the first object with a names dimnames or
83. 3 the first page and in figure 7 24 the second page If we do not specify layout Trellis Graphics chooses the numbers of columns rows and pages by a layout algorithm The algorithm takes into account the aspect ratio the number of packets the number of conditioning variables and the number of levels of each conditioning variable It chooses the numbers to maximize the size of the graph within the graphics region 233 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS 20 30 40 50 60 Vaivet t Walei j 1932 1931 Waseca e a Crookston e Mortis e e University Farm e Duluth e e Grand Rapids e No 475 No 475 1932 1931 Waseca e e Crookston e Mortis e e University Farm e e Duluth e e Grand Rapids e Manchuria Manchuria 1932 1931 Waseca Crookston e Mortis University Farm e e Duluth e e Grand Rapids e No 462 No 462 1932 1931 Waseca e Crookston e e Mortis e e University Farm e e Duluth e e Grand Rapids e e Svansota Svansota 1932 1931 Waseca s Crookston e e Mortis e University Farm e e Duluth e e Grand Rapids e e T T T T T T 20 30 40 50 60 yield Figure 7 23 The first page of the multipage plot of the barley data 234 MULTIPANEL CONDITIONING Main Effects Ordering n Trebi 1932 1937 Waseca e e Crookston e e Mortis e e University Farm Duluth e e Grand Rapids e e Waseca e e Crookston e e Morris e e Uni
84. 4 trama001 clemr001 2 1 F r5 andeb001 morrj001 3 1 F r6 barrm001 morrj001 2 1 F r7 boggw001 morrj001 21 0 F r8 ricej001 morrj00l 3 1 F See the chapter Data Objects for further information on data frame objects The chapter Importing and Exporting Data discusses how to read in data frame objects from ASCII files The st object is the most general and most flexible object for holding data in S PLuS A list is an ordered collection of components Each list component can be any data object Different list components can be of different modes as well For example a list might have three components consisting of a vector of character strings a matrix of numbers and another list Hence lists are more general than vectors or matrices because they can have components of different types or modes and they are more general than data frames because they are not restricted to having a rectangular row by column nature You create lists with the 1ist function For example to create a list with two components one a vector of mode numeric and one a vector of character strings one of length 19 and the other of length 2 type the following PLus LANGUAGE BASICS Managing Data Objects Assigning Data Objects gt list 101 119 c char string 1 char siring 2 S PLUS responds with LILI 1 101 102 103 104 105 106 107 108 109 110 111 112 113 14 114 115 116 117 118 119 L21 1 char string 1 char string 2 The
85. 44 146 147 147 148 149 150 154 154 154 155 156 157 158 158 160 161 163 164 166 170 171 172 173 174 174 175 176 177 177 178 180 180 180 183 184 CONTENTS Controlling Multiple Plots 185 Overlaying Figures 188 High Level Functions That Can Act as Low Level Functions 188 Overlaying Figures by Setting new TRUE 188 Overlay Figures by Using subplot 189 Adding Special Symbols to Plots 192 Arrows and Line Segments 192 Adding Stars and Other Symbols 193 Custom Symbols 195 Traditional Graphics Summary 197 References 200 Chapter 7 Traditional Trellis Graphics 201 A Roadmap of Trellis Graphics 202 Giving Data to General Display Functions 204 A Data Set gas 204 formula Argument 204 subset Argument 206 Data Frames 207 Aspect Ratio 208 General Display Functions 210 A Data Set fuel frame 210 A Data Set gauss 223 Arranging Several Graphs On One Page 228 Multipanel Conditioning 230 A Data Set barley 230 About Multipanel Display 230 Columns Rows and Pages 230 Packet Order and Panel Order 231 layout Argument 233 Main Effects Ordering 235 Summary The Layout of a Multipanel Display 237 A Data Set ethanol 237 Conditioning on Discrete Values of a Numeric Variable 237 Conditioning on Intervals of a Numeric Variable 239 xiii CONTENTS Scales and Labels 3 D Display aspect Argument Changing the Text in Strip Labels Panel Functions How to Change the Rendering in the Data Region Passing Arguments to a
86. 6 Everything in S PLUS is an object Every object has an associated class The class of an object defines how the object is represented and determines what actions may be performed on the object and how those actions are performed The simplest objects are atomic vectors objects containing 0 or more elements that can be indexed numerically Atomic vectors are so called to indicate that in S PLUS they are indeed fundamental objects All of S PLUS s basic mathematical operations and data manipulation functions are designed to work on the vector as a whole Individual elements of the vector however can be extracted using their numerical indices with the subscript operator gt car galsfet1 3 5 1 13 3 11 5 14 3 All elements within an atomic vector must be from only one of seven atomic modes logical numeric single integer complex raw or character An eighth atomic mode NULL applies only to the NULL vector The number of elements and their mode completely define the data object as a vector The class of any vector is the mode of its elements gt elasstelT T FT LL begieal gt class c 1 2 334 1 integer 2 lass ce tl 24 3 45 pid 1 numeric The number of elements in a vector is called the length of the vector and can be obtained for any vector using the ength function gt length 1 10 1 10 More complicated objects can be creat
87. 650 Germany 94 NA Compact 12140 USA NA NA Medium 6319 Korea 4 af Smal 23300 Japan 5 21 Medium 13800 Japan NA NA Sporty 27900 Japan NA NA Sporty 9995 USA Z 23 Compact 21498 Japan 3 23 Medium can use read table within functions to hide the mechanics of S PLUS from the users of your functions EXPORTING DATA SETS EXPORTING DATA SETS You use the exportData function to export S PLUS data objects to formats for applications other than S PLUS To export data for use by S PLUS use the data dump function When you are exporting to most file types with exportData you typically need to specify only the data set file name and depending on the file name you specified the file type and the data will be exported into a new data file using default settings You can specify your own settings using additional arguments to exportData All formats that can be imported from can be exported to The arguments to exportData are shown in Table 3 2 Table 3 2 Arguments to exportData Argument Required Description data Required Data frame to be exported file Required A character string containing the name of the file to be created updated type Optional One of ASCII DBASE EXCEL FASCII GAUSS GAUSS96 HTML LOTUS MATLAB QUATTRO SAS SAS1 SAS4 SAS_TPT SPSS SPSSP STATA SYSTAT keep Optional Character vector of variable names specifying which variables in data to export Only one of kee
88. 7 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS piechart Pie charts have severe perceptual problems Experiments in graphical perception have shown that compared with dot plots they convey information far less reliably But if you want to display some data and perceiving the information is not so important then a pie chart is fine Figure 7 12 is a pie chart of the mileage means gt piechart names mileage means mileage means Figure 7 12 Pie chart 218 GENERAL DISPLAY FUNCTIONS histogram A histogram can be useful for showing the distribution of a single set of data but two or more histograms are typically not nearly as powerful as a boxplot or qqplot for comparing data distributions Figure 7 13 is a histogram of mileage gt histogram Mileage data fuel frame aspect 1 nint 10 The argument nint determines the number of intervals The histogram algorithm chooses the intervals to make the bar widths be simple numbers while trying to make the number of intervals as close to nint as possible 20 15 4 F E te ke 5 amp 104 J D a 54 L 0 20 25 30 35 Mileage Figure 7 13 Histogram 219 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS densityplot 220 Like histograms density plots can be of help in understanding the distribution of a single set of data but boxplots and qqplots typically give more incisive comparisons of
89. 7500 Washington gt books AuthorFirstName AuthorLastName Book 1 Lorne Green Bonanza 2 Loren Blye Midwifery 3 Loren Blye Gardening 4 Loren Blye Perennials 5 Robin Green Who_dun_it 6 Rich Calaway Splus The data sets have different variable names but overlapping information Using the by x and by y arguments to merge we can join the data sets by the first and last names COMBINING DATA FRAMES gt merge authors books by x c FirstName LastName by y c AuthorFirstName AuthorLastName FirstName LastName Age Income Loren Loren Loren Lorne Robin ar wn Fe Blye Blye Blye Green Green 40 40000 40 40000 40 40000 82 1200000 45 25000 Home Washington Washington Washington California Book Midwifery Gardening Perennials Bonanza Washington Who_dun_it Because the desired by columns are in the same position in both books and authors we can accomplish the same result more simply as follows gt merge authors books by 1 2 More examples can be found in the merge help file 109 CHAPTER 5 DATA FRAMES APPLYING FUNCTIONS TO SUBSETS OF A DATA FRAME 110 A common operation on data with factor variables is to repeat an analysis for each level of a single factor or for all combinations of levels of several factors SAS users are familiar with this operation as the BY statement In S PLUS you can perform these operations using the by or aggregate function Use aggregate when
90. 96 16 colors Most displays are capable of many more colors than this so you can use more than one digit per phosphor Table 8 2 shows the allowed forms for an RGB triad Table 8 3 illustrates hexadecimal values for some common colors You can use up to four digits to specify the intensity of one phosphor 307 CHAPTER 8 WORKING WITH GRAPHICS DEVICES 308 this allows for about 3 x 10 colors You do not need to know how many colors your machine can display your window system automatically scales the color specifications to your hardware Table 8 2 Legal forms of RGB triads Triad Form Approximate Number of Possible Colors RGB RRGGBB RRRGGGBBB RRRRGGGGBBBB 4 000 17 million 70 billion 3x 1014 Table 8 3 Hexadecimal values of some common colors Hex Value Color Name 000000 black FFFFFF white FF0000 red 00FF00 green 0000FF blue FFFF00 yellow 00FFFF cyan FF00FF magenta ADD8E6 light blue GRAPHICS WINDOW DETAILS Specifying Color The following conventions are used when listing colors to specify a color Schemes scheme e Color names or values are separated by spaces e When a color name is more than one word it should be enclosed in quotes For example lawn green The order in which you list the color names or values corresponds to the numerical order in which they are referred to in S PLUS with the graphics parameter col For example if
91. ARRAYS Arrays generalize matrices by extending the Dim slot to more than two dimensions If the rows and columns of a matrix are the length and width of a rectangular arrangement of equal sized cubes then length width and height represent the dimensions of a three way array You can visualize a series of equal sized rectangles or cubes stacked one on top of the other to form a three dimensional box The box is composed of cells the individual cubes and each cell is specified by its position along the length width and height of the box An example of a three dimensional array is the iris data set in S PLUS The first two cases are presented here Z APTS LAGS 33 gt 260s Sepal L Sepal W Petal L Petal W lad Sel 2 J 4 9 Versicolor Sepal L Sepal W Petal L Petal W hae Deg G 0 2 Lis 7 0 ed 4 7 1 4 T 6 4 3 2 4 5 Lee Virginica Sepal L Sepal W Petal L Petal W 1 6 3 3 3 6 0 25 Lead 5 68 ad Syl 1 9 The data present 50 observations of sepal length and width and petal length and width for each of three species of iris Setosa Versicolor and Virginica The Dim slot of iris represents the length width and height in the box analogy gt dim iris ij 564 3 There is no limit to the number of dimensions of an array Additional dimensions are represented in the Dim slot as additional values in the vector the number of values is the number of dimensions From this we can think of a matrix as a two d
92. Atlanta Atlantic City Bismarck Boise Dallas Denver Lincoln Los Angeles Miami Milwaukee New York Seattle city name lt city name City X S CTY LX city y lt CTty y names city x lt names city y lt names city name lt city name pop lt c 425 60 28 34 904 494 129 2967 347 741 072 557 usa symbols city x select city y select circles sqrt pop add T size lt ifelse pop gt 1000 2 1 size lt ifelse pop lt 100 0 5 size text city x select city y select city name select cex size 276 PRINTING YOUR GRAPHICS Creating Encapsulated PostScript Files Modifying a function containing a string of graphics commands is much easier than retyping all the commands to re create the graphic Another useful technique for preparing PostScript graphics is to use PostScript screen viewers such as ghostview If you are creating graphics for inclusion in other documents you typically want to create a single file for each graphic in a file format known as Encapsulated PostScript or EPS EPS files can be included in documents produced by many word processing and text formatting programs Documents conforming to the Adobe Document Structuring Convention Specifications Version 3 for Encapsulated PostScript have the following first line PS Adobe 3 0 EPSF 3 0 They must also include a BoundingBox comment No
93. CHAPTER WELCOME TO PLus GETTING STARTED Running S PLUS 8 Command Line Editing 12 Getting Help in S PLUS 15 S PLUs Language Basics 18 Importing and Editing Data 33 Graphics in S PLUS 41 Statistics 47 This chapter provides basic information that everyone needs to use S PLUS effectively It describes the following basic tasks Starting and quitting S PLUS Getting help Using fundamental elements of the S PLUS language such as basic operators assignments function calls etc Creating and manipulating basic data objects Opening graphics windows and creating basic graphics CHAPTER 2 GETTING STARTED RUNNING S PLus This section covers the basics of starting S PLUS opening windows for graphics and help and the basics of constructing S PLUS expressions Starting S PLUS To start S PLUS type the following at the UNIX shell prompt and press the and Entering Expressions RETURN key Splus Note that only the S is capitalized When you press RETURN a copyright message appears in your S PLUS window followed the first time you start S PLUS with a message about initializing a new S PLUS user These messages are followed by the S PLUS prompt Splus S PLUS Copyright c 1988 1998 MathSoft Inc S Copyright Lucent Technologies Inc Version 5 0 for Sun SPARC SunOS 5 3 1998 Working data will be in gt You use S PLUS by typing expressions after the prompt and pressing the return key You t
94. Characteristics The PostScript options that have the greatest immediate impact on what you see are those affecting the PostScript graphic s plotting characteristics These options include the following e fonts A vector of character strings specifying all available fonts e colors A numeric vector or matrix assigning actual colors to the color numbers used as arguments to graphics functions This option is discussed in more detail in the next section e image colors Same as colors but for use with the image function e background A numeric vector giving the color of the background as in colors background can also be a single number that is used as an index to the colors argument if it is positive or if it is negative specifies no background at all Creating Color Creating PostScript graphics in color is no more difficult than creating color PostScript graphics on your windowing graphics device With the xgetrgb function Graphics you can copy the color map from the current motif device and use it for PostScript output The following steps show how to print graphics from a motif window to a PostScript printer using the same color map 1 Start the graphics window gt motif 2 Set the color scheme using the Color Scheme dialog box accessible from the Options menu See the section The Options Menu page 295 for complete details 3 Plot the graphic in the graphics window gt image voice five 4 Capture the colors
95. DEVICES Available Colors Under XII Viewing Color Names Listed in rgb txt 306 To specify color schemes for the motif device use the Color Scheme Specifications window To specify a color scheme you must create a list of colors There are two ways to list colors in a color scheme e Use color names listed in the system file rgb txt e Use hexadecimal values that represent colors in the RGB Color Model The first method is a front end to the second method it is easier to use but you are limited to the colors listed in the rgb txt file The second method is more complex but it allows you to specify any color your display is capable of producing Both methods are described below The initial set of colors is set system wide at installation Any changes you make using the Color Scheme Specifications window override the system values This remains true even if system wide changes are installed The rgb txt file contains a list of predefined colors that have been translated from a hexadecimal code into English text To see what the available color names are you can either look at the rgb txt file with a text editor or you can use the showrgb command coupled with a paging program like more by typing the following command showrgb more The rgb txt file is usually located in the directory usr lib X11 To move into this directory type the command cd usr lib X11 GRAPHICS WINDOW DETAILS Hexadecimal Co
96. Default Panel Function A Panel Function for a Multipanel Display Special Panel Functions Commonly Used S PLUS Graphics Functions and Parameters Panel Functions and the Trellis Settings Superposing Two or More Groups of Values on a Panel Data Structures More on Aspect Ratio and Scales Prepanel Functions More on Multipanel Conditioning Summary of Trellis Functions and Arguments Chapter 8 Working With Graphics Devices Printing Your Graphics Printing with PostScript Printers Printing with HP GL Pen Plotters Creating PDF Graphics Files Managing Files from Hard Copy Graphics Devices Using Graphics from a Function or Script Graphics Window Details Basic Terminology Available Colors Under X11 Chapter 9 Customizing Your S PLUS Session Setting S PLUS Options Setting Environment Variables Customizing Your Session at Start up and Closing Setting S_FIRST Customizing Your Session at Closing Using Personal Function Libraries Creating an S Chapter Placing the Chapter in Your Search Path Specifying Your Working Directory Specifying a Pager Environment Variables and printgraph xiv 242 244 244 246 246 246 247 247 248 249 252 259 262 263 266 271 272 272 283 285 285 286 289 289 306 311 312 314 316 316 317 318 318 319 320 321 322 CONTENTS Index Setting Up Your Window System Setting X11 Resources S PLUS X11 Resources Common Resources for the Motif Graphics Device 324 324 325 325 329
97. ESOURCES Getting Help There are a variety of ways to accelerate your progress with S PLUS and to build upon the work of others This section describes the learning and support resources available to S PLUS users Online Help S PLUS offers an online help system to make learning and using S PLUS easier To get help type he1p or at the S PLUS prompt Printed and Your S PLUS license comes with four manuals this user s guide the S PLUS Online Manuals Guide to Statistics and the S PLUS Installation and Maintenance Guide all of which are also available online as PDF files and the book Programming with Data by John M Chambers Programming with Data is the definitive guide to programming with S Version 4 You can keep up to date with the latest in S programming by visiting the Programming with Data website at http cm bell labs com stat Sbook The web site also includes errata for the book Notes on Online versions of the Guides The Online manuals are viewed using Acrobat Reader which is available for free over the Internet at http www adobe com Add On Modules Add on modules that offer analytical functionality beyond that of the base S PLUS product include DOxX helps in designing and analyzing industrial experiments especially fractional factorial experiments response surface experiments and robust design experiments S GARCH provides an essential suite of tools designed for univariate and multi
98. HAPTER 2 GETTING STARTED 52 gt 2 gt 2 z gt o type lt ordered Type c Smal1 Sporty Compact Medium Large Van par mfrow c 1 1 coplot Fuel Weight o type given values sort unique o type lm fitz lt update im fitl Type Im fit3 lt update Im fit2 Weight Type anova Im fitl Im fit2 Im fit3 summary 1m fit3 IMPORTING AND EXPORTING DATA Importing Data Files Setting the Import Filter Notes on Importing Files Notes on Importing ASCII Delimited ASCID Files Notes on Importing FASCII Formatted ASCID Files Notes on Importing Excel Files Notes on Importing Lotus Files Notes on Importing dBase Files Notes on Importing Data From Enterprise Databases Other Data Import Functions Reading Vector and Matrix Data with scan Reading Data Frames Exporting Data Sets Exporting Data to S PLUS Other Export Functions 54 59 62 62 63 64 64 64 64 67 69 71 72 72 53 CHAPTER 3 IMPORTING AND EXPORTING DATA IMPORTING DATA FILES Data Import Filters One easy method of getting data into S PLUS for plotting and analysis is to import the data file The principal tool for importing data is the importData function Using importData you can select from the following file types to import into S PLUS Default Format Type Erteniiois Notes ASCII ASCII txt csv Formatted ASCII FASCII fix dBase DBASE
99. Here are some simple calculations using the arithmetic operators gt SPE 1 74 ae sa tall i 363 gt 6 5 4 75 I The exponentiation operator is which can be used as follows ZaD 1 8 Some operators work with only one argument and hence are called unary operators For example the subtraction operator can act as a unary operator PAg La SS The colon is an important operator for generating sequences of integers 7 110 I 1 23 4 p amp 7 B yg Table 2 2 lists the S PLUS operators for comparison and logic Comparisons are among the most common sources for logical data gt LPI oS 8 A ae ca ae oe ee es ee Comparisons and logical operations are frequently convenient for extracting subsets of data and conditionals using logical comparisons play an important role in flow of control in functions 27 CHAPTER 2 GETTING STARTED Table 2 2 Logical and comparison operators Operator Explanation Operator Explanation equal to not equal to gt greater than lt less than gt greater than or equal to lt less than or equal to amp vectorized And vectorized Or amp amp control And control Or not Expressions An expression is any combination of functions operators and data objects 28 Thus x lt c 4 3 2 1 is an expression that involves an operator the assignment operator and a function the combine function Here are a few more examples to give you an i
100. MathSoft S PLus 5 FOR UNIX User s Guide September 1998 Data Analysis Products Division MathSoft Inc Seattle Washington Proprietary Notice Copyright Notice Acknowledgements ii MathSoft Inc owns both this software program and its documentation Both the program and documentation are copyrighted with all rights reserved by MathSoft The correct bibliographical reference for this document is as follows S PLUS 5 for UNIX Users Guide Data Analysis Products Division MathSoft Seattle WA Printed in the United States Copyright 1988 1998 MathSoft Inc All Rights Reserved The license management portion of this product is based on lan License Manager Copyright 1989 1998 Rainbow Technologies Inc All Rights Reserved Other portions of the software are copyright Rogue Wave Software and Circle Systems Inc The following notice applies only to X Window System software included in S PLUS X Window System is a trademark of MIT Copyright 1989 by the Massachusetts Institute of Technology Permission to use copy modify distribute and sell this software and its documentation for any purpose is hereby granted without fee provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation and that the name of M I T not be used in advertising or publicity pertaining to distribution of the software wit
101. R GRAPHICS For most exploratory data analysis the complete graphics created by S PLUS with their automatically generated axes tick marks and axis labels serve your needs well Most of the graphics described in the previous sections were created with one step functions such as plot and hist These one step functions are called high level graphics functions If you are preparing graphics for publication or a presentation you need more control over the graphics that S PLUS produces The following sections describe how to customize and fine tune your S PLUS graphics with low level graphics functions and graphics parameters Low level graphics functions do not generate a complete graphic but rather one specific part of a graphic Graphics parameters control the details of the graphics that are produced by the graphics functions including where the graphics appear on the graphics device Many of the examples in this chapter use the following data gt s t seed 12 Pk lt gt repr gt y lt rnorm 12 If you use these statements you will be able to reproduce exactly the plots that use x and y We also use the following data from the built in data set auto stats gt price lt auto stats Price gt mileage lt auto stats 2 163 CHAPTER 6 TRADITIONAL GRAPHICS LOW LEVEL GRAPHICS FUNCTIONS AND GRAPHICS PARAMETERS 164 The section Frequently Used Plotting Options page 126 introduced several low level
102. TED WARRANTY AND RIGHT OF REPLACEMENT IS IN LIEU OF AND YOU HEREBY WAIVE ANY AND ALL OTHER WARRANTIES BOTH EXPRESS AND IMPLIED RELATING TO THE SOFTWARE DOCUMENTATION MEDIA OR THIS LICENSE INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY FITNESS FOR A PARTICULAR PURPOSE TITLE AND NONINFRINGEMENT IN NO EVENT SHALL MATHSOFT BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES INCLUDING BUT NOT LIMITED TO LOSS OF USE LOSS OF REVENUES OR PROFIT LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY THIRD PARTIES EVEN IF MATHSOFT HAS BEEN ADVISED OF THE POSSIBILITIES OF SUCH DAMAGES NO ORAL OR WRITTEN INFORMATION OR ADVICE GIVEN BY MATHSOFT ITS EMPLOYEES DISTRIBUTORS DEALERS OR AGENTS SHALL INCREASE THE SCOPE OF THE ABOVE WARRANTIES OR CREATE ANY NEW WARRANTIES WE DISCLAIM AND EXCLUDE ALL OTHER IMPLIED OR EXPRESS WARRANTIES This warranty gives you specific legal rights which may vary from state to state Some states do not allow the limitation or exclusion of liability for consequential damages so the above limitation may not apply to you MathSoft hereby warns you that due to the complexity of the Software it is possible that use of the Software could lead unintentionally to the loss or corruption of data You assume all risk for such data loss or corruption the warranties provided hereunder do not cover any damage or losses resulting therefrom MathSoft s licensors do not warrant the Software do not a
103. To get a display with both the sample labels and the digit labels you need to create a factor object a grouping variable to use as an additional argument For example if you wish to use the sample number as the grouping variable then create the factor object sample fac as follows gt sample fac lt factor col digits lab sample names and use this factor object as the third argument to dotchart gt dotchart digits digit names sample fac 145 CHAPTER 6 TRADITIONAL GRAPHICS For more information on factor objects see the chapter Data Objects Several other options are available with the dotchart function see the help file for complete details Pie Charts You can make pie charts with the function pie For example you can display the first sample of the digits data as a pie chart and add the subtitle sample 1 by using pie as follows gt pie digitsL 1 names digit names angle seq 45 135 len 5 density 10 sub sample 1 sample 1 Figure 6 13 A pie chart of the digits data As an alternative try replacing digits 1 by digits 2 and digits 3 and replacing sample 1 by sample 2 and sample 3 respectively Several other options are available with the pie function see the help file for complete details Recommendation Although pie charts display all the information about the three samples of random digits they are not as easy to interpret as dot charts and bar p
104. XV CONTENTS xvi WELCOME TO S PLus Introduction Introduction 1 Help Support and Learning Resources 2 Getting Help 2 Welcome to S PLUS 5 0 for UNIX the first release of S PLUS based on the newest version of Lucent Technologies S language S Version 4 As the exclusive licensee of the S language MathSoft has molded the S technology into the most powerful data analysis product available today The S PLUS object oriented environment delivers benefits that traditional language analysis programs simply cant match With S PLUS every data set function or analysis model is treated as an object which makes it easy to examine and visually explore data run functions one step at a time and visually compare models for fit S PLUS gives you immediate feedback because it runs functions one at a time With S PLUS you ve got control over every step of your analysis Visually compare different models for fit re explore your data for outliers or other factors that might influence a result and document every analysis function Because S PLUS puts you in control you ll have complete confidence in the quality of your results When your analysis requires a new method or approach you can modify existing methods or develop new ones with the programming language By tapping into the power flexibility and extensibility of S PLUs you can take your analysis to a new level CHAPTER WELCOME TO S Plus HELP SUPPORT AND LEARNING R
105. You calla function by typing an expression consisting of the name of the function followed by a pair of parentheses which may enclose some arguments separated by commas For example runif is a function which produces random numbers uniformly distributed between 0 and 1 To get S PLUS to compute 10 such numbers type runi f 10 gt runif 10 1 0 6033770 0 4216952 0 7445955 0 9896273 0 6072029 6 0 1293078 0 2624331 0 3428861 0 2866012 0 6368730 S PLUS displays the results computed by the function followed by a new prompt In this case the result is a vector object consisting of 10 random numbers generated by a uniform random number generator The square bracketed numbers here 1 and 6 help you keep track of how many numbers are displayed on your screen and help you locate particular numbers One of the functions in S PLUs that you will use frequently is the function c which allows you to combine data values into a vector For example gt 3 7 100 103 1 3 7 100 103 gt c T F F T T 1 TFFFTT 25 CHAPTER 2 GETTING STARTED Operators 26 gt c sharp teeth COLD PAWS 1 sharp teeth COLD PAWS gt c sharp teeth COLD PAWS 1 sharp teeth COLD PAWS The last example illustrates that either the double quote character or the single quote character can be used to delimit character strings Usually you want to assign the result of the c function to an object with another name
106. a Set gauss Arranging Several Graphs On One Page Multipanel Conditioning A Data Set barley About Multipanel Display Columns Rows and Pages Packet Order and Panel Order layout Argument Main Effects Ordering Summary The Layout of a Multipanel Display A Data Set ethanol Conditioning on Discrete Values of a Numeric Variable Conditioning on Intervals of a Numeric Variable Scales and Labels 3 D Display aspect Argument Changing the Text in Strip Labels Panel Functions How to Change the Rendering in the Data Region Passing Arguments to a Default Panel Function A Panel Function for a Multipanel Display Special Panel Functions Commonly Used S PLUS Graphics Functions and Parameters Panel Functions and the Trellis Settings 202 204 204 204 206 207 208 210 210 223 228 230 230 230 230 231 233 235 237 237 237 239 242 244 244 246 246 246 247 247 248 249 201 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS A Roadmap of Trellis Graphics Getting Started with Trellis General Display Functions Common Arguments Panel Functions 202 Superposing Two or More Groups of Values on a Panel 252 Data Structures 259 More on Aspect Ratio and Scales Prepanel Functions 262 More on Multipanel Conditioning 263 Summary of Trellis Functions and Arguments 266 Trellis Graphics provide a comprehensive set of display functions that are a popular alternative to using the traditional S PLUS graphics functions described in t
107. a are all on the same scale There are three general ways to overlay figures in S PLUS 1 Call a high level plotting function then call one of the high level plotting functions that can be used as a low level plotting function by specifying the argument add T 2 Calla high level plotting function set the graphics parameter new T then call another high level plotting function 3 Use the subplot function We discuss each of these methods below There are currently four plotting functions that can act as either high level or low level plotting functions usa symbols image and contour By default these functions act like high level plotting functions to make them act like low level plotting functions set the argument add T For example you can put up a map of the northeastern U S with a call to usa then overlay a contour plot of ozone concentrations with a call to contour by setting add T gt usa xlim range ozone xy x ylim range ozone xy y 1ty 2 col 2 gt contour interp ozone xy x ozone xy y ozone median add T gt title Median Ozone Concentrations in the North East Another way to overlay figures is to reset the new graphics parameter Whenever a graphics device is initialized the graphics parameter new is set to TRUE meaning that this is a new graphics device so it is assumed there are currently no plots on it In this case a call to a high level plotting function will mot erase the canvas before
108. a are qualitative rather than quantitative or numeric If observations can be assigned only to a category rather than given a specific numeric value they are termed qualitative or categorical The values assigned to these variables are typically short character descriptions of the category to which the observation belongs The following lists some examples of categorical variables e gender where the values are male and female e marital status where the values might be single married wow separated divorced e experimental status where the values might be treatment and Ww Ww control Categorical data in S PLUS is represented with a data type called a factors The data frame fuel frame has a variable named Type which classifies each automobile as either Smal1 Sporty Compact Medium Large or Van gt fuel frame Type 1 Smal Small Small Small Small Small Small 8 Small Small Small Small Small Small Sporty 15 Sporty Sporty Sporty Sporty Sporty Sporty Sporty 22 Sporty Compact Compact Compact Compact Compact Compact 29 Compact Compact Compact Compact Compact Compact Compact 36 Compact Compact Medium Medium Medium Medium Medium 43 Medium Medium Medium Medium Medium Medium Medium 50 Medium Large Large Large Van Van Van 57 Van Van Van Van When you print a factor the values correspond to the evel of the factor for each data point or observation Internally a factor keeps track of the lev
109. ach level in a factor call summary gt summary fuel frame Type Compact Large Medium Small Sporty Van 15 3 13 13 9 7 Creating To create a factor use the factor function The factor function takes data Factors with categorical values and creates a data object of class factor For example you can categorize a group of 10 students by gender as follows gt classlist lt c male female male male male female female male female male 91 CHAPTER 4 DATA OBJECTS 92 gt Factartelass 1st 1 male female male male male female female male 9 female male S PLUS creates two levels with labels female and male respectively Table 4 2 Arguments to factor Argument Description x data to be thought of as taking values on the finite set of levels levels optional vector of levels for the factor The default value of levels is the sorted list of distinct values of x labels optional vector of values to use as labels for the levels of the factor The default is as character levels exclude a vector of values to be excluded from forming levels The levels argument allows you to specify the levels you want to use or to order them the way you want For example if you want to include certain categories in an analysis you can specify them with the levels argument Any values omitted from the levels argument are considered missing gt intensity x facter c Hi Med L
110. acter datatype 116 character function 80 character strings delimiting 26 character values 77 city name data set 193 city x dataset 193 city y dataset 193 329 INDEX class 18 class attribute 90 116 cloud function 226 codes function 91 col parameter 135 245 columns argument 256 combining data frames 104 by column 104 by row 106 merging 107 rules 116 command line editing 12 command line editor 12 command recall 14 example 13 startup 12 table of keystrokes 12 Commonly Used S PLUS Graphics Functions and Parameters 248 complex function 80 complex values 77 composite figures 191 Conditioning On Discrete Values of a Numeric Variable 237 Conditioning On Intervals of a Numeric Variable 239 conditioning variables 230 continuation 10 contour function 158 contourplot function 223 Controlling the Pages of a Multipage Display 236 corn rain dataset 192 CSi parameter 175 cuts argument 224 D data editing 33 importing 33 with import data function 33 reading from a file 33 data argument 206 data array 155 330 data frames 97 adding new classes of variables 116 applying functions to subsets 110 attributes 117 combining objects 102 dimnames attribute 101 row names 101 rules for combining objects 116 data objects 97 combining 25 editing 34 data class function 116 data frame datatype 116 data frame function 99 datax horizontal screen axis 225 datay vertical screen axis 225 dataz function 223 dataz perpendicular sc
111. al for most functions but are required for functions and operators containing special characters such as lt To get the most information from the S PLUS help system you should become familiar with the general arrangement of help files Help files are organized as follows not all files contain all sections e DESCRIPTION A brief description of the function s main use e USAGE Provides the correct syntax for a call to the function Arguments for which just the argument name is given are required while arguments stated in the form name value are optional arguments where the given value is the default value GETTING HELP IN S Plus REQUIRED ARGUMENTS Lists arguments required in every call to the function If not supplied an error results OPTIONAL ARGUMENTS Lists arguments that may be supplied in a call to the function If not supplied default values are used SIDE EFFECTS Lists any effects of the function other than returning a value DETAILS Documents some of the computational details describing the implementation of the function REFERENCES References to scientific literature or books which describe in further detail the methodology or interpretation of the results of this function SEE ALSO Lists related S PLUS functions EXAMPLES Gives examples of use of the function 17 CHAPTER 2 GETTING STARTED S PLus LANGUAGE BASICS Data Objects 18 This section introduces the most basic conce
112. an also specify settings for these graphics devices by setting X11 resources The motif graphics device uses resources of the X Window System Version 11 or X11 This section describes how to customize your graphics windows by setting X11 resources There are a number of ways you can set resources for X11 applications You should talk with your system administrator about the way that is preferred on your system This section describes one of the most flexible methods of setting X11 resources using the xrdb command As with other X11 programs before you can run the xrdb command you must give it permission to access your display To do this you need to first specify your display server which controls the access to your display and then explicitly give access to that server to the host on which you run xrdb If you are running the C shell the network name of the computer or terminal you are sitting at is displayserver and the network name of the machine on which you run xrdb is remotehost you can give the appropriate permission with the following commands setenv DISPLAY displayserver 0 xhost remotehost The setenv command sets the DISPLAY environment variable to your window server so that every X11 program knows where to create windows The xhost command gives the specified computer permission to create a window on your display The xrdb command takes a file of X11 resources as its argument and creates an X11 Resource Database
113. an murder rate by region You can use tapply as follows gt tapply state x77 Murder state region mean Northeast South North Central West 4 722222 10 58125 5 275 72153685 APPLYING FUNCTIONS TO SUBSETS OF A DATA FRAME To compute the mean murder rate by region and income use tapply as follows gt income lev lt cut state x77 Income summary state x77 Income 4 gt income lev 1 1 4 3 1 4 4 4 3 4 2 4 2 4 2 3 3 1 18 1 1 4 3 3 3 NA 2 2 2 4 2 4 1 4 1 4 36 3 1 2 2d i22ei s4i12 3 attr levels 1 3098 thru 3993 3993 thru 4519 3 4519 thru 4814 4814 thru 6315 gt tapply state x77 Murder list state region income lev mean 3098 thru 3993 3993 thru 4519 Northeast 4 10000 4 700000 South 10 64444 13 050000 North Central NA 4 800000 West 9 70000 4 933333 4519 thru 4814 4814 thru 6315 Northeast a sb 6 40 South 7 85 9 60 North Central 5 52 5 85 West 6 30 8 40 115 CHAPTER 5 DATA FRAMES ADDING NEW CLASSES OF VARIABLES TO DATA FRAMES The manner in which objects of a particular data type are included in a data frame is determined by that types method for the generic function as data frame The default method for this generic function uses the data class function to determine an objects type Thus even data types without formal class attributes such as vectors or character vectors can have specific methods The behavior for most built in types is derived
114. an only have one color value Refer to the section Available Colors Under X11 page 306 for information on available color names Now move the pointer to the Lines box and type in the desired color name s Repeat the previous step for the Text Polygons and Images boxes To make this color scheme permanent move the pointer to the Save button and click If you do not save your newly created color scheme it remains only for the duration of the graphics window Once the graphics window is destroyed you lose any color schemes that have not been saved Move the pointer to the Apply button and click The plot in the graphics window is now based on your newly created color scheme To see the new plot move the dialog box out of the way or click on the Close button to make the dialog box disappear GRAPHICS WINDOW DETAILS Available Color Schemes ff Color Scheme Specifications Name unnamed color scheme 3 p i f ff Background black Lines white Text Polygons Images A Figure 8 6 Creating a new color scheme The Reset Button Any time you are in the Color Scheme dialog box you may move the pointer to the Reset button and click If you have not yet clicked on the Apply button then the Available Color Schemes menu and Color Scheme Specifications editor are set to how they were when you first entered the dialog box If you have at some time clicked on the Apply button then the color schemes are reset to how
115. an then add axis labels using title gt title xlab Gallons per Trip ylab Miles per Trip The limits of the x axis and the y axis are set automatically by the S PLUS plotting functions However you may wish to choose your own axis limits to make room for adding text in the body of a plot as described in the section Interactively Adding Information to Your Plot page 137 For example gt plot co2 automatically determines y axis limits of roughly 310 and 360 giving just enough vertical room for the plot to fit inside the box You can make more vertical or horizontal room in the plot by using the optional arguments ylim and x1im To get y axis limits of 300 and 370 use gt plot co2 ylim c 300 370 You can change the x axis limits as well for example gt plot co2 xlim c 1955 1995 You can use both xlim and ylim at the same time S PLUS rounds your specified axis limits to sensible values You may also want to set axis limits when you are making multiple plots as described in the section Multiple Plot Layout page 126 For example after creating one plot you may wish to make the x axis and y axis limits the same for all of the plots in the set You can do so by using the function par as follows 129 CHAPTER 6 TRADITIONAL GRAPHICS Logarithmic Axes Plot Types 130 gt par xaxs d yaxs d If you want to control the limits of only one of the axes you drop one of the two arguments as appropria
116. appear and the text itself More generally you can specify vectors of x and y coordinates and a vector of text labels Thus in our example you type gt plot car miles car gals gt Dext 275 22 0utliers The text Outliers is centered on the xy coordinates 275 22 You can guess the coordinate values by eyeballing the spot on the plot where you want the text to go However this approach to locating text is not very accurate and you can do better using the locator function within text The locator function allows you to use the mouse cursor to accurately identify the location of any number of points on your plot When you use locator S PLUS waits for you to position the mouse cursor and click the left mouse button and then it calculates the coordinates of the selected point The argument to locator specifies the number of times the text is to be positioned For example we could have applied text and locator together as follows to obtain much the same result as before gt text locator 1 Outliers Suppose that you want to improve the graphical presentation by drawing a straight line from the text Outliers to each of the three data points which you regard as outliers You can add each such line one at a time with the following expression gt locator n 2 type 1 S PLUS now awaits your response Locate the mouse cursor at the desired starting point for the line and click the left button Move the mouse
117. arameters are initialized whenever a graphics device is started a change via par applies only to the current device You can write your own Device Default function to have one or more parameters set automatically when you start a graphics device see the Device Default help file e Information parameters give information about the state of the device but may not be changed directly by the user An example is din the size of the current device in inches See the par help file for descriptions of the information parameters LOW LEVEL GRAPHICS FUNCTIONS AND GRAPHICS PARAMETERS The arguments to title main sub xlab and ylab while not graphics parameters are quite similar to them They are accepted as arguments by several graphics functions as well as the title function Table 6 10 on page 197 summarizes the S PLUS graphics parameters Warning Some graphics functions do not recognize certain high level or general graphics parameters The help files for these functions describe which graphics parameters the functions will accept 165 CHAPTER 6 TRADITIONAL GRAPHICS SETTING AND VIEWING GRAPHICS PARAMETERS There are two ways to set graphics parameters 1 Use the name value form either within a graphics function call or with the par function For example gt par tnitrow c 2 1 cex 5 gt ploty pen Li gt plot price mileage log y Note that you can set several graphics parameters simulta
118. arameters of a change in mex from 1 to 2 Table 6 4 Effect of changing mex Parameter mex 1 mex 2 mar 5 14 1 4 1 2 1 5 1 4 1 4 1 2 1 mai 0 714 0 574 0 574 0 294 1 428 1 148 1 148 0 588 oma 0050 0 0 0 0 2 5 0 0 omi 0 000 0 000 0 699 0 000 0 000 0 000 0 699 0 000 CONTROLLING GRAPHICS REGIONS From the table we see that an increase in mex leaves mar and omi unchanged while mai is increased and oma is decreased When you shrink margins with mar be sure to check the mgp parameter which determines where axis and tick labels are placed if the margins don t provide room for those labels the labels are not printed and you receive a warning from S PLUS Controlling the To determine the shape of the plot use the pty layout graphics parameter Plot Area plot type The pty parameter has two possible values m for maximal and s for square By default plots fill the entire space allowed for the plot pty m Another way to control the shape of a plot is with pin which gives the width and height of the plot in inches 173 CHAPTER 6 TRADITIONAL GRAPHICS CONTROLLING TEXT IN GRAPHICS Controlling Text and Symbol Size 174 The section Interactively Adding Information to Your Plot page 137 described how to add text and legends to existing plots This section describes how to control the size of text and plotting symbols the placement of text within the plot area and the width of lines in the plot
119. are Software labeled as an upgrade replaces and or supplements the product that formed the basis of your eligibility for the upgrade You may use the resulting upgraded product only in accordance with the terms of this license which supersedes all prior agreements MathSoft reserves all rights not expressly granted to you by this License Agreement The license granted herein is limited solely to the uses specified above and without limiting the generality of the foregoing you are NOT licensed to use or to copy all or any part of the Software or the documentation in connection with the sale resale license or other for profit personal or commercial reproduction or commercial distribution of computer programs or other materials without the prior written consent of MathSoft You will not export or re export the Software without the appropriate United States and or foreign government licenses MathSoft warrants that the media on which the Software is recorded will be free from defects in materials and workmanship under normal use for a period of ninety 90 days from the date of purchase as evidenced by a copy of your receipt The liability of MathSoft pursuant to this limited warranty shall be limited to the replacement of the defective media If failure of the media has resulted from accident abuse or misapplication of the product then MathSoft shall have no responsibility to replace the media under this limited warranty THIS LIMI
120. area The size of text and most plotting symbols is controlled by the general graphics parameter cex character expansion The expansion refers to expansion with respect to the graphics device s default font By default cex is set to 1 so graphics text and symbols appear in the default font size When cex 2 text appears at twice the default font size Some devices however have only a few fonts available so that all values of cex in a certain range produce the same font See the chapter Customizing Your S PLUS Session for information on how to control available fonts on your display device Many graphics functions and parameters use or modify cex For example main titles are written with a cex of 1 5 times the current cex The mfrow parameter sets cex to 1 for a small number of plots fewer than three per row or column but sets it to 0 5 for a larger number of plots The cex parameter controls the size of plotting symbols Plotting symbols of various sizes can be shown on a single figure as shown in figure 6 28 which shows how symbols of different sizes can be used to highlight groups of data Figure 6 28 is produced with the following expressions gt plot x y gt points x x y gt 2 median x y yLx y gt 2 median x y cex 2 gt points x x y lt median x y yLx y lt median x y pch 18 cex 2 CONTROLLING TEXT IN GRAPHICS Controlling Text Placement 0 2 0 4 0 6 0 8 x Figure 6 28 Symbols of differ
121. argin If you try to write text with cex 2 it will not fit because the text is twice as high as the specified margin line TEXT IN FIGURE MARGINS To specify the position of the text along the margin you can use the at argument with the mtext command argument The value of the at argument is in units of the x or y coordinates depending on whether you are placing text on the top or bottom margin sides 1 and 3 or the left or right margin sides 2 and 4 As described in section Controlling Text Placement page 175 if you can t determine the appropriate value of the at argument you can look at the usr coordinates graphics parameter For example the following command puts text in the lower left hand corner of the figure margin of figure 6 30 gt pari usr 1 0 1758803 0 9420847 2 2629721 1 5655365 gt mtext A comment line 3 side 1 at 3 By default mtext centers text along the margin or if the at argument is supplied at the at coordinate You can also use the adj parameter to place text along the margin The default setting is adj 0 5 centered text Set adj 0 to set the text flush with the left side of the margin or at coordinate adj 1 to set the text flush right Values between 0 and 1 set the text with the specified fraction of white space placed before the text the remaining white space placed after the text Note The adj parameter is generally more useful than usr coordinates when writing in the o
122. ask T MULTIPANEL CONDITIONING Summary The To lay out a multipanel display in a certain way you specify the following Layout of a Multipanel Display A Data Set ethanol Conditioning on Discrete Values of a Numeric Variable e An ordering of the conditioning variables by the order you enter them in the argument formula e An ordering of the levels of each factor possibly by creating an ordered factor e The number of columns rows and pages through the argument layout The data frame ethanol contains three variables from an industrial experiment with 88 runs gt names ethanol 1 NOx i Fii ai S gt dim ethanol 1 88 3 The concentrations of oxides of nitrogen NOx in the exhaust of an engine were measured for different settings of compression ratio C and equivalence ratio E These measurements were part of the same experiment that produced the measurements in the data frame gas introduced in the section A Data Set gas page 204 For the barley data the explanatory variables are factors so it is natural to condition on the levels of each factor This is not the case for the ethanol data both explanatory variables C and E are numeric Suppose that for the ethanol data we want to graph NOx against E given C The variable C has five unique values in other words the variable while numeric is discrete gt table ethanol C 7 5 9 12 15 18 22 17 14 19 16 It makes sense then to conditi
123. ata frame vector You can create new methods from scratch provided they have the same arguments as as data frame gt as data frame function x row names NULL optional F UseMethod as data frame The argument allows the generic function to pass any method specific arguments to the appropriate method If you ve already built a function to construct data frames from a certain class of data you can use it in defining your as data frame method Your method just needs to account for all the formal arguments of as data frame For example suppose you have a class loops and a function make df 1oops for creating data frames from objects of that class You can define a method as data frame loops as follows gt as data frame loops function x row names NULL optional F x lt make df loops x if is null row names row names lt as character row names if length row names nrow x stop paste Provided length row names names for nrow x rows attr x row names lt row names This method takes account of user supplied row names but ignores the argument optional a flag that is TRUE when the method is not expected to generate non trivial row names or variable names for a calling function 117 CHAPTER 5 DATA FRAMES 118 TRADITIONAL GRAPHICS Introduction Getting Started with Simple Plots Plotting a Vector Data Object Plotting Mathematical Functions Cr
124. ata xlab Predictor ylab Response 13 CHAPTER 2 GETTING STARTED To recall this command type CTRL R plot The complete command is restored to your command line You can then use other editing commands to edit it if desired or press RETURN to issue the command 14 GETTING HELP IN S Plus GETTING HELP IN S PLus If you need help at any time during an S PLUS session you can obtain it easily with the and help functions The function has simpler syntax it requires no parentheses in most instances 1m Fit Linear Regression Model DESCRIPTION Returns an object of class Im or represents a fit of a linear model mim that USAGE lm formula data lt lt see below gt gt weights lt lt see below gt gt subset lt lt see below gt gt na action na fail method qr model F x F y F contrasts NULL REQUIRED ARGUMENTS formula a formula object with the response on the left of a operator and the terms separated by operators on the right OPTIONAL ARGUMENTS data a data frame in which to interpret the variables named in the formula or in the subset and the weights argument Paging with less hit q to quit lt space gt to continue or use vi commands Both and help use the less pager provided with S PLUS to display the requested help You can use the d and u keys to page down and up respectively use the q key to exit help and return to the S PLUS prompt The
125. be especially careful not to name one of your own functions C or t as these are functions used frequently in By now you are familiar with the most basic object in S PLUS the vector which is a set of numbers character values logical values etc Vectors must be of a single mode i e you cannot have a vector consisting of the values T 2 3 If you try to create such a vector S PLUS coerces the elements to a common mode For example 2 CCT 372 3 Fi 160 2 3 Vectors are characterized by their ength and mode Length can be displayed with the length function and mode can be displayed with the mode function An important data object type in S PLUS is the two way array or matrix object For example 3 0 Zal J G ae sa 220 729 10 0 16 1 Daa l0 6 9 Matrices and their higher dimensional analogues arrays are related to vectors but have an extra structure imposed on them S PLUS treats these objects similarly by having the matrix and array classes inherit from another virtual class the structure class To create a matrix use the matrix function The matrix function takes as arguments a vector and two numbers which specify the number of rows and columns PLus LANGUAGE BASICS For example gt matrix 1 12 nrow 3 ncol 4 Lobd Eee Let Lot 1 il 4 Pi 10 2 2 5 8 11 Lsd 3 6 9 12 The first argument to matrix is a vector of integers from 1 through 12 The second and third arguments are the number of r
126. be controlled by the arguments xlim and ylim or by the argument scales Another argument prepanel is a function that supplies information for the banking and range calculations The code below will plot the ethanol data NOx is graphed against E given C and loess curves have been superposed gt xyplot NOx E C data ethanol aspect 1 2 panel function x y panel xyplot x y panel loess x y span 1 2 degree 2 There are now two things we would like to do with this plot one involving the aspect ratio and the other involving the ranges of the scales First we have set the aspect ratio to 1 2 using the aspect argument We could have set the aspect argument to xy to carry out 45 degrees banking of the line segments that connect the points of the plot that is the graphed values of E and NOx But normally we do not want to carry out banking of the raw data if they are noisy rather we want to bank an underlying smooth pattern In this example we want to bank using the line segments of the loess curves Second in the top panel the loess curve exceeds the maximum value along the vertical scale and so is chopped off It is important to understand why this happened The scales where chosen based on the values of E and NOx The loess curves were computed by the panel function after all of the scaling had been carried out We would like a way for the scaling to take account of the values of the loess curve The argument prepane
127. bers The default value is digits 7 SETTING S Plus OPTIONS pager tells S PLUS what pager program to use in such places as the help and page functions The default for pager is the value of environment variable S_PAGER which in turn defaults to the value of environment variable PAGER or less if that is not set See the options help file for a complete description of the available options If you want to set an option each time you start a session see the section Customizing Your Session at Start up and Closing page 316 You can also determine the value of any option with options For example to find the current value of the echo option type the following expression at the gt prompt gt options echo S PLUS answers with the following options echo echo 1 T Because echo is true we set it in the first paragraph of this section S PLUS prints the command you type in before returning the requested value 313 CHAPTER 9 CUSTOMIZING YOUR S PLUS SESSION SETTING ENVIRONMENT VARIABLES The following is a list of the environment variables recognized by S PLUS You are not required to set them Table 9 1 Variables Variable Description ALWAYS_PROMPT Chiefly affects the actions of the parse function Normally parse prompts for input only when the input appears to be coming from a terminal When ALWAYS_PROMPT is set to anything at all parse prompts even if the standard
128. bustness in the sense that the least squares method is very sensitive to outliers A robust method is one which is not affected very much by outliers and which gives a good fit to the bulk of the data Once you have created a plot you may want to add additional data to it For example you might plot an additional data set with a different line type or plotting character Or you might add a statistical function such as a smooth curve fit to the data already in the plot To add data to a plot created by plot INTERACTIVELY ADDING INFORMATION TO YOUR PLOT you use one of the two functions points or lines These functions are virtually identical to plot except that they plot without creating a new set of axes The points function is used to add data points while lines is used to add lines All the arguments to plot that weve discussed so far including type pch and 1ty work with points and lines exactly as before This means that you can choose line types and plotting characters as you wish You can even make line type plots with points and points type plots with lines For example suppose you plot the built in data set co2 which gives monthly levels of carbon dioxide at the Mauna Loa volcano from January 1959 to December 1990 gt plot co2 By default plot uses points to plot the data The plot function recognizes that co2 is a time series data set consisting of monthly measurements and provides appropriate yearly labels o
129. can obtain the desired coordinates by interpolating from the values in the layout parameter usr The usr parameter gives the minimum and maximum of the x and y coordinates Controlling Two graphics parameters crt character rotation and srt string rotation Text control the orientation of text in the plot region and the figure and outer i margins Figure 6 29 shows the result of typing the following commands after Orientation starting a postscript device gt plot 1 10 type n gt text 2 2 srt 0 crt 0 srt 0 crt 0 gt text 4 4 srt 0 crt 90 srt 0 crt 90 gt text 6 6 srt 90 crt 0 srt 90 crt 0 gt text 8 8 srt 90 crt 90 srt 90 crt 90 o j e i 0 4 oO 4 t a r fon i N eo S 8 T t r s 4 DH lO OH IDO N 4 srt 0 crt 0 T T T T T 2 4 6 8 10 Index Figure 6 29 Character and string rotation The postscript device is the only graphics device that uses both the crt and srt graphics parameters All other graphics devices ignore crt so you can rotate only the whole string with srt 176 CONTROLLING TEXT IN GRAPHICS Warning If you use both crt and srt in a plotting command while running the postscript device you must supply crt afier srt otherwise it will be ignored Controlling The width of lines both within a plot and in the axes is controlled by the Line Width general graphics parameter Iwd The default value of lwd is 1 larger numbers pr
130. ce and by the printgraph function these options are discussed in the section Setting PostScript Options page 279 The append onefile and print it arguments however are specific to calls to postscript The onefile argument is specified as a logical value which defaults to TRUE By default when you start the postscript device explicitly plots are accumulated into a single file as given by the file argument If no file argument is specified the file is named using the template specified in ps options tempfile When onefile is FALSE a separate file is created for each plot and the PostScript file created is structured as an Encapsulated PostScript document See the section Creating Encapsulated PostScript Files page 277 for further details The append option is a logical value that specifies whether PostScript output is appended to file if it already exists In addition to appending the new graphics S PLUS edits the file to comply with the PostScript Document Structuring Conventions If append FALSE new graphics output writes over the existing file destroying its previous contents You can use the print it argument to specify that the graphic created on the postscript device be both sent to the printer and written to a file as follows gt postscript tile nystutf2 ps print i1t T gt plot corn rain gt title A plot created with postscript gt dev off Starting to make postscript file null device 1 gt l
131. component selection C CL subscripts elements s exponentiation unary minus sequence operator hh A b modulus integer divide matrix multiply multiply divide 29 CHAPTER 2 GETTING STARTED Table 2 3 Precedence of operators Continued Operator Use add subtract lt lt gt s comparison not amp amp amp and or formulas Se gt SE assignments Note 30 When using the operator if the base is a negative number the exponent must be an integer Among operators of equal precedence evaluation proceeds from left to right within an expression Whenever you are uncertain about the precedence hierarchy for evaluation of an expression you should use parentheses to make the hierarchy explicit S PLUS shares a common feature of many computer languages that the innermost parentheses are evaluated first and so on until the outermost parentheses are evaluated In the following example we assign the value 5 to a vector of length 1 called x We then use the sequence operator and show the difference between how the expression is evaluated with and without parentheses In the expression 1 x 1 x 1 is evaluated first and 4 is the result S PLUS displays the integers from 1 to 4 gt x lt 5 gt LIX 1 1234 However when the parentheses are left off the operator has greater precedence than the operator and so the expression 1 x 1
132. copy graphics devices a plot is sent to a plot file not when initially requested but only after a subsequent high level graphics command is issued a new frame is started the graphics device is turned off or you quit S PLUS To write the current plot to a plot file assuming you have started the graphics device with the appropriate file option you must do one of the following e Make another plot assuming a single figure layout e Call the function frame again assuming a single figure layout e Call the function dev off to turn off the current graphics device e Call the function graphics off to turn off all of the active graphics devices e Quit S PLUS 285 CHAPTER 8 WORKING WITH GRAPHICS DEVICES Once you have created a graphics file you can send it to the printer or plotter without exiting S PLUS by using the following procedure 1 Type to escape to UNIX 2 Type the appropriate printing command and then the name of the file 3 Type a carriage return To remove graphics files after sending them to the plotter without exiting S PLUS 1 Type to escape to UNIX 2 Type rm file where file is the name of the graphics file you want removed 3 Type a carriage return Using Graphics Most experienced users of S PLUS use a function or script to construct from a complicated plots for presentation or publication This method lets you use the motif display device to preview the plots on your
133. create a list with two components one for rows and one for columns and assign them using the dimnames function gt dimnames mat lt list paste row letters 1 3 paste col LETTERS 1 41 gt mat col A col B col col D row a 1 2 3 4 row b 1 2 3 4 row c 1 2 3 4 In the example above letters and LETTERS are character vectors with values the letters of the alphabet in lower and upper case respectively The character strings row and col are replicated to match the length of vectors containing the letters for labeling The paste function binds values into a single character string To suppress either row or column labels use the NULL value for the corresponding component of the list For example to suppress the row labels and number the columns gt dimnames mat lt list NULL paste col 1 4 gt mat col 1 col 2 col 3 col 4 Lied 1 rd 3 4 feed 1 g 3 4 35 1 2 3 4 To specify the row and column labels when defining a matrix with matrix use the optional argument dimnames as follows gt mat2 lt matrix 1 12 ncol 4 dimnames list NULL paste col 1 4 A second set of functions for working with matrices is described in the chapter The Object Oriented Matrix Library of the Guide to Statistics The library includes contstructor functions for a Matrix class and numerous subclasses and methods for many matrix computations based on the LAPACK library of numerical Fortran routines ARRAYS
134. ctively adding information to your plot e Bar plot pie chart and dot chart type presentation graphics e Visualizing the distribution of your data e Visualizing correlation in your time series data e Using multiple active graphics devices We recommend that you read the first two sections carefully before proceeding to any of the other sections In addition to the graphics features described in this chapter S PLUS includes the Trellis Graphics library Trellis Graphics features additional functionality such as multipanel layouts and improved 3 D rendering See the chapter Traditional Trellis Graphics for more information 121 CHAPTER 6 TRADITIONAL GRAPHICS GETTING STARTED WITH SIMPLE PLOTS This section helps you get started with S PLUS graphics by using the function plot to make simple plots of your data You use the function plot to make plots of vector data objects plots of mathematical functions and scatter plots of two vector data objects i e plots of the values of one variable against the values of another variable Plotting a You can graphically display the values of a batch of numbers or graphically display Vector Data observations using the function plot For example you obtain a graph of Obiect the built in vector data object car gals using plot as follows jec gt plot car gals e wo N o g n gt eo 2 o e e ee oe e e e oso e esee ot 0 nA
135. cursor to the desired ending point for the line and click the left button again S PLUS then draws a straight line between the two points Often you make plots which contain one or more sets of data displayed with different plotting characters or line types In such cases you probably want to provide a legend which identifies each of the plotting characters or line types For example if you use gt plot smooth co2 type 1 gt points co2 pch INTERACTIVELY ADDING INFORMATION TO YOUR PLOT to plot the data shown in figure 6 10 you probably want to add the legend shown in the figure To do this first make a vector leg names which contains the character strings co2 and smooth of co2 and then use legend as follows gt leg names lt c co2 smooth of co2 gt legend locator 1 leg names pch 1ty c 0 1 a co2 aft B smooth of co2 ane 7 Ht o Hgt a serie O m m O N 4 m 1960 1965 1970 1975 1980 1985 1990 Time Figure 6 10 Plot with added legend S PLUS now waits for you to respond Move the mouse cursor to the location on the plot where you want to place the upper left corner of the legend box then click the left mouse button 141 CHAPTER 6 TRADITIONAL GRAPHICS MAKING BAR PLOTS DOT CHARTS AND PIE CHARTS Bar plots and pie charts are familiar methods of graphically displaying data for oral presentations reports and publications In this section w
136. data matrix use barplot in a more powerful way in which each bar represents a sample i e a column of the matrix and each bar is divided into a number of blocks representing the digits with different shadings in each of the blocks You do this as follows gt barplot digits angle seq 45 135 len 5 density 16 names sample names Using the optional argument angle seq 45 135 1en 5 establishes five angles for the shading fill for each of the five blocks in each bar with the angles equally spaced between 45 degrees and 135 degrees Setting the density optional argument at the value 16 causes the shading fill lines to have a density of 16 lines per inch If you want the density of the shading fill lines to vary cyclically you need to set density at a vector value with the vector of length five in the case of the digits data For example 143 CHAPTER 6 TRADITIONAL GRAPHICS Dot Charts 144 gt barplot digits angle seq 45 135 len 5 density 1 5 5 names sample names To produce a legend that associates a name to each block of bars use the legend argument with an appropriate character vector as its value For the digits data example you use legend digit names to associate a digit name with each of the blocks in the bars gt barplot digits angle c 45 135 density 1 5 5 names sample names legend digit names ylim c 0 270 To make room for the legend you usually need to increase the range of the vertical axis
137. data object you want to share is not on the working data you must specify the object s location in the search path with the where argument gt data dump halibut where data The inverse operation to the scan function is provided by the cat and write functions Similarly the inverse operation to read table is provided by write table The result of either write or cat is just an ASCII file with data in it There is no S PLUS structure written in Of the two commands write has an argument for specifying the number of columns and thus is more useful for retaining the format of a matrix By default write writes matrices column by column five values per line If you want the matrix represented in the ASCII file in the same form it is represented in S PLUS transform the matrix first with the t function and specify the number of columns in your original matrix EXPORTING DATA SETS gt mat Codd Weel Led Lis 1d 1 4 7 10 2 g 5 8 11 3 3 6 9 12 gt write t mat mat ncol 4 You can view the resulting file with a text editor or pager it contains the following three lines 147 10 AN 36 9 12 The cat function is a general purpose writing tool in S PLUS used for writing to the screen as well as writing to files It can be useful in creating free format data files for use with other software particularly when used with the format function gt cat format runif 100 fi11 T 0 261401257 0 556708986 0 184055283
138. dered function to create ordered factors The arguments to ordered are the same as those to Factors factor To create an ordered version of the intensity factor do gt ordered cl HiT Med Lo AT Ha LO Jy levels c Lo Med Hi 1 Hi Med Lo Hi Hi Lo Lo lt Med lt Hi The order relationship between the different levels is printed for an ordered factor along with the values The order of the values used in the levels argument determines the order placed on the levels Warning If you don t provide a levels argument an ordering will be placed on the levels corresponding to the default ordering of the levels by S PLUS 93 CHAPTER 4 DATA OBJECTS Creating To create categorical data out of numerical or continuous data use the cut Factors from function You provide either a vector of specific break points or an integer specifying how many groups to divide the numerical data into then cut Continuous creates levels corresponding to the specified ranges All the values falling in Data any particular range are assigned the same level For example the murder rates in the 50 states can be grouped into High and Low values using cut gt cut state x77 Murder breaks c 0 8 16 1 2 26 1 attr level L T Or thru 2 Bt thru 16 Pr l2 Laigetligiiteziegiegige 21 ee2hR li Li2te2dtiszs it it eve no KR Ph we ee The breakpoints must completely enclose the values you want included
139. describes how to use S PLUS to create simple plots To put S PLUS to work creating the many other types of plots see the chapters Traditional Graphics and Trellis Graphics Plotting engineering scientific financial or marketing data including the preparation of camera ready copy on a laser printer is one of the most powerful and frequently used features of S PLUS S PLUs has a wide variety of plotting and graphics functions for you to use The most frequently used S PLUs plotting function is plot When you call a plotting function an S PLUS graphics window displays the requested plot gt plot car miles The argument car miles is an S PLUS built in vector data object Since there is no other argument to plot the data are plotted against their natural index or observation numbers 1 through 120 Since you may be interested in your gas mileage you may want to plot car miles against car gals This is also easy to do with plot gt plot car gals car miles The result is shown in Figure 2 1 41 CHAPTER 2 GETTING STARTED car gals lo N N wo _ i R a PRS y j PA ii o d ee oO fe m T T T T T T 100 150 200 250 300 350 car miles Figure 2 1 An S PLUS plot 42 You can use many S PLUS functions besides plot to display graphical results in the S PLUS graphics window Many of these functions are listed in Table 2 4 and Table 2 5 which display respectively high level and low l
140. device can be changed gt trellis device motif gt plot symbol lt trellis par get plot symbol gt plot symbol col i 2 gt plot symbol col lt 3 gt trellis par set plot symbol plot symbol gt plot symbol lt trellis par get plot symbol gt plot symbol col bi 2 trellis par set sets an entire Trellis setting list not just some of the components Thus the simplest way to make a change is to get the current list alter it and then save the altered list The change lasts only as long as the device continues If the S PLUS session is ended the altered settings are removed 251 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS SUPERPOSING TWO OR MORE GROUPS OF VALUES ON A PANEL panel superpose 252 One common visualization task is superposing two or more groups of values in the same data region encoding the different groups in different ways to show the grouping For example we might graph leaf width against leaf length for two samples of leaves one from maple trees and one from oaks and use a circle as the plotting symbol for the maples and a plus for the oaks Superposition is achieved by the panel function panel superpose In addition the key argument of the general display functions can be used to show the group encoding Superposition is illustrated by using the data frame fuel frame For 60 automobiles Mileage is graphed against Weight for six types of vehicles described by the factor Type
141. divided by its width is a critical factor in determining how well a data display shows the structure of the data There are situations where choosing the aspect ratio to carry out banking to 45 degrees shows information in the data that cannot be seen if the graph is square that is has an aspect ratio of 1 More generally any time we graph a curve or a scatter of points with an underlying pattern that we want to assess controlling the aspect ratio is vital One advance of Trellis Graphics is the direct control of the aspect ratio through the argument aspect You can use the aspect argument to set the ratio to a specific value In figure 7 3 the aspect ratio has been set to 3 4 gt xyplot NOx E data gas aspect 3 4 Setting the aspect argument to xy banks line segments to 45 degrees Here is how it works Suppose x and y are data points to be plotted Consider the line segments that connect successive points The aspect ratio is chosen so that the absolute values of the slopes of these segments are centered on 45 degrees This is done in figure 7 4 by the expression gt xyplot NOx E data gas aspect xy We have used the data themselves in this example to carry out banking just to illustrate how it works The resulting aspect ratio is about 0 4 Ordinarily though we should bank based on a smooth underlying pattern in the data that is we should bank based on the line segments of a fitted curve You can do that with Trellis Graphic
142. e Figure 7 6 Boxplot 212 GENERAL DISPLAY FUNCTIONS stripplot A strip plot sometimes called a one dimensional scatterplot is similar to a boxplot in general layout but the individual data points are shown instead of the boxplot summary Figure 7 7 is a strip plot gt stripplot Type Mileage data fuel frame jitter TRUE aspect 1 Setting ji tter TRUE causes some random noise to be added vertically to the points to alleviate the overlap of the plotting symbols When jitter FALSE the default the points for each level lie on a horizontal line Van 8 8 2 Sporty e o o Bo 9 j Small es g 88 o Medium o 8 6 Large o i Compact 8 8388250 20 25 30 35 Mileage Figure 7 7 Strip plot 213 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS qq 214 The quantile quantile plot or qqplot is an extremely powerful tool for comparing the distributions of two sets of data The idea is quite simple quantiles of one data set are graphed against corresponding quantiles of the other data set The variable fuel frame Type has five levels gt table fuel frame Type Compact Large Medium Small Sporty Van 15 3 13 13 5 7 Figure 7 8 is a qqplot comparing the quantiles of mileage for compact cars with the corresponding quantiles for small cars gt qq Type Mileage data fuel frame aspect 1 subset Type Compact Type Smal11 The factor on the left side of the formula must have at least t
143. e look up the populations of the selected cities and create a vector to hold the information in thousands gt pop lt c 425 60 28 34 904 494 129 2967 347 741 7072 557 Use the usa function to plot the map gt usa Next add the circles representing the cities gt symbols city x select city y select circles sqrt pop add T The next two lines use the ifelse command to create a size vector for controlling the text size gt size lt ifelse pop gt 1000 2 1 gt size lt ifelse pop lt 100 5 size Taken together these two lines specify a size of 2 for cities with population greater than one million a size of 1 for cities with population between one hundred thousand and one million and a size of 0 5 for cities with population less than one hundred thousand Finally we add the text using the size just determined to specify the text size gt text city x select city y select city name select cex size 194 ADDING SPECIAL SYMBOLS TO PLOTS You can use any one of the following shapes as an argument to symbol with values as indicated Table 6 9 Using shapes as an argument to the function symbol Shape Values Cire es Vector or matrix with one column containing the radii of the circles squares Vector or matrix with one column containing the lengths of the sides of the squares rectangles Matrix with two columns giving widths and heights of rectangles Missing values are allo
144. e Software and the documentation are protected under applicable copyright laws international treaty provisions and trade secret statutes of the various states This Agreement grants you a personal limited non exclusive non transferable license to use the Software and the documentation This is not an agreement for the sale of the Software or the documentation or any copies or part thereof Your right to use the Software and the documentation is limited to the terms and conditions described therein You may use the Software and the documentation solely for your own personal or internal purposes for non remunerated demonstrations but not for delivery or sale in connection with your personal or internal purposes a if you have a single license on only one computer at a time and by only one user at a time however the user of the computer on which the Software is installed may make a copy for his or her exclusive use on a portable computer so long as the Software is not used on both computers at the same time b if you have acquired multiple licenses the Software may be used on either stand alone computers or on computer networks by a number of simultaneous users equal to or less than the number of licenses that you have acquired and c if you maintain the confidentiality of the Software and documentation at all times Persons for whom license fees have not been paid may not access or use the Software or any part thereof through
145. e a logical value a numeric value and a character value as follows gt c T 2 seven Ei TRUE sia i seven S PLUS coerces all three values to mode character because this is the 77 CHAPTER 4 DATA OBJECTS 78 most informative mode represented Similarly in the following example all the values are coerced to mode numeric 2 CLT Fe Pty 7 1 1 000000 0 000000 3 141593 7 000000 When logical values are coerced to integers TRUE values become the integer 1 and FALSE values become the integer 0 The same kind of coercion occurs when values of different modes are combined in computations For example logical values are coerced to zeros and ones in integer or numeric computations VECTORS VECTORS Creating Vectors The simplest type of data object in S PLUS is a vector A vector is simply an ordered set of values The order of the values is emphasized because ordering provides a convenient way of extracting parts of a vector If you want to create a vector you can do so in a number of ways You have seen that you can combine arbitrary values to create a vector with the c function and type in data from the keyboard or a data file with the scan function Other functions are useful for repeating values or generating sequences of numeric values The rep function repeats a value by specifying either a times argument or a length argument If times is specified the value is repeated the number
146. e a vector x with values ranging from 0 to 20 at intervals of 0 1 compute the vector y by evaluating the function at each value in x then plot y against x gt RAP senil gt y lt exp x 10 cos 2 x gt plot x y type 1 1 0 0 5 0 0 l 0 5 l Figure 6 3 Plot of exp x 10 cos 2x For a rougher plot use fewer points for a smoother plot use more 124 GETTING STARTED WITH SIMPLE PLOTS Creating Scatter Plots Scatter plots reveal relationships between pairs of variables You create scatter plots in S PLUS with the plot function applied to a pair of equal length vectors a matrix with two columns or a list with components x and y For example to plot the built in vectors car miles versus car gals use the following S PLUS expression gt plot car miles car gals When using plot with two vector arguments the first argument is plotted along the horizontal axis and the second argument is plotted along the vertical axis If x is a matrix with two columns you use plot x to plot the second column versus the first For example you could combine the two vectors car miles and car gals into a matrix called miles gals by using the function cbind gt miles gals lt cbind car miles car gals Then use gt plot miles gals 125 CHAPTER 6 TRADITIONAL GRAPHICS FREQUENTLY USED PLOTTING OPTIONS Plot Shape Multiple Plot Layout 126 This section tells you how to make
147. e data have been jittered before plotting as data frame ts The function as data frame ts takes one or more time series as arguments and produces a data frame with components named series which time and cycle The series component is the data from all of the time series combined into one long vector The time component gives the time associated with each of the points measured in the same units as the original series for example years and cycle gives the periodic component of the time for example 1 Jan 2 Feb Finally the which component is a 260 DATA STRUCTURES factor that tells which of the time series the measurement came from In the following example there is only one series hstart but in general as data frame ts can take many arguments gt as data frame ts hstart 1 5 series which time cycle 1 81 9 hstart 1966 000 Jan 2 79 0 hstart 1966 083 Feb 3 122 4 hstart 1966 167 Mar 4 143 0 hstart 1966 250 Apr 5 133 9 hstart 1966 333 May To graph housing starts for each month separately from 1966 to 1974 gt xyplot series time cycle data as data frame ts hstart type b xlab Year ylab Housing Starts by Month 261 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS MORE ON ASPECT RATIO AND SCALES PREPANEL FUNCTIONS prepanel Argument 262 Banking to 45 degrees is an important display method built into Trellis Graphics through the argument aspect The ranges of scales on the panels can
148. e it to the general display function as the argument panel For example if you have your own panel function mypanel you specify panel mypanel PANEL FUNCTIONS A Panel Function for a Multipanel Display Special Panel Functions A panel function is always a function of at least two arguments the first two are named x and y Suppose for the gas data that you want to use xyplot to graph NOx against E and use a as the plotting symbol for all observations except that for which NOx is a maximum in which case you want to use M There is no provision for xyplot to do this so you must write your own First let us write the panel function gt panel special lt function x y biggest lt y max y points x biggest y biggest pch points x biggest y biggest pch M The function points is a core graphics function It graphs individual points on a graph Its first argument x contains the coordinates of the points along the horizontal scale and its second argument y contains the coordinates of the points along the vertical scale The third argument pch gives the symbol used to display the points To show the result of giving panel special to xyplot try gt xyplot NOx E data gas aspect 1 2 panel panel special The panel function for this could also have been defined as part of the xyplot command gt xyplot NOx E data gas aspect 1 2 panel function x y biggest lt y max y point
149. e level plot AER dataz I M MN OND PN DM NRN NON OW S OY D XRO ERIR S LSE ff i Ww N AD ESE See i Wa w O PN i ROO Ny y ne HAN N KS V KS datay datax Figure 7 19 3D wireframe plot 225 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS cloud 226 A static 3 D plot of a scatter of points is typically not effective because the depth cues are insufficient to give a strong 3 D effect Still on rare occasions such a plot can be useful sometimes as a presentation or teaching tool Figure 7 20 is a 3 D scatterplot of the first three variables in the data frame fuel frame gt cloud Mileage Weight Disp data fuel frame screen 1ist z 30 x 60 y 0 xlab W ylab D zlab M The behavior of the argument screen is the same as that for wireframe We have used three additional arguments to specify scale labels such labeling will be discussed in the section Scales and Labels page 242 Figure 7 20 3D scatterplot or cloud GENERAL DISPLAY FUNCTIONS The Display Functions and Their Formulas The following listing of the general display functions and their formulas is instructive because it shows certain conventions and consistencies in the formula mechanism Graph One Numerical Variable Against Another xyplot numericl numeric2 Compare the Sample Distributions of Two or More Sets of Data bwp
150. e show you how to use S PLUS to make these plots We also show you how to make another type of chart called a dot chart that is less widely known but often more useful than the more familiar bar plots and pie charts We illustrate each of the above types of plots with the following 5 x 3 matrix digits gt digits sample 1 sample 2 sample 3 digit 1 20 15 30 digit z 16 17 30 digit 3 24 16 17 digit 4 l 24 20 digit 5 19 13 28 For convenience in what follows create this matrix and take the row labels and the column labels from the matrix as follows gt digits lt tatrix c 20 15 30 16 17 30 24 16 17 21 24 20 19 13 28 nrow 5 byrow T gt dimnames digits lt list paste digit 1 5 sep _ paste sample 1 3 sep gt digit names lt dimnames digits 1 gt sample names lt dimnames digits 2 Bar Plots The function barplot is a flexible function for making bar plots The simplest use of barp1ot is with a vector or a single column of a matrix For example using the first column of digits gives the result in figure 6 11 gt barplot digits 1 names digit names 142 MAKING BAR PLOTS DOT CHARTS AND PIE CHARTS digit 1 digit 2 digit 3 digit 4 digit 5 Figure 6 11 A bar plot of the digits data In this case the height of each bar is the value usually a count occurring in the corresponding component of the vector or matrix column To make a bar plot of the entire digits
151. e vector function This function takes two arguments the first specifies the mode and the second specifies the length gt vector logical 3 fl FFF The functions logical integer numeric complex and character generate vectors of the named mode Each of these functions takes a single argument which specifies the length of the vector Thus logical 3 generates the same initialized vector as above Table 4 1 Useful functions for creating vectors 80 Function Description Examples scan read values any mode scan scan data c combines values any mode C 1 3 2 8 G yes no rep repeat values any mode rep NA 5 rep c 1 2 3 numeric sequences lba desl seq numeric sequences seg pi pi 5 vector initialize vectors vector complex 5 logical initialize 1ogical vectors logical 3 VECTORS Table 4 1 Useful functions for creating vectors Function Description Examples numeric initialize numeric vectors numeric 4 complex initialize complex vectors complex 5 character initialize character vectors character 6 Naming Vectors such as case labels or value identifiers with each value of the vector To create a vector with named values you assign the names with the names function You can assign names to vector elements to associate specific information gt numbered letters lt letters gt names numbered letters lt paste obs 1 26 sep gt numbered lett
152. e x axis is 2 The default text in the strip label for a numeric conditioning variable is the name of the variable This can be illustrated with the code below which displays the ethanol data introduced in the section A Data Set ethanol page 237 gt xyplot NOx E C data ethanol The default text in the strip label for a factor conditioning variable is the name of the factor level for the panel The barley data introduced in the section A Data Set barley page 230 illustrate this gt dotplot variety yield year site data barley The name of the factor for example site does not appear because seeing the names of the levels is typically enough to convey the name of the factor Thus the text comes from the names given to variables and factor levels in the data sets that are plotted If we want to change the text we can change the names For example if we want to change the long label University Farm to U Farm then we can change the names of the levels of the factor site as follows gt levels barley site 1 Grand Rapids Duluth University Farm 4 Morris Crookston Waseca Before barley can be used as an argument to a replacement function it must first be assigned locally gt barley lt barley gt levels barley site 3 lt U Farm SCALES AND LABELS par strip text Argument strip Argument gt levels barley site 1 Grand Rapids Duluth U Farm 4 Morris
153. eating Scatter Plots Frequently Used Plotting Options Plot Shape Multiple Plot Layout Titles Axis Labels Axis Limits Logarithmic Axes Plot Types Line Types Plotting Characters Controlling Plotting Colors Interactively Adding Information to Your Plot Identifying Plotted Points Adding Straight Line Fits to a Current Scatter Plot Adding New Data to a Current Plot Adding Text to Your Plot Making Bar Plots Dot Charts and Pie Charts Bar Plots Dot Charts Pie Charts Visualizing the Distribution of Your Data Boxplots Histograms Density Plots Quantile Quantile Plots Visualizing Higher Dimensional Data Multivariate Data Plots 121 122 122 123 125 126 126 126 128 129 129 130 130 133 134 135 137 137 138 138 140 142 142 144 146 147 147 148 149 150 154 154 119 CHAPTER 6 TRADITIONAL GRAPHICS 120 Scatterplot Matrices Plotting Matrix Data Star Plots Faces 3 D Plots Contour Perspective and Image Plots Contour Plots Perspective Plots Image Plots Customizing Your Graphics Low level Graphics Functions and Graphics Parameters Setting and Viewing Graphics Parameters Controlling Graphics Regions Controlling the Outer Margin Controlling Figure Margins Controlling the Plot Area Controlling Text in Graphics Controlling Text and Symbol Size Controlling Text Placement Controlling Text Orientation Controlling Line Width Plotting Symbols in Margin Text in Figure Margins Controlling Axes Enabling and Disabl
154. ected straight line segments e As both points and lines with points isolated e As overstruck points and lines points not isolated e As a vertical line for each data point this is known as a high density plot FREQUENTLY USED PLOTTING OPTIONS e Asa stairstep plot e As an empty plot with axes and labels but no data plotted The method used for plotting data on a graph is called the graph s plot type Scatter plots typically use the first plot type while time series plots typically use the second In this section we give examples of the other plot types You choose your plot type by using the optional argument type The possible values for this argument correspond to the choices listed above Table 6 1 Possible values of the plot type argument Setting Plot type type p points type 1 lines type b both points and lines type 0 lines with points overstruck type h high density plot type s stairstep plot type n no data plotted Different graphics functions have different default choices For example plot and matplot use the default type p while ts plot uses the default type 1 Although you can use any of the plot types with any plotting function some combinations of plot function and plot type may result in an ineffective display of your data The option type n is useful for obtaining precise control over axis limits and box line types For example you might want to have the axes a
155. ects 38 To display more than one element at a time use the c function within the characters The following displays the second and fifth elements of x gt XO 5 J 1 14 5 Use negation to display all elements except a a specified element or list of elements For instance x 4 displays all elements except the fourth gt xL i S14 8 5 Similarly x c 1 3 displays all elements except the first and third PRLS I 1 14 9 5 A more advanced use of subsetting uses a logical expression within the characters Logical expressions divide a vector into two subsets one for which a given condition is true and one for which the condition is false When used as a subscript the expression returns the subset for which the condition is true For instance the following expression selects all elements with values greater than 8 gt ALB 1 14 9 In this case the second and fourth elements of x with values 14 and 9 meet the requirements of the logical expression x gt 8 and so are displayed As usual in S PLUS you can assign the result of the operation to another object For example you could assign the above selected subset to an object named y and then display y or use y in subsequent calculations gt y lt x x gt 8 ae 1 14 9 In the next section you will see that the same principles also apply to matrix data objects although the syntax is a little more complicated because there are two dimensions
156. ed from atomic vectors in two basic ways by allowing complete S objects as elements or by building new data classes from old using slots Objects that contain other S objects as elements are called recursive objects and include such common S PLUS objects as lists and data frames A ist is a vector for which each element is a distinct S object of any type A data frame is essentially a list in which each of the elements is an atomic vector and all of the elements have the same length BASIC DATA OBJECTS Coercion of Values A list is a completely flexible means for representing data in earlier versions of S it was the standard means of combining arbitrary objects into a single data object Much the same effect can be created however using the notion of slots With slots you can store any information you need to uniquely define your data object that is the object s attributes in one or more slots The virtual class vector extends all of the atomic vector classes New vector classes can be created by defining class specific methods for length and a few other functions Next in complexity after the atomic vectors are the structures which extend vectors by imposing a structure typically a multi dimensional array upon the data The simplest structure is the two dimensional matrix A matrix starts with a vector then adds the information about how many rows and columns the matrix contains This information the
157. ed to a graph position takes a vector of four numbers the first two numbers are the coordinates of the lower left corner of the graph rectangle and the second two numbers are the coordinates of the upper right corner The argument more has been give a value of T which says that more drawing is coming Notice that in the above example the graph rectangles overlap somewhat Here is the reason The graph contains margins empty space around the edges of the graph But in arranging graphs on a page we might well want to overlap margin space to use the page space as efficiently as possible The following code illustrates another argument split that provides a different method for arranging the plots on the page attach fuel frame scatter plot lt xyplot Mileage Weight other plot lt xyplot Mileage Disp detach print scatter plot split ct1 1 1 2 more T gt gt gt gt gt gt print other plot split c 1 2 1 2 ARRANGING SEVERAL GRAPHS ON ONE PAGE split takes a vector of four values The last two define an array of subregions in the graphics region In our example the array has one column and two rows for both plots The first two values of split prescribe the subregion in which the current plot is to be drawn 35 4 o L oOo oo ie 30 4 o O 2 O O 2 016 CO lole oOo 25 4 o o o L co ooo O ie O ie oo O
158. el argument 246 263 Panel functions 202 panel functions 246 panel function 246 panel variables 230 panel loess function 247 panel special function 247 panel superpose function 252 254 panel xyplot function 246 248 249 par function 126 par strip text argument 245 parallel function 222 paste function 84 pch argument 134 246 pch parameter 248 pdf graph function 203 pie function 146 piechart function 218 plot 126 plot area 170 plot function 122 plot types 131 plot line function 251 plot symbol function 251 plots high level functions for 42 low level functions for 43 plotting characters 134 points function 139 247 248 polygon function 248 position argument 228 postscript argument 203 precedence of operators 29 prepanel argument 262 prepanel loess function 263 print function 90 prompt screen function 186 Prompts continuation 312 Prompts S Plus 312 pscales argument 243 pty argument 126 pugetN data set 161 333 INDEX Q qq function 214 qqline function 151 qqmath function 215 qqnorm function 151 qqplots 150 qqunif function 151 Quitting S PLUS 9 R rbind function 82 99 106 107 read table function 69 70 99 recalling previous commands 14 rectangular plot shape 126 reorder factor function 236 rep function 79 rm function 25 Rows function 255 S S_CLEDITOR environment variable 12 S_CMDFILE variable 316 scales and labels of graphs 242 scales argument 243 scan function 67 69 scatterplo
159. els or different categorical values contained in the data and indices which point to the appropriate level for each data point The different levels of a factor are stored in an attribute called levels Factor objects are a natural form for categorical data in an object oriented programming environment because they have a class attribute that allows specific method functions to be developed for them For example the generic print function uses the print factor method to print factors If FACTORS AND ORDERED FACTORS you override print factor by calling print default you can see how a factor is stored internally gt print default fuel frame Type 11 14444444444444555555555111 PHGILLLELL LEP LLalSsa esa a aes a233 3 lJ 2226608 606 a attr levels 1 Compact Large Medium Small Sporty Van attri elase iy faeror The integers serve as indices to the values in the levels attribute You can return the integer indices directly with the codes function gt codes fuel frame Type 11 14444444444444555555555111 CG LILDELULL EL Le See Seeees3 222 a S 51 22 2eoe Goo 6 Or you can examine the levels of a factor with the levels function gt levels fuel frame Type 1 Compact Large Medium Small Sporty Van The print factor function is roughly equivalent to gt levels fuel frame Type codes fuel frame Type except the quotes are dropped To get the number of cases of e
160. ent Older FALSE 000 000 000 294 000 000 APPLYING FUNCTIONS TO SUBSETS OF A DATA FRAME Call lm formula Number Start data data Coefficients Intercept Start 6 071257 0 1191617 Degrees of freedom 9 total 7 residual Residual standard error 1 170313 Kyphosis absent Older TRUE As in the above example you should define your FUN argument simply If you need additional parameters for the modeling function specify them fully in the call to the modeling function rather than attempting to pass them in through a argument Warning Again as with aggregate you need to be careful that the function you are applying by to works with data frames and often you need to be careful that it works with factors as well For example consider the following two examples gt by kyphosis kyphosis Kyphosis function data apply data 2 mean kyphosis Kyphosis absent Kyphosis Age Number Start NA NA 3 75 12 60938 kyphosis Kyphosis present Kyphosis Age Number Start NA 97 82353 5 176471 7 294118 Warning messages 1 64 missing values generated coercing from character to numeric in as double x 2 17 missing values generated coercing from character to numeric in as double x gt by kyphosis kyphosis Kyphosis function data apply data 2 max 113 CHAPTER 5 DATA FRAMES 114 Error in FUN x character Dumped Numeric summary undefined for mode The function
161. ent sizes A parameter equivalent to cex is csi which gives the height interline space of text with the current cex measured in inches Changing either cex or csi changes the other The csi parameter is useful when creating the same graphics on different devices since the absolute size of graphics is device dependent When you add text to the plot area you specify its coordinates in terms of the plotted data in essence S PLUS treats the added text as a data point If axes have been drawn and labeled you can read the coordinates off the plot If not you can obtain the desired coordinates by interpolating from the values in the layout parameter usr For example figure 6 28 has an x axis with values from 0 to 1 and a y axis with values running from approximately 2 5 to 1 To add the text Different size symbols we could specify any point within the grid determined by these x and y limits as follows gt text 4 7 Different size symbols By default the text is centered at the specified point However you can left or right justify the text at the specified point by using the general parameter adj The adj parameter determines the fraction of the text string that appears to the left of the specified xy coordinate The default is 0 5 Set adj 0 to left justify adj 1 to right justify 175 CHAPTER 6 TRADITIONAL GRAPHICS If no axes have been drawn and you can t determine the coordinates by looking at your graphic you
162. ers bs1 obs2 obs3 obs4 obs5 obs6 obs7 obs8 obs9 bs10 0bs11 mye Ee igh re Ser pr a F e ae ES obs12 obs13 obs14 obs15 obs16 obs17 obs18 obs19 obs20 obs21 st ii a mos g mape mpm un Rpa gn obs22 obs23 obs24 obs25 obs26 mye y myn map os In the above example the first 26 integers are converted to character strings by the paste function and then attached to each value The quotes around the numbers are suppressed in the printing The actual values of the vector numbered letters are character strings each containing one letter If you specify too many or too few names for the values S PLUS gives an error message 81 CHAPTER 4 DATA OBJECTS MATRICES Creating Matrices 82 Matrices are used to arrange values by rows and columns in a rectangular table For data analysis different variables are usually represented by different columns and different cases or subjects are represented by different rows Thus matrices are convenient for grouping together observations that have been measured on the same set of subjects and variables Matrices differ from vectors by having a Dim slot which specifies the dimension of the matrix that is the number of rows and columns Any vector can be turned into a matrix simply by specifying its Dim slot as we see in the examples below To create a matrix from an existing vector use the dim function to set the Dim slot To use dim you assign a vector of two integers specifying the
163. es the UNIX command Ip Ipr etc used to send files to a PostScript printer S PRINTGRAPH_ONEFILE Determines whether plots generated by the postscript function are accumulated in a single file TRUE or whether each plot is put in a separate EPS file This environment variable sets the default for the onefile arguments to pS options and postscript S_PRINT_ORIENTATION Specifies the orientation of the graphic as landscape or portrait Determines the default value of the horizontal argument to ps options and printgraph S_SHELL Specifies the shell used during shell escapes that is commands issued from the escape character The default value is the value of SHELL S_SILENT_STARTUP Disable printing of copyright version messages S_WORK Specifies the location of the working data directory that is the directory in which S PLUS creates and reads data objects Equivalent to SWORK VISUAL Sets the command line editor to either emacs or vi Overridden by S CLEDITOR if it contains a valid value Many of the variables in this section take effect if you set them to any value and do not take effect if you do not set them so you may leave them unset without harm For example to set S_SILENT_STARTUP you can enter setenv S SILENT_STARTUP X on the command line and S PLUS will not print its copyright information on startup because the variable S_SILENT_STARTUP has a value any value User code can check the current values fo
164. evel plotting functions High level plotting functions create a new plot complete with axes while low level plotting functions typically add to an existing plot Table 2 4 Common high level plotting functions barplot hist Bar graph histogram boxplot Boxplot GRAPHICS IN S Plus Table 2 4 Common high level plotting functions Continued brush contour image persp symbols coplot dotchart faces stars map pairs pie plot qqnorm qqplot scatter smooth tsplot usa abline axis box Brush pair wise scatter plots spin 3D axes 3D plots Conditioning plot Dotchart Display multivariate data Plot all or part of the U S part of the maps library Plot all pair wise scatter plots Pie chart Generic plotting Normal and general QQ plots Scatter plot with a smooth curve Plot a time series Plot the boundary of the U S Table 2 5 Common low level plotting functions Add line in intercept slope form Add axis Add a box around plot 43 CHAPTER 2 GETTING STARTED Quick Hard Copy Using the Graphics Window 44 Table 2 5 Common low level plotting functions Continued contour image Add 3D information to plot persp symbols identify Use mouse to identify points on a graph legend Add a legend to the plot lines points Add lines or points to a plot mtext text Add text in the margin or in the plot stamp Add date and time information to t
165. f the scales and then passes along the changes of the line segments that will make up the plotted curve Any of the component names can be missing from the list if either dx or dy is missing the other must be as well When dx and dy are present they give the information needed for banking to 45 degrees as well as the instruction to do so thus the aspect argument should not be used as an argument when dx and dy are present The multipanel conditioning of Trellis Graphics has three more arguments that assist in the control of the layout visual design and labeling The argument between puts space between adjacent columns or adjacent rows The argument skip allows a panel position to be skipped when packets are sent to the panels for drawing The page argument can add page numbers text or even graphics to each page of a multipage Trellis display 263 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS between Argument skip Argument 264 To graph the barley data gt barley plot lt dotplot site yield variety year data barley aspect xy layout c 2 5 2 gt barley plot In the resulting two page Trellis display yield is plotted against site given variety and year The layout 2 columns 5 rows and 2 pages has put the measurements for 1931 on the first page and for 1932 on the second page The display will be saved in barley plot for future editing The panels can be squeezed into one page by changing layout from 2 5 2
166. from the device using xgetrgb gt my colors lt xgetrgb type images 281 CHAPTER 8 WORKING WITH GRAPHICS DEVICES The type argument to xgetrgb should be appropriate for the type of graph being reproduced Here we use type images because we want the colors used to produce an image plot The default type is polygons which is appropriate for barplots histograms and pie charts and is usually also suitable for scatter plots and line plots such as time series plots Other valid types are lines text and background 5 Send the color specification to update the graphics window s printer options gt ps options send image colors my colors The image colors argument assigns colors for image plots Use the colors argument to assign colors for all other plots Use the background argument to specify the background color You can of course use the results of xgetrgb as arguments without first assigning them to an S PLUS object as is shown below gt ps options send image colors xgetrgb images colors xgetrgb lines background xgetrgb background 6 Select the Print button to print the colored graphic To create color graphics with the postscript function you follow essentially the same steps as in the following example 1 Start the graphics window gt motif 2 Set the desired color scheme using Options Color Scheme from the motif menu 3 Capt
167. fy a different one On most devices there are eight distinct line types figure 6 7 illustrates the various types Figure 6 7 Line types 133 CHAPTER 6 TRADITIONAL GRAPHICS If you specify a higher value S PLUS produces the line type corresponding to the remainder on division by the number of line types For example if you specify 1ty 26 on the graphsheet graphics device S PLUS produces the line type shown as Ity 2 Warning lty 2 The value of 1ty must be an integer This contrasts with the value of type which is of character mode and is therefore enclosed in quotes For example to plot the time series hal ibut cpue using plot with gt plot halibut cpue type 1 1ty 2 Plotting Characters 134 When your plot type involves points you can choose the plotting character for the points By default the plotting character is usually a circle 0 depending on your graphics device and the plot function you use For matplot the default plotting character is the number 1 because matp1ot is often used to plot more than one time series or more than one vector In such cases more than one plotting character is needed to distinguish the separate graphs one plotting character for each time series or vector to be plotted The default plotting characters in such cases are the numbers 1 2 However you can choose alternative plotting characters when making a points type plot with any of the above plotting funct
168. g vi Do not include commands that start a graphics device 2 In S PLUS start a graphics device then use source to execute the S PLUS commands in your file gt motif gt source plotcmds asc 3 View your graphs If you want to change something edit your file with an editor 287 CHAPTER 8 WORKING WITH GRAPHICS DEVICES 4 Once you are satisfied with your plots start a hard copy graphics device source your plotting commands and then turn the hard copy graphics device off gt postscript gt source plotcmds asc gt dev off 5 Save your file of graphics commands if you will need to reproduce the plots in the future 288 GRAPHICS WINDOW DETAILS GRAPHICS WINDOW DETAILS Basic Terminology Opening and Removing Graphics Devices This section describes in detail how to use the motif graphics device This device is available only on machines that run either the X Window System Version 11 X11 The motif device is available on all UNIX platforms The motif device lets you interactively change the color specifications of your plots and immediately see the results and also interactively change the specifications that are used to send the plot to a printer In this section we assume you are familiar with your particular window system In particular we assume you know how to start your window system and set your display so that X11 applications can display windows on your screen Fo
169. graphics functions including points which adds a scatter of points to an existing plot and abline which adds a specified line to an existing plot Low level graphics functions unlike high level graphics functions do not automatically generate a new coordinate system Thus you can use several low level graphics functions in succession to create a single finished graphic Some functions such as image and contour which are described in the section 3 D Plots Contour Perspective and Image Plots page 158 can be used as either high or low level graphics functions Graphics parameters add to the flexibility of graphics by controlling virtually every detail of a page of graphics There are about 60 parameters which fall into four classes e High level graphics parameters can be used only as arguments to high level graphics functions An example is x1im which gives the approximate limits for the x axis Layout graphics parameters can be set only with the par function These parameters typically affect quantities that concern the page as a whole The mfrow parameter is an example this states how many rows and how many columns of plots are placed on a single page General graphics parameters may be set either in a call to a graphics function or with the par function When used in a graphics function the change is valid only for that function call If you set a parameter with par the change lasts until you change it again Graphics p
170. green specifies a list of 13 colors just as our previous example did In this example however only 3 entries in the X server s color table are allocated rather than the 13 allocated by the previous example CUSTOMIZING YOUR S PLUS SESSION Setting S PLUS Options 312 Setting Environment Variables 314 Customizing Your Session at Start up and Closing 316 Setting S_FIRST 316 Customizing Your Session at Closing 317 Using Personal Function Libraries 318 Creating an S Chapter 318 Placing the Chapter in Your Search Path 319 Specifying Your Working Directory 320 Specifying a Pager 321 Environment Variables and printgraph 322 Setting Up Your Window System 324 Setting X11 Resources 324 S PLUS X11 Resources 325 Common Resources for the Motif Graphics Device 325 S PLUS offers a number of ways to customize your session You can set options specifying how S PLUS displays data and other information create your own library of functions or load C or Fortran code You can even define a function to set these options each time you start S PLUS and another function to clean up each time you end a session This chapter describes changes that apply only to your S PLUS session To install them for every user on your system talk with your system administrator or see the procedures in the Installation and Maintenance Guide 311 CHAPTER 9 CUSTOMIZING YOUR S PLUS SESSION SETTING S PLus OPTIONS 312 Options in S PLUS serve much the
171. gure 7 18 Level plot GENERAL DISPLAY FUNCTIONS wireframe Wireframe displays can be quite useful for displaying f x y when we have no need to study conditional dependence Figure 7 19 is a 3 D wireframe plot of the gauss surface gt wireframe dataz datax datay data gauss drape F screen 1ist z 45 x 60 y 0 The argument screen is a list The three components of the list x y and z trefer to screen axes The first component is horizontal and the second is vertical both in the plane of the screen The third component is perpendicular to the screen The surface is rotated about these axes in the order given in the list Here is how it worked for figure 7 19 The surface began with datax as the horizontal screen axis datay as the vertical and dataz as the perpendicular The origin was at the lower left in the back First the surface was rotated 45 degrees about the perpendicular screen axis where a positive rotation is counterclockwise Then there was a 60 degrees rotation about the horizontal screen axis where a negative rotation brings the picture at the top of the screen away from the viewer and the bottom toward the viewer Finally there was no rotation about the vertical screen axis had there been one with a positive number of degrees then the left side of the picture would have moved toward the viewer and the right away If drape T a color encoding is added to the surface using the same encoding method of th
172. gure regions interactively with your mouse When you type 186 CONTROLLING MULTIPLE PLOTS gt Split sereant pronpt screent S PLUS responds with Click at 2 opposite corners Now move your mouse cursor into your graphics window and click at two opposite corners After you do this the region you indicated will be colored in and labeled with the number 1 This is the first screen In the command window S PLUS responds again with Click at 2 opposite corners Repeat this action until you have created all the screens you want then click on the right mouse button Once you have divided up the graphics device into separate screens use the screen function to move between screens See the help file for split screen for more information on using these functions Warning If you want to issue a high level plotting command in a screen that already has a plot in it but you don t want the plots in the other screens to disappear use the erase screen function before calling the high level plotting command 187 CHAPTER 6 TRADITIONAL GRAPHICS OVERLAYING FIGURES High Level Functions That Can Act as Low Level Functions Overlaying Figures by Setting new TRUE It is often desirable to include more than one data set in the same plot Simple additions can be made with the lines and points functions The matplot function plots a number of columns of data at once These all assume however that the dat
173. h the file argument to name the output file then make a scatter plot and time series plot using dev off to append the second plot to the file and turn off the hpg1 device After sending the files to the plotter we remove them gt hpgl file hpgl com gt plot corn rain corn yield gt ts plot lynx gt dev off Append the last plot to hpgl com PRINTING YOUR GRAPHICS Creating PDF Graphics Files Managing Files from Hard Copy Graphics Devices gt lpr P hpgl hpgl com gt rm hpgl com In this example two plots are written to the file hpg1 com We then escape to the UNIX shell and issue the 1pr command to send the file to the plotter The command for sending your file to the plotter may be different for your system Finally we escape to the UNIX shell and issue the rm command to remove the file The Portable Document Format PDF is a popular electronic publishing format closely related to PostScript You can create PDF graphics files in S PLUS using the pdf graph graphics device You can create a PDF graphics file simply by calling pdf graph with the desired output file name gt pdf graph mygraph pdf gt plot corn rain corn yield main Another corny plot gt dev off Once you ve created your PDF graphics you can view them using Adobe s Acrobat Reader available on most personal computers and some UNIX platforms See the pdf graph help file for more details With all hard
174. he Import Filter Notes on Importing Files Notes on Importing ASCII Delimited ASCII Files Notes on Importing FASCI Formatted ASCII Files Notes on Importing Excel Files Notes on Importing Lotus Files Notes on Importing dBase Files Notes on Importing Data From Enterprise Databases Other Data Import Functions Reading Vector and Matrix Data with scan Reading Data Frames Exporting Data Sets Exporting Data to S PLUS Other Export Functions Chapter 4 Data Objects Basic Data Objects Coercion of Values Vectors Creating Vectors Naming Vectors Matrices Creating Matrices Naming Rows and Columns Arrays Creating Arrays Lists Creating Lists List Component Names 47 47 48 50 53 54 59 62 62 63 64 64 64 64 67 67 69 71 72 72 75 76 77 79 79 81 82 82 84 85 86 87 87 89 CONTENTS Factors and Ordered Factors Creating Factors Creating Ordered Factors Creating Factors from Continuous Data Chapter 5 Data Frames The Benefits of Data Frames Creating Data Frames Combining Data Frames Combining Data Frames by Column Combining Data Frames by Row Merging Data Frames Applying Functions to Subsets of a Data Frame Adding New Classes of Variables to Data Frames Chapter 6 Traditional Graphics Introduction Getting Started with Simple Plots Plotting a Vector Data Object Plotting Mathematical Functions Creating Scatter Plots Frequently Used Plotting Options Plot Shape Multiple Plot Layout Titles Axi
175. he c function and type 2 lt lt 04 3 2 1 23 CHAPTER 2 GETTING STARTED Storing Data Objects Listing Data Objects 24 You type lt by typing two keys on your keyboard the less than key lt followed by the minus character with no intervening space To store the vector containing the integers 1 through 10 in y type gt y lt 1 10 The following assignment expressions using the operator are identical to the two previous assignments above gt x 0 4 3 2 1 gt y 1 10 The lt form of the assignment operator is highly suggestive and readable so the examples in this manual use the arrow The is easier to type and matches the assignment operator in C so many users prefer it However the S language also uses the operator inside function calls for argument matching if you want assign the value of an argument inside a function call you must use the lt operator Data objects in your working directory are permanent They remain even if you quit S PLUS and start S PLUS again later If you do not start S PLUS in a valid chapter directory S PLUS creates a temporary working directory for you You can also change the UNIX directory location where S PLUS objects are stored by using the attach function See the attach help file for further information You can specify the working directory explicitly through the environment variable S_WORK which can specify one directory or a colon sepa
176. he graphics window is the Graph menu title Move the pointer to this title and click to call up a menu with the following items e Redraw Redraws the graph that appears in the pane of the graphics window e Copy Creates a copy of the current graphics window as shown in figure 8 3 The copy has a title bar a menu bar a pane and a footer just like the original The title in the title area is S PLUS Copy The menu bar in a copy of the graphics window does not contain an Options menu title only the Graph and Help menu titles Print Converts the current plot in the graphics window to either a PostScript or LaserJet file and then sends this file to your printer Choosing Print is not equivalent to typing the printgraph command in the S PLUS window The printgraph command uses S PLUS environment variables to determine printing defaults whereas Print uses the specifications shown in the Printing dialog box When you select Print a message is displayed in the footer of the graphics window telling you what kind of file was created and the command that was used to route this file to the printer See the section The Options Menu page 295 for a description of how to set the defaults for printing 294 GRAPHICS WINDOW DETAILS Figure 8 3 A copy of the motif graphics window The Options The Options menu title is the second menu title in the menu bar of the Menu graphics window Move the pointer to this title and click to
177. he plot title Add title x axis labels y axis labels and or subtitle to plot Each graphics window also offers a simple straightforward way to get a hard copy of the picture you have composed on the screen the Print option on the Graph pull down menu You can exercise even more control over your instant hard copy such as specifying whether the copy is in landscape or portrait orientation which printer the hard copy is sent to and for HP Laserjet systems the dpi dots per inch resolution of the printout You can use a mouse to perform basic functions in a graphics window such as redrawing or copying a graph The standard graphics window also known as the motif device Figure 2 2 has a set of pull down menus providing a mouse based point and click capability for copying redrawing and printing hard copy on a printer In general you select actions by pulling down the appropriate menu and clicking the left mouse button GRAPHICS IN S Plus Options Redraw Figure 2 2 The motif window Copying A Graph Fach graphics window provides a mechanism to copy a graph on the screen Redrawing A Graph Multiple Plot Layout This option allows you to freeze a picture in one state but continue to modify the original The motif device has a Copy choice under the Graph pull down menu on the menu bar Each graphics window provides a mechanism to redraw a graph This option can be used to refresh the pictu
178. he previous chapter The Trellis functions are particularly geared towards multipanel and multipage plots This chapter describes the Trellis system based on traditional S PLUS graphics Open a Trellis Graphics device with the command trellis device If no device is open Trellis commands will open one by default but by using this command you ensure the open graphics device is compatible with Trellis Graphics gt trellis device The Trellis library has a collection of general display functions that draw different types of graphs For example xyplot makes x y plots dotplot makes dot plots and wireframe makes 3 D wireframe displays The functions are general because they have the full capability of Trellis Graphics including multipanel conditioning These functions are introduced in the the section General Display Functions page 210 There is a set of common arguments that all general display functions employ The usage of some of these arguments varies but each has a common purpose across all functions Many of the general display functions also have arguments that are specific to the types of graphs that they draw The common arguments which are listed in the section Summary of Trellis Functions and Arguments page 266 are discussed in many sections Panel functions are a critical aspect of Trellis Graphics They make it easy to tailor displays to your data even when the displays are quite complicated ones with many panels
179. he second plot uses the left half of the bottom two thirds of the device and the last plot uses the right half of the bottom two thirds The example begins with the frame function which tells the graphics device to begin a new figure You use frame frequently when creating graphics from low level graphics functions gt frame 185 CHAPTER 6 TRADITIONAL GRAPHICS gt par fig c 0 1 66 1 mar e 5 4 2 2 1 gt plottx gt par fig c 0 5 0 66 gt plot x gt part figect 5 1 0 66 gt ploty yaxs a gt part tig c00 1 0 1 5 10 15 20 0 0 50 100 150 200 Index roy 78 7B a a o o o amp 0 amp i a g ap 67 7 o amp sq E o amp g oe a ga oo So oo Bo o oo o o 9 93 5a 9 54 a gt a aan a4 aa gt z a og 2 Dg Ooo 28 2 o 4 o S o 0o00 po 9 F S o o o 0 ag oa o B OQ p oo 9 4 oo GOO Goa Dg oa 00 25 WwW af WZ ou oa oo On oo oo 7 oo 6 e oo TT Ga o0 Ga o0 oo E oo oo oo w T T T if T if T T T T f 0 5 10 15 20 0 50 100 150 200 x Index Figure 6 31 Controlling the layout of multiple plots on one page Once you create one figure with fig you must use it to specify the layout of the entire page of plots When you complete your custom plot reset fig to c 0 1 0 1 An easy way to use fig with a display device is through the functions split screen and prompt screen These functions used together let you specify the fi
180. hemes or the old color schemes will be unavailable 327 CHAPTER 9 CUSTOMIZING YOUR S PLUS SESSION INDEX INDEX operator 27 Symbols argument 117 First function 316 First function 319 Last function 317 A abline function 138 164 About Multipanel Display 230 add argument 188 adding a legend 140 adding new data to a plot 138 adding straight lines to a scatter plot 138 adding text to existing plot 140 add on modules 2 adj parameter 175 aggregate function 110 along argument 80 angle argument 143 aov function 203 argument 117 arguments abbreviating 31 Arithmetic operators 26 array function 86 arrays 85 arrows function 192 as data frame function 99 as data frame array function 259 as data frame ts function 260 ASCII files 62 ASCII specifying a format string 62 aspect argument 208 244 aspect function 262 at argument 179 223 attach function 24 206 319 auto dat data set 69 auto stats data set 163 axes parameter 180 axis function 180 B bar fill parameter 251 barchart function 217 barley data set 230 barplot function 142 between argument 263 border argument 257 breaks argument 94 bwplot function 212 by function 110 113 byrow argument 83 C c function 25 calling functions 25 car miles dataset 132 cat function 72 73 categorical variables 90 cbind function 82 99 104 125 cex argument 216 cex parameter 174 245 248 Changing the Text in Strip Labels 244 char
181. here are seven symbols providing for up to seven groups If there are two groups the first two symbols are used if there are three groups the first three symbols are used and so forth The setting for the default line types is superpose line gt trellis par get superpose line col HJIT Silty 1 L238 45 6 F lwd H LL11 1IL There are seven line types A call to trellis settings will show the seven symbols in the first panel and the seven line types in the second panel of the top row The function panel superpose can be used with any general display function where superposing different groups of values makes sense For example we can superpose data sets with xyplot or with dotplot or with many of the other general display functions By achieving superposition through the panel function we do not need a special superposition general display function for each type of graphical method which makes things much simpler To illustrate this the following code produces a dot plot of the barley data discussed earlier gt barley plot lt dotplot variety yield site data barley groups year layout c 1 6 aspect 5 xlab Barley Yield bushels acre panel function x y dot line lt trellis par get dot line abline h unique y wd dot line 1wd lty dot line lty col dot 1line col panel superpose x y gt print barley plot On each panel data for two years are displayed and
182. hout specific written prior permission M I T makes no representations about the suitability of this software for any purpose It is provided as is without express or implied warranty This software is not subject to any license of the American Telephone and Telegraph Company or of the Regents of the University of California S PLUS is a registered trademark of MathSoft Inc S and New S are trademarks of Lucent Technologies Inc Elan License Manager is a trademark of Rainbow Technologies All other trademarks are acknowledged S PLUS would not exist without the pioneering research of the Bell Labs S team at AT amp T now Lucent Technologies John M Chambers Richard A Becker Allan R Wilks Duncan Temple Lang David James Mark Hansen William S Cleveland and colleagues License Agreement and Limited Warranty MathSoft Inc License Agreement Warning MATHSOFT IS WILLING TO LICENSE THE ENCLOSED SOFTWARE TO YOU ONLY UPON THE CONDITION THAT YOU ACCEPT ALL OF THE TERMS CONTAINED IN THIS LICENSE AGREEMENT PLEASE READ THE TERMS CAREFULLY BEFORE OPENING THE PACKAGE WITH THE CD ROM OR OTHER MEDIA AS OPENING THE PACKAGE WILL INDICATE YOUR ASSENT TO THEM IF YOU DO NOT AGREE TO THESE TERMS THEN MATHSOFT IS UNWILLING TO LICENSE THE SOFTWARE TO YOU IN WHICH EVENT YOU SHOULD RETURN THIS COMPLETE PACKAGE WITH ALL ORIGINAL MATERIALS AND THE UNOPENED PACKAGE WITH THE CD ROM OR OTHER MEDIA AND YOUR MONEY WILL BE REFUNDED Both th
183. ice the default graphical parameters are device dependent These parameters are contained in lists that we will refer to as the Trellis settings When trellis device sets up a graphics device the Trellis settings are established for that device and are saved on a special data structure When you write your own panel functions you may want to make use of the Trellis settings to provide good performance across different devices Three functions enable you to access display and change the settings for the current device trellis par get lets you get settings for use in a panel function show settings shows graphically the values of the settings trellis par set lets you change the settings for the current device Here is the panel function panel xyplot function x y type p cex plot symbol cex pch plot symbol pch font plot symbol font lwd plot line lwd ty plot line lty col if type 1 plot line col else plot symbol col if type 1 plot line lt trellis par get plot line lines x y lwd lwd 1ty lty col col type type 249 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS else plot symbol lt trellis par get plot symbol points x y pch pch font font cex cex col col type type If the argument type is p which means that point symbols are used to plot the data then the plotting symbol is defined by the settings list plot symbol the components of this list are given to the function points
184. ield given year and site This simple use of formula creates a complex multipanel display A multipanel conditioning display is a three way rectangular array laid out into columns rows and pages In figure 7 22 there are two columns six rows and one page The numbers of columns rows and pages are selected by an algorithm that attempts to fill up as much of the graphics region as possible subject to certain constraints As we will see in the section Summary The Layout of a Multipanel Display page 237 there is an argument layout that allows you to choose the numbers MULTIPANEL CONDITIONING Trebi Wisconsin No 38 No 457 Glabron Peatland Velvet No 475 Manchuria No 462 Svansota Trebi Wisconsin No 38 No 457 Glabron Peatland Velvet No 475 Manchuria No 462 Svansota Trebi Wisconsin No 38 No 457 Glabron Peatland Velvet No 475 Manchuria No 462 Svansota Trebi Wisconsin No 38 No 457 Glabron Peatland Velvet No 475 Manchuria No 462 Svansota Trebi Wisconsin No 38 No 457 Glabron Peatland Velvet No 475 Manchuria No 462 Svansota Trebi Wisconsin No 38 No 457 Glabron Peatland Velvet No 475 Manchuria No 462 Svansota fi Waseca 1932 Crookston 932 Mortis 1932 Duluth Duluth 1932 1931 Grand Rapids 1932 Grand Rapids 1931 yield Figure 7 22 Multipane
185. ifornia 21198 Tiari 20 Michigan 9111 70 63 125 Utah 1203 72 90 137 As before if row or column names have been defined they can be used in place of the index numbers gt state x77 Lec California Michigan Utah y ec Population Life Exp Frose Population Life Exp Frost California 21198 Sled 20 Michigan 9111 70 63 125 Utah 1203 72 90 137 39 CHAPTER 2 GETTING STARTED Selecting All To select all of the rows leave the expression before the comma blank To Rows or All select all columns leave the expression after the comma blank The following Columns From a pression chooses all columns for the states California Michigan and Utah Matrix Object In the following expression the closing bracket appears immediately after the comma this means that all columns are selected gt state x7 7 c California Michigan Utah Population Income Illiteracy Life Exp Murder California 21198 5114 Loi fig el loos Michigan 9111 4751 O83 70 63 tli Utah 1203 4022 0 6 72 90 4 5 HS Grad Frost Area California 62 6 20 156361 Michigan 62 8 125 56817 Utah 67 3 137 82096 40 GRAPHICS IN S Plus GRAPHICS IN S PLus Making Plots Graphics are central to the S PLUS philosophy of looking at your data visually as a first and last step in any data analysis With its broad range of built in graphics functions and its programmability S PLUS lets you look at your data from many angles This section
186. ify the observation numbers of such points use the identify function which lets you point and click with a mouse on the unusual points For example consider the plot of y versus x plotted as follows gt set seed 12 gt x lt runif 20 gt y lt 4 x rnorm 20 2R Re CAME xy x Cl 2 gt plot x y You immediately notice one point separated from the bulk of the data Such a data point is called an outlier To identify this point by observation number use identify as follows gt identify x y n 1 After pressing RETURN you do not get a prompt Instead S PLUS waits for you to identify points with the mouse Now move the mouse cursor into the graphics window so that it is adjacent to the data point to be identified and click the left mouse button The observation number appears next to the point If you click when the cursor is more than 0 5 inch from the nearest point in the plot a message appears on your screen to tell you there are no points near the cursor After identifying all the points that you requested in our example n 1 S PLUS prints out the observation numbers of the identified points and returns your prompt 137 CHAPTER 6 TRADITIONAL GRAPHICS Adding Straight Line Fits toa Current Scatter Plot Adding a Least Squares Straight Line Adding a Robust Straight Line Fit Adding New Data toa Current Plot gt identify x y n 1 1 21 If you omit the optional argument n
187. ile for more information on omi and omd 171 CHAPTER 6 TRADITIONAL GRAPHICS Warning If you set oma to something other than the default value c 0 0 0 0 and then later reset a of the graphics parameters in a call to par e g par orig par you will see the warning message Warning messages Graphics error Figure specified in inches too large in zzfigz in This message can be safely ignored Controlling Figure Margins 172 To specify the size of the figure margins use one of two equivalent graphics layout parameters mar or mai The mar parameter specified as a numeric vector of length four with values expressed in mex is generally the more useful of the two because it can be used to specify relative margin sizes The mai parameter measures the size of each side of the margin in inches and is thus useful for specifying absolute margin sizes If for example mex is 1 the default and mar equals c 5 5 5 5 there is room for five lines of default font text cex 1 in each margin If mex is 2 and mar is c 5 5 5 5 there is room for 10 lines of default font text in each margin The mex parameter specifies the size of font that is to be used to measure the margins When you change mex S PLUS automatically resets some margin parameters to decrease the size of the figure margins to correspond to smaller text without changing the size of the outer margin Table 6 4 shows the effects on the various margin p
188. imensional array and a vector as a one dimensional array 85 CHAPTER 4 DATA OBJECTS Creating Arrays 86 To create an array in S PLUS use the array function The array function is analogous to matrix It takes data and the appropriate dimensions as arguments then produces the array If no data is supplied the array is filled with NAs When passing values to array combine them in a vector so that the first dimension varies fastest the second dimension the next fastest and so on The following example shows how this works gt array c 1 8 11 18 111 118 dim c 2 4 3 led WEIT Lig I B amp F 2 1 2 4 6 8 ae a ae 9 ied Ld Te 2s 17 2 12 14 16 18 ba PE RA Hisl A MS eS day 2 112 114 116 118 The first dimension the rows is incremented first This is equivalent to placing the values column by column The second dimension the columns is incremented second The third dimension is incremented by filling a matrix for each level of the third dimension For creating arrays from existing vectors the dim function works for arrays in the same way it works for matrices The dim function lets you set the Dim slot as you can for a matrix For example if the data above were stored in the vector vec you could create the above array by defining the Dim slot with the vector c 2 4 3 gt vec Lt 23436 7 8 il 1213 12 14 15 16 17 18 111 112 113 114 115 116 23 117 118 gt dim vec lt c 2 4 3
189. in the factors Data less than or equal to the first breakpoint or greater than the last breakpoint are returned as NA To create a specific number of groups by partitioning the range of the data into equal sized intervals use an integer value for the breaks argument gt cut state x77 Murder breaks 2 1 22122111 eli 1211222 attr levels 1 1263 thru 8 250 8 2504 thru 15 237 2 hie Ue eg be le he C2 LiL e2 hegi betas By default cut creates abels of the form first breakpoint thru second breakpoint etc using either the breakpoints you provide or the ones it creates However you can assign different labels to the levels with the 1abe1 s argument gt CULtstate x77 Murder 8 16 labels c Low High ETATE TTS 20 112112722 attr levels 1 Low High Q2 LL 220ULe2fe2leled Oe i ei ee ies a a te Note As you may notice from the style of printing in the above examples cut does not produce factors directly Rather the value returned by cut is a category object 94 FACTORS AND ORDERED FACTORS To create a factor from the output of cut just call factor with the call to cut as its only argument gt Factor eut state x77 Murder 0 8 16 labels c Low High 1 High High Low High High Low Low Low High High 11 Low Low High Low Low Low High High Low High 21 Low High Low High High Low Low High Low Low 31 High High High Lo
190. ing Axes Controlling Tick Marks and Axis Labels Controlling Axis Style Controlling Axis Boxes Controlling Multiple Plots Overlaying Figures High Level Functions That Can Act as Low Level Functions Overlaying Figures by Setting new TRUE Overlay Figures by Using subplot Adding Special Symbols to Plots Arrows and Line Segments Adding Stars and Other Symbols Custom Symbols Traditional Graphics Summary References 154 155 156 157 158 158 160 161 163 164 166 170 171 172 173 174 174 175 176 177 177 178 180 180 180 183 184 185 188 188 188 189 192 192 193 195 197 200 INTRODUCTION Introduction Visualizing data is a powerful data analysis tool because it allows you to easily detect interesting features or structure in the data This may lead you to immediate conclusions or guide you in building a statistical model for your data This chapter shows you how to use S PLUS to visualize your data The first section Getting Started with Simple Plots page 122 shows you how to plot vector and time series objects Once you have read this first section you will be ready to use any of the plotting options described in the section Frequently Used Plotting Options page 126 These options which can be used with many S PLUS graphics functions control most features in a plot such as plot shape multiple plot layout titles axes etc The remaining sections of this chapter cover a range of plotting tasks e Intera
191. ing all kinds of statistical analysis including hypothesis testing linear regression analysis of variance contingency tables factor analysis survival analysis and time series analysis Estimation techniques for all these branches of statistics are described in detail in the manual Guide to Statistics This section gives overviews of the functions that produce summary statistics perform hypothesis tests and fit statistical models S PLUS includes functions for calculating all the standard summary statistics for a data set together with a variety of robust and or resistant estimators of location and scale Table 2 6 gives a list of the most common functions for summary statistics Table 2 6 Common functions for summary statistics cor Correlation coefficient cummax cummin Cumulative maximum minimum product and cumprod cumsum sum diff Create sequential differences max min Maximum and minimum pmax pmin Maxima and minima of several vectors mean Arithmetic mean median 50th percentile prod Product of elements of a vector quantile Compute empirical quantiles range Returns minimum and maximum of a vector 47 CHAPTER 2 GETTING STARTED Hypothesis Testing 48 Table 2 6 Common functions for summary statistics Continued sample Random sample or permutation of a vector sum Sum elements of a vector summary Summarize an object var Variance and covariance The summary function is a generic fu
192. input and standard error streams are pipes or files See the parse help file for more details EDITOR Sets the command line editor to either emacs or vi Overridden by S_CLEDITOR or VISUAL if either contains a valid value PATH Specifies the directories which are searched when a command is issued to the UNIX shell In particular the Splus5 command should be installed in one of the listed directories S_CLEDITOR Sets the command line editor to either emacs or vi S_CLHISTFILE Sets the name of the command line editor s history file The default is HOME Splus_history S_CLHISTSIZE Specifies the maximum number of lines to put in the command line editor s history file S_CLNOHIST Suppresses writing of the command line editor s history file S_EDITOR Sets the value of options editor The specified editor is used by the fix function S_FIRST S PLUS function evaluated at start up See section Setting S_FIRST page 316 SHELL Specifies the UNIX command shell which S PLUS uses to determine the shell to use in shell escapes if S_ SHELL is not set SHOME Specifies the directory where S PLUS is installed By default this is set to the parent directory of the program executable 314 SETTING ENVIRONMENT VARIABLES Table 9 1 Variables S_PAGER Specifies which pager to use Sets the value of options pager the specified pager is used by the page help and functions S_POSTSCRIPT_PRINT_COMMAND Specifi
193. ins a character vector which briefly describes the data To access a list component specify the name of the list and the name of the component separated by a For example to display the grouping data gt heart list group Li Pitta Pata Z22222 222z More generally you can access list components by an index number enclosed in double brackets LE For example the grouping information can also be accessed by gt heart list 1 me fae Le i Gs ee eA dea Once you ve accessed a component you can specify particular values of the component in the usual way using the single bracket notation For example since the group component is a vector you can obtain the 11th and 12th elements with gt heart Tisti LIJIAR IZ AI L2 or LISTS gt heart list group 11 12 fi 2 2 If you define a list without naming the components components can be accessed only using the double bracket notation When the components are named you can use either the double bracket notation or the names convention with a separating the list name and the component name List The names of a list s components can be changed by assigning them with the Component names function Names gt names heart list lt c group total heart weight descrip gt names heart list 1 group total heart weight descrip 89 CHAPTER 4 DATA OBJECTS FACTORS AND ORDERED FACTORS 90 In data analysis many kinds of dat
194. inter to the desired option and click These option menus and the Command text entry box are described below Method e Orientation Command Resolution Determines the kind of file that is created when the Print option under the Graph menu is applied The PostScript method produces a file of PostScript graphics commands the LaserJet method produces a file of LaserJet graphics commands Determines the orientation of the graph on the paper Landscape orientation puts the x axis along the long side of the paper Portrait orientation puts the x axis along the short side of the paper Shows the command that is used to send the file of graphics commands to the printer To change this command move the pointer to this line and click The cursor changes into an I You can now type in text from the keyboard Appears only if Method is set to LaserJet Controls the resolution of the HP LaserJet plots The default settings for Method Orientation Command and Resolution are initially set using X resources The way to change these settings is explained below Printing Options Buttons Apply Click on this button to apply any changes you have made to the printing specifications Only the specifications are changed no printing is done Any changes you make last only as long 303 CHAPTER 8 WORKING WITH GRAPHICS DEVICES as the graphics window remains or until you make more changes and select Apply again
195. ion shingle For example the following creates five intervals of equal width and no overlap for the variable ethanol E gt endpoints lt seq min ethanol E max ethanol E length 6 MULTIPANEL CONDITIONING 0 0 0 0 GIVEN E lt shingle ethanol E intervals cbind endpoints 6 endpoints 1 levels GIVE min max 5350 0 6744 6744 0 8138 8138 0 9532 9932 120926 1 0926 1 2320 N E The argument intervals is a two column matrix holding the left endpoints and the right endpoints of the intervals respectively Panel Figure 7 27 T T T T 0 6 0 8 1 0 1 2 GIVEN E Plotting intervals using shingles 241 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS SCALES AND LABELS xlab ylab main and sub Arguments xlim and ylim Arguments 242 The functions presented in the section General Display Functions page 210 have arguments that specify the scales and labels of graphs These arguments are discussed in this section To produce a scatterplot of NOx against E for the gas data which were introduced in the section A Data Set gas page 204 gt xyplot NOx E data gas aspect 1 2 The labels appearing on the plot for the horizontal or x scale and the vertical or y scale are taken from the names used in the argument formula We can specify these scale labels as well as a main t
196. ions by using the optional argument pch Any printing character can be used as a plotting character The plotting character is specified as a character string so it must be enclosed in quotes For example gt plot halibut biomass pch B FREQUENTLY USED PLOTTING OPTIONS Controlling Plotting Colors You can also choose any one of a range of plotting symbols by using pch n Here you must use numeric mode for the value of pch The symbol corresponding to each of these integers is shown in figure 6 8 O WwW O 10 13 16 11 14 17 12 15 18 E H lt 0g0 e Wx O N gt O XOD Figure 6 8 Plotting symbols from the pch parameter To specify the color in which your graphics are plotted use the col parameter You can use color to distinguish between sets of overlaid data gt plot co2 gt lines smooth co2 col 2 The colors available are determined by the device s color map The default color map for graphsheet has sixteen colors fifteen foreground and one background color To see all the colors in the default color map use the following expression gt pie rep 1 15 col 1 15 This expression plots a pie chart with 15 colors on the background color color 0 for a total of 16 colors You specify the color map for the graphsheet device using the Color Schemes dialog box which lists the default color map or scheme together with several other predefined schemes and any color schemes you define
197. is interpreted by S PLUS as meaning take the integers from 1 to 5 and then subtract one from each integer Hence the output is of length 5 instead of length 4 and starts at 0 instead of 1 as follows PLus LANGUAGE BASICS Optional Arguments to Functions gt Ae 1 0 1234 When using S PLUS keep in mind the effect of parentheses and of the default operator hierarchy One powerful feature of S PLUS functions is considerable flexibility through the use of optional arguments At the same time simplicity is maintained because sensible defaults for optional arguments have been built in and the number of required arguments is kept to a minimum You can determine which arguments are required and which are optional by looking in the help file in the REQUIRED ARGUMENTS and the OPTIONAL ARGUMENTS sections For example to produce 50 random normal numbers with mean 0 and standard deviation 1 use the following gt rnorm 50 If you want to produce 50 normal random numbers with mean 3 and standard deviation 5 you can use any of the following rnorm 50 3 5 rnorm 50 sd 5 mean 3 rnorm 50 m 3 s 5 rnorm m 3 s 5 50 VVV MV In the first expression you are supplying the optional arguments by value When supplying optional arguments by value you must supply all the arguments in the order they are given in the help file USAGE statement In the second through fourth expressions above you are supplying the
198. is not standard S PLUS syntax and the expressions described are not standard S PLUS expressions Do not use the syntax described in this section for any purpose other than passing a filter argument to importData or exportData Variable Expressions You can specify a single variable or an expression involving several variables All of the usual arithmetic operators are available for use in variable expressions Relational Operators The following relational operators are available Operator equals not equal 59 CHAPTER 3 IMPORTING AND EXPORTING DATA 60 Operator lt less than gt greater than lt less than or equal gt greater than or equal amp and or not Examples Examples of selection conditions given by filter expressions are sex 1 amp age lt 50 income benefits famsize lt 4500 incomel gt 20000 income2 gt 20000 incomel gt 20000 amp income2 gt 20000 dept auto loan Note that strings used in case selection expressions must be enclosed in single quotes if they contain embedded blanks Wildcards or are available to select subgroups of string variables For example account 22 mid m 3 The first statement will select any accounts that have 2 s as the 5th and 6th characters in the string while the second statement will select strings of any length that begin with 3 The comma o
199. itle at the top and a subtitle at the bottom using the following code gt xyplot NOx E data gas aspect 1 2 xlab Equivalence Ratio ylab Oxides of Nitrogen main Air Pollution sub Single Cylinder Engine Each of these four label arguments can also be a list The first component of the list is a new character string for the text of the label The other components specify the size font and color of the text The component cex specifies the size font a positive integer specifies the font and col a positive integer specifies the color The following code changes the sizes of the title and subtitle gt xyplot NOx E data gas aspect 1 2 xlab Equivalence Ratio ylab Oxides of Nitrogen main list Air Pollution cex 2 sub list Single Cylinder Engine cex 1 25 In Trellis the upper value of the scale line for a numeric variable is the maximum of the data to be plotted plus 4 of the range of the data Similarly the lower value of the scale line for a numeric variable is the minimum of the data to be plotted minus 4 of the range of the data The 4 helps prevent the data values from running into the edge of the plot We can alter the extremes of the horizontal scale line by the argument x1im a vector of two values The first value replaces the minimum of the data in the above procedure and the second value replaces the maximum Similarly we can alter the vertical scale by the ylim argument In plots crea
200. ize etd 0 5 0 omd layout numeric outer margin size coe G5 01 omi layout numeric outer margin size c0 050 PLOT AREA pin layout numeric plot area e 3 5 4 pit layout numeric plot area Ct 05 852 1658 pty layout character plot type ee uin information numeric inches per usr unit Cleat oe DS usr layout numeric limits in plot area CLIO OP gees xlim high level numeric limits in plot area C1358 ylim high level numeric limits in plot area c 3 8 199 CHAPTER 6 TRADITIONAL GRAPHICS Table 6 10 Summary of the most useful graphics parameters Name Type Mode Description Example MISCELLANEOUS col general integer color 2 err general integer print warnings a new layout logical is figure blank TRUE References Chernoff H 1973 The Use of Faces to Represent Points in k Dimensional 200 Space Graphically Journal of American Statistical Association 68 361 368 Cleveland W S 1985 The Elements of Graphing Data Monterey California Wadsworth Martin R D Yohai V J and Zamar R H 1989 Min max bias robust regression Annals of Statistics 17 1608 30 Silverman B W 1986 Density Estimation for Statistics and Data Analysis London Chapman and Hall TRADITIONAL TRELLIS GRAPHICS A Roadmap of Trellis Graphics Giving Data to General Display Functions A Data Set gas formula Argument subset Argument Data Frames Aspect Ratio General Display Functions A Data Set fuel frame A Dat
201. kes when using S PLUS You will not break anything by making a mistake Usually you get some sort of error message after which you can try again Here are two examples of mistakes made by typing improper expressions ogee deel Problem Syntax error illegal literal 1 on input line 1 gt 5 2 4 Problem Invalid object supplied as function Here we typed something that S PLUS tried to interpret as a function because of the parentheses However there is no function named 5 11 CHAPTER 2 GETTING STARTED COMMAND LINE EDITING 12 Included with S PLUS is a command line editor that can help improve your productivity by enabling you to recall and edit previously issued S PLUS commands The editor can do either emacs or vi style editing The command line editor uses the first valid value in the following list of environment variables S_CLEDITOR VISUAL EDITOR To be valid the value for the environment variable must end in vi or emacs If none of the listed variables has a valid value the command line editor defaults to vi style For example from the C shell you issue the following command to set your S_CLEDITOR to emacs setenv S CLEDITOR emacs To use the command line editor within S PLUS start S PLUS with the following command Splus e Table 2 1 summarizes the most useful editing commands for both modes of the command line editor Table 2 1 Command line editing in S PLUS
202. l or character For example the vectors described above have length 4 8 and 2 and class numeric logical and character respectively S PLUS assigns the class of a vector containing different kinds of values so as to preserve the maximum amount of information character strings contain the most information numbers somewhat less logical values still less S PLUS coerces less informative values to equivalent values of the more informative type gt ely TRUE FALSE 1 17 1 0 gt CCIF TRUE hel lo i wT TRUE hello PLus LANGUAGE BASICS Data Object Object names must begin with a letter and may include any combinations of Names upper and lower case letters numbers and periods For example the following are all valid object names mydata data ozone RandomNumbers lottery ohio 1 28 90 Warning If you create S PLUS data objects on a file system with more restrictive naming conventions than those your version of S PLUS was compiled for you may lose data if you violate the restrictive naming conventions in naming your S PLUS objects For example if you are running S PLUS on a machine allowing 255 character names and create S PLUS objects on a machine restricting file names to 14 characters object names greater than 14 characters will be truncated to the 14 character limit If two objects share the initial 14 characters the latest object will overwrite the earlier object S PLUS warns you whene
203. l allows us to bank to 45 degrees based on the loess curves and to take the curves into account in computing the ranges of the scales gt xyplot NOx E C data ethanol prepanel function x y prepanel loess x y span 1 2 degree 2 layout c 1 6 panel function x y MORE ON ASPECT RATIO AND SCALES PREPANEL FUNCTIONS More on Multipanel Conditioning panel xyplot x y panel loess x y span 1 2 degree 2 The prepanel argument takes a function and does panel by panel computations just like the argument panel but these computations are carried out before the scales and aspect ratio are determined and so can be used in their determination The returned value of a prepanel function is a list with prescribed component names These names are shown in the prepanel function prepanel loess gt prepanel loess FUNCETONCX Y ss xlim lt range x ylim lt range y out lt loess smooth x y x lt outix y lt out y list xlim range x xlim ylim range y ylim dx diff x dy diff y The component values x1im and y1im determine ranges for the scales just as they do when they are given as arguments of a general display function The values of dx and dy are the horizontal and vertical changes of the line segments that are to be banked to 45 degrees The function prepanel loess computes the smooths for all panels computes values of x1im and ylim that ensure the curve will be included in the ranges o
204. l conditioning on the barley data Packet Order and Panel Order In the above formula the conditioning variable year appeared first and site appeared second This gives an explicit ordering to the conditioning variables Each of these variables is a factor with levels gt levels barley year 1932 1 Ig31 231 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS 232 gt levels barley site 1 Grand Rapids Duluth University Farm 4 Morris Crookston Waseca The levels of each factor are ordered by their order of appearance in the levels attribute As we will discuss shortly we can control the order by making the factor an ordered factor A packet is information sent to a panel for display For figure 7 22 each packet includes the values of variety and yield for a particular combination of year and site Packets are ordered by the orderings of the conditioning variables and their levels the levels of the first conditioning variable vary the fastest the levels of the second conditioning variable vary the next fastest and so forth For figure 7 22 the order of the packets is 1932 Grand Rapids 1931 Grand Rapids 1932 Duluth 1931 Duluth 1932 University Farm 1931 University Farm 1932 Morris 1931 Morris 1932 Crookston 1931 Crookston 1932 Waseca 1931 Waseca The panels of a multipanel display are also ordered The bottom left panel is panel one From there we move fastest through the columns next fas
205. ll as from relational databases and ASCII files e read table reads in data from an external file e data frame binds together S PLUS objects of various kinds e as data frame coerces objects of a particular type to objects of class data frame You can also combine existing data frames in several ways using the cbind rbind and merge functions The importData function is described in detail in Chapter 3 Importing and Exporting Data The read table function reads data stored in a text file in table format directly into S PLUS The as data frame function is primarily a support function for the top level data frame function it provides a mechanism for defining how new variable classes should be included in newly constructed data frames This mechanism is discussed further in section Adding New Classes of Variables to Data Frames page 116 For most purposes when you want to create or modify data frames within S PLUS you use the data frame function or one of the combining functions cbind rbind or merge This section focuses specifically on the data frame function for combining S PLUS objects into data frames The following section discusses the functions for combining existing data frames The data frame function is used for creating data frames from existing S PLUS data objects rather than from data in an external text file The only required argument to data frame is one or more data objects All of the objects must produce co
206. lor Values Table 8 1 gives some examples of available colors in the rgt txt file Table 8 1 Some available colors in rgb txt violet blue green yellow orange red black white ghost white peach puff lavender blush lemon chiffon lawn green chartreuse olive drab lime green magenta medium orchid blue violet purple You can also specify a color by using a hexadecimal value from the Red Green and Blue RGB Color Model A hexadecimal value is made up of hexadecimal digits A hexadecimal digit can take on any of the values 0 1 2 3 4 5 6 7 8 9 A B C D E F listed from smallest to largest Most color displays are based on the RGB Color Model Each pixel on the screen is made up of three phosphors one red one green and one blue Varying the intensities of each of these phosphors varies the color that you see on your display You can specify the intensities of each of the three phosphors with a hexadecimal triad The first part of the triad corresponds to the intensity of the red phosphor the second to the intensity of the green phosphor and the third to the intensity of the blue phosphor A hexadecimal triad must begin with the symbol For example the hexadecimal triad 000 corresponds to no intensity in any of the phosphors and yields the color black while the triad FFF corresponds to maximum intensity in all of the phosphors and yields white A hexadecimal triad with only one digit per phosphor allows for 4 0
207. lot factor numeric stripplot factor numeric qq factor numeric Graph Measurements with Labels dotplot character numeric barchart character numeric piechart character numeric Graph the Sample Distribution of One Set of Data qqmath numeric histogram numeric densityplot numeric Graph Multivariate Data splom data frame parallel data frame Graph a Function of Two Variables Evaluated on a Grid contourplot numericl numeric2 numeric3 levelplot numericl numeric2 numeric3 wireframe numericl numeric2 numeric3 Graph Three Numerical Variables cloud numericl numeric2 numeric3 227 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS ARRANGING SEVERAL GRAPHS ON ONE PAGE print 228 Several graphs made separately by Trellis display functions can be displayed on a single page There is one restriction None of the individual graphs may be a multipanel conditioning display with more than one page Figure 7 21 shows two graphs arranged on one page gt attach fuel frame gt box plot lt bwplot Type Mileage gt scatter plot lt xyplot Mileage Weight gt detach gt print box plot position c 0 0 1 4 more T gt print scatter plot positronmc 0 35 1 1 The argument position specifies the position of each graph on the page using a page coordinate system in which the lower left corner of the page is 0 0 and the upper right corner is 1 1 The graph rectangle is the portion of the page allocat
208. lots Bar plots too introduce perceptual ambiguities particularly in the divided bar chart For these reasons we recommend the dot chart 146 VISUALIZING THE DISTRIBUTION OF YOUR DATA VISUALIZING THE DISTRIBUTION OF YOUR DATA For any data set you need to analyze you should try to get a visual picture of the shape of its distribution The distribution shape is readily visualized from such familiar plots as boxplots histograms and density plots Less familiar but equally useful are quantile quantile plots qqplots In this section we show you how to use S PLUS functions to make these kinds of plots Boxplots A boxplot is a simple graphical representation showing the center and spread of a distribution along with a display of unusually deviant data points called outliers To create a boxplot in S PLUS you use the boxp1ot function gt boxplot corn rain Figure 6 14 Boxplot from corn rain data The horizontal line in the interior of the box is located at the median of the data This estimates the center of the distribution for the data The height of the box is equal to the interquartile distance or IQD which is the difference 147 CHAPTER 6 TRADITIONAL GRAPHICS Histograms 148 between the third quartile of the data and the first quartile The IQD indicates the spread or width of the distribution for the data The whiskers the dotted lines extending from the top and bo
209. low Title bar Contains the window menu button the title S PLUS the minimize button and the maximize button e Menu Bar Contains three menu titles Graph Options and Help The Help menu title produces a pop up window rather than a menu when you select it Pane Area where S PLUS displays any graphs that you create while the motif graphics device is active Footer Area where S PLUS puts status or error messages concerning the graph you have created e Resize Borders Used to change the size of the window 292 GRAPHICS WINDOW DETAILS Now type the rain vs yield example shown in the section An Example page 290 S PLUS Graph Options Help Figure 8 2 The motif window The Help Menu The Help menu title appears at the far right side of the menu bar Move the pointer to this menu title and click to call up a help pop up window This help window contains a condensed version of the motif help file Click on the Close button in this pop up window to make this window disappear once you have finished with it 293 CHAPTER 8 WORKING WITH GRAPHICS DEVICES The Graph Menu The first menu title in the menu bar of t
210. lumn names for example A AB can be used to specify the starting and ending columns If your Lotus type worksheet contains numeric data only in a rectangular block starting in the first row and column of the worksheet then all you need to specify is the file name and file type If a row contains names specify the number of that row in the colNameRow argument it does not have to be the first row You can select a rectangular subset of your worksheet by specifying starting and ending columns and rows Lotus style column names for example A AB can be used to specify the starting and ending columns The row specified as the starting row is always read first to find out the data types of the columns Therefore there cannot be any blank cells in this row In other rows blank cells are filled in with missing values S PLUS imports dBase and dBase compatible files The file name and file type are often the only things you need specify for dBase type files Column names and data types are obtained from the dBase file However you can select a rectangular subset of your data by specifying starting and ending columns and rows The importData function supports importing data from Informix Oracle and Sybase databases The importData function makes S PLUS a client that connects to the databases The database must be properly configured for network client access and appropriate environment variables must be set for the import to work NOTES
211. lumns of the same length Vectors must have the same number of observations as the number of rows of the data frame matrices must have the same number of rows as the data frame and lists must have components that match in lengths for vectors or rows for matrices If the objects don t match appropriately you get an error message saying the arguments imply differing number of rows For example suppose we have vectors of various modes each having length 20 along with a matrix 99 CHAPTER 5 DATA FRAMES with two columns and 20 rows and a data frame with 20 observations for each of three variables We can combine these into a data frame as follows 100 gt my logical lt sample c T F size 20 replace T gt my complex lt rnorm 20 runif 20 1i gt my numeric lt rnorm 20 gt my matrix lt matrix rnorm 40 ncol 2 gt my df lt kyphosis 1 20 1 3 gt my df2 lt data frame my logical my complex my numeric my matrix my df gt my df2 my logical my complex my numeric 1 FALSE 1 8831606111 0 501943978i1 1 09345678 re FALSE 0 3368386818 0 858758209i1 0 09873739 3 TRUE 0 0003541437 0 381377962i1 0 91776485 4 FALSE 1 2066770747 0 006793533i1 1 76152800 5 FALSE 0 0204049459 0 158040394i 0 30370197 6 FALSE 1 0119328923 0 8603261291 0 52486689 7 FALSE 0 9163081264 0 474985190i1 1 46745534 8 FALSE 1 3829848791 0 932033515i 0 45363152 9 FALSE 0 4695526978 0 795743512i1 0 40777969 10 TRUE 0 803589
212. ly that the data frame gas is being used this can be helpful for understanding at some future point how the graph was produced Suppose you want to redo figure 7 1 and omit the observations for which E is 1 1 or greater You could do this by gt xyplot NOx LE lt 1 1 ELE lt 1 1 data gas But it is a nuisance to repeat the logical subsetting E lt 1 1 and the nuisance would be much greater if there were many variables in the formula instead of just two It is typically easier to use the argument subset instead gt xyplot NOx E data gas subset E lt 1 1 The result is shown in figure 7 2 The argument subset can be a logical or numerical vector GIVING DATA TO GENERAL DISPLAY FUNCTIONS Data Frames O a 0 5 O O O ie ie 4 R L ie g xX O O 35 L ie G O 205 o 0 el T T T T T 0 7 0 8 0 9 1 0 1 1 E Figure 7 2 Using the subset argument on the gas data You can keep variables as vectors and draw Trellis displays without using data frames Still data frames are very convenient But data sets are often stored at least initially in data structures other than data frames so we need ways to go from data structures of various types to data frames Functions to do this are discussed in the section Data Structures page 259 207 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS ASPECT RATIO aspect Argument 208 The aspect ratio of a graph the height of a panel data region
213. ly with the argument method postscript S PRINTGRAPH METHOD determines the default value for the method argument to printgraph and specifies the type of printer for which printgraph produces output Environment variables cannot be set from within S PLUS if you want to change an environment variable quit S PLUs reset the environment variable then restart S PLUS Within your S PLUS session you can control the default printing behavior by using ps options We recommend that you use ps options instead of environment variables whenever possible The options that can be controlled through ps options are described in the section Setting PostScript Options page 279 To call printgraph to print an immediate hard copy of the current graphic use the following call gt printgraph You can override the default method command and orientation with arguments to printgraph gt printgraph horizontal F method postscript conmand Tpr h You can start the postscript device directly very simply as follows gt postscript By default this writes PostScript output to a temporary file using the template specified in ps options When the device is shut down the output is printed with the command specified in ps options PRINTING YOUR GRAPHICS You can specify many options as arguments to postscript most of these are global PostScript printing options that are also used by the Print option of the windowing graphics devi
214. lysis Cluster analysis In a formula you specify the response variable first followed by a tilde and the terms to be included in the model Variables in formulas can be any STATISTICS expression that evaluates to a numeric vector a factor or ordered factor or a matrix Table 2 9 gives a summary of the formula syntax Table 2 9 Summary of the S PLUS formula syntax Expression Meaning A B A is modeled as B B C Include both B and C in the model B C Include all of B except what is in C in the model B C The interaction between B and C B C Include B C and their interaction in the model C in B C is nested within B B C Include Band C in B in the model The following sample S PLUS session illustrates some steps to fit a regression model to the fuel frame data containing five variables for 60 cars We do not show the output type these commands at your S PLUS prompt and you ll get a good feel for doing data analysis with the S PLUs language gt names fuel frame gt par mfrow c 3 2 gt plot fuel frame gt pairs fuel frame gt attach fuel frame gt par mfrow c 2 1 gt scatter smooth Mileage Weight gt scatter smooth Fuel Weight gt Im fitl lt Im Fuel Weight gt Im fitl gt names 1m fitl gt summary 1m fit1 gt qqnorm residuals 1m fitl gt plot m influence 1Im fitl hat type h xlab Case Number ylab Hat Matrix Diagonal 51 C
215. ment you must allow your qqplot function to pass the required argument For example you create qqchi sq as follows gt qqchisq lt function x df plot qchisq ppoints x df sort x QQplots for When you want to check whether two sets of data have the same distribution Comparing Two use the function qqplot If the two data sets have the same number of Sets of Data observations qqp1ot plots the ordered data values of one data set versus the ordered data values of the other data set If the two data sets have different numbers of observations then the ordered data values for one data set are plotted against interpolates of the ordered data values of the other data set For example to compare the distributions of the two New Jersey lottery data sets lottery payoff and lottery3 payoff use the following expression gt qqplot lottery payoff lottery3 payoff 153 CHAPTER 6 TRADITIONAL GRAPHICS VISUALIZING HIGHER DIMENSIONAL DATA Multivariate Data Plots Scatterplot Matrices 154 For data with three or more variables many methods of graphical visualization have been developed Some of these are highly interactive and take full advantage of the power of personal computers The following sections describe how to use S PLUS functions in analyzing multi dimensional data This section describes several methods for static data visualization that are widely considered useful scatterplot matrices matplots star pl
216. middle of a character string The most important feature of this function is that it uses pty s so that the figure will be drawn to proper scale when used with draw symbol The draw symbol function takes some locations and a symbol given in the form of a list with x and y components gt draw symbol lt function x y sym size 1 fill F uin lt par uin inches per user unit sym x lt sym x uin 1 size sym y lt sym y uin 2 size if Cf for i in 1 length x lines x Lil sym x yLil sym y else for i in 1 length x polygon xLil sym x y i tsym y The uin graphics parameter is used to scale the symbol into user units The make symbol and draw symbol functions are examples of how to create your own graphics functions using the built in graphics functions and graphics parameters TRADITIONAL GRAPHICS SUMMARY TRADITIONAL GRAPHICS SUMMARY Table 6 10 Summary of the most useful graphics parameters Name Type Mode Description Example MULTIPLE FIGURES fig layout numeric figure location UTEE D fin layout numeric figure size cl3 5 4 fty layout character figure type ae mfg layout integer location in figure array coat od ears mfcol layout integer figure array size CUS mfrow layout integer figure array size clZ 4 TEXT adj general numeric text justification 5 cex general numeric height of font 1 5 CPL general numeric character rotation 90 csi general numeric height
217. more powerful way to execute UNIX commands because it allows you to capture and manipulate output produced by UNIX within an S PLUS session IMPORTING AND EDITING DATA IMPORTING AND EDITING DATA Reading a Data File Entering Data From Your Keyboard There are many kinds and sizes of data sets that you may want to work on in S PLUS The first step is to get your data into S PLUS in appropriate data object form In this section we show you how to import data sets that exist as files and how to enter small data sets from your keyboard The data you are interested in may have been created in S PLUS but more likely it came to you in some other form perhaps as an ASCII file or perhaps from someone else s work in another software package such as SAS You can read data from a variety of sources using the S PLUS function importData For example suppose you have a SAS file named Exenvirn ssd01 To import that file using the importData function you must supply the file s name as that function s file argument gt Exenvirn lt import data file Exenvirn ssd01 After S PLUS reads the data file it assigns the data to the Exenvirn data frame To get a small data set into S PLUS create an S PLUS data object using the function scan with no argument mydata lt scan where mydata is any legal data object name S PLUS prompts you for input as described in the following example We enter 14 data values and assign them
218. n EPS files have the following first line PS Adobe 3 0 Warning S PLUS supports the Encapsulated PostScript file format EPSF It does not support the Encapsulated PostScript Interchange format EPSI EPS files created by S PLUS do not include a preview image so if you import an S PLUS graphic into WYSIWYG software such as FrameMaker or Word you will see only a gray rectangle or a box where the graphic is included You can use printgraph to produce separate files for each graphic you produce as soon as you ve finished composing it on a windowing graphics device or terminal emulator that supports printgraph You can specify the file name and orientation of the graphics file For example you can create the PostScript file mystuff ps containing a plot of the dataset corn rain as follows gt motif gt plot corn rain gt title My Plot of Corn Rain Data gt printgraph file mystuff eps You can produce EPS files with direct calls to postscript by setting onefile FALSE To create a single file with a name you specify call postscript with the file argument and onefile F 277 CHAPTER 8 WORKING WITH GRAPHICS DEVICES gt postscript file mystuff eps onefile F print F gt plot corn rain gt dev off Warning 278 If you supply the fi 1e argument and set onefile F in the same call to postscript you must turn off the device with dev off after completing the first plot
219. n the horizontal axis The series co2 has an obvious seasonal cycle and an increasing trend It is often useful to smooth such data and display the smoothed version in the same plot The function smooth produces a smoothed version of an S PLUS data object You can use smooth as an argument to lines to add a plot of the smoothed version of co2 to the existing plot gt lines smooth co2 If your original plot was created with matplot you can add new data with functions analogous to points and lines To add data to a plot created with matplot use matpoints or matlines See the corresponding help files for further details r anes g EAEE m 8 cola anpes greet eee se o PESEE 4 p nites o P 0 S o ETR PEFR Q Anit oO 4 e Giogo e oO a PIE ER RIRKA E ETE o 3 PET USE a a 4 aS pd esy e ro oprise gts mpegs s 2 1960 1965 1970 1975 1980 1985 1990 Time Figure 6 9 The co2 data 139 CHAPTER 6 TRADITIONAL GRAPHICS Adding Text to Suppose you want to add some text to an existing plot For example consider Your Plot Connecting Text and Data Points with Straight Lines Adding Legends 140 the automobile mileage data plot in figure 6 5 To add the text Outliers near the three outlying data points in the upper right hand corner of the plot use the text function To use text you specify the x and y coordinates the same coordinate system used by the plot itself at which you want the text to
220. nction providing appropriate summaries for different types of data For example for an object of class 1m created by fitting a linear model the returned summary includes the table of estimated coefficients their standard errors and t values along with other information The summary for a standard vector is a six number summary of the minimum maximum mean median and first and third quartiles gt summary stack loss Min lst Qu Median Mean 3rd Qu Max 7 14 15 17 52 19 42 S PLUS contains a number of functions for doing classical hypothesis testing as shown in Table 2 7 Table 2 7 S PLUS functions for hypothesis testing Test Description t test Student s one or two sample t test wilcox test Wilcoxon rank sum and signed rank sum tests chisq test Pearson s chi square test for 2D contingency table var test F test to compare two variances kruskal test Kruskal Wallis rank sum test STATISTICS Table 2 7 S PLUS functions for hypothesis testing Continued Test Description fisher test Fisher s exact test for 2D contingency table binom test Exact binomial test friedman test Friedman rank sum test mcnemar test McNemar s chi square test prop test Proportions test cor test Test for zero correlation mantelhaen test Mantel Haenszel chi square test The following example illustrates how to use t test to perform a two sample t test to detect a difference in means This example uses two
221. nd FALSE The default value is the opposite of the value of auto e auto Determines whether the device can automatically advance the paper Possible values are TRUE and FALSE The default value is FALSE 283 CHAPTER 8 WORKING WITH GRAPHICS DEVICES 284 e color Determines the degree of color plotting support provided by the device See the help file for details e speed Determines maximum allowed axis pen velocity See the help file for details e rotated Determines whether the x axis lies along the long side of the paper landscape mode or the short side of the paper portrait mode Possible values are TRUE portrait mode and FALSE landscape mode The default value is FALSE e file Determines the name of the file that the HP GL commands are stored in By default the commands are sent to your terminal e hw control Determines whether hardware control escape sequences are to be included These escape sequences may be unnecessary depending on how the output is to be used For example if the output will be imported into another software package it may help to set hw control to FALSE The default is TRUE To use the hpg1 graphics device follow these steps 1 Type the hpg command along with any arguments you want to specify For example use the file argument to send your graphics output to a file 2 Type your S PLUS graphics commands For example the following commands start the hpg graphics device wit
222. nd labels in one color and the data plotted in another You could do this easily as follows gt plot x y type n gt points x y col 3 131 CHAPTER 6 TRADITIONAL GRAPHICS Figure 6 6 shows the different plot types for the built in data set car miles plotted with the plot function gt plot car miles gt plot car miles type 1 gt plot car miles type b gt plot car miles type 0 gt plot car miles type h gt plot car miles type s Bay ware P So gg Paf Ca H oe oP a gat e gar oa g oF amp o G op e 2 a 2 T so Med H 8 Ys de Figure 6 6 Plot types for the function plot Top row page 132 points and lines second row both points and lines and lines with points overstruck third row page 133 high density plot and stairstep plot 132 FREQUENTLY USED PLOTTING OPTIONS a A a 7 i e a ade da ifs i i A Figure 6 6 Plot types for the function plot Top row page 132 points and lines second row both points and lines and lines with points overstruck third row page 133 high density plot and stairstep plot Line Types When your plot type involves lines you can choose the ine type for the lines By default the line type for the first line on a graph is a solid line If you prefer a different line type you can use the argument 1ty n where 7 is an integer to speci
223. ndication of the variety of expressions you will be using in S PLUS gt os runrtdo 1 1 6006757 2 2312820 0 8554818 2 4478138 2 3561580 6 1 1359854 2 4615688 1 0220507 2 8043721 2 5683608 x orez ipl Li amp 32 gt c 2 runif s 10 20 1 0 6010921 0 3322045 1 0886723 0 3510106 5 0 9838003 10 0000000 20 0000000 gt Set e x 5221 1 41 14 The last two examples above illustrate a general feature of S PLUS functions arguments to functions can themselves be S PLUS expressions Here are three examples of expressions which are important because they show how arithmetic works in S PLUS when you use expressions involving PLus LANGUAGE BASICS both vectors and numbers If x consists of the numbers 4 3 2 1 then the following operations work on each element of x gt xl lJ s2i a ea 1 6420 oe 1 16941 Any time you use an operator with a vector as one argument and a number as the other argument the operation is performed on each component of the vector Hint If you are familiar with the APL programming language this treatment of vectors will be familiar to you Precedence The evaluation of S PLUS expressions has a precedence hierarchy shown below Hierarchy in Table 2 3 Operators appearing higher in the table have higher precedence than those appearing lower operators on the same line have equal precedence Table 2 3 Precedence of operators Operator Use
224. neously in a single call to par 2 Supply a list to the par function The names of the list components are the names of the graphics parameters you want to set For example gt my list lt list mfrow c 2 1 cex 5 gt par my list When you change graphics parameters with par it returns a list containing the original values of the graphics parameters that you changed This list will not print out on your screen you must assign the result of calling par to a variable name if you want to see it gt par orig lt par mrrow c 2 1 cex 5 gt par arig mfrow 1 i 1 cex 1 f You can use this list returned by par to restore parameters after you have changed them gt par orig lt partmrrow c 2 1 cex 9 gt Now make some plots gt par par orig 166 SETTING AND VIEWING GRAPHICS PARAMETERS When setting multiple parameters with par check for possible interactions between parameters Such interactions are indicated in Table 6 3 and in the par help file In a single call to par general graphics parameters are set first then layout graphics parameters If a layout graphics parameter affects the value of a general graphics parameter what you specify for the general graphics parameter may get overridden For example changing mfrow automatically resets cex see the section Controlling Multiple Plots page 185 If you type gt par mfrow c 1 cex 75 Table 6 3 Interaction between gra
225. ner e 1 1 lines list Rows superpose line 1 6 Size cl3 3 0 0 0 0 text istie Span 0 5 Span 1 0 rep 4 points Rows superpose symbol 1 6 text list levels fuel frame Type DATA STRUCTURES DATA STRUCTURES Trellis Graphics uses the S PLUS formula language to specify the data for plotting This requires the data to be stored in data sets that work with formulas Roughly speaking this means that the data variables must either be from a data frame or be vectors of the same length this is also true of the S PLUS modeling functions such as 1m But in S PLUS there are many other data structures So that Trellis functions will be easy to use three functions convert data structures of different kinds into data frames make groups as data frame array and as data frame ts make groups The function make groups takes several vectors and constructs a data frame with two components data and which For example consider payoffs of the New Jersey Pick It lottery from three time periods The data are stored as three vectors of values Suppose we want to make boxplots to compare the three distributions We first convert the three vectors to a data frame gt lottery lt make groups lottery payoff lottery2 payoff lottery3 payoff gt names lottery 1 data which gt levels lottery which 1 lottery payoff lottery2 payoff lottery3 payoff The data component is simply the combined numbers from all
226. nica The matrix pet width contains 50 observations of petal widths for each of the same three species To graphically explore the relationship between petal lengths and petal widths use matplot to display widths versus lengths simultaneously on a single plot gt matplot pet length pet width 3 33 3 3 3333 3 3 3 3 3 3 3 3333 3 3 3333 3 3 33 3 3 23 3 33 3 3 3 3 2 2 g 2 3 2 222 233 2 2 222 3 2 2222222 22 2 2 2 2 22 22 2 e 1 1 pe ee A 11101 1 411111 1 1 11 1 2 3 4 5 6 7 Figure 6 19 Simultaneous plots of petal heights versus widths for three species of iris 155 CHAPTER 6 TRADITIONAL GRAPHICS Star Plots 156 If the matrices x and y you are plotting with matplot do not have the same number of columns then the columns of the smaller matrix are cycled so that every colulmn in the larger matrix is plotted Thus if x is a vector i e a matrix with a single column then matplot x y plots every column of the matrix y against the vector x A star plot represents multivariate data as a set of stars with each star representing one case or row and each point or radial of a star representing a particular variable or column The length of each radial is proportional to the data value of the corresponding variable Thus both the size and the shape of the stars have meaning size reflects the overall magnitude of the data and shape reveals the relationships between variables Comparing two star
227. ning Figure 7 17 is a contour plot of the gaussian surface gt contourplot dataz datax datay data gauss aspect 1 at seqt l e9 by 2 5 The argument at specifies the values at which the contours are to be computed and drawn If no argument is specified default values are chosen 0 5 a 0 0 aa O 7 datay ols 0 5 0 3 1 5 1 0 0 5 0 0 0 5 1 0 1 5 Figure 7 17 Contour plot 223 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS levelplot 224 Level plots are also helpful displays for studying a function f x y They are no better than contour plots when the function is simple but often are better when there is much fine detail for example many peaks and valleys Figure 7 18 is a level plot of the gauss surface gt levelplot dataz datax datay data gauss aspect 1 cuts 6 The values of the surface are encoded by color a gray scale in this case For devices with full color the scale goes from pure magenta to white and then to pure cyan If the device does not have full color a gray scale is used For a level plot the range of the function values is divided into intervals and each interval is assigned a color A rectangle centered on each grid point is given the color of the interval containing the value of the function at the grid point In figure 7 18 there are six intervals The argument cuts specifies the number of breakpoints between intervals datay datax Fi
228. nt of the lowest interval is the minimum 239 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS shingle 240 of the data and the right endpoint of the highest interval is the maximum of the data The endpoints are chosen to make the counts of points in the intervals as nearly equal as possible and the fractions of points shared by successive intervals as close to the target fraction as possible The command that produced figure 7 26 is gt xyplot NOx C GIVEN E data ethanol aspect 2 5 The aspect ratio was chosen to be 2 5 to approximately bank the underlying pattern of the points to 45 degrees Notice that the automatic layout algorithm chose five columns and two rows The result of equal count is an object of class shingle The class is named shingle because of the overlap like shingles on a roof First a shingle contains the numerical values of the variable and can be treated as an ordinary numeric variable gt range GIVEN E YT 0 535 2 232 Second a shingle has the intervals attached as an attribute There is a plot method a special Trellis function that displays the intervals Figure 7 27 shows the intervals of GIVEN E gt plot GIVEN E You can use the function levels to extract the intervals from the shingle gt levels GIVEN E min max 0 535 0 686 0 055 0 761 0 733 0 811 0 808 0 899 0 892 1 002 0 990 1 045 1 042 1 125 1 115 1 189 1 175 1 232 A shingle can be specified directly by the funct
229. nts curves or both A full discussion of xyplot is in the section General Display Functions page 210 but for now we will use it to illustrate how to specify data The plot in figure 7 1 is a scatterplot of gas NOx against gas E gt xyplot formula gas NOx gas E The argument formula specifies the variables that are to be graphed In this case they are gas NOx and gas E For xyplot the variable to the left of the goes on the vertical axis and the variable to the right of the goes on the horizontal axis The formula gas NOx gas E is read as gas NOx is graphed against gas E The use of formula here is the same as that in the S PLUS statistical modeling functions such as 1m and aov To the left or right of the you can use any S PLUS expression For example if you want to graph the log base 2 of gas NOx you can use the formula log gas NO0x base 2 gas E GIVING DATA TO GENERAL DISPLAY FUNCTIONS The argument formula is a special one in Trellis Graphics It is always the first argument of a general display function such as xyplot We can omit typing formula provided the formula is the first argument Thus the expression xyplot gas NOx gas E also produces figure 7 1 The arument formula is the only one that should be given by position all others must be given by name 0 O O O fe 54 is L ie O ie ie 4 7 o ie ie ie fs 2 37 z G o oO D fe Oo 274 o p oO Oo 1H o O T T
230. o Hi Hi Lo levels c Lo Hi gt intensity 1 Hi NA Lo Hi Hi Lo gt levels intensity 1 Le i If you had left the levels argument off the levels would have been ordered alphabetically as Hi Low Medium You use the labels argument if you want the levels to be something other than the original data FACTORS AND ORDERED FACTORS gt factor et hi Lo Med Hi Hi LO Jevels c Lo Hi labels c LowDose HighDose 1 HighDose LowDose NA HighDose HighDose LowDose Warning If you provide the levels and labels arguments then you must order them in the same way If you don t provide the levels argument but do provide the 1abels argument then you must order the labels the same way S PLUS orders the levels of the factor which is alphabetically for character strings and numerically for a numeric vector which is converted to a factor Use the exclude argument to indicate which values to exclude from the levels of the resulting factor Any value that appears in both x and exclude will be NA in the result and will not appear in the levels attribute The intensity factor could alternatively have been produced with gt factorCeC Hi ww Med Lo Hi E n 3 Le 5 exclude c Med 1 Hi NA Lo Hi Hi Lo Creating If the order of the levels of a factor is important you can represent the data as Ordered a special type of factor called an ordered factor Use the or
231. ode will produce a plot showing selected cities in New England and New England s position relative to the rest of the United States To do this subplot is called several times To create the main plot use the usa function with the arguments x1im and ylim to restrict attention to New England gt usa xlim c 72 5 65 ylim c 40 4 47 6 189 CHAPTER 6 TRADITIONAL GRAPHICS 190 The coordinates shown in the example were obtained by trial and error using as a starting point the coordinates of New York These were obtained from the three built in data sets city x city y and city name Before city x or city y can be used as an argument to a replacement function it must first be assigned locally gt CIE eK A CLCYRs CIty y lt Cliycy gt names city x lt city name gt names city y lt city name gt nyc coord lt c city x New York city y New York gt nyc coord New York New York 73 9667 40 7833 To plot the city names we first use city x and city y to determine which cities are contained in the plotted area gt ne cities lt city x gt 72 5 amp city y gt 40 4 We then use this criterion to select cities to label gt text city x ne cities city y ne cities city name ne cities For convenience in placing the subplot retrieve the usr coordinates gt usr lt pat usr Now create a subplot of the entire U S in a blank spot and save the value of this call to subplot so
232. oduce wider lines while smaller numbers produce narrower lines Some graphics devices can produce only one width Plottin Generally plotting symbols are clipped so that the symbols don t appear in Yp g sy pp y pp Symbols in the margin You can allow plotting in the margin by setting xpd to TRUE the allowable plotting area is expanded Margin 177 CHAPTER 6 TRADITIONAL GRAPHICS TEXT IN FIGURE MARGINS To add text in margins use the mtext marginal text function You specify which of the four margins with the side argument which is a number from 1 to 4 the default is 3 The line argument to mtext gives the distance in mex between the text and the plot You may specify non integer values for line in mtext For example figure 6 30 shows the placement of the following marginal text gt gt partnar c 5 5 5 5 4 1 plot x y type n axes F xlab ylabe box mtext Some text line 0 mtext Some more text side 2 cex 1 1line 2 mtext Still more text side 4 cex 5 line 3 Some text x lt eb i 2 Q g E amp O op Figure 6 30 Placing text in margins Text is not placed in the margin if there is not room for it this usually happens only when the margin sizes or cex have been reset or with long axis labels For example suppose mex 1 the default and you reset the figure margins with mar c 1 1 1 1 to allow precisely one line of text in each m
233. of font re main title character main title Y versus X srt general numeric string rotation 90 sub title character subtitle Y versus X xlab title character axis titles X in dollars ylab title character axis title Y in size 197 CHAPTER 6 TRADITIONAL GRAPHICS Table 6 10 Summary of the most useful graphics parameters Name Type Mode Description Example SYMBOLS lty general integer line type 2 lwd general numeric line width 3 pch general character plot symbol Ema A integer smo general integer curve smoothness 1 type general character plot type H xpd general logical symbols in margins TRUE AXES axes high level logical plot axes FALSE bty general integer box type 4 exp general numeric format for exponential 1 numbers lab general integer tick marks and labels c 3 7 4 las general integer label orientation 1 10g high level character logarithmic axes xy mgp general numeric axis locations el31309 tek general numeric tick mark length 1 xaxs general character style of limits TRADITIONAL GRAPHICS SUMMARY Table 6 10 Summary of the most useful graphics parameters Name Type Mode Description Example yaxs general character style of limits mge xart general character axis type ms yart general character axis type ele MARGINS mai layout numeric margin size C0 64505 556 62 mar layout numeric margin size aiie A T mex layout numeric margin units 15 oma layout numeric outer margin s
234. of times specified the value may be a vector gt rep NA 5 1 NA NA NA NA NA gt reptc T 7 FJ 2 Re ae ame in ie If times is a vector with the same length as the vector of values being repeated each value is repeated the corresponding number of times gt repic yes ne 4 2 1 yes yes yes yes no no The sequence operator generates sequences of integer values spaced one unit apart 21 5 Li 2 gt 1 224 1 1 2 gt 1 toy More generally the seq function generates sequences of numeric values with an arbitrary increment For example gt S q pi pi 5 1 3 1415927 2 6415927 2 1415927 1 6415927 1 1415927 6 0 6415927 0 1415927 0 3584073 0 8584073 1 3584073 79 CHAPTER 4 DATA OBJECTS 11 1 8584073 2 3584073 2 8584073 You can specify the length of the vector and seq computes the increment gt seq pi pi length 10 1 3 1415927 2 4434610 1 7453293 1 0471976 0 3490659 6 0 3490659 1 0471976 1 7453293 2 4434610 3 1415927 Or you can specify the beginning the increment and the length with either the length argument or the along argument gt seq 1 by 05 1length 10 GI 2 00 1205 2 10 1 15 1 20 1 25 1 20 1 35 1 40 1 45 gt seq 1 by 05 along 1 5 C1 1 00 1 05 1 10 1 15 1 20 See the help file for seq for more information on the length and along arguments To initialize a vector of a certain mode and length before you know the actual values use th
235. ollows gt ozone fit lt interp ozone xy x ozone xy y ozone median 3 D PLOTS CONTOUR PERSPECTIVE AND IMAGE PLOTS gt persp ozone fit Warning It is not a good idea to convert a persp plot to objects so many objects can result that the conversion takes a considerable time Image Plots An image plot is a two dimensional plot that represents three dimensional data as shades of color or gray scale You produce image plots with the image function gt image voice five 60 50 40 30 20 10 0 2000 4000 6000 8000 10000 12000 Figure 6 24 Image of the voice spectrogram A more conventional use of image is to produce images of topological data as in the following example gt image pugetN The data set pugetN contains elevations in and around Puget Sound It is not part of the standard S PLUS distribution 161 CHAPTER 6 TRADITIONAL GRAPHICS 162 48 8 48 4 e vt 123 0 122 8 122 6 122 4 122 2 122 0 Figure 6 25 Image plot of Puget Sound If you have an equal number of observations for each of three variables you can use interp to generate interpolated values for z on an equally spaced xy grid For example to create an image plot of the ozone data you can use interp and image as follows gt ozone fit lt interp ozone xy x ozone xy y ozone median gt image ozone fit CUSTOMIZING YOUR GRAPHICS CUSTOMIZING YOU
236. ompact Type Smal1 qqmath function qqmath Mileage data fuel frame subset Type Smal1 reorder factor function barley variety lt reorder factor barley varietry barley yield median Rows function Rows trellis par get superpose symbol 122 scales argument Xyplot NOx E data gas aspect 1 2 ylim c 0 6 scales list cex 2 tick number 4 268 SUMMARY OF TRELLIS FUNCTIONS AND ARGUMENTS Table 7 1 An alphabetical guide to Trellis Graphics Statement Purpose Example screen argument wireframe dataz datax datay data gauss drape F screen 1ist z 45 x 60 y 0 shingles function GIVEN E lt shingle ethanol E intervals cbind endpoints 6 endpoints 1 show settings function show settings skip argument bwplot age log 1 tusage income pick strip function strip default strip nanes 1 skipect FFF Fs PaF layout c 2 4 2 data market survey span argument see prepanel loess example space argument update barley plot key list points Rows trellis par get superpose symbol 1 2 text list levels barley year space right splom function splom fuel frame strip argument see Skip example stripplot function see jitter example sub argument see aspect example subscripts argument Xyplot NOx E C data ethanol aspect 1 2 panel function x y subscripts text x y subscripts cex 75 subset argument xyplot NOx E data gas subset E lt 1 1 s
237. on on the unique values of C Figure 7 25 does this 237 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS 238 gt xyplot NOx E C data ethanol aspect 1 2 a4 Oo L 0 ie 34 L O 24 o O g O 14 e oF Oo O 0 96 4 o O Fa O O 3 O J A g 2 Q 4 o Qo 1 oO O C a4 L O o O O 34 A L 5 fe ie 2 24 F O o o 14 o L fs 20 a J 4 80 4 o o 3 ie Q 2 io O 4 o 2 Hi o0 o i 4 L 34 2 H oO 24 8 a H O 14 o o L Oo o O T T T T 06 08 1 0 12 E Figure 7 25 Multipanel conditioning When a numeric variable is used as a conditioning variable in the argument formula then conditioning is automatically carried out on the sorted unique values In other words the levels of the variable in such a case are the unique values The order of the levels is from smallest to largest For C the first level is 7 5 the second 9 and so forth Thus the first packet includes values of NOx and E for C 7 5 the second packet includes the values for C 9 and so on The packets fill the panels according to the packet order and the panel order In figure 7 25 the values of C which are indicated by the darkened bars in the strip labels increase from bottom to top MULTIPANEL CONDITIONING Conditioning on Intervals of a Numeric Variable equal count For the ethanol data we graphed NOx against E given C in figure 7 25 We would like to see NOx against C given
238. optional arguments by name When supplying arguments by name order is not important However we recommend that for consistency of style you supply optional arguments after required arguments The third and fourth expressions illustrate that you may abbreviate the formal argument names of optional arguments for convenience so long as the names are uniquely identified You will find that supplying arguments by name is convenient because you can then supply them in any order 31 CHAPTER 2 GETTING STARTED Access to UNIX 32 Of course you do not need to specify all of the optional arguments For instance the following are two equivalent ways to produce 50 random normal numbers with mean 0 the default and standard deviation of 5 gt rnorm 50 m 0 s 5 gt rnorm s0 s 5 One important general feature of S PLUS is easy access to and use of UNIX tools For example S PLUS provides a simple shell escape character for issuing a single UNIX command from within S PLUs gt date Mon Apr 15 17 46 25 PDT 1991 Here date is a UNIX command which passes its result to S PLUS for display as shown You can use any UNIX command in place of date Of course if you have separate UNIX windows open on your workstation screen as will often be the case you can just move into another window to issue a UNIX command read your mail etc The escape function is not the only way to execute UNIX commands There is a unix function which is a
239. ornm gt qqnorm car gals gt qqline car gals Quantiles of Standard Normal Figure 6 17 A qqnorm plot The qqline function gives the highly robust straight line fit which is not much influenced by outliers You can also make qqplots to check whether or not your data come from any of a number of other distributions To do so you need to create a simple S PLUS function for each distribution which we illustrate for the case of a hypothesized uniform distribution Create the function qqunif as follows 151 CHAPTER 6 TRADITIONAL GRAPHICS gt qqunif lt function x plot qunif ppoints x sort x The function qunif computes quantiles for the uniform distribution at probabilitiy values pi i 5 n computed by ppoints and sort orders the data x gt qqunif car gals Now you can create a qqplot for other hypothesized distributions by replacing qunif by one of the functions from Table 6 2 Table 6 2 Distributions for qqplots 152 Function Distribution Required Arguments Optional Arguments Defaults qbeta beta shapel shape2 none qcauchy Cauchy none location scale 0 1 qchisq chi square df none qexp exponential none rate 1 qf F afia none qgamma Gamma shape none qlnorm log normal none mean sd 0 1 qnorm normal none mean sd 0 1 qt Student s t df none qunif uniform none min max 0 1 VISUALIZING THE DISTRIBUTION OF YOUR DATA Note For functions requiring a parameter argu
240. ort from For Oracle this should be the empty string table The table in database to import OTHER DATA IMPORT FUNCTIONS OTHER DATA IMPORT FUNCTIONS Reading Vector and Matrix Data with scan While importData is the recommended method for reading data files into S PLUS there are several other functions that you can use to read ASCII data into S PLUS These functions are commonly used by other functions in S PLUS so it is a good idea to familiarize yourself with them The two functions discussed in this section are scan and read table The scan function which can read from either standard input or from a file is commonly used to read data from keyboard input By default scan expects numeric data separated by white space although there are options that let you specify the type of data being read and the separator When using scan to read data files it is helpful to think of each line of the data file as a record or case with individual observations as fields For example the following expression creates a matrix named x from a data file specified by the user x lt matrix scan filename ncol 10 byrow T Here the data file is assumed to have 10 columns of numeric data the matrix contains a number of observations for each of these ten variables To read in a file of character data use scan with the what argument x lt matrix scan fi7ename what ncol 10 byrow T Any character vector can be u
241. ots and Chernoff s faces A scatterplot matrix is an array of pairwise scatter plots showing the relationship between any pair of variables in a multivariate data set To produce a static scatterplot matrix in S PLUS you use the pairs function with an appropriate data object as its argument For example the following S PLUS expression generates a scatterplot matrix gt pairs longley x 250 350 450 550 150 250 350 1950 1955 1960 GNP deflator 90 110 GNP 250 400 550 i Unemployed 200 350 Armed Forces 150 250 350 Population 110 120 130 Year 1950 1960 90 100 110 200 300 400 110 120 130 Figure 6 18 A scatterplot matrix VISUALIZING HIGHER DIMENSIONAL DATA Plotting Matrix For visualizing several vector data objects at once or for visualizing some Data 2 5 1 5 0 5 kinds of multivariate data you can use the function matp1ot to plot columns of one matrix against columns of another For example S PLUS has a built in multivariate data set iris The iris data set is in the form of a data array which is a generalized matrix Let s extract two particular 50 x 3 matrices from the iris array gt pet length lt irisl 3 gt pet width lt iris 4 The matrix pet 1ength contains 50 observations the rows of petal lengths for each of three species of iris the columns Setosa Versicolor and Virgi
242. otting symbols colors line types and so forth that are automatically chosen depending on the device you select The section Panel Functions and the Trellis Settings page 249 discusses the Trellis settings The general display functions take in data just like many of the S PLUS modeling functions such as 1m aov glm and loess This means that there is a heavy reliance on data frames The Trellis library contains several functions that change data structures of certain types to a data frame which makes it easier to pass the data on to the general display functions or in fact on to the modeling functions The section Data Structures page 259 discusses these functions that create data frames 203 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS GIVING DATA TO GENERAL DISPLAY FUNCTIONS A Data Set gas formula Argument 204 For a graphics function to draw a graph it needs to know the data on which the drawing is based This section is about arguments to the Trellis drawing functions that allow you to specify the data The data frame gas contains two variables from an industrial experiment with twenty two runs in which the concentrations of oxides of nitrogen NOx in the exhaust of an engine were measured for different settings of equivalence ratio E gt names gas 1 NOx bi as gt dim gas Lid 22 2 The function xyplot makes an x y plot a graph of two numerical variables the result might be scattered poi
243. ows and number of columns Each row and column is labeled The row labels are 1 2 3 and the column labels are 1 2 3 4 This notation for row and column numbers is derived from mathematical matrix notation In the above expression the vector 1 12 fills the first column first then the second column and so on This is called filling the matrix by columns If you want to fill the matrix by rows use the optional argument byrow T to matrix For a vector of given length used to fill the matrix the number of rows determines the number of columns and vice versa Thus you need not provide both the number of rows and the number of columns as arguments to matrix It is sufficient that you provide only the number of rows or the number of columns The following command produces the same matrix as above gt matrixt 1 12 3 You can also create the same matrix by specifying the number of columns only To do this type gt matrix 1 12 nco1 4 You have to provide the optional argument ncol 4 in name value form because by default the second argument is taken to be the number of rows y lt 9 23 8 When you use the by name form i e ncol 4 as the second argument a y g you override the default See the section Optional Arguments to Functions page 31 for further information on using optional arguments in function calls The structure classes have three slots a Data slot to hold the actual
244. p or drop may be specified drop Optional Character vector of variable names specifying which variables in data are not to be exported Only one of keep or drop may be specified delimiter Optional Character to be used as delimiter Used only with type ASCII The default is format Optional A character string specifying the width and precision for each field colNames Optional Logical flag if TRUE column names are also exported rowNames Optional Logical flag if TRUE row names are exported 71 CHAPTER 3 IMPORTING AND EXPORTING DATA Table 3 2 Arguments to exportData Exporting Data to S PLUS Other Export Functions 72 Argument Required Description quote Optional Logical flag specifying whether to put quotes around character strings TRUE or FALSE Default is TRUE filter Optional Character string specifying the output filter See the section Setting the Import Filter for details When you want to export data to share with another S PLUS user use the data dump function gt data dump matz By default the data object matz is exported to the file dumpdata in your S PLUS startup directory You can specify a different output file with the connection argument to data dump gt data dump matz connection matz dmp The connection argument needn t specify a file it can specify any valid S PLUS connection object See Programming with Data for more details on connections If the
245. perator is used to list different values of the same variable name that will be used as selection criteria It allows you to bypass lengthy OR expressions when giving lists of conditional values for example state CA WA OR AZ NV caseid I 22 30774200 SETTING THE IMPORT FILTER Missing Variables You can test to see that any variable is missing by comparing it to the special internal variable NA For example income NA amp age NA 61 CHAPTER 3 IMPORTING AND EXPORTING DATA NOTES ON IMPORTING FILES Notes on Importing ASCII Delimited ASCII Files 62 When importing ASCII files you have the option of specifying column names and data types for imported columns This can be useful if you want to name columns or if you wish to skip over one or more columns when importing Format String Use the format argument to importData to specify the data types of the imported columns For each column you need to specify a sign and then the data type Dates may automatically be imported as numbers After importing you can change the column format type to a dates format Here is an example ASCII format string aSa Afs Anr AT The s denotes a string data type f denotes a float data type and the asterisk denotes a skipped column If you do not specify the data type of each column S PLUS looks at the first row of data to be read and uses the contents of this row to determine the data type of each
246. phics parameters Parameters Interaction cex mex mfrow mfcol ert Srt If mf row or mfcol specify a layout with more than two rows or columns cex and mex are set to 0 5 otherwise cex and mex are both set to 1 When srt is set crt is set to the same value unless crt appears later in the command than srt S PLUS will first set cex 75 because cex is a general graphics parameter then set mfrow c 2 1 because mfrow is a layout graphics parameter but setting mfrow c 2 1 automatically sets cex back to 1 To set both mfrow and cex you need to call par twice gt par mfrow c 2 1 gt par cex 75 You can also use the par function to view the current setting of any or all graphics parameters To view the current values of parameters give par a vector of character strings of the names of the parameters gt parc usr or gt par c mfrow cex To get a list of all of the parameters call par with no arguments gt par During an extended S PLUS session you may make repeated calls to par to change graphics parameters Sometimes you may forget what you have changed and may just want to restore the device to its original defaults It is 167 CHAPTER 6 TRADITIONAL GRAPHICS often a good idea to save the original values of the graphics parameters as soon as you start a device You can then call par to restore the device to its original state gt par orig wg lt par gt
247. playing labeled data Let us compute the mean mileage for each vehicle type gt mileage means lt tapply fuel frame Mileage fuel frame Type mean Figure 7 10 is a dot plot of the log base 2 means gt dotplot names mileage means logb mileage means base 2 aspect 1 cex 1 25 The argument cex is passed to the panel function to change the size of the dot of the dot plot more on this in the section Panel Functions page 246 Notice that the vehicle types in figure 7 10 are ordered from bottom to top by the order of the elements of the vector mileage means If you wanted the graph to show the values from smallest to largest going from bottom to top you could first redefine mi leage means gt mileage means lt sort mileage means Sporty e Small e Medium e Large Compact e T T 4 4 4 6 48 log mileage means base 2 Figure 7 10 Dot plot GENERAL DISPLAY FUNCTIONS barchart Overall dot plots are a more effective display method than bar charts avoiding some of the perceptual problems of bar charts Still there are circumstances where bar charts are harmless Figure 7 11 is a bar chart of the mileage means without logs gt barchart names mileage means mileage means aspect 1 Sporty Small Medium Large Compact T T T T 20 22 24 26 28 30 mileage means Figure 7 11 Bar chart 21
248. pts you need in using the S PLUS language expressions operators assignments data objects and function calls When using S PLUS you should think of your data sets as data objects belonging to a certain class Each class has a particular representation often defined as a named list of s ots Each slot in turn contains an object of some other class Among the most common classes are numeric String list and data frame This chapter introduces the most basic data objects see the chapter Data Objects for a more detailed treatment The simplest type of data object is a one way array of values all of which are numbers logical values or character strings but not a combination of those For example you can have an array of numbers 2 0 3 1 5 7 7 3 Or you can have an array of logical values T T F T F T F F where T stands for TRUE and F stands for FALSE Or you can have an ordered set of character strings sharp claws COLD PAWS These simple one way arrays when stored in S PLUS are called vectors The class vector is a virtual class encompassing all basic classes whose objects can be characterized as one way arrays in which any individual value can be extracted and replaced by referring to its index or position in the array The ength of a vector is the number of values in the array valid indices for a vector object x are in the range 1 length x Most vectors belong to one of the following classes numeric integer logica
249. r further information on a particular window system consult your system administrator or the following references e Quercia V and O Reilly T 1989 X Window System Users Guide Sebastopol California O Reilly and Associates e Quercia V and O Reilly T 1990 X Window System Users Guide Motif Edition Sebastopol California O Reilly and Associates In this section we refer to the window in which you start S PLUS as the S PLUS window The window that is created when you start a windowing graphics device from the S PLUS window is called the graphics window To open a graphics device type gt motif at the S PLUS prompt The motif device is also started automatically if no other graphics device is open when you ask S PLUS to evaluate a high level plotting function 289 CHAPTER 8 WORKING WITH GRAPHICS DEVICES To remove a graphics window without quitting S PLUS use the function dev off or graphics off Warning Do not destroy the S PLUS graphics window by using a window manager menu If you remove a graphics window in this way S PLUS will not know that the graphics device has been removed Thus this graphics device will still appear on the vector returned by dev 1ist but if you try to send plot commands to it you will get an error message If you do accidentally remove the graphics window with a window manager menu use the dev off function to tell S PLUS that this device is no longer active
250. r these variables by using getenv from C or S code 315 CHAPTER 9 CUSTOMIZING YOUR S PLUS SESSION CUSTOMIZING YOUR SESSION AT START UP AND CLOSING Creating the First Function Setting S_FIRST 316 If you routinely set one or more options each time you start S PLUS you can store these options and have S PLUS set them automatically whenever it starts You can store the options by doing one of the following e Create an S PLUS function named First containing the desired options e Create a text file of S PLUS tasks named S init in your home directory e Set the S PLUS command line variable S_FIRST as described below When S PLUS starts up it checks whether the S_FIRST variable exists If not S PLUS runs the First function if the function exists from your data directory If S_FIRST is set S PLUS ignores the First function If S PLUS encounters any errors in your First function it starts without executing it After running the command specified in S_FIRST or executing the First function S PLUS looks for the S init file and executes any commands it finds there Here is asample First file that starts the default graphics device gt First lt function motif After creating a First function you should always test it immediately to make sure it works Otherwise S PLUS will not execute it in subsequent sessions To store a sequence of commands in the _FIRST variable use the following syntax
251. raphics device whether activated by the Print option from a motif graphics device by a call to printgraph or by a direct call to postscript is controlled by options you can set with the ps options function These options allow you to control many aspects of the PostScript output including the following e The name of the PostScript output file e The UNIX command to print your PostScript output e The orientation and size of the finished plot e Printer specific characteristics including paper size number of rasters per inch and the size of the imageable region e Plotting characteristics of the graphics including the base point size for text and available fonts and colors Specifying the PostScript File Name All PostScript output is initially written to a file Unless you explicitly call the postscript device with the onefile T argument S PLUS writes a separate PostScript file for each plot in compliance with the Encapsulated PostScript Document Structuring Conventions You can specify the file name for the output file using the file argument to postscript or printgraph or provide a template for multiple file names using the PostScript option tempfile which defaults to ps out HHHF ps You can specify this option as an argument to the printgraph postscript and ps options functions The template you specify must include some symbols as in the default S PLUS replaces the first series of these symbols that it encounters
252. raphics window s Options Printing menu that specified orientation is taken to be the default You specify the plotting region in inches with the width the x axis dimension and height y axis dimension options Thus to create graphics for inclusion in a manual you might specify the following options gt ps options horizontal F width 5 height 4 The default value for width and height are determined by the printer s imageable region as described in the next subsection Specifying Printer Characteristics PostScript can describe pages of virtually any size but it does little good to create enormous page descriptions if you don t have an output device capable of printing them Most PostScript printers have remarkably similar characteristics so you may not have to change the options that specify them For example in the United States most printers default to letter 8 1 2 x 11 paper Among the options that you can specify for your printer the paper option is the most important The paper argument is a character string most standard ANSI and ISO paper sizes are accepted Each paper size has a specific imageable region which is the portion of the page on which the printer can actually print This region can vary slightly depending on the printer hardware even for paper of the same size The imageable region determines the default values for the width and height options PRINTING YOUR GRAPHICS Specifying Plotting
253. rated list of directories The first valid directory in the list is used as the working directory To display a list of the names of the data objects in your working directory use the objects function as follows gt objects If you created the vectors x and y in the section Assigning Data Objects page 23 you see these listed in your working directory The S PLUS objects function also searches for objects whose names match a character string given to it as an argument The pattern may include wildcard characters For instance the following expression displays all of your objects which start with the letter d gt ebjectsit d See the help file for grep for information on wildcards and how they work PLus LANGUAGE BASICS Removing Data Objects Displaying Data Objects Functions Because S PLUS objects are permanent from time to time you should remove objects you no longer need Use the rm function to remove objects The rm function takes any number of objects as its arguments and removes each one For instance to remove two objects named a and b use the following expression gt rm a b To look at the contents of a stored data object just type its name gt x flj 43 21 2y ft 12 345678910 A function is an S PLUS expression that returns a value usually after performing some operation on one or more arguments For example the c function returns a vector formed by combining the arguments to c
254. rd copy graphics devices PostScript laser printers and Hewlett Packard HP GL plotters S PLUS also supports publication on the World Wide Web by means of a graphics device for creating files in Portable Document Format PDF These devices are discussed in the following sections General rules for making plot files are discussed in the section Managing Files from Hard Copy Graphics Devices page 285 One important and widespread use of S PLUS is to produce camera ready graphics plots for technical reports and papers For many S PLUS users that means producing graphics suitable for printing on PostScript compatible printers In S PLUS you can create PostScript graphics using any of the following methods e Choose Print from the Graph menu on the motif windowing graphics device e Use the printgraph function with any graphics device that supports it The motif device supports printgraph as do many others See the Devices help file for a complete list e Use the postscript function directly We discuss each of these methods in the following subsections If you are using postscript directly the aspect ratio of the finished graphic is determined by the width and height if any that you specify the orientation and the paper size If you use the other methods by default the aspect ratio is the original aspect ratio of the device on which the graphic is originally created For the windowing graphic devices motif this ratio is 8 6 3
255. re if your screen has become cluttered The motif device offers the Redraw option as a selection from the Graph pull down menu It is often desirable to display more than one plot in a window or on a single page of hard copy To do so you use the S PLUS function par to control the layout of the plots The following example shows you how to use par for this purpose The par command is used to control and customize many aspects of S PLUS plots See the chapter Traditional Graphics for further information on use of the par command In this example you use par to set up a a window or a page to have four plots in two rows of two each Following the par command we issue four plot commands Each creates a simple plot with a main title gt par mfrow c 2 2 gt plot 1 10 1 10 main Straight Line gt hist rnorm 50 main Histogram of Normal gt qqnorm rt 100 5 main Samples from t 5 gt plot density rnorm 50 main Normal Density 45 CHAPTER 2 GETTING STARTED The result is shown in figure 2 3 46 Straight Line 1 10 246 8 2 4 6 8 10 1 10 samples from t 5 rt 100 5 0 density rnorm 50 y Quantiles of Standard Normal Figure 2 3 A multiple plot layout Histogram of Normal 0 5 10 rnorm 50 Normal Density 0 3 0 0 e amp 2 1 0 1 2 density rnorm 50 x STATISTICS STATISTICS Summary Statistics S PLUS includes functions for do
256. reen axis 225 dBase files 64 delimiters for character strings 26 density argument 143 density plot function 220 dev off function 203 Device Default function 164 digits 142 digits argument 145 dim attribute 77 dim function 86 dimnames argument 86 dimnames function 84 Direct axis 183 dot plot function 216 dotplot function 202 210 E editing command line 12 data objects 34 editing data 33 Editor 312 EDITOR environment variable 12 emacs 12 INDEX emacs editor table of keystrokes 12 emacs_unixcom editor table of keystrokes 12 Environment variables PAGER 313 environment variables 314 EDITOR 12 S_CLEDITOR 12 S_CMDFILE 316 S WORK 320 VISUAL 12 equal count algorithm 239 erase screen function 187 error messages 9 ethanol dataset 237 244 Excel files 64 exclude argument 93 Exiting S PLUS 9 exp parameter 181 export data function 71 exporting data 7 1 expressions multiple line 10 Extended axes label 183 eye argument 160 F faces function 157 factor class 91 factor function 91 factors 90 FASCII files 63 FASCII importing specifying a format string 63 fig parameter 185 figure region 170 files importing 54 fill argument 73 font parameter 245 format function 73 formula argument 204 230 242 frame function 185 fuel frame dataset 210 222 FUN argument 112 functions calling 9 25 for hypothesis testing 48 for statistical modeling 50 for summary statistics 47 high level plotting 42 import data 33
257. ribed below sgraphMotif defaultFont tells the motif graphics device which font in the font resource list to use as the default font when cex 1 Note The fonts are numbered from 0 so that the following resource tells the motif graphics devices to use the third font in the list given by sgraphMotif fonts sgraphMotif defaultFont 2 e sgraphMotif canvas width and sgraphMotif canvas height control the starting size of the drawing area of the graphics windows The following resources set the size of the plotting area for the mot if graphics device to 800 by 632 pixels sgraphMotif canvas width 800 sgraphMotif canvas height 632 Note When S PLUS creates graphics to display in the graphics windows it uses the initial values of canvas width and canvas height resources as the size of the drawing area If you create a graphics device with a small drawing area and later resize the graphics window to a larger size the resolution of the graphics image is reduced so that your plots may look blocky To set color resources for motif devices interactively we recommend that you use the menus provided in the graphics windows You can also use the 326 SETTING UP YOUR WINDOW SYSTEM sgraphMotif colorSchemes resource to define new color schemes However if you use sgraphMotif colorSchemes to define new color schemes you must copy the existing resource completely before defining your new sc
258. rom bottom to top by the variety medians Svansota has the smallest median and Trebi has the largest The site panels have been ordered from bottom to top by the site medians Grand Rapids has the smallest median and Waseca has the largest Finally the year panels are ordered from left to right by the year medians 1932 has the smaller median and 1931 has the larger This median ordering is achieved by making the data set for each explanatory variable an ordered factor where the levels are ordered by the medians For example suppose variety started out as a factor without the median ordering We get the ordered factor through the following gt barley variety lt ordered barley variety levels names sort variety medians Main effects ordering is so important and is carried out so often that Trellis Graphics includes a function reorder factor to carry it out Here it is used to reorder variety gt barley variety lt reorder factor barley variety barley yield median The first argument is the factor to be reordered the second is the data on whose main effects the reordering is based and the third argument is the function to be applied to the second argument to compute main effects If a multipage display is sent to a screen device the default behavior is that each page will be drawn in order with no pause between pages You can force the screen device to pause and prompt you before drawing each page by first using par
259. round the name of the new color scheme disappears Figure 8 4 illustrates a setup in which there are 3 available color schemes called color scheme 1 color scheme 2 and color scheme 3 The default color scheme is color scheme 1 The specifications for this color scheme are shown in figure 8 4 under the Color Scheme Specifications option menu It uses a black background and white lines The specifications for Text Polygons and Images are blank Your available color schemes will not necessarily have the names or specifications shown in figure 8 4 Initially the available color schemes are defined using X resources How to define new color schemes and save them is explained below Figure 8 5 shows what happens when the color scheme color scheme 2 is selected Under the Available Color Schemes option menu the color scheme color scheme 2 is now boxed in dashed lines and the specifications under the Color Scheme Specifications option menu have changed to the ones that correspond to color scheme 2 When color scheme 2 is applied the example plot that you created earlier of rain vs yield has the following characteristics e The title legend box axis lines axis labels and axis titles are yellow color 1 e The points are red color 2 e The dashed line representing the smooth from the lowess command is cyan color 3 GRAPHICS WINDOW DETAILS PLUS Color Scheme Edito Available Color Schemes ff Color Scheme Specifications
260. rwards Axis styles can be illustrated with the following expressions gt par mfrow c 2 2 gt plot x y main Rational axes gt plot x y xaxs i yaxs i main Internal axes 183 CHAPTER 6 TRADITIONAL GRAPHICS gt plot x y xaxs e yaxs e main Extended axes gt plot x y xaxs s yaxs s main Standard axes Controlling You control boxes around the plot area using the bty box type parameter Axis Boxes which specifies the type of box to be drawn around a plot The available types are as follows Table 6 7 Specifying the type of box around a plot using the bty paramter Setting Effect n No box is drawn around the plot although the x and y axes are still drawn g The default box type draws a four sided box around the plot The box resembles an uppercase O hence the option name gr Draws a three sided box around the plot in the shape of an uppercase C ky Draws a two sided box around the plot in the shape of an uppercase L i Draws a two sided box around the plot in the shape of a square numeral 7 The box function draws a box of given thickness around the plot area The shape of the box is determined by the bty parameter You use box to draw full boxes on plots with customized axes for example gt gt 7 P gt gt par mfrow c 2 2 plot x y main bty o plowcx yy bty 1 main bty T plot x y bty n
261. s To get both a main title and a subtitle use both arguments gt plot car gals car miles main MILEAGE DATA sub Miles versus Gallons MILEAGE DATA e e B m e e Q 2 D e e B e e e vs Pe le Ss e bd D Cak Oo th e e ea e e e B e T e o Q a e g e T T T T 10 15 20 25 car gals Miles versus Gallons Putting main titles and subtitles on plots Alternatively you can add the titles after creating the plot using the function title as follows gt plot car gals car miles gt title main Mileage Data sub Miles versus Gallons FREQUENTLY USED PLOTTING OPTIONS Axis Labels Axis Limits When you use plot S PLUS provides axis labels which by default are the names of the data objects passed as arguments to plot However data object names such as car gals and car miles are chosen with brevity in mind You may want to use more descriptive axis labels For example you may prefer Gallons per Trip and Miles per Trip respectively to car gals and car miles To obtain your preferred labels use the xlab and ylab arguments For example gt plot car gals car miles xlab Gallons per Trip ylab Miles per Trip If you dont want the default labels you can suppress them by using the arguments xlab and ylab with the value as follows gt plot car gals car miles xlab ylab This gives you a plot with no axis labels If desired you c
262. s Labels Axis Limits Logarithmic Axes Plot Types Line Types Plotting Characters Controlling Plotting Colors Interactively Adding Information to Your Plot Identifying Plotted Points Adding Straight Line Fits to a Current Scatter Plot Adding New Data to a Current Plot Adding Text to Your Plot 90 91 93 94 97 98 99 104 104 106 107 110 116 119 121 122 122 123 125 126 126 126 128 129 129 130 130 133 134 135 137 137 138 138 140 xi CONTENTS xii Making Bar Plots Dot Charts and Pie Charts Bar Plots Dot Charts Pie Charts Visualizing the Distribution of Your Data Boxplots Histograms Density Plots Quantile Quantile Plots Visualizing Higher Dimensional Data Multivariate Data Plots Scatterplot Matrices Plotting Matrix Data Star Plots Faces 3 D Plots Contour Perspective and Image Plots Contour Plots Perspective Plots Image Plots Customizing Your Graphics Low level Graphics Functions and Graphics Parameters Setting and Viewing Graphics Parameters Controlling Graphics Regions Controlling the Outer Margin Controlling Figure Margins Controlling the Plot Area Controlling Text in Graphics Controlling Text and Symbol Size Controlling Text Placement Controlling Text Orientation Controlling Line Width Plotting Symbols in Margin Text in Figure Margins Controlling Axes Enabling and Disabling Axes Controlling Tick Marks and Axis Labels Controlling Axis Style Controlling Axis Boxes 142 142 1
263. s as a component y that is a list and specifications for both scales appear as remaining components of the argument scales There is an exception to the behavior of the scales argument The two 3 D general display functions wireframe and cloud currently do not accept changes to each scale separately in other words components x y and z cannot be used The general display function piechart has no tick marks and labels so the argument scales does not apply at all The general display function splom has many scales so the same delicate control is not available but more limited control is available through the argument pscales See the on line help for pscales for more details 243 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS 3 D Display aspect Argument Changing the Text in Strip Labels 244 The aspect ratio the height of a panel data region divided by the width is controlled by the aspect argument This argument was introduced in the section Aspect Ratio page 208 for 2 D displays The behavior of the aspect argument for the two 3 D general display functions wireframe and cloud is somewhat different Since there are three axes we must specify two aspect ratios to specify the shape of the 3 D box around the data Suppose the formula and the aspect arguments are formula z x y aspect c 1 2 Then the ratio of the length of the y axis to the length of the x axis is 1 and the ratio of the length of the z axis to the length of th
264. s as well an example is in the section More on Aspect Ratio and Scales Prepanel Functions page 262 ASPECT RATIO o 5 4 Q oe L o o o 47 o F 9 o o x 34 i o 27 o o 14 o H o I I I I I I 0 7 0 8 0 9 1 0 11 1 2 E Figure 7 3 The scatterplot of the gas data with an aspect ratio of 3 4 o a 2 pa ie 5 5 amp O 47 o 8 F x o G 354 6 fea 25 om a o 17 o H o I I I I I I 0 7 0 8 0 9 1 0 1 1 1 2 E Figure 7 4 The scatter plot of the gas data with line segments banked to 45 degrees 209 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS GENERAL DISPLAY FUNCTIONS A Data Set fuel frame 210 Each general display function draws a particular type of graph For example dotplot makes dot plots wireframe makes 3 D wireframe displays histogram makes histograms and xyplot makes x y plots This section describes a collection of general display functions The data frame fuel frame contains five variables that measure characteristics of 60 automobile models gt names fuel frame 1 Weight Disp Mileage Fuel Type gt dim fuel frame Li 50 5 The variables are weight displacement of the engine fuel consumption in miles per gallon fuel consumption in gallons per mile and a classification into type of vehicle The first four variables are numeric The fifth variable is a factor gt table fuel frame Type Compact Large Medium Small Sporty Van
265. s beginning with row 10 and places them in the new data frame beginning at row 1 End row from range in source Spreadsheets only The default 1 is to read to the last row in the spreadsheet The page number of the spreadsheet Spreadsheets only The default is to read all pages The row containing the column names If the file you are importing contains names for the columns of data S PLUS can use these names as column names In the colNameRow argument specify which row number in the file being imported contains the column names If you do not specify a named row S PLUS attempts to locate column names in the first row of the file Specify Row 0 to have S PLUS not search for a name row In a delimited ASCII file the name row must come before the first data rows to be read in the start row a character string specifying the database server if importing from a relational database a character string specifying the user name when importing from a relational database a character string specifying the password for the database user a character string specifying the name of the database to use when importing from a relational database This should be set to if type ORACLE a character string specifying the name of the table in database to import logical flag if TRUE strings are converted to factors when imported logical flag if TRUE levels for any factors created from strings will be sorted 57
266. s gives a quick graphical picture of similarities and differences between two cases similarly shaped stars indicate similar cases For example to create a star plot from the data used to create our scatterplot matrix gt stars longley x 4 RS 1947 1951 1955 1959 gt C XO 1948 1952 1956 1960 1949 1953 1957 1961 1950 1954 1958 1962 Figure 6 20 A star plot VISUALIZING HIGHER DIMENSIONAL DATA Faces Chernoff introduced the idea of using faces to represent multivariate observations Each variable in a given observation is associated to one feature of the face Two cases can be compared using a feature by feature comparison You can create Chernoff s faces with the S PLUS faces function gt faces t cereal attitude labels dimnames cereal attitude 2 ncol 3 s B I corn flakes shreaded wheat frosties OO weet abix sugar puffs all bran y rice krispies special k Figure 6 21 A faces plot See the faces help file and Chernoff 1973 for complete details on interpreting Chernoff faces 157 CHAPTER 6 TRADITIONAL GRAPHICS 3 D PLOTS CONTOUR PERSPECTIVE AND IMAGE PLOTS Contour Plots 158 Many types of data are usefully viewed as surfaces generated by functions of two variables Familiar examples are meteorological data topographic data and other data gathered by geographical location S PLUS provides three functions for viewing such data The simplest con
267. s in your file are not separated by line feeds or if your file splits each row of data into two or more lines For FASCII import you need to specify the file name and the file type In addition because FASCII files are assumed to be non delimited for example there are no commas or spaces separating fields you also need to specify each column s field width and data type in the Format String This tells S PLUS where to separate the columns Each column must be listed along with its data type character or numeric and its field width If you want to name the columns specify a list of names in the colNames argument Column names cannot be read from the FASCII data file When importing FASCII files you need to specify the following arguments to importData colNames Enter a character vector of column names for the imported data columns separated by spaces or commas Specify one column name for each imported column for example Apple Oranges Pears You can use an asterisk to denote a missing name for example Apples Pears format Specify the data types and field widths of the imported columns For each column you need to specify a sign then the field width and then the data type Commas or spaces must separate each specification in the string The format string is necessary because formatted ASCII files do not have delimiters such as commas or spaces separating each column of data Here is an example format string
268. s mean and max are not very different conceptually Both return a single number summary of their input both are only meaningful for numeric data Because of implementation differences however the first example returns appropriate values and the second example dumps However when all the variables in your data frame are numeric or when you want to use by with a matrix you should encounter few difficulties gt dimnames state x7 7 2 4 lt Life Exp gt by state x77 c Murder Population Life Exp state region summary INDICES Northeast Murder Population Life Exp Min 2 400 Min 472 Min 2 70 39 Ist OQu 2 3 100 Tst Qus Gal Ist Gi 770 55 Median 3 300 Median 3100 Median 71 23 Mean 2 4 722 Mean 5495 Mean 271 26 srd Qu 5 500 Brd Qu 7333 srd Oil t7 1 83 Max 10 900 Max 18080 Max 772 48 INDICES South Murder Population Life Exp Min s 6 20 Min 579 Min 267 96 1st Quet 9 25 Ist es Z622 ist Qu 568 98 Median 10 85 Median 3710 Median 70 07 Mean 210 58 Mean 4208 Mean 69 71 ord Queill 3rd Qu 4944 Sra Qus 7033 Max s15 19 Max 12240 Max 271 42 Closely related to the by and aggregate functions is the tapply function which allows you to partition a vector according to one or more categorical indices Each index is a vector of logical or factor values the same length as the data vector to use more than one index create a list of index vectors For example suppose you want to compute a me
269. s of the arguments of the function key which actually does the drawing of the key so the values of these components are given to the corresponding arguments of key The exception is the component argument space which can leave extra space for a key in the margins of the display The key argument is easy to use yet is quite powerful it has the capability to draw most keys used in practice and many yet to be invented update barley plot key list points Rows trellis par get superpose symbol 1 2 text list levels barley year The plot would be drawn using update to alter barley plot The component text of the key argument is a list with the year names The component points is a list with the graphical parameters of the two symbols used by panel superpose to plot the data These parameters are from the Trellis setting superpose symbol which panel superpose uses to draw the plotting symbols We want to give the component points only the parameters of the symbols used so the function Rows extracts the first two elements of each component of superpose symbol gt trellis device postscript 255 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS 256 gt Rows trellis par get superpose symbol 1 2 cex Eid 2b al col CI r1 font 1 1 pch 1 ee ee The key has two entries one for each year If there had been four years there would have been four entries Each entry has two items as we shall see we can
270. s the color in which the border should be drawn The repositioning uses two coordinate systems The first describes locations in the rectangle that just encloses the panels of the display but not including the tick marks the lower left corner of this panel rectangle has coordinates 0 0 and the upper right corner has coordinates 1 1 A location in the panel rectangle is specified by the components x and y The second coordinate system describes locations in the border rectangle of the key which is shown when the border is drawn the lower left corner of the key rectangle has coordinates 0 0 and the upper right corner has coordinates 1 1 A location in the border rectangle is specified by the component corner a vector with two elements the horizontal and vertical coordinates The key is positioned so that the locations specified by the two coordinate systems are at the same place on the graph Having two coordinate systems makes it far easier to get the key to a desired location quickly often on the first try Notice that we specified the space argument to be top The reason is that as soon as we specify a value for any of the coordinate arguments x y or corner no default space is allocated in any margin location unless we explicitly use the argument space If we do not use the coordinate arguments the space argument defaults to top To allocate space to the right update barley plot key list points Rows trellis par get
271. s the x axis along the short side of the paper and landscape which puts the y axis along the short side of the paper e PRINTGRAPH_ONEFILE controls whether S PLUS writes printgraph output to one file or many It has two possible values yes and no If yes printgraph sends its output to PostScript out If no printgraph creates a separate file each time and tries to send it to the printer by executing the command specified in the variable S_POSTSCRIPT_PRINT_COMMAND e POSTSCRIPT_PRINT_COMMAND sets the UNIX PostScript printing command 322 ENVIRONMENT VARIABLES AND PRINTGRAPH Note You cannot change the values of any environment variable once you start S PLUS If you want to change a variable you must stop S PLUS change the variable then start S PLUS again To change printgraph s behavior temporarily see the printgraph help file for optional arguments You can also modify printgraph s behavior using options passed to ps options send See the section Printing with PostScript Printers for details on how to control PostScript options 323 CHAPTER 9 CUSTOMIZING YOUR S PLUS SESSION SETTING UP YOUR WINDOW SYSTEM Setting XI I Resources 324 The motif graphics device has a control panel to help you pick the colors fonts and printing commands you want for your S PLUS graphics When you save these settings they are used each time you start one of these devices You c
272. s with some Frames duplicated data To get the cleanest possible data set for analysis you want to merge or join the data before proceeding with the analysis For example player statistics extracted from Total Baseball overlap somewhat with player statistics extracted from The Baseball Encyclopedia You can use the merge function to join two data frames by their common data For example consider the following made up data sets gt baseball off player years ML BA HR 1 Whitehead 4 0 308 10 2 Jones 3 04235 11 3 Smith 5 0 207 4 4 Russell NA 0 270 19 5 Ayer T Ulba 5 107 CHAPTER 5 DATA FRAMES gt baseball def player years ML A FA 1 Smith 5 300 0 974 2 Jones 3 7 0 990 3 Whitehead 4 9 0 980 4 Russel NA 55 0 963 5 Ayer 7 532 0 955 These can be merged by the two columns they have in common using merge gt merge baseball off baseball def player years ML BA HR A FA 1 Ayer 7 0 283 5 32 0 955 Z Jones S Geldo LI 7 0 990 3 Russell NA 0 270 19 55 0 963 4 Smith 5 0 207 4 300 0 974 5 Whitehead 4 0 308 10 9 0 980 By default merge joins by the columns having common names in the two data frames You can specify different combinations using the by by x and by y arguments For example consider the data sets authors and books gt authors FirstName LastName Age Income Home 1 Lorne Green 82 1200000 California 2 Loren Blye 40 40000 Washington 3 Robin Green 45 25000 Washington 4 Robin Howe 2 0 Alberta 5 Billy Jaye 40 2
273. s x biggest y biggest pch points x biggest y biggest pch M In most cases a panel function that is used for a single panel display can be used for a multipanel display as well The panel function panel special could be used to show the maximum value of NOx on each panel of a multipanel display of the ethanol data gt xyplot NOx E C data ethanol aspect 1 2 panel panel special Even if you write your own panel function you might want to use the default panel function as part of it This is often true when you want to augment a standard Trellis panel Also Trellis Graphics provides some special purpose panel functions One of them is panel loess It adds smooth curves to scatterplots 247 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS subscripts Argument Commonly Used S PLUS Graphics Functions and Parameters 248 To add smooth curves to a multipanel display of the ethanol data gt GIVEN E lt equal count ethanol E number 9 overlap 1 4 gt xyplot NOx C GIVEN E data ethanol aspect 2 5 F panel function x y panel xyplot x y panel loess x y span 1 The default panel function panel xyp1ot draws the points of the scatterplot on each panel The special panel function panel 10ess computes and draws the smooth curves the argument span the smoothing parameter has been specified If you request it another component of the packet sent to each panel is the subscripts that tell which
274. same purpose as environment variables in UNIX they determine the behavior of many aspects of environment You can set or modify these options with the the S PLus options command For example to tell S PLUs to echo back to the screen the commands you type in use this expression options echo T Among the most useful options you can set are the following echo prompt tells S PLUS whether to repeat commands it receives back to the screen The default value is echo F tells S PLUS what character string to print when it is ready for input The default value is prompt gt continue tells S PLUS which character string to print when you press the return key before completing an S PLUS expression The default value is continue width tells S PLUS how wide the screen is You can change this value to get the print command to create very wide or very narrow lines The default value is width 80 length check tells S PLUS how tall the screen is This controls how frequently the print command prints out the summary of column names when printing a matrix The default value is ength 48 tells S PLUS to perform automatic validity checking at various points in the evaluation The default is false or check F editor tells S PLUS what text editor will be used in history and fix The default is vi digits tells many of the printing functions how many digits to use when printing num
275. screen and then once Function or you are satisfied with your plots send them to a hard copy device without Script having to re type the same plotting commands Note Direct use of a hard copy device ensures the best hard copy output To use this method using an S PLUS function follow these steps 1 Put all the S PLUS commands necessary to create the graphs into a function in S PLUS say plotfcn using fix Do not include commands that start a graphics device 2 In S PLUS start a graphics device then call your function gt motif gt plotfcn 286 PRINTING YOUR GRAPHICS Note If you are creating several plots on separate pages you may want to set the graphics parameter ask to TRUE before calling your plotting function In this case the sequence of steps is gt motif gt par ask T gt plotfcn 3 View your graphs If you want to change something use fix to modify your plotting function 4 Once you are satisfied with your plots start a hard copy graphics device call your function and then turn the hard copy graphics device off gt postscript gt plotfcn gt dev off 5 Save your function containing graphics commands if you will need to reproduce the plots in the future To use this method using a script follow these steps 1 Put all the S PLUS commands necessary to create the graphs into a file outside of S PLUS say plotcmds asc using an editor e
276. sed in place of For most efficient memory allocation what should be the same size as the object to be read in For example to read in a character vector of length 1000 use gt scan what character 1000 The what argument to scan can also be used to read in data files of mixed type for example a file containing both numeric and character data as in the following sample file table dat Tom 93 37 Joe 47 42 Dave 18 43 In this case you provide a list as the value for what with each list component corresponding to a particular field gt z lt scan table dat what list 0 0 67 CHAPTER 3 IMPORTING AND EXPORTING DATA 68 x 2 LL fi Tom Joe Dave 23 1 93 47 18 eeu 1 37 42 43 S PLUS creates a list with separate components for each field specified in the what list You can turn this into a matrix with the subject names as column names as follows gt Matz lt rbind z 2 1 z 03JJ3 gt dimnames matz lt list NULL z 1 gt matz Tom Joe Dave 1 93 47 18 2 37 42 43 You can scan files containing multiple line records by using the argument multi 1line T For example suppose you have a file heart al1 containing information in the following form johns 1 450 54 6 marks 1 760 73 5 You can read it in with scan as follows gt scan heart all what list 0 0 0 multi line T ELL ais 1 johns marks avery able simpson L4 1 34 6
277. ssume any liability regarding the Software and do not undertake to furnish any support or information regarding the Software IN NO CASE WILL MATHSOFT S LIABILITY EXCEED THE AMOUNT OF THE LICENSE FEE ACTUALLY PAID BY YOU TO MATHSOFT The Software and documentation are provided with restricted rights Use duplication or disclosure by the Government is subject to restrictions as set forth in subparagraph c 1 ii of the Rights in Technical Data and Computer Software clause at DFARS 252 227 7013 or subparagraphs c 1 and 2 of the Commercial Computer Software Restricted Rights at 48 CFR 52 227 19 as applicable Manufacturer is MathSoft Inc 101 Main Street Cambridge MA 02142 vi Without prejudice to any other rights MathSoft may terminate this license if you fail to comply with the terms and conditions of this Agreement If this license is terminated you agree to destroy all copies of the Software and documentation in your possession This License agreement shall be governed by the laws of the Commonwealth of Massachusetts and shall inure to the benefit of MathSoft its successors representatives and assigns The license granted hereunder may not be assigned sublicensed or otherwise transferred by you without the prior written consent of MathSoft If any provisions of this Agreement shall be held to be invalid illegal or unenforceable the validity legality and enforceability of the remaining provisions shall in no wa
278. t 159 CHAPTER 6 TRADITIONAL GRAPHICS Perspective Plots 160 Perspective plots give a three dimensional view of data in the form of a matrix of heights on an evenly spaced grid The heights are connected by line segments to produce the familiar mesh appearance of such plots As a simple example consider again the voice spectrogram for the word five The contour plot of the voice data was difficult to interpret because the number of contour lines forced us to omit the height labels Had we included the labels the clutter of labels would have made the graph unreadable The perspective plot in figure 6 23 gives a much clearer view of how the spectrogram varies To create the plot function use the following S PLUS expression gt persp voice five 2 00 e009 a S 70 2000 Figure 6 23 Perspective plot of a voice spectrogram You can modify the perspective by choosing a different eye location You do this with the eye argument By default the eye is located c 6 8 5 times the range of the x y and z values For example to look at the voice data from the other side we could use the following command gt persp voice five eye c 72000 350 30 If you have an equal number of observations for each of three variables you can use interp to generate interpolated values for z on an equally spaced xy grid For example to create a perspective plot of the ozone data you can use interp and persp as f
279. t using the lt or operator within an S PLUS session S PLUS creates the named object in your working directory The working directory occupies position 1 in your S PLUS search list so it is also the first place S PLUS looks for an S PLUS object You specify the working directory with the environment variable S WORK which can specify one directory or a colon separated list of directories The first valid directory in the list is used as the working directory and the others are placed behind it in the search list To be valid a directory must be a valid S PLUS chapter and be one for which you have write permission For example to specify the chapter usr rich mysplus as your working directory set S WORK as follows setenv S_WORK usr rich mysplus If S WORK is not set S PLUS sets the working directory according to the rules given on page 123 of Programming with Data SPECIFYING A PAGER SPECIFYING A PAGER A pager is a tool for viewing objects and files that are larger than can fit on your screen They function much like pagers for moving around files but typically do not have actual editing functions The most common uses for pagers in S PLUS are to look at lengthy functions and data sets with the page function and to look at help files with the he1p function Both functions use the pager specified in options pager The value of options pager is initially specified by the S_PAGER environment variable if set or to less
280. t 154 screen argument 225 screen axes 225 segments function 192 248 seq function 79 Session options continuation prompt 312 session options echo 312 Session options editor 312 Session options printing digits 312 Session options prompt 312 Session options screen dimensions 312 shingle function 240 show settings function 249 251 single symbol operators 205 skip argument 263 smooth function 139 334 S news mailing list 3 solder data set 98 space argument 255 span argument 248 span parameter 258 split argument 228 split screen function 186 splom function 221 S PLUS syntax formulae in 51 S Press newsletter 3 square plot shape 126 Standard axes 183 star plot 156 Starting S PLUS 8 12 static data visualization 154 statistical modeling 50 statistics summary 47 common functions for 47 StatLib 3 strip argument 245 strip names argument 245 strip white argument 69 stripplot function 213 sub argument 128 242 subscripts argument 248 subset argument 206 subtitle of a plot 128 summary function 91 summary statistics 47 common functions for 47 superpose symbol function 253 switzerland data set 158 symbols function 193 syntax 9 case sensitivity 10 continuation lines 10 spaces 9 T t function 72 145 tapply function 114 INDEX tck parameter 180 technical support 4 testing hypothesis 48 text function 140 248 times argument 79 title function 128 165 training courses 3 Trellis settings 249
281. t create a plot with gt plotix y tck 02 mgp c 2 1 0 CONTROLLING AXES which draws the tick marks inside the plot and brings the labels closer to the axis line Controlling The xaxs and yaxs parameters determine the style of the axes The available Axis Style styles are as follows Table 6 6 Axis styles Setting Style r The default axis style this extends the range of the data by 4 and then labels internally An internally labeled axis has labels that are inside the range of the data 1 Labels internally without expanding the range Thus there will be at least one datapoint on each boundary of an i style axis if Xl im and ylim are not used e Extended axes label externally that is a pretty value beyond the range of the data is included and expand the range by half a character if necessary so that no point is precisely on a boundary S Standard axes are similar to extended axes but do not expand the range A plot with standard axes will be exactly the same as a plot with extended axes for some data sets but for other data sets the extended axes will contain a slightly wider range ng Direct axis retains the axis from the previous plot For example you can make several plots that have precisely the same x axis or y axis by giving xaxs d or yaxs d as an argument to the second and subsequent plot commands You can also set it with par but then you need to remember to release the axis afte
282. ta objects you are combining into the data frame gt data frame price country reliab mileage type row names c Acura Audi BMW Chev Ford Mazda MazdaMX Nissan Olds Toyota price country reliab mileage type Acura 11950 Japan 5 NA Smal Audi 26900 Germany NA NA Medium 103 CHAPTER 5 DATA FRAMES COMBINING DATA FRAMES Combining Data Frames by Column 104 We have already seen one way to combine data frames since data frames are legal inputs to the data frame function you can use data frame directly to combine one or more data frames For certain specific combinations other functions may be more appropriate This section discusses three general cases 1 Combining data frames by column This case arises when you have new variables to add to an existing data frame or have two or more data frames having observations of different variables for identical subjects The principal tool in this case is the cbind function 2 Combining data frames by row This case arises when you have multiple studies providing observations of the same variables for different sets of subjects For this task use the rbind function 3 Merging or joining data frames This case arises when you have two data frames containing some information in common and you want to get as much information as possible from both data frames about the overlapping cases For this case use the merge function All three of the func
283. ta point falls in the rightmost interval The number of intervals produced by hist e g six intervals in the above example is determined automatically by hist to balance the tradeoff between obtaining smoothness and preserving detail However no automatic rule is completely satisfactory Thus hist allows you to choose the number of intervals yourself by using the optional argument nclass Choosing a larger number of intervals produces a rougher histogram with more detail and choosing a smaller number produces a smoother histogram with less detail For example gt hist corn rain nclass 10 gives the rougher but more detailed histogram VISUALIZING THE DISTRIBUTION OF YOUR DATA o 2 z You can also use hist to make a histogram in which you specify the number of intervals and their locations You do this by using the optional argument breaks with value a vector whose values give the interval boundary points The length of this vector is one plus the number of intervals you want For example to specify 12 intervals for the corn rain histogram with interval boundaries at the integers 6 through 18 use gt hist corn rain breaks 6 18 Figure 6 15 Density Plots 6 8 10 12 14 16 18 corn rain Histogram of corn rain with specified break points Many other options are available with hist including many of the arguments to barplot See the help files for hist and barplot for complete details
284. te Using the xaxs d and yaxs d arguments sets all axis limits to the values for the most recent plot in a sequence of plots If those limits are not the widest required in the sequence points outside the limits are not plotted and you receive the message Points out of bounds To avoid this error you can first make all plots in the usual way without specifying axis limits to find out which plot has the largest range of axis limits Then create your first plot using x1im and ylim with values determined by the largest range Now set the axes with xaxs d and yaxs d as described above To return to the usual default state in which each plot determines its own limits in a multiple plot layout use gt par xaxs yaxs The change goes into effect on the next page of figures Often a data set you are interested in does not reveal much detail when graphed on ordinary axes This is particularly true when many of the data points bunch up at small values making it difficult to see any potentially interesting structure in the data Such data sets yield more informative plots if you graph them using a logarithmic scale for one or both of the axes To put the horizontal axis on a logarithmic scale use 1og x similarly for the vertical axis use 1og y To put both the horizontal and vertical axes on logarithmic scales use og xy You can plot data in S PLUS in any of the following ways e As points e As lines i e as conn
285. te kyphosis by kyphosis Kyphosis FUN sum Error in Summary factor structure Data ell 1 A factor is not a numeric object Dumped For time series aggregate returns a new shorter time series that summarizes the values in the time interval given by a new frequency For instance you can quickly extract the yearly maximum minimum and average from the monthly housing start data in the time series hstart gt aggregate hstart nf 1 fun max 1966 143 0 137 0 164 9 159 9 143 8 205 9 231 0 234 2 160 9 start deltat frequency 1966 1 1 gt aggregate hstart nf 1 fun min 1966 62 3 61 7 82 7 85 3 69 2 104 6 150 9 90 6 54 9 start deltat frequency 1966 1 1 gt aggregate hstart nf 1 fun mean 1966 99 6 110 2 128 8 125 0 122 4 173 7 198 2 171 5 112 6 start deltat frequency 1966 1 1 The by function allows you to partition a data frame according to one or more categorical indices conditioning variables and then apply a function to the resulting subsets of the data frame Each subset is considered a separate data frame hence unlike the FUN argument to aggregate the function passed to by does not need to have a numeric result Thus by is useful for functions that work on data frames by fitting models for example gt by kyphosis INDICES kyphosis Kyphosis FUN summary kyphosis Kyphosis absent 111 CHAPTER 5 DATA FRAMES 112 Kyphosis Age Number absent 64 Min z go Min 72 00 present 0 lst Qu
286. ted with the code listed above NOx is graphed along the vertical scale The limits of this variable are SCALES AND LABELS scales and pscales Arguments gt range gas NOx 1 0 537 5 344 To include the values 0 and 6 in the vertical scale gt xyplot NOx E data gas aspect 1 2 ylim c 0 6 The argument scales affects tick marks and tick mark labels In the plot produced by the code above there would be seven tick marks and tick mark labels along the vertical scale and six along the horizontal The function scales is used to reduce the number of ticks and increase the size of the tick labels gt xyplot NOx E data gas aspect 1 2 ylim c 0 6 scales list cex 2 tick number 4 The argument scales is a list The list component cex affects the size The list component tick number affects the number but it is just a suggestion an algorithm tries to find tick values that are pretty while trying to come as close as possible to the specified number We can also specify the tick marks and labels separately for each scale The specification scales list cex 2 x list tick number 4 y list tick number 10 changes cex on both scales but tick number has been set to 4 for the horizontal or x scale and to 10 for the vertical or y scale Thus the rule is this specifications for the horizontal scale appear in the argument scales as a component x that is itself a list specifications for the vertical scale appear in scale
287. tensions shown in the above table you need not specify a type nor in most cases any other information For example suppose you have a SAS data set rain sd2 in your startup directory You can read this into S PLUS using importData as follows sas rain data lt importData rain sd2 Note If a file extension is inappropriate an error may appear indicating an unrecognized format or the data file may be converted incorrectly If you have trouble reading the data most likely you just need to supply additional arguments to importData to specify extra information required by the data importer to read the data correctly 55 CHAPTER 3 IMPORTING AND EXPORTING DATA Arguments to importData The importData function has the arguments shown in table 3 1 Table 3 1 Arguments to importData Argument Required Description file type keep drop colNames rowNamesCol format filter startCol endCol 56 Required except for database reads Optional Optional Optional Optional Optional Optional Optional Optional Optional A character string giving the name of the file and directory path See the Type column in the previous table A character vector of variable names in data file to be imported A character vector of variable names in data file that are not to be imported A character vector of column names for the data columns to import separated by an
288. test through the rows and the slowest through the pages The panel ordering rule is like a graph not like a table the origin is at the lower left and as we move either from left to right or from bottom to top the panel order increases The following shows the panel order for figure 7 22 which has two columns six rows and one page 11 12 9 10 7 5 3 1 me FD MULTIPANEL CONDITIONING layout Argument In Trellis Graphics packets are assigned to panels according to the packet order and the panel order Packet 1 goes into panel 1 packet 2 goes into panel 2 and so forth In figure 7 22 the two orderings result in the year variable changing along the columns and the site variable changing along the rows Note that as the levels for one of these factors increase the darkened bars in the strip label for the factor move from left to right Multipanel conditioning is a powerful tool for understanding how a response depends on two or more explanatory variables In such an analysis it is typically important to make as many displays as necessary to have each explanatory variable appear at least once as a panel variable In figure 7 22 variety an explanatory variable appears as a panel variable We will make a new display with site as a panel variable The argument layout specifies the numbers of columns rows and pages gt dotplot site yield year variety data barley layout c 2 5 2 The result is shown in figure 7 2
289. th the axis function to create 8 plots of mathematical functions on a standard Cartesian coordinate system For example you can define the following simple function to plot a set of ae y l 8 Simp e tonc P points from the domain of a function against the set s image on a Cartesian grid gt mathplot lt function domain image plot domain image type 1 axes F axis 1 pos 0 axis 2 pos 0 To control the length of tick marks use the tck general parameter This parameter is a single number which is interpreted as a fraction of a plot dimension If tck is less than one half the tick marks on each axis have the same length this length is the fraction tck of the smaller of the width and height of the plot area Otherwise the length of the tick marks on each axis are a fraction of the corresponding plot dimension Use tck 1 to draw grid lines The default is tck 02 meaning tick marks of equal length on each axis are drawn pointing out from the plot Try the following expressions gt par mfrow c 2 2 gt plotix y Naina tek 02 gt ploii x y maine tek 05 tek 05 gt plot x y main tek 1 teksi CONTROLLING AXES You can have tick marks of different lengths on each axis The following code draws a plot with no axes then adds each axis individually with different values of tck and 1ty the line type gt plot x y axes F main Different tick marks gt akis 1 gt axis 2 tck
290. that information can be added to it gt subpars lt subplot x c 69 usr 2 y c usr 3 43 usa xlim c 130 50 The rest of the commands add to the small map of the entire U S First draw the map with a box around it gt subplot box pars subpars Next draw a box around New England gt subplot polygon c usr 1 65 65 usr 1 c usr 3 usr 3 usr 4 usr 4 density 0 pars subpars OVERLAYING FIGURES Finally add text to indicate that the boxed region just created corresponds to the enlarged region gt subplot text usr 1l tusr 2 2 usr 4 4 Enlarged Region pars subpars The subplot function can also be used to create composite figures For example to plot density estimates of the marginal distributions in the margins of a plot of Mileage against Price enter the following code First we set up the coordinate system with par and usr and create and store the main plot with subplot gt frame gt par usr c 0 1 0 1 2 o par lt subplot x c 0 85 y c 0 85 fun plot price mileage log x We next find the usr coordinates from the main plot and calculate the density estimate for both variables gt o usr lt o par usr gt den p lt density price width 3000 gt den m lt density mileage width 10 Finally we plot the two marginal densities with two calls to subplot The first plots the density estimate for price along the top of the main plot gt Subp letix o
291. the make groups arguments The which component is a factor with three levels giving the names of the original data vectors Now we can make the boxplots gt bwplot which data data lottery as data frame array The function as data frame array converts arrays into data frames Consider the object iris a three way array of 50 measurements of four variables for each of three varieties of irises gt dimtiris 1 50 4 3 259 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS To turn iris into a data frame in preparation for Trellis plotting use jiris df lt as data frame array iris col dims 2 names iris df 5 6 lt c flower variety The resulting data frame has what used to be its second dimension turned into four columns IF sd lls Sepal L Sepal W Petal L Petal W flower variety 1 Sl 3 5 1 4 0 2 1 Setosa 2 4 9 30 1 4 Use 2 Setosa 3 4 7 owe Wl cee Uere 3 Setosa 4 4 6 3 1 laS 0 2 4 Setosa 5 5 0 3 6 1 4 0 2 5 Setosa To produce a scatterplot matrix of the data superpose symbol lt trellis par get superpose symbol for i in 134 ints dtl il lt gittertiris dt 7 splom iris df 1 4 key list space top columns 3 text list levels iris df variety points Rows superpose symbol 1 3 varnames c Sepal Length n cm Sepal Width n cm Petal Length n cm Petal Width n cm groups iris df variety panel panel superpose To prevent exact overlap of many of the plotting symbols th
292. they were immediately after the last time you clicked on the Apply button The Printing Dialog Box The second menu item under the Options menu is labeled Printing When you select Printing the Printing dialog box appears This window lets you interactively change the specifications of the printing method used when you choose the Print menu item under the Graph menu See the section The Graph Menu page 294 CHAPTER 8 WORKING WITH GRAPHICS DEVICES 302 Figure 8 7 shows an example of the Printing dialog box This window has a header with a window menu button and the title S PLUS Graph Printing Options The pane of the Printing dialog box contains option menus entitled Method Orientation and if Method is LaserJet Resolution as well as a text entry box labeled Command There are also six buttons labeled Apply Reset Print Save Close and Help These features are explained below 5 PLUS Graph Printing Options Method Orientation PostScript HE Landscape a7 LaserJet ar Portrait Figure 8 7 The Motif Printing dialog box GRAPHICS WINDOW DETAILS Method Orientation Resolution and Command The Method Orientation and Resolution option menus all contain options marked with diamond shaped buttons called radio buttons Radio buttons are used to distinguish mutually exclusive options The option that is currently active is denoted by a darker radio button To change the currently active option move the po
293. three components The first component is an empty character vector character 0 the second component is a vector of four character strings indicating whether the measurement is sepal length or width or petal length or width and the third component is a vector of three character strings specifying the species of iris To create a list use the list function Each argument to list defines a component of the list Naming an argument using the form name component creates a name for the corresponding component For example you can create a list from the two vectors grp and thw as follows 87 CHAPTER 4 DATA OBJECTS 88 grp lt c rep 1 11 rep 2 10 thw lt c 450 760 325 495 285 450 460 375 310 615 425 245 350 340 300 310 270 300 360 405 290 heart list lt list group grp thw thw descrip heart data gt heart list group BIEL ei tiliagegegeze2e2s fe A ate Ot as thw 1 450 760 325 495 285 450 460 375 310 615 425 245 350 14 340 300 310 270 300 360 405 290 descrip 1 heart data The first component of the list contains a numeric vector with grouping information for the data so it is named group The second component is the total heart weight thw in grams The name of the component is the same as the name of the object stored in that component The thw on the left of the equal sign is the component name and the thw on the right of the equal sign is the object stored there The third component conta
294. tions mentioned above cbind rbind and merge have methods for data frames but in the usual cases you can simply call the generic function and obtain the correct result Suppose you have a data frame consisting of factor variables defining an experimental design When the experiment is complete you can add the vector of observed responses as another variable in the data frame In this case you are simply adding another column to the existing data frame and the natural tool for this in S PLUS is the cbind function For example consider the simple built in design matrix 0a 4 2p3 representing a half fraction of a 2 4 design gt 0a 4 2p3 A B C 1 Al BI C1 2 Al B2 C2 3 A2 Bl C2 4 A2 B2 Cl COMBINING DATA FRAMES If we run an experiment with this design we obtain a vector of length four one observation for each row of the design data frame We can combine the observations with the design using cbind as follows gt runl lt cbind oa 4 2p3 resp c 46 34 44 30 gt runl A B C resp 1 Al B1 C1 46 2 Al B2 C2 34 3 A2 B1 C2 44 4 A2 B2 C1 30 Another use of cbind is to bind a constant vector to a data frame as in the following example gt fuell lt cbind 1 fuel frame gt fuell 1 Weight Disp Mileage Fuel Type Eagle Summit 4 1 2560 97 33 3 030303 Smal Ford Escort 4 1 2345 114 33 3 030303 Smal Ford Festiva 4 1 1845 81 37 2 702703 Small Honda Civic 4 1 2260 91 32 3 125000 Smari Mazda Protege 4 1 2440 113 32
295. tor The Color Scheme Specifications editor includes specifications for the following characteristics Name The name of the color scheme Background The color of the background This specification can have only one color name or value Lines The color names or values used for lines e Text The color names or values used for text Polygons The color names or values used with the polygon pie barplot and hist plotting functions e Images The color names or values used with the image plotting function All color schemes must have values for the specifications Name Background and Lines The specifications for Text Polygons and Images default to the specifications for Lines if left blank See the section Available Colors Under X11 page 306 for information and rules on how to specify colors with the mot i f windowing graphics device 297 CHAPTER 8 WORKING WITH GRAPHICS DEVICES 298 Selecting a Different Color Scheme To select a different color scheme move the pointer to one of the color scheme names under the Available Color Schemes option menu and click The name of the newly chosen color scheme is boxed in dashed lines and its specifications are displayed in the Color Scheme Specifications editor The plot in the graphics window however is still based on the original color scheme To apply the newly chosen color scheme you must click on the Apply button Once you apply the new color scheme the box a
296. tour represents the surface as a set of contour plot lines on a grid representing the other two variables The perspective plot persp creates a perspective plot with hidden line removal The image function plots the surface as a color or grayscale variation on the base grid All three functions require similar input a vector of x coordinates a vector of y coordinates and a length x by length y matrix of z values In many cases these arguments are all supplied by a single list such as the output of the interp function The interp function interpolates the value of the third variable onto an evenly spaced grid of the first two variables For example the built in data set ozone contains the objects ozone xy a list of latitudes and longitudes for each observation site and ozone median a vector of the medians of daily maxima ozone concentrations at all sites To create a contour or perspective plot we can use interp to interpolate the data as follows ozone fit lt interp ozone xy x ozone xy y ozone median For contour and persp but not image you can also provide a single matrix argument which contour and persp interpret as the z matrix The two functions then automatically generate an x vector 1 nrow z and a y vector I ncol z See the persp and contour help files for more information To generate a contour plot use the contour function For example the built in data set switzerland contains elevation data for Switzerland gt
297. trellis device function 202 249 trellis par get function 249 trellis par set function 249 251 type argument 123 253 Type factor 221 U unix function 32 usa function 194 using logarithmic scale 130 usr parameter 175 V vector arithmetic 29 vector data type 116 vector function 80 vectors 67 79 creating 25 vi editor 12 table of keystrokes 12 vi function 35 VISUAL environment variable 12 W what argument 67 69 width argument 150 220 widths argument 69 wireframe function 202 210 225 working directory how set 320 write function 72 Writing A Panel Function 246 X xaxs argument 130 xlab argument 129 242 xlim argument 129 242 xyplot function 202 204 211 Y yaxs argument 130 ylab argument 129 242 ylim argument 129 242 335 INDEX 336
298. ts of different types into a data frame some objects may be altered somewhat to be more suitable for further analysis For example numeric vectors and factors remain unchanged in the data frame Character and logical vectors however are converted to factors before being included in the data frame The conversion is done because S PLUS assumes that character and logical data will most commonly be taken to be a categorical variable in any modeling that is to follow If you want to keep a character or logical vector as is in the data frame pass the vector to data frame wrapped in a call to the I function which returns the vector unchanged but with the added class AsIs For example consider the following logical vector my logical gt my logical EI TTT errr TP eee Tear TT oT We can combine it as is with a numeric vector rnorm 20 ina data frame as follows gt my df lt data frame a rnorm 20 b I my logical gt my df a 0 6960192 0 4342069 0 4512564 0 8785964 0 8857739 ort wWwhnr Fe amaaa CREATING DATA FRAMES 6 0 2865727 F 7 10816919 T 8 22956470 T 9 O f27 7701 F 10 0 6382045 T 11 0 9127547 T Iz R177152 F 13 05361920 T 14 0 3633339 F 15 0 5164660 T 16 0 4362987 T 17 1 2920592 T 18 0 831435 T 19 0 6188006 T 20 1 4910625 T gt mode my df b 1 legical You can provide a character vector as the row names argument to data frame Just make sure it is the same length as the da
299. ttom of the box extend to the extreme values of the data or a distance 1 5 x IQD from the center whichever is less For data having a Gaussian distribution approximately 99 3 of the data falls inside the whiskers Data points that fall outside the whiskers may be outliers and so they are indicated by horizontal lines In our example the two horizontal lines at the top of the graph represent outliers Boxplots provide a very powerful method for visualizing the rough distributional shape of two or more samples of data For example to compare the distributions of the New Jersey lottery payoffs lottery payoff lottery2 payoff and lottery3 payoff in each of three different years use gt boxplot lottery payoff lottery2 payoff lottery3 payoff You can modify the style of your boxplots and many other features as well using arguments to boxp1ot see the help file for complete details A histogram shows the number of data points that fall in each of a number of intervals You create histograms in S PLUS with the hist function gt hist corm rain Notice that a histogram gives you an indication of the relative density of the data points along the horizontal axis For example there are 10 data points in the interval 8 to 10 and only one data point in the interval 14 to 16 The histogram produced by the above simple use of hist always spans the range of the data i e the smallest data value falls in the leftmost interval and the largest da
300. uk General Becker R A Chambers J M and Wilks A R 1988 The New S Language Wadsworth amp Brooks Cole Pacific Grove CA Krause A and Olson M 1997 The Basics of S and S PLUS Springer Verlag New York Spector P 1994 An Introduction to S and S PLUS Duxbury Press Belmont CA Data Analysis Bruce A and Gao H Y 1996 Applied Wavelet Analysis with S PLUS Springer Verlag New York Chambers J M and Hastie T J 1992 Statistical Models in S Wadsworth amp Brooks Cole Pacific Grove CA Everitt B 1994 A Handbook of Statistical Analyses Using S PLUS Chapman amp Hall London Hardle W 1991 Smoothing Techniques with Implementation in S Springer Verlag New York Kaluzny S P Vega S C Cardoso T P and Shelly A A 1997 S SPATIALSTATS Users Manual Springer Verlag New York Marazzi A 1992 Algorithms Routines and S Functions for Robust Statistics Wadsworth amp Brooks Cole Pacific Grove CA HELP SUPPORT AND LEARNING RESOURCES Venables W N and Ripley B D 1994 Modern Applied Statistics with S PLUS Springer Verlag New York Graphical Techniques Chambers J M Cleveland W S Kleiner B and Tukey P A 1983 Graphical Techniques for Data Analysis Duxbury Press Belmont CA Cleveland W S 1993 Visualizing Data Hobart Press Summit NJ Cleveland W S 1985 The Elements of Graphing Data Hobart Press Summit NJ
301. unction cloud Mileage Weight Disp data fuel frame screen 1ist z 30 x 60 y 0 xlab W ylab pe zi ab M contourplot function contourplot dataz datax datay data gauss aspect 1 at seq 1 9 by 2 data argument see aspect example densityplot function densityplot Mileage data fuel frame aspect 1 2 width 5 dev cur function dev cur dev list function dev list dev off function dev off dev set function dev set which 2 dotplot function dotplot names mileage means log mileage means base 2 aspect 1 cex 1 25 SUMMARY OF TRELLIS FUNCTIONS AND ARGUMENTS Table 7 1 An alphabetical guide to Trellis Graphics Statement Purpose Example equal count function GIVEN E lt equal count ethanol E number 9 overlap 1 4 formula argument xyplot formula gas NOx gas E histogram function histogram Mileage data fuel frame aspect 1 nint 10 intervals argument GIVEN E lt shingle ethanol E intervals cbind endpoints 6 endpoints 1 jitter argument stripplot Type Mileage data fuel frame Jitter TRVE aspect 1 key argument update barley plot key list points Rows trellis par get superpose symbol 1 2 text list levels barley year layout argument dotplot site yield year variety data barley layout c 2 5 2 levelplot function levelplot dataz datax datay data gauss aspect 1 cuts 6 levels function levels barley year main argument see aspect example make groups function
302. until the matrix is filled in If you provide more data than necessary to complete the matrix excess values are discarded If either ncol or nrow is provided but not both the missing argument is computed using the following relations e nrow the smallest integer equal to or greater than the number of values divided by the number of columns e ncol the smallest integer equal to or greater than the number of values divided by the number of rows Thus nrow and ncol are computed to create the smallest matrix from all the values when ncol or nrow is given individually By default the values are placed in the matrix column by column That is all the rows of the first column are filled then the rows of the second column are filled etc To fill the matrix row by row set the byrow argument to T For example gt matrix 1 12 nco1 3 byrow T eld 2 8 i fi 2 3 2e 4 5 6 35 7 8 g 4 10 JII The byrow argument is especially useful when reading in data from a text file that is arranged in a table The data are read in with scan row by row in this case so the byrow argument is used to place the values in a matrix correctly 83 CHAPTER 4 DATA OBJECTS Naming Rows and Columns 84 For a vector you saw that you could assign names to each value with the names function For matrices you can assign names to the rows and columns with the dimnames function To create a matrix with row and column names of your own
303. uperpose symbol argument trellis par get superpose symbol trellis args function trellis args trellis device function trellis device postscript onefile FALSE trellis par get function plot line lt trellis par get plot line 269 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS Table 7 1 An alphabetical guide to Trellis Graphics Statement Purpose Example trellis par set function trellis par set plot symbol plot symbol update function foo lt update foo main Dependence of NOx on E width argument see densityplot example wireframe function see screen example xlab argument see aspect example xlim argument Xlim lt range x xyplot function xyplot Mileage Weight data fuel frame aspect 1 ylab argument see aspect example ylim argument see scales example 270 WORKING WITH GRAPHICS DEVICES Printing Your Graphics Printing with PostScript Printers Printing with HP GL Pen Plotters Creating PDF Graphics Files Managing Files from Hard Copy Graphics Devices Using Graphics from a Function or Script Graphics Window Details Basic Terminology Available Colors Under X11 272 272 283 285 285 286 289 289 306 271 CHAPTER 8 WORKING WITH GRAPHICS DEVICES PRINTING YOUR GRAPHICS Printing with PostScript Printers 272 One important and widespread use of S PLUS is to produce camera ready graphics plots for technical reports and papers S PLUS supports two kinds of ha
304. ure the colors from the device using xgetrgb and specify the captured colors as the PostScript color scheme using ps options gt ps options colors xgetrgb colors background xgetrgb background 4 Start the postscript device using the postscript function gt postscript file colcorn ps 282 PRINTING YOUR GRAPHICS Printing with HP GL Pen Plotters 5 Plot the graphic the following commands produce a plot with three different colors gt plot corn rain corn yield type n gt points corn rain corn yield col 2 gt title main A plot with several colors col 3 6 Turn off the postscript device gt dev off The hpgl graphics device translates your S PLUS plotting commands into commands that can be read by pen plotters that accept the Hewlett Packard HP GL instruction set To start the hpg1 graphics device type gt hpgl file file where file is a file name specifying where to write the plotting commands When the hpg device is the current graphics device no graphics appear on your screen The following arguments may be supplied to the hpg1 function e width Determines the width of the x axis dimension in inches The default value is 10 e height Determines the height of the y axis dimension in inches The default value is 7 25 e ask Determines whether you are prompted by G0 prior to advancing to a new frame Possible values are TRUE a
305. use of the Print option is to create immediately a hard copy of the displayed graphic You can however specify a command such as the following to store the PostScript output in a named file cat gt mpyfile lt Here myfile is any desired file name However the printgraph function described in the next section provides a more convenient method for creating files of PostScript output To choose the Print option from the graphics device 1 Move the pointer to the button labeled Graph 2 Click and a menu appears 273 CHAPTER 8 WORKING WITH GRAPHICS DEVICES Using the printgraph Function Using the postscript Function 274 3 Drag the pointer to the Print option then release the mouse button A message appears in the footer of the graphics window telling you that the specified command has been executed In its simplest use the printgraph function is just another way to produce immediate hard copies of graphics created on windowing or other graphics devices Many graphics devices for use with graphics terminals and emulators including tek14 support the printgraph function The default behavior of the printgraph function is determined by a number of environment variables These are discussed in the section Environment Variables and printgraph page 322 To make printgraph produce PostScript output you should make sure that the environment variable S_PRINTGRAPH_METHOD is set to postscript or call printgraph direct
306. uter margin of multiple figures because the usr coordinates are the coordinates from the most recent plot created in the figure region By default mtext rotates text to be parallel to the axis To control the orientation of text in the margins use the srt argument along with the at argument For example the following command displays upside down text in the top figure margin gt mtext Title with srt 180 line 2 at 5 srt 180 Warning If you supply mtext with the srt argument you must supply the at argument otherwise srt will be ignored 179 CHAPTER 6 TRADITIONAL GRAPHICS CONTROLLING AXES Enabling and Disabling Axes Controlling Tick Marks and Axis Labels 180 The high level graphics commands described in the section Getting Started with Simple Plots page 122 create complete graphics including labeled axes Often however you need to create graphics with axes different from those provided by S PLUS You may need to specify a different choice of axes or different tick marks or different plotting characteristics This section describes how to control these characteristics Whether axes appear on a plot is determined by the high level graphics parameter axes which takes a logical value If axes FALSE no axes are drawn on the plot If axes are not drawn on the original plot they can be added afterward with one or more calls to the axis function You can use plot with axes F together wi
307. values a Dim slot to hold the dimensions vector and an optional Dimnames slot to hold the row and column and so on names The most important slot for a matrix data object is the dimension or Dim slot Use the dim function to display the dimension For example gt my mat lt matrix 1 8 4 2 21 CHAPTER 2 GETTING STARTED Data Frame Objects List Objects 22 gt dim my mat 1 4 2 shows that the dimension of the matrix my mat that you created is 4 rows by 2 columns Matrix objects also have length and mode which correspond to the length and mode of the vector in the Data slot A matrix object has a single mode This means that you cannot create for example a two column matrix with one column of numeric data and one column of logical or character data For that you must use a data frame S PLUS also contains an object which is very similar to a matrix object called a data frame object A data frame object consists of rows and columns of data just like a matrix object except that the columns can be of different modes The following object baseball df is a data frame object consisting of some baseball data from the 1988 season The first two columns are factor objects codes for names of players the next two columns are numeric and the last column is logical gt baseball df bat ID pitch ID event typ outs play err play rl pettg001 clemr001 2 1 F r2 whit1001 clemr001 14 0 F r3 evand001 clemr001 3 1 F r
308. variate GARCH modeling of financial time series data S SPATIALSTATS provides a comprehensive set of tools for statistical analysis of spatial data including tools for hexagonal binning variogram estimation and kriging autoregressive and moving average modeling and testing for spatial randomness S WAVELETS offers a visual data analysis approach to a whole range of signal processing techniques such as wavelet packets local cosine analysis and matching pursuits GETTING HELP StatLib S News Training Courses S Press StatLib is a system for distributing statistical software data sets and information by electronic mail FTP and the World Wide Web It contains a wealth of user contributed S PLUS functions e To access StatLib by FTP open a connection to lib stat emu edu Login as anonymous and send your e mail address as your password The FAQ frequently asked questions is in S FAQ or in HTML format at http www stat math ethz ch S FAQ To access StatLib with a web browser visit http lib stat cmu edu e To access StatLib by e mail send the message send index from S to statlib lib stat emu edu You can then request any item in StatLib with the request send item from S where item is the name of the item S news is an electronic mailing list by which S PLUS users can ask questions and share information with other users To get on this list send a message with message body subscribe to s news request
309. ver you attach a directory with more restrictive naming conventions than it is expecting Hint You will not lose data if when creating data objects on a file system with more restrictive naming conventions than your version of S PLUS was compiled for you restrict yourself to names that are unique under the more restrictive conventions However your file system may truncate or otherwise modify the object name To recall the object you must refer to it by its modified name For example if you create the object aov devel smal1 on a file system with a 14 character limit you should look for it in subsequent S PLUS sessions with the 14 character name aov devel smal The use of periods often enhances the readability of similar data set names as in the following data 1 data 2 data 3 19 CHAPTER 2 GETTING STARTED Warning function S PLUS Vector Data Objects Matrix Data Objects 20 You should not choose names that coincide with the names of S PLUS functions If you store a function with the same name as a built in S PLUS function access to the S PLUS function is temporarily prevented until you remove or rename the object you created S PLUS warns you when you have masked access to a function with a newly created function To obtain a list of objects that mask other objects use the masked At least seven S PLUS functions have single character names C D c I q S and t You should
310. versity Farm e e Duluth e e Grand Rapids e e No 457 No 457 1932 1931 Waseca Crookston e e Mortis e University Farm e e Duluth e e Grand Rapids e e Glabron Glabron 1932 1981 Waseca e Crookston e e Morris e e University Farm e Duluth Grand Rapids e Peatland Peatland 1932 1931 Waseca Crookston e e Mortis e e University Farm e e Duluth e e Grand Rapids e e yield Figure 7 24 The second page of the multipage plot of the barley data For the barley data the explanatory variables are categorical The data set for each is a factor Since there are only two years the year variable is treated as a factor rather than a numeric vector For each factor consider the median yield for each level For example for variety the level medians are gt variety medians lt tapply barley yield barley variety median 235 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS reorder factor Controlling the Pages of a Multipage Display 236 gt variety medians Svansota No 462 Manchuria No 475 Velvet Peatland 28 55 30 45 30 96667 31 06667 32 16 32 38334 Glabron No 457 Wisconsin No 38 Trebi 32 4 33 96666 36 95 39 2 The barley displays in figure 7 22 to figure 7 24 use an important display method main effects ordering of levels This greatly enhances our ability to perceive effects Consider figure 7 22 On each panel the varieties are ordered f
311. w Low Low Low Low Low High 41 Low High High Low Low High Low Low Low Low 95 CHAPTER 4 DATA OBJECTS 96 DATA FRAMES The Benefits of Data Frames 98 Creating Data Frames 99 Combining Data Frames 104 Combining Data Frames by Column 104 Combining Data Frames by Row 106 Merging Data Frames 107 Applying Functions to Subsets of a Data Frame 110 Adding New Classes of Variables to Data Frames 116 Data frames are data objects designed primarily for data analysis and modeling You can think of them as generalized matrices generalized in a way different from the way arrays generalize matrices Arrays generalize the dimensional aspect of a matrix data frames generalize the mode aspect of a matrix Matrices can be of only one mode for example logical numeric complex character Data frames however allow you to mix modes from column to column For example you could have a column of character values a column of numeric values a column of categorical values and a column of logical values Each column of a data frame corresponds to a particular variable each row corresponds to a single case or set of observations 97 CHAPTER 5 DATA FRAMES THE BENEFITS OF DATA FRAMES 98 The main benefit of a data frame is that it allows you to mix data of different types into a single object in preparation for analysis and modeling The idea of a data frame is to group data by variables columns regardless of their
312. we create a matrix the matrix will have the row and column labels specified by the dimnames argument IMPORTING AND EDITING DATA Extracting Subsets of Data Subsetting From Vectors gt matrix 1 12 nrow 3 dimnames list c I II III OCR RZ TRS ROT ID x1 x2 x3 x4 1 2 4 7 10 Ik 2 5 8 11 iil 3 912 You can assign row and column names to existing matrices using the dimnames function which works much like the names function for vectors gt y lt matrix 1 12 nrow 3 gt dimmnamesty lt list ec i ITP 111 3 et x1 KZ PKS MA F xl x2 x3 x4 i i 4 7 10 fi 2 amp gn lit amp 9 2 Another powerful feature of the S PLUS language is the capability to extract subsets of data for viewing or for further manipulation The examples in this introductory chapter illustrate subset extraction for vectors and matrices However similar techniques can be used to extract subsets of data from other S PLUs data objects Suppose you create a vector of length 5 consisting of the integers 5 14 8 9 5 as follows gt x lt gt 5 14 8 9 5 gt xX fi 5 14 6 9 5 To display a single element of this vector just type the vector s name followed by the elements index within characters For example type x 1 to display the first element and x 4 to display the fourth element gt RL 1 5 gt x 4 1 9 37 CHAPTER 2 GETTING STARTED Subsetting From Matrix Data Obj
313. wed points containing missing values are not plotted stars Matrix with 7 columns where 7 is the number of points to a star The matrix must be scaled from 0 to 1 thermometers Matrix with 3 or 4 columns The first two columns give the widths and heights of the rectangular thermometer symbols If the matrix has 3 columns the third column gives the fraction of the symbol that is filled from the bottom up If the matrix has 4 columns the third and fourth columns give the fractions of the rectangle between which it is filled boxplots Matrix with 5 columns of positive numbers giving the width and height of the box the amount to extend on the top and bottom and the fraction of the way up the box to draw the median line Note Missing values are allowed points containing missing values are not plotted except in stars where they are treated as zeros Custom The following functions provide a simple way to add your own symbols to a Symbols plot The make symbol function facilitates creating a symbol gt make symbol lt function on exit par p p lt par pty s plot 0 0 type n xlim c 0 5 0 5 ylimec 0 5 0 5 195 CHAPTER 6 TRADITIONAL GRAPHICS 196 cat Now draw your symbol using the mouse Continue string clicking at corners n locator type 1 This returns a list with components named x and y The Continue string prompt is given because there was a new line while in the
314. wee ee se cy e e ee e r eee amp e o e e e e e e e o e a e e e e T T T T T T T 0 20 40 60 80 100 120 Index Figure 6 1 Scatter plot of a single vector The data are plotted as a set of isolated points For each plotted point the vertical axis location gives the data value and the horizontal axis location gives the observation number or index 122 GETTING STARTED WITH SIMPLE PLOTS If you have a vector x which is complex plot plots the real part of x on the horizontal axis and the imaginary part on the vertical axis For example a set of points on the unit circle in the complex plane can be plotted as follows gt unit circle lt complex arg seq pi pi length 20 gt plot unit circle 1 0 e wo o T 2 Se eo E e Te S e a T T T T T 1 0 0 5 0 0 0 5 1 0 Re unit circle Figure 6 2 Scatter plot of a single complex vector Plotting Mathematical Functions sufficiently dense set of plotting points You can obtain smooth solid line plots of mathematical functions with plot by using the optional argument type 1 to produce a plot with connected solid line segments rather than isolated points provided you choose a 123 CHAPTER 6 TRADITIONAL GRAPHICS For example to plot the mathematical function in the equation 0 10 y f x cos 2x 6 1 for x in the range 0 20 creat
315. with a sequential number of the same number of digits in the generated file names For example if you have a project involving the halibut data and you know your project will use fewer than 1000 graphics files you can set the tempfile option as follows to use the name of your data set gt ps options tempfile halibut HHF ps Specifying a Printer Command What happens to the file after it is created is determined by the command option The command option is a character string specifying the UNIX command used to print a graphic If file is specified and is neither a template nor an empty string the command option must be activated by 279 CHAPTER 8 WORKING WITH GRAPHICS DEVICES 280 some user action either choosing the Print option from a windowing graphics device specifying print TRUE in the printgraph function or specifying print it TRUE in the postscript function The default for command is the value of the environment variable S_POSTSCRIPT_PRINT_COMMAND Specifying Plot Orientation and Size You specify the plot orientation with the horizontal option TRUE for landscape mode x axis along long edge of paper FALSE for portrait Most figures embedded in documents should be created in portrait mode because that is the usual orientation of documents The default is the orientation specified by the S_PRINT_ORIENTATION which by default is set to TRUE that is landscape mode If you specify an orientation with your g
316. wo levels The default labels for the two scales are the names of the levels o 35 4 H o o o 30 7 o T 5 o O e o 25 4 a T T T 25 30 35 Compact Figure 7 8 gqplot GENERAL DISPLAY FUNCTIONS qqmath Normal probability plots or normal qqplots are the single most powerful tool for determining if the distribution of a set of measurements is well approximated by the normal distribution Figure 7 9 is a normal probability plot of the mileages for small cars gt qqmath Mileage data fuel frame subset Type Smal1 That is the ordered data are graphed against quantiles of the standard normal distribution The formula for qqmath is used in a way unlike any of the previous examples Only one data object appears in the formula to the right of the because this graphical method utilizes only one data object If we used gt qqmath Mileage data fuel frame subset Type Smal1 aspect 1 distribution qexp the result would be an exponential probability plot Note that the name of the function appears as the default label on the horizontal scale of the plot 36 E 32 4 o o L Mileage 30 a 26 4 o o es qnorm Figure 7 9 Normal probability plot 215 CHAPTER 7 TRADITIONAL TRELLIS GRAPHICS dotplot 216 The dot plot which displays data with labels provides highly accurate visual decodings typically far more accurate than other methods for dis
317. wubios wustl edu To get off this list send a message with body unsubscribe to the same address Once enrolled on the list you will begin to receive e mail To send a message to the S news mailing list send it to s news wubios wustl edu Do zor send subscription requests to the full list use the s news request address shown above MathSoft Educational Services offers a variety of courses designed to quickly make you efficient and effective at analyzing data with S PLUS The courses are taught by professional statisticians and leaders in statistical fields Courses feature a hands on approach to learning dividing class time between lecture and online exercises All participants receive the educational materials used in the course including lecture notes supplementary materials and exercise data on diskette S Press is a free quarterly newsletter about S PLUS mailed to primary users of S PLUS S Press features stories by S PLUS users in industry and academia a technical support column and provides new product announcements and other information from MathSoft CHAPTER WELCOME TO S Plus Technical Support Books on Data Analysis Using S PLUS In North America to contact technical support call 206 283 8802 ext 235 or fax to 206 283 6310 or send e mail to support statsci com In Europe Asia Australia Africa and South America call 44 1276 475350 or fax to 44 1276 451224 or email to shelp mathsoft co
318. y be affected or impaired thereby CONTENTS OVERVIEW Introduction Chapter Welcome to S PLus Chapter 2 Getting Started Chapter 3 Importing and Exporting Data Data Structures Chapter 4 Data Objects Chapter 5 Data Frames Graphics Chapter 6 Traditional Graphics Chapter 7 Traditional Trellis Graphics Chapter 8 Working With Graphics Devices Advanced Topics Chapter 9 Customizing Your S PLUS Session Index 53 75 97 119 201 271 311 329 CONTENTS OVERIVEW vill CONTENTS Chapter Welcome to S PLus Introduction Help Support and Learning Resources Getting Help Chapter 2 Getting Started Running S PLUS Starting S PLUS and Entering Expressions Quitting S PLUS Basic Syntax and Conventions Command Line Editing Getting Help in S PLUS Reading S PLUS Help Files S PLUS Language Basics Data Objects Managing Data Objects Functions Operators Optional Arguments to Functions Access to UNIX Importing and Editing Data Reading a Data File Editing Data Built in Data Sets Quick Hard Copy Adding Row And Column Names Extracting Subsets of Data Graphics in S PLUS Making Plots Quick Hard Copy Using the Graphics Window Multiple Plot Layout pi NYOVvVvoO0N NN ARAARA BBR GO GH OG DH OH WO ON NHN m m Mod RN DN ON RD DDN HH DN DH CO DOA WN CONTENTS Statistics Summary Statistics Hypothesis Testing Statistical Models Chapter 3 Importing and Exporting Data Importing Data Files Setting t
319. y of the delimiters specified in the Delimiters field Specify one column name for each imported column for example Apples Oranges Pears You can use an asterisk to denote a missing name for example Apples Pears An integer denoting which column is to be used as the row names for the resulting data frame If specified the column of row names is dropped from the resulting data frame A single character string specifying the format for formatted ASCII text files type FASCII See notes on Importing ASCII Files See the section Setting the Import Filter Starting column in source from 1 to n For example if you specify 5 S PLUS reads the columns beginning with column 5 and places them in the new data frame beginning at the Target Start Column Spreadsheet style letters for example A AB can be used to specify the start and end columns to import End column in source The default 1 means to read to the last column IMPORTING DATA FILES Table 3 1 Arguments to importData Argument Required Description startRow endRow pageNumber colNameRow server user password database table stringsAsFactors sortFactorLevels Optional Optional Optional Optional Optional Optional Optional Optional Optional Optional Optional Starting row from range in source Spreadsheets only For example if you specify row 10 S PLUS reads the row
320. yi mystuff2 ps PS Adobe 3 0 melitle S PLUS Graphics eCreator S PLUS For Rich Calaway x240 meCreationDate Thu Jul 30 21 45 21 1992 BoundingBox 20 11 592 781 Pages atend 275 CHAPTER 8 WORKING WITH GRAPHICS DEVICES Warning If you want to both print the graphic and keep the named PostScript file be sure that the UNIX print command does not delete the printed file For example on some computers the default value of ps options command which is determined by the environment variable S_POSTSCRIPT_PRINT_COMMAND is 1pr r h where the r flag causes the printed file to be deleted The following call to postscript replaces this default with a command that does not delete the file gt postscript file mystuff2 ps print it T command Ipr h Using postscript directly can be cumbersome since you dont get immediate feedback on graphics produced incrementally You can however build a graphics function incrementally using a windowing graphics device or graphics terminal Then when the graphics function works well on screen start a postscript device and call your graphics function Such an approach will result in fewer hard copies for the recycling bin For example consider the complicated graphic constructed in section Adding Special Symbols to Plots page 206 We can combine the commands of that section into a single function as follows gt usasymb plot function select lt c
321. you want numeric summaries of each variable computed for each level use by when you want to use all the data to construct a model for each level The aggregate function allows you to partition a data frame or a matrix by one or more grouping vectors and then apply a function to the resulting columns The function must be one that returns a single value such as mean or sum You can also use aggregate to partition a time series univariate or multivariate by frequency and apply a summary function to the resulting time series For data frames aggregate returns a data frame with a factor variable column for each group or level in the index vector and a column of numeric values resulting from applying the specified function to the subgroups for each variable in the original data frame gt aggregate state x7 7 c Population Area by state division FUN sum Group Population Area 1 New England 12187 6295 2 Middle Atlantic 37269 100318 South Atlantic 32946 266909 4 East South Central 13516 178982 5 West South Central 20868 427791 6 East North Central 40945 244101 7 West North Central 16691 507723 8 Mountain 9625 856047 9 Pacific 28274 891972 APPLYING FUNCTIONS TO SUBSETS OF A DATA FRAME Warning For most numeric summaries all variables in the data frame must be numeric Thus if we attempt to repeat the above example with the kyphosis data using kyphosis as the by variable we get an error gt aggrega
322. ype in an expression at the S PLUS gt prompt and S PLUS responds Among the simplest S PLUS expressions are arithmetic expressions such as the following gt Bri 1 10 X orel 1 63 The symbols and represent S PLUS operators for addition and multiplication respectively In addition to the usual arithmetic and logical operators S PLUS has special operators for special purposes For example the colon operator is used to obtain sequences ey Lae fi 12345 6 7 RUNNING PLus Quitting S PLus Basic Syntax and Conventions Spaces The 1 in each of the output lines is the index of the first S PLUS response on the line of S PLUS output If S PLUS is responding with a long vector of results each line is preceded by the index of the first response of that line The most common S PLUS expression is the function call An example of a function in S PLUS is the c function used for combining comma separated lists of items into a single item Functions calls are always followed by a pair of parentheses with or without any arguments in the parentheses 7 03 4 1 56 1 3416 In all of our examples to this point S PLUS has simply returned a value To reuse the value of an S PLUS expression you must assign it with the lt operator For example to assign the above expression to an S PLUS object named newvec youd type the following gt newvec lt c 3 4 1 6 S PLUS creates the
Download Pdf Manuals
Related Search
Related Contents
Veilux RMS User`s Manual BS1000 messenger to php web server GE 3504502P327 User's Manual Guide du tri sélectif complet - Mairie de Rueil Mode d`emploi 406J-A - FMミニ放送 Bedienungsanleitung Espresso Perfect Crema Plus Swingline CombBind Fax-Client-Handbuch Copyright © All rights reserved.