Home

GRAP A Language for Typesetting Graphs Tutorial and User Manual

image

Contents

1. X This program re expresses the x axis with GRAP arithmetic and uses an if statement to graph only part of the data file It produces 80 60 Population x re expressed gt in Millions date 1600 l 100 20 odl 1800 1850 1900 The EQN space 0 clause is necessary to keep EQN from adding extra space that would interfere with positions computed by GRAP see Section 4 The file army d contains four related time series describing the United States Army 40 16 9 249 1 42 43 44 45 83 190 521 692 TED 80 12 2867 36 6358 47 7144 62 7283 9 606 l 55 71 90 67 The first field is the year the next four fields give the number of male officers female officers enlisted males and enlisted females each in thousands Actually there were no female enlisted personnel in the Army until 1943 the value 1 in 1940 and 1942 is just a placeholder since GRAP has no mechanism for handling missing data The following GRAP program draws the four series with four different sets of draw and next commands 72 coord x 38 85 y 8 10000 log y label bot U S Army Personnel label left Thousands left 3 draw of solid Officers Female draw ef dashed Enlisted Female draw om dotted Officers Male draw em solid Enlisted Male copy army d thru X next of at 1 3 next ef at 1 5 next om at 1 2 next em at 1 4 X copy thru 1 2 size 3 at 60 3 un
2. f J J 0 1 Direction field is y x7 y Programmers familiar with floating point arithmetic may be surprised that the above graph is correct Because of roundoff error iteration from 0 to 1 by 05 usually produces the values 0 05 10 95 GRAP uses a fuzzy test in the for statement to avoid that problem which may in turn introduce other problems Such problems may be avoided by iterating over an integer range and incre menting a non integer value within the loop Most of the data we have seen so far is inherently two or more dimensional As an example of one dimensional data we will return to the populations of the fifty states which is the third field in the file states d introduced on page 9 the file is sorted in increasing order of population Our first graph takes the most space but it also gives the most information frame ht 4 wid 5 label left Rank in Population label bot Population Millions label top Slog sub 2S Population coord x 3 30 y 0 51 log x define L exp 1 log 2 le6 1 ticks DOE Out at b di 25 Sy k 2 0 ticks left out from 10 to 50 by 10 ticks top out at L 19 L 20 L 21 L 22 L 23 L 24 thisy 50 copy states d thru X 1 size 4 at 3 le6 thisy thisy thisy 1 X line dotted from 15 3 1 to 515 50 The L macro for Label with input parameter X evaluates to the number 2 1 000 000 followed by the string X the ticks command expe
3. JEG H SOS 2 vs 2 frame ht 2 5 wid 2 5 label left Weight Pounds left 3 label bot Gallons per Mile coord x 0 10 y 0 5000 ticks left from 0 to 5000 by 1000 ticks bot from 0 to 10 by 02 copy cars d thru X circle at 1 1 2 X svS 2 ps 2 LE GRAP supports logarithmic re expression of data with the log clause in the coord statement any other re expression of data must be done with GRAP arithmetic as above 5000 4000 3000 Weight Pounds 2000 ods gs 1000 o o oo oo o o o o 0 0 02 0 04 0 06 0 08 0 1 Gallons per Mile This graph shows that gallons per mile is roughly proportional to weight The two outliers near 4000 pounds are the Cadillac Seville and the Oldsmobile 98 In Visual Display of Quantitative Information Tufte proposes the dot dash plot as a means for maximizing data ink showing the two dimensional distribution and the two one dimensional marginal dis tributions while minimizing what he calls chart junk ink wasted on borders and non data labels His preference is easy to express in GRAP frame coord copy invis ht 3 wid 3 x 0 10 y 0 5000 earssd thru tx tx 1 1 ty 2 bullet at tx ty tick bot at tx tick left at ty X n Although visually attractive we do not find the resulting graph as useful for interpreting the data 14 eo oe o Tufte
4. a synonym for log x log y label left Representatives to Congress left 3 label bot Population Millions coord x 3 30 y 8 50 log log define PlotState X circle at 3 1le6 2 X copy states d thru PlotState Although the population is given in persons the PlotState macro plots the population in millions by dividing the third input field by one million written in exponential notation as 1e6 for 1x10 20 iames o2 oF 10 wo Representatives ey to Congress 5 a 2 EEE O o 1 00 000 1 10 Population Millions Using circle as a plotting symbol displays overlapping points that are obscured when the data is plotted with bullets The representation of a state is roughly proportional to its population except in the very small states Our next plot will use the state s rank in population as the x coordinate and two different y coordinates population and number of representatives We will use two coord commands to define the two coordinate systems pop and rep We then explicitly give the coordinate system whenever we refer to a point both in constructing axes and plotting data frame ht 3 wid 3 5 label left Population in Millions Plotted as bu label bot Rank In Population up 2 label right Representatives Plotted as sq coord pop x 0 51 y 2 30 log y coord rep x 0 51 y 3 100 log y ticks left out at pop 3 1 3 10 30 ticks bot out at pop 1 50 ticks right out at r
5. they are interpreted as by printf If no str is supplied the tick labels will be the values of the expressions If the by clause is omitted steps are of size 1 If the by expression is preceded by one of or the step is scaled by that operator e g 10 means that each step is 10 times the previous one 35 The grid statement produces grid lines along i e perpendicular to the named side grid grid side linedesc shift tick locations Grids are labeled by the same mechanism as ticks Plot statements place text at a point plot strlist at point plot expr str at point point name expr expr As in the label statement the string list may contain position and size modifiers The plot statement uses the optional format string as in the C printf statement it may contain a f or g The optional name refers to a coordinate system The line statement draws a line or arrow from here to there line line arrow from point to point linedesc The circle statement draws a circle circle circle at point radius expr The radius is in inches the default size is small The draw statement defines a sequence of lines draw draw name linedesc str Subsequent data for the named sequence will be plotted as a line of the specified style with the optional str plotted at each point The next statement continues a sequence next next name at point linedesc If a line description is specif
6. through them We will use the standard least squares regression in which n xy IxLy ndx Lx where the summations range over all n x and y values in the data set and the y intercept is slope 2y slopex2x n The following GRAP program boldly and rather foolishly implements that formula label left Heights in Feet Median and fifth percentiles label bot Heights of Boys in U S ages 2 to 18 cmpft 30 48 Centimeters per foot minx 1lel2 maxx lel2 n sigx sigx2 sigy sigxy 0 copy boyhts d thru X line from 1 2 cmpft to 1 4 cmpft ty 3 cmpft bullet at 1 ty n ntl sigx sigx 1 sigx2 sigx2 1 S1 sigy sigytty sigxy sigxy 1 ty minx min minx 1 maxx max maxx 1 X Calculate least squares fit and draw it slope n sigxy sigx sigy n sigx2 sigx sigx inter sigy slope sigx n print slope print inter line from minx slope minxtinter to maxx slope maxxtinter It plots the extreme fifth percentiles as a bar through the median which is plotted as a bullet All heights are converted to feet before plotting and calculating the regression line 18 6 t 5 4 Heights in Feet 4 K Median and Lt fifth percentiles 4 _ a i 44 5 10 15 Heights of Boys in U S ages 2 to 18 GRAP print statements write on stderr as they are processed by GRAP their single argument can be either an expression or a strin
7. 52 1972 51 08 1976 49 29 1980 48 88 To add these times to the graph we use frame invis ht 2 wid 3 left solid bot solid label left Time in seconds left 2 label bot Olympic 400 Meter Run Winning Times coord x 1894 1982 y 42 56 ticks left out at 44 44 46 48 48 50 52 52 ticks bot in from 1900 to 1980 by 20 draw solid copy 400mpairs d new dotted copy 400wpairs d The file 400wpairs d contains the times for the women s 400 meter race which has been run only since 1964 54 Women Men size 3 at 1958 52 size 3 at 1910 47 The new command tells GRAP to end the old curve and to start a new curve which in this case will be drawn with a dotted line Text is placed on the graph by commands of the form string at xvalue yvalue The size clauses following the quoted strings tell GRAP to shrink the characters by three points absolute point sizes may also be specified Strings are usually centered at the specified position but can be adjusted by clauses to be illustrated shortly Time in seconds 48 _ 44 1900 1920 1940 1960 1980 Olympic 400 Meter Run Winning Times The file phone d records the number of telephones in the United States from 1900 to 1970 00 01 02 03 04 w ANH F WOW OO W 70 120 2 Each line gives a year and the number of telephones present in that year in millions truncated to the near est hundred thousand The simple GRA
8. Tukey or Visual Display of Quantitative Information by Tufte ttt Throughout this document we will show only the first five lines and the last line of data files omitted lines are indi cated by 45 4 The graph shows the decrease in winning times from 54 2 seconds to 44 60 seconds If the times are con tained in the file 400mt imes d we could produce the same graph with the shorter program copy 400mtimes d Writing copy fname in a GRAP program is equivalent to including the contents of file fname at that point in the file In the interests of compatibility with other programs include is a synonym for copy Each line in the file 400mpairs d contains two numbers the year of the Olympics and the win ning time 1896 54 1900 49 1904 49 1908 50 1912 48 NON BN 1980 44 60 If we plot this data with the program copy 400mpairs d the bottom x axis represents the year of the Olympics 50 4 45 4 e 1900 1920 1940 1960 1980 The holes in x values reflect the fact that the 1916 1940 and 1944 Olympics were cancelled due to war Because the previous data in 400mtimes d had just one number per line GRAP viewed it as a time series and supplied x values of 1 2 3 before plotting the data as y values The input to the second program has two values per line so they are interpreted as x y pairs Rather than a scatter plot o
9. accounted for more than half of the run time This file is difficult for GRAP to deal with even though if statements would allow us to extract lines 2 through 11 of the file we could not remove the leading _ from a routine name or access the last field in a record We will therefore process it with the following AWK program awk NR 2 NR 11 print 1 substr SNF 2 lt profl d gt prof2 d The program produces 21 1 yylook 11 2 yyparse 9 3 _doprnt 9 1 write 5 9 input 2 0 nextchar We could even use the sh statement to execute the AWK program from within the GRAP program which would make the latter entirely self contained see the reference manual for details We will display the data with this program 30 ticks left off cury 0 barht 7 copy prof2 d thru X line from 0 cury to 1 cury line from 1 cury to 1 cury barht line from 0 cury barht to 1 cury barht 2 ljust at 0 cury barht 2 cury cury 1 X line from 0 0 to 0 cury l barht bars cury frame invis ht bars 3 wid 3 Observe that the program knows nothing about the range of the data It uses default ticks and a frame statement with a computed height to achieve total data independence yylook yyparse _doprnt write input print sprintf unput yylex nextchar 0 5 10 15 20 This bar chart highlights the fact that m
10. i l i lt n i EE 5110 print TY print graph A s s with Frame w at A Frame e 1 0 print frame ht 5 n wid 5 n print label bot xlabel i if i 1 print label left ylabel if logtext print coord print ticks off print copy fname thru X symtext at xfield i yfield x print 262 eae Running this program on the above description produces the following output which is typically piped yaz directly to GRAP graph A frame ht 1 66667 wid 1 66667 label bot Male_Officers label left Enlisted_Men coord log x log y ticks off copy army d thru X s 3S1 s 3 at 2 4 X graph A with Frame w at A Frame e 1 0 frame ht 1 66667 wid 1 66667 label bot Female_Officers coord log x log y ticks off copy army d thru X s 3S1 s 3 at 3 4 X graph A with Frame w at A Frame e 1 0 frame ht 1 66667 wid 1 66667 label bot Enlisted_Women coord log x log y ticks off copy army d thru X s 3 1 st 3 at 5 4 X The generated program uses the PIC trick of re using the same name A for several objects Although the program above is merely a toy minicompilers can produce useful preprocessors for GRAP The scatmat program for instance is a 90 line AWK program that reads a simple input language and produces as output a GRAP program to produce a scatterplot matrix which is a handy graphical device for spotting pairwise
11. in states2 d 2 BwWNH EF O Non or 23 1 There are 12 states with population between 0 and 999 999 5 states with population between 1 000 000 and 1 999 999 and so on This GRAP program uses three 1 ine commands to plot each rectangle in the histogram a OT frame invis bot solid label bot Populations in Millions of the 50 States label left Number of States left 3 ticks bot out from 0 to 25 by 5 coord x 0 25 y 0 13 copy SsStates2 d thru X line from 1 0 to 1 2 line from 1 2 to 1 1 2 line from 1 1 2 to 1 1 0 X It produces 10 Number of States 5 9 0 5 10 15 20 25 Populations in Millions of the 50 States The same file can be plotted in a more attractive and more useful form by frame invis bot solid left solid label bot Populations in Millions of the 50 States label left Number of States left 3 ticks bot out from 0 to 25 by 5 coord x 0 25 y 0 13 copy states2 d thru X line dotted from 1 5 0 to 1 5 2 bu size 3 at 1 5 2 X which produces one of Bill Cleveland s dot charts or lolliplots 28 10 Number of e e States eal oor ae 0 l Pe l n i l i i 0 5 10 15 20 25 Populations in Millions of the 50 States We use bu the TROFF character for a bullet rather than the built in string to get a larger size Other histograms are po
12. interactions among several variables If GRAP lacks a feature you desire con sider building a simple preprocessor to provide it An alternative is to define macros for the task which approach is best depends strongly on the job you wish to accomplish The next graph uses iterators to make a graph without reading data from a file Rather its data is a function of two variables that describes a derivative field and a function of one variable that describes one solution to the differential equation frame ht 2 5 wid 2 5 coord x 0 1 y 0 1 label bot Direction field is y sup prime x sup 2 y label left Sy sqrt 2x sup 3 1 3 right 3 ticks left in 0 at 0 1 ticks bot in 0 at 0 1 len 04 for tx from 01 to 91 by 1 do for ty from 01 to 91 by 1 do deriv tx tx ty scale len sqrt 1l deriv deriv line from tx ty to txt scale tyt scale deriv draw solid for tx 0 to 1 by 05 do next at tx sqrt 2 tx tx tx 1 3 The left label uses EQN text between the delimiters The variable scale ensures that all lines in the direction field are the same length The in clauses in the ticks statements specify that the ticks go in zero inches to avoid overprinting The variables tx and ty are so named because x and y are reserved words for the coord statement 24 Gf A Je hee tif ae eos Be tee oe ee Pe ee CF oy ee oe eK OT Lr Be pe OH re ve bas ee Pe fhe ad oap A 0
13. this graph were to tell a story about American politics rather than to illustrate multiple coordinate systems it should be redrawn with a single coordinate system Many graphs plot both observed data and a function that theoretically describes the data There are many ways to draw a function in GRAP a series of next commands is tedious but works as does writing a simple program to write a data file that is subsequently read and plotted by the GRAP program The for statement often provides a better solution This GRAP program frame ht 1 wid 3 draw solid pi atan2 0 1 for i from 0 to 2 pi by 1 do next at i sin i produces j 0 5 0 0 5 1 The for statement uses the same syntax as the ticks statement but the from keyword can be replaced by which will look more familiar to programmers It varies the index variable over the specified range and for each value executes all statements inside the delimiter characters which use the same rules as macro delimiters Itis of course useful for many tasks beyond plotting functions I2 The if statement provides a simple mechanism for conditional execution If a file contains data on both cities and states and lines describing states have S in the first field it could be plotted by state ments like if UST gt tot then 4 PlotState 2 3 4 else PlotCity 2 3 4 5 6 The else clause is optional delimit
14. December 1984 AT amp T Bell Laboratories 600 Mountain Avenue Murray Hill NJ 07974 Computing Science Technical Report No 114 GRAP A Language for Typesetting Graphs Tutorial and User Manual Jon L Bentley Brian W Kernighan GRAP A Language for Typesetting Graphs Tutorial and User Manual Jon L Bentley Brian W Kernighan AT amp T Bell Laboratories 600 Mountain Avenue Murray Hill NJ 07974 ABSTRACT GRAP is a language for describing plots of data This graph of the 1984 age distri bution in the United States 5 4_ Population in millions 0 20 40 60 80 1984 Age is produced by the GRAP commands coord x 0 89 y 0 5 label left Population in millions label bottom 1984 Age draw solid copy agepop d Each line in the data file agepop d contains an age and the number of Americans of that age alive in 1984 the file is sorted by age The GRAP preprocessor works with PIC and TROFF Most of its input is passed through untouched but statements between G1 and G2 are translated into PIC com mands that draw graphs December 1984 GRAP A Language for Typesetting Graphs Tutorial and User Manual Jon L Bentley Brian W Kernighan AT amp T Bell Laboratories 600 Mountain Avenue Murray Hill NJ 07974 1 Introduction GRAP is a language for describing graphical displays of data It provides such services as automatic scaling and labeling of axes and for state
15. P program copy phone d produces the simple graph e e e 100 4 e o yg a e lt 50 Pa e g ae 0 20 40 60 The number of telephones appears to grow exponentially to study that we will plot the data with a logarithmic y axis by adding log y to the coord command We will also add cosmetic changes of labels more ticks and a solid line to replace the unconnected dots label left Millions of Telephones log scale left 5 coord x 0 70 y 1 130 log y ticks left out at 1 2 5 10 20 50 100 ticks bot out at 0 1900 70 1970 ticks bot out from 10 to 60 by 10 g draw solid copy phone d The third ticks command provides a string that is used to print the tick labels C programmers will rec ognize it as a printf format string others may view the g as the place to put the number and any thing else in this case just an apostrophe as literal text to appear in the labels To suppress labels use the empty format string The program produces 100 50 4 Millions of aN Telephones 10 _ log scale 5 2 1 1900 10 20 30 40 50 60 1970 The number of telephones grew rapidly in the first decade of this century and then settled down to an expo nential growth rate upset only by a decrease in the Great Depression and a post war growth spurt to return the curve to its pre Depression line Our presentation so far has been to s
16. a period that are not numbers are passed through literally under the assumption that they are TROFF commands The graph statement graph graph Picname pic text defines a new graph named Picname resetting all coordinate systems If any graph commands are used in a GRAP program then the statement after the G1 must be a graph command The pic text can be used to position this graph relative to previous graphs by referring to their Frames as in graph First graph Second with Frame w at First Frame e 0 1 0 Macros and expressions in pic text are not evaluated Picnames must begin with a capital letter to satisfy PIC syntax The print statement print print expr str writes on stderr as GRAP processes its input it is sometimes useful for debugging 237 Many reserved words have synonyms such as thru for through tick for ticks and bot for bottom The introduces a comment which ends at the end of the line Statements may be continued over several lines by preceding each newline with a backslash character Multiple statements may appear on a single line separated by semicolons GRAP ignores any line that is entirely blank including those processed by copy thru commands When GRAP is first executed it reads standard macro definitions from the file usr lib grap defines The definitions include bullet plus box star dot times htick vtick square and delta 38 Summary of GRAP Commands In the f
17. copy 400mpairs d The y axis now ranges from 42 to 56 seconds a little more than before and the x axis from 1894 to 1982 a little less 55 Time 50 in seconds 45 1900 1920 1940 1960 1980 Olympic 400 Meter Run Winning Times The ticks in the preceding graphs were generated by GRAP guessing at reasonable values If you would rather provide your own you may use the ticks command which comes in the flavors illustrated below frame invis ht 2 wid 3 left solid bot solid label left Time in seconds left 2 label bot Olympic 400 Meter Run Winning Times coord x 1894 1982 y 42 56 ticks left out at 44 44 46 48 48 50 52 52 54 ticks bot in from 1900 to 1980 by 20 draw solid copy 400mpairs d The first ticks command deals with the left axis it puts the ticks facing out at the numbers in the list GRAP puts labels only at values with strings except that when no labels at all are given each number serves as its own label as in the second ticks command That command is for the bottom axis it puts the ticks facing in at steps of 20 from 1900 to 1980 The command ticks off turns off all ticks GRAP does its best to place labels appropriately but it sometimes needs your help the left 2 clause moves the left label 0 2 inches further left to avoid the new ticks 52 Time in seconds 48 _ 44 1900 1920 1940 1960 1980 Olympic 400 Meter Run Winning Times 1964 52 1968
18. cts a number followed by a string label 25 log Population 19 20 21 22 23 24 50 4 AK WY gt ha Bae DE ND SD MT NV NS 40 4 RI HI ME NM UT NE ve WV e AR i Kiis 30 4 OR Sp AZ i 2 Rank in 10k Population S RY AL MN 20 WA LA MD TN WI MO VA GA os IN oa MA Si 10 4 NC NJ gt MI FL Ou IL PA TX ONY CA 0 5 1 2 5 10 20 Population Millions The dotted line is the least squares regression log jg Population 7 214 03xRank which gives 15 3 million as the population of the largest state and 515 million as the population of the smallest state It says that population drops by a factor of two every ten states compare the top and left scales As sloppy as the exponential fit is though it is a much better fit to this data than a Zipf s Law curve is drawing that curve is left as an exercise for the reader The next graph is a more standard representation of one dimensional data frame invis ht 3 wid 5 bottom solid label bot Populations in Millions of the 50 States coord x 3 30 y 0 1 log x ticks bot out ati sS hy Zo 5 105 20 ticks left off copy states d thru X vtick at 3 le6 5 X The markers were chosen to be vt icks because they denote only an x value 0 5 1 2 5 10 20 Populations in Millions of the 50 States 26 The next one dimensional graph uses th
19. e The optional expressions after dotted and dashed change the spacing exactly as in PIC The label statement places a label on a specified side label label side strlist shift shift left right up down expr strlist str cjust Ljust above below size expr str Lo Lists of text strings are stacked vertically In any context string lists may contain clauses to adjust the position or change the point size Each clause applies to the string preceding it and all following strings Normally the coordinate system is defined by the data with 7 percent extra on each side To change that to 5 percent assign 0 05 to the GRAP variable margin which is reset to 0 07 at each G1 statement The coord statement defines an overriding system coord coord name x expr expr y expr expr Log x log y log log Coordinate systems can be named ranges logarithmic scaling etc are done separately for each The ticks statement places tick marks on one side of the frame ticks ticks side in out expr shift tick locations tick locations at name expr str expr str from name expr to expr by op expr str If no ticks are specified they will be provided automatically ticks off suppresses automatic ticks The optional expression after in or out specifies the length of the ticks in inches The optional name refers to a coordinate system If str contains format specifiers like f or g
20. e state s name as its marker to reduce overprinting the graph is jittered by using a random number as a y value frame invis ht 1 wid 5 bottom solid label bot Populations in Millions of the 50 States coord x 3 30 y 0 1000 log x ticks bot ouk at 5 1 2 5 10 20 ticks left off copy states d thru X 1 size 4 at 3 1e6 100 900 rand X The function rand returns a pseudo random real number chosen uniformly over the interval 0 1 ND NH NE MS IN ee KS AZ AK ee te CT VA zm NM me NY CA MT a IL IA m CA N MI WYT NV i ii Ri me ur AR OSC An wio MA PA Wv co nes NC OH 0 5 1 2 5 10 20 Populations in Millions of the 50 States This graph is too cluttered circles would have been a better choice as a plotting symbol bullets once again would hide data Histograms are a standard way of presenting one dimensional data in two dimensional form Our first step in building a histogram of the population data is the following AWK program which counts how many states are in each bin of a million people awk BEGIN bzs 0 bw le6 bin zero start bin width count int 3 bzs bw END for i in count print i count i lt states d sort n gt states2 d The variable bzs tells where bin zero starts although it is zero in this graph it might be 95 in a histogram of human body temperatures in degrees Fahrenheit The program produces the following output
21. ep 1 3 10 30 100 thisrank 50 copy states d thru X bullet at pop thisrank 3 le6 square at rep thisrank 2 thisrank thisrank 1l X The copy statement in the program uses an immediate macro enclosed in X s and thus avoids having to name a macro for this task Because the program assumes that the states are sorted in increasing order of population it generates thisrank internally as a GRAP variable The program produces I 30 100 10 ie ee 10 se eee 34 Ee ee a Representatives in Millions Sre Plotted as O Plotted as e m a 3 1 cl 1 0 3 4 1 Rank In Population 50 The plotting symbols were chosen for contrast in both shape and shading This graph also indicates that representation is proportional to population Once we see this graph though we should realize that we don t really need two coordinate systems we can relate the two by dividing the population of the U S about 226 000 000 by the number of representatives 435 to see that each representative should count as 520 000 people If the purpose of
22. ers use the same rules as macros and for statements 3 A Collection of Examples The previous section covered the GRAP commands that are used in common graphs In this section we ll spend less time on language features and survey a wider variety of graphs These examples are intended more for browsing and reference than for straight through reading You should be prepared to refer to the manual in Section 5 when you stumble over a new GRAP feature The file cars d contains the mileage miles per gallon and the weight pounds for 74 models of automobiles sold in the United States in the 1979 model year 22 2930 17 3350 22 2640 17 2830 23 2070 T A370 The trivial GRAP program copy cars d produces 5000 e e e 4000 Bes ge ae i ff 3000 e e Ss t e e 8 2000 ee a e 8 e 10 20 30 40 This graph shows that weights bottom out somewhat below 2000 pounds and that heavier cars get worse mileage it is hard to say much more about the relationship between weight and mileage The next graph provides labels uses circles to expose data hidden in the clouds of bullets and re expresses the x axis in gallons per mile It also changes the point size and vertical spacing to a size appro priate for camera ready journal articles and books the size changes should be made outside the GRAP pro gram The ft command changes to a Helvetica font which some people prefer for graphs f 32
23. f points we might prefer to see the winning times connected by a solid line The program draw solid copy 400mpairs d produces the graph 50 45 1900 1920 1940 1960 1980 Eric Liddell of Great Britain won his gold medal in Paris in 1924 with a time of 47 6 seconds Remember Chariots of Fire We can make the graph more attractive by modifying its frame and adding labels frame invis ht 2 wid 3 left solid bot solid label left Time in seconds label bot Olympic 400 Meter Run Winning Times draw solid copy 400mpairs d The frame command describes the graph s bounding box the overall frame which has four sides is invisible it is 2 inches high and 3 inches wide which happen to be the default height and width and the left and bottom sides are solid they could have been dashed or dotted instead The labels appear on the left and bottom as requested Time in seconds 45 1900 1920 1940 1960 1980 Olympic 400 Meter Run Winning Times To set the range of each axis GRAP examines the data and pads both dimensions by seven percent at each end The coord coordinates command allows you to specify the range of one or both axes explicitly it also turns off automatic padding frame invis ht 2 wid 3 left solid bot solid label left Time in seconds label bot Olympic 400 Meter Run Winning Times coord x 1894 1982 y 42 56 draw solid
24. g The print statements which are commented out in the above GRAP program at one time showed that the regression line is Height in Feet 2 61 19xAge Thus for most American boys between 3 and 16 you may safely assume that they started out life at 2 feet 7 inches and grew at the rate of two and a quarter inches per year This program probably misapplies GRAP if you really want to perform least squares regressions on data you should usually use a simple AWK program like awk x 1 x2 1 S1 y 2 xy 1 2 END slope NR xy x y NR x2 x x print y slope x NR slope x Be warned though that this program is not numerically robust While we re on the subject of fitting straight lines to data we ll redraw three graphs from J W Tukey s Exploratory Data Analysis The file usapop d records the population of the United States in millions at ten year intervals 1790 3 93 1800 Jeol 1810 7 24 1820 9 64 1830 12 87 1950 150 7 Tukey s first two graphs indicate that the later population growth was linear while the early growth was exponential The following GRAP program plots them as a pair using graph commands to place inter nally unrelated graphs adjacent to one another 19 graph Linear coord x 1785 1955 y 0 160 label left Population in Millions left 2 label right Linear Scale Linear Fit ticks bot off copy usapop d define fit X 35 1 4 1 1870 X line from 1850 fit 1850 t
25. g X The else clause is optional Relational operators include gt gt lt lt and amp amp Strings may be compared with the operators and GRAP provides the same macro processor that PIC does define macro_name X anything X Subsequent occurrences of the macro name will be replaced by the string with arguments of the form n replaced by corresponding actual arguments Macro definitions persist across G2 boundaries as do values of variables The copy statement is somewhat overloaded copy filename includes the contents of the named file at that point copy filename thru macro_name copies the file through the macro and copy thru macro_name copies subsequent lines through the macro each number or quoted string is treated as an argument In each case copying continues until end of file or the next G2 The optional clause until str causes copying to terminate when a line whose first field is str occurs In all cases the macro can be specified inline rather than by name copy thru X macro body X The sh command passes text through to the Unix shell sh sh X anything X The body of the command is scanned for macros The built in macro pid is a string consisting of the pro cess identification number it can be used to generate unique file names The pic command passes text through to PIC with the pic removed variables and macros are not evaluated Lines beginning with
26. h Picname pic text print print expr str de fine define macro_name X anything X copy copy filename thru macro_name X macro body X until endstring sh sh X anything X pic pic anything assignment var expr X any single character or braces Predefined strings include bullet plus box star dot times htick vtick square and delta Built in functions include log base 10 exp base 10 int sin cos atan2 sqrt min max and rand
27. heir macros to copy into your GRAP programs The above program produces Highest Point i Heights of in 50 States 219 Volcanoes Feet q laska 20 000 Guallatiri 15 000 10 000 5 000 Florida Ilhanova Even though the extreme heights are the same state heights have a lower median and a greater spread 235 Someday you may use GRAP to prepare overhead transparencies only to find that everything comes out too small The following program illustrates some ways to get larger graphs ps 14 vs 18 frame ht 2 wid 2 label left Response Variable left 5 label bot Factor Variable line from 0 0 to 1 1 line dotted from 5 0 to 5 1 define blob X v 2m bu v 2m X blob at 0 5 blob at 5 5 blob at 1 5 ps VS The ps and vs commands preceding the graph set the text size to 14 points and the vertical spacing to 18 points the two quantities are reset by the commands following the G2 Such size changes should be made outside the GRAP program as mentioned earlier The 4 following the G1 stretches the graph including GRAP s estimate of the accompanying text to be four inches wide it is an alternative to altering the frame command The macro blob is a plotting symbol that is much larger than bullet the different name ensures that later references to bullet are unaffected The TROFF commands within the blob string move the character down two tenths of an em
28. ied it overrides the default display mode for the line segment ending at point The new statement starts a new sequence it has the same format as the draw statement A line consisting of a set of numbers is treated as a family of points x y1 y2 etc to be plotted at the single x value numberlist number If there is only one number it is treated as a y value and x values of 1 2 3 are supplied automatically GRAP provides arithmetic with the operators and Variables may be assigned to assign ments are expressions Built in functions include log exp both base 10 beware int truncates towards zero sin cos both use radians atan2 dy dx sqrt min two arguments only max ditto and rand returns a real number random on 0 1 The for statement provides a modest looping facility for for var from expr to expr by op expr do X anything X X is any single character that doesn t appear in the string If X is a left brace then the string may contain internally balanced braces and is terminated by a right brace The text anything which may contain new lines is repeated as var takes on values from expr to expr2 As with tick iterators the by clause is optional and may proceed arithmetically or multiplicatively In a for statement the from may be 6699 replaced by f 36 The if then else statement provides conditional evaluation if if expr then X anything X else X anythin
29. in a graph or balanced s and f commands within a string Do not however add space sp or change the line spac ing vs 1s within a graph Some defined terms like bullet contain embedded size changes further qualifying them with GRAP size commands may not always work Because GRAP is built on top of PIC the following quote from the PIC manual is relevant There is a subtle problem with complicated equations inside PIC pictures they come out wrong if EQN has to leave extra vertical space for the equation If your equation involves more than subscripts and superscripts you must add to the beginning of each such equation the extra information space 0 This feature was illustrated on page 20 Alternatives Besides GRAP and your local draftsperson what other choices are there The S system provides a host of tools for statistical analysis but somewhat fewer tools than GRAP for producing document quality graphs S produces graphs on the screen of a 5620 terminal much more quickly than GRAP often in seconds rather than minutes but it takes somewhat longer to learn at least for us If you expect to do a lot of interactive data analysis then S is probably the right tool for you S may be used to generate PIC commands The standard Unix program GRAPH provides many of the basic features of GRAP though with quite a bit less control over details particularly text It produces output only in the Unix plot language which ma
30. inal graphs all of which have logarithmic scales 22 345 42 Enlisted Men a 70 50 40 434445 42 46 3 15 8083 50 40 4445 42 46 70 75 883 50 40 Male_Officers Female Officers Enlisted Women The number of enlisted men is almost linearly related to the number of male officers it is somewhat related to the number of female officers and it varies widely as a function of the number of enlisted women Much more interesting than the graph itself is the method we used to produce it We wrote a minia ture compiler that accepts as its source language a description of a scatterplot vector and produces as object code a GRAP program to draw the graph The source program for the above example is file army d log x log y symbol s 3 1 s 3 y 4 Enlisted_Men x 2 Male_Officers x 3 Female_Officers x 5 Enlisted_Women The program lists several global attributes of the graph the y variable to be plotted and as many x variables as are desired with each variable is its field in the file and a descriptive string The language is compiled by the following AWK program xlabel n 3 wie logtext awk Parse all commands 1 file fname 2 1 log logtext 0 1 symbol symtext 2 Sl y yfield 2 ylabel 3 1 x n xfield n 2 Generate n graphs END print G1 for
31. ments if statements and macros to facilitate user programma bility GRAP is intended primarily for including graphs in documents prepared on the Unixt operating sys tem and is only marginally useful for elementary tasks in data analysis Section 2 of this document is a tutorial introduction to GRAP readers who find it slow going may wish to skim ahead The examples in Section 3 illustrate the various kinds of graphs that GRAP can pro duce and some common GRAP idioms Mundane matters about using GRAP are discussed in Section 4 and Section 5 contains a brief reference manual We have tried to illustrate good principles of statistics and graphical design in the graphs we present In several places though good taste has lost to the necessity of illustrating GRAP capabilities Readers interested in statistical integrity and taste should consult the literature t t 2 Tutorial The following is a simple GRAP program f t t G1 54 49 49 50 48 NON BN 44 60 G2 The single number on each line is the winning time in seconds for the men s 400 meter run from the first modern Olympic Games 1896 to the nineteenth 1980 If the file olymp g contains the text above then typing the command grap olymp g pic troff gt junk creates a TROFF output file junk that contains the picture t Unix is a Trademark of AT amp T Bell Laboratories tt See for instance Graphical Methods in Data Analysis by Chambers Cleveland Kleiner and
32. o 1950 fit 1950 graph Exponential with Frame n at Linear Frame s 0 05 coord x 1785 1955 y 3 160 log y label left Population in Millions left 2 label right Logarithmic Scale Exponential Fit copy usapop d define fit X exp 0 75 012 1 1800 X line from 1790 fit 1790 to 1920 fit 1920 The statements defining each graph are indented for clarity The second graph has the northern point of its frame 0 05 inch below the southern point of the frame of the first graph the with clause is passed directly through to PIC without being evaluated for macros or expressions The names of both graphs begin with capital letters to conform to PIC syntax for labels 150 100 4 Linear Scale Linear Fit Population in Millions 50 100 50 Logarithmic Scale Exponential Fit Population in Millions 20 10 1800 1850 1900 1950 Polynomial functions lie between the linear and exponential functions Tukey shows how a seventh degree polynomial provides a better and longer fit to the early population growth 20 label label left in Millions left 2 right x re expressed as Sspace 0 left date 1600 over 100 right newx X exp 7 log 1 1600 100 X bot out at newx 1800 1800 newx 1850 newx 1900 1900 copy usapop d thru X if 1 lt 1900 then bullet at newx 1 2 Population sup 7 left 1 2 defin ticks 1850
33. ollowing italic terms are syntactic categories typewriter terms are literals parenthesized constructs are optional and indicates repetition In most cases the order of statements constructs and attributes is immaterial grap program G1 width in inches grap statement G2 grap statement frame label coord ticks grid plot line circle draw new next graph numberlist copy for if pic assignment print define copy sh frame frame ht expr wid expr side linedesc side top bot left right linedesc solid invis dotted expr dashed expr label Label side strlist shift shift left right up down expr strlist str cjust Ljust above below size expr Ste coord coord name x expr expr y expr expr log x log y log log ticks ticks side in out expr shift tick locations tick locations at name expr str expr str from name expr to expr by expr str grid grid side linedesc shift tick locations plot strlist at point plot expr str at point point name expr expr line line arrow from point to point linedesc circle circle at point radius expr draw draw name linedesc str new new name linedesc str next next name at point linedesc numberlist number for for var from expr to expr by expr do X anything X if if expr then X anything X else X anything X graph grap
34. one mile run because its GRAP program is so similar to its automotive counterpart we won t show the program or data Time seconds 250 240 230 1900 1940 World Record One Mile Run The three graphs show three different kinds of changes Although horses are getting faster they appear to be approaching a barrier near two minutes Cars show great jumps as new technologies are introduced fol lowed by a plateau as limits of the technology are reached Milers have shown a fairly consistent linear improvement over this century but there must be an asymptote down there somewhere The next file gives the median heights of boys in the United States aged 2 to 18 together with the fifth and ninety fifth percentiles 2 3 4 5 6 18 82 89 95 102 107 165 86 94 109 94 102 109 sae 4 5 8 0 09 8 102 9 0 9 7 1 116 176 8 123 187 U O WOO nw 6 The heights are given in centimeters 1 foot 30 48 centimeters The trivial program Fs copy boyhts d displays the data as eee e e ev e e e e eet e e e 150 e e e e e e e e ee ee ee e WOE a toe e e 5 10 15 Because there are four numbers on each input line the first is taken as an x value and the remaining three are plotted as y values The three curves appear to be roughly straight at least up to age 16 so it makes sense to fit a line
35. ost of the time spent by GRAP is devoted to input and output J W Tukey s box and whisker plots represent the median quartiles and extremes of a one dimensional distribution The following GRAP program defines a macro to draw a box plot and then uses that shape to compare the distribution of heights of volcanoes with the distribution of heights of States of the Union 231 frame invis ht 4 wid 3 bot solid ticks off coord x 5 3 5 y 0 25 define Ht X 1 000 size 3 at 2 1 X Ht 5 Ht 10 Ht 15 Ht 20 Highest Point in 50 States at 1 23 Heights of 219 Volcanoes at 3 23 Feet at 2 21 5 arrow from 2 22 5 to 2 24 define box X x min 25 median 75 max minname maxname xc 1 xl xc boxwidth 2 xh xctboxwidth 2 yl 2 y2 3 y3 4 y4 5 y5 S6 bullet at xc yl 7 size 3 ljust at xc yl line from xc yl to xc y2 lo whisker line from xl y2 to xh y2 box bot line from xl y3 to xh y3 box mid line from xl y4 to xh y4 box top line from xl y2 to xl y4 box left line from xh y2 to xh y4 box right line from xc y4 to xc y5 hi whisker bullet at xc y5 8 size 3 ljust at xc y5 X boxwidth 3 box 1 3 2 0 4 6 11 2 20 3 Florida Alaska POMS sAr Sap 6557 9 5 19 9 Ilhanova Guallatiri Boxes are one of many shapes used for the graphical representation of several quantities If you use such shapes frequently then you should make a library file of t
36. s graph does point out two facts that are not obvious in the previous graphs there is a gap in car weights near 3000 pounds exhibited by the hole in the y axis ticks and the gallons per mile axis is regu larly structured the ticks are the reciprocals of an almost dense sequence of integers The reader may decide whether those insights are worth the decrease in clarity Throughout the twentieth century horses cars and people have gotten faster let s study those improvements For horses we ll consider the winning times of the Kentucky Derby from 1909 to 1983 in the file speedhorse d 126 126 125 129 124 oO A ORAN A222 The program label left Winning Time seconds left 3 label bot Kentucky Derby 1909 to 1983 bestsofar 1000 Greater than first time year 09 copy speedhorse d thru X bullet at year 1 bestsofar min bestsofar 1 line from year bestsofar to yeart l bestsofar year yeart l X produces the graph 15 130 ae Winning Time 125 A ih seconds Rae res 120 4 20 40 60 Kentucky Derby 1909 to 1983 06 127 10 131 11 141 19 149 20 T55 83 633 label bot World Land Speed Record label left Miles per Hour left 4 ticks bot out from 10 to 70 by 10 ticks bot out at 0 1900 40 1940 80 firstrecord 1 copy speedcar d thru if firstrecord 1 then firstrecord 0 else Each race is recorded with a bullet and record times are marked b
37. ssible The following AWK program awk BEGIN bzs 0 bw le6 bin zero start bin width thisbin int 3 bzs bw print 1 thisbin count thisbin lt states d gt states3 d produces the file states3 d AK WY VT DE ND e Oo Oo BwWN EF Oo CA 23 0 which lists the state s abbreviation bin number and height within the bin The GRAP program frame invis wid 4 ht 2 5 bot solid ticks bot out from 0 to 25 by 5 ticks left off label bot Populations in Millions of the 50 States coord x 0 25 y 0 13 copy states3 d thru X 1 size 4 at 2 5 3 5 X reads that file to make the following histogram in which the state names are used to display the heights of the bins In each bin the states occur in increasing order of population from bottom to top 29 HI RI ID NH NV MT IA MO SD co WI ND WV AZ AL TN NC DE NE OR KY MD MA VT UT MS SC LA IN WY NM KS CT WA GA FL PA AK ME AR OKMNVA N MOHL TX NY CA 0 5 10 15 20 25 Populations in Millions of the 50 States The next data set is a run time profile of an early version of GRAP created by compiling the program with the p option and running prof after the program executed Stime cumsecs call ms call name 21 1 11 02 26834 0 41 _yylook 11 2 16 89 30 195 60 _yyparse Peo 24 1S __doprnt Pl 26 52 _write 0 0 52 19 170 0 00 _tickside Although there were more than fifty procedures in the program the top four time hogs
38. tart with a simple GRAP program that illustrates the data and then refine it Later in this document we will ignore the design phase and present rather complex graphs in their final form Beware All the examples so far have placed data on the graph implicitly by copying a file of numbers either a time series with one number per line or pairs of numbers It is also possible to draw points and lines explicitly The GRAP commands to draw on a graph are illustrated in the following fragment frame ht 2 wid 2 coord x 0 100 y 0 100 grid dotted bot from 20 to 80 by 20 grid dotted left from 20 to 80 by 20 bullet at vtick at box at times at circle a circle a line das Text above Text rjust 80 90 80 80 80 7 80 6 50 50 50 80 radius hed from 10 90 to 30 90 above at 50 50 rjust at 50 50 0 0 lt 25 arrow from 10 70 to 30 90 draw draw next next next next next next The grid command is similar to the ticks command except that grid lines extend across the frame The next few commands plot text at specified positions The plotting characters such as bullet are implemented as predefined macros more on that shortly Unlike arbitrary characters the visual cen ters of the markers are near their plotting centers The circle command draws a circle centered at the specified location A radius in inches may be specified if no radius is given then the circle will be the small circle shown at the cen
39. ter of the graph The line and arrow commands draw the obvious objects DW PrP P we solid dashed delta at at at at at at 10 10 10 20 50 20 90 10 50 30 90 30 shown at the upper left This figure also illustrates the combined use of the draw and next commands Saying draw A solid defines the style for a connected sequence of line fragments to be called A Subsequent commands of next A at point add point to the end of A There are two such sequences active in the above example A and B note that their next commands are intermixed Because the predefined string delta follows 40 20 Text aboy Text rust ae the specification of B that string is plotted at each point in the sequence GRAP has numeric variables implemented as double precision floating point numbers and the usual collection of arithmetic operators and mathematical functions see the reference section for details GRAP provides the same rudimentary macro facility that PIC does define name X replacement text X defines name to be the replacement text X is any character that does not appear in the replacement open and closing braces may also be used as delimiters Any subsequent occurrence of name will be replaced by replacement text The replacement text of a macro definition may contain occurrences of 1 2 etc these will be replaced by the corresponding actual arguments when the macro is invoked The invocation for a macro
40. til XXX Enlisted Men 1200 ale Officers 140 Enlisted Women 12 Female Officers 2 5 XXX The program labels the lines by copying immediate data the program is therefore shorter to write and eas ier to change The delimiter string XXX in the until clause could be deleted in this graph the G2 line also denotes the end of data Even though that string is enclosed in quotes it may not contain spaces The y positions of the labels are the result of several iterations Enlisted M 1000 ON n Male Officers Thousands 100 TR Ta te aeai ores eae Ka 10 Enlisted Women oe Se ee a Female Officers 14 60 70 80 U S Army Personnel This data can tell many stories the buildup during the Second World War is obvious as is the exodus after the war increases during Korea and Vietnam are also apparent We will consider a different story the ratio of enlisted men to the three other classes of personnel There are several ways to plot this data the most obvious graph uses three time series showing how the ratios change over time and is left as an exer cise for the reader We will instead construct a graph that gives little insight into this data but illustrates a general method that is quite useful in conjunction with GRAP The graph is a scatterplot vector that shows how one variable the number of enlisted men varies as a function of the other three Breaking with tradition we first show the f
41. to center its plotting position determined experimen tally and then reset the vertical position The program produces this trivial but large graph 1 Response Variable a 0 0 5 1 Factor Variable 4 Using GRAP Following are a few day to day matters about using GRAP Errors GRAP attempts to pinpoint input errors for example the input GL i i 1 results in this message on stderr 33 grap syntax error near line 1 file context is i i gt gt gt lt lt lt 1 The error was noticed at the Unfortunately pinpointing is not the same as explaining the real error is that the variable i was not initialized The words x and y are reserved for the coord statement you will get an equally inexplicable syntax error message if you use them as variable names GRAP tries to load a file of standard macro definitions usr lib grap defines for terms like bullet plus etc but doesn t complain if that file isn t found If you later use one of these words however you ll get a syntax error message Certain constructs suggested by analogy to PIC do not work For example GS and GE would have been nicer than G1 and G2 but they were already taken The PIC construct PS lt file has been superseded by GRAP s copy command which in turn has been retrofitted into PIC TROFF issues You may use TROFF commands like ps or ft to change text sizes and fonts with
42. with arguments is name argl arg2 Non existent arguments are replaced by null strings The following GRAP program uses macros and arithmetic to plot crude approximations to the square and square root functions frame ht 1 5 wid 1 5 define square X 1 1 X define root exp log 1 2 define P times at i square i i i 1 circle at j root j j jt5 i 1 j 5 Because GRAP has the square root function sqrt the macro root is superfluous The program produces o5 20 15 10 5 of o_ 0 5 10 15 20 25 The copy command has a thru parameter that allows each line of a file to be treated as though it were a macro call with the first field serving as the first argument and so on This is the typical GRAP mechanism for plotting files that are not stored as time series or as x y pairs We will illustrate its use on the file states d which contains data on the fifty states AK 1 401851 WY 1 469557 VT 1 511456 DE 1 594338 ND 1 652717 CA 45 23667902 The first field is the postal abbreviation of the state s name Alaska Wyoming Vermont the second field is the number of Representatives to Congress from the state after the 1981 reapportionment and the third field is the population of the state as measured in the 1980 Census The states appear in increasing 10 order of population We will first plot this data as population representative pairs In the coord statement log logis
43. y be processed by a variety of filters for a variety of output devices The original Unix typesetter graphics programs are PIC and IDEAL you may be able to do as well without using GRAP as an intermediary In particular IDEAL provides shading and clipping which are use ful in presentation quality bar charts and the like but are well beyond the capabilities of PIC The Analyst s Workbench family of programs includes a plotting package called D The DISSPLA software purveyed by Bell Labs computer centers has extensive facilities for drawing graphs 5 Reference Manual In the following italic terms are syntactic categories typewriter terms are literals parenthesized constructs are optional and indicates repetition In most cases the order of statements constructs and attributes is immaterial 34 grap program G1 width in inches grap statement G2 A width on the G1 line overrides the computed width as in PIC grap statement frame label coord ticks grid plot line circle draw new next graph numberlist copy for if sh pic assignment print The frame statement defines the frame that surrounds the graph frame frame ht expr wid expr side linedesc side top bot left right linedesc solid invis dotted expr dashed expr Height and width default to 2 and 3 inches sides default to solid If side is omitted the linedesc applies to the entire fram
44. y horizontal lines Secretariat is the only horse to have run the one and a quarter mile race in under two minutes he won in 1973 in 1 59 4 For automobiles we will study the world land speed record even though those vehicles are by now just low flying airplanes The file speedcar d lists years in which speed records were set and the record set in that year in miles per hour averaged over a one mile course We will plot the data with the following GRAP program which uses nested braces in the copy and if statements 1980 line from lastyear lastrec to 1 lastrec lastyear 1 line from lastyear lastrec to 84 lastrec lastrec 2 Each record line is drawn after the next record is read because the program must know when the record was broken to draw its line The if statement handles the first record and the extra Line com mand extends the last record out to the current date Miles per Hour 16 600 500 400 300 200 100 1940 World Land Speed Record 1980 The horizontal lines reflect the nature of world records they last until they are broken The records could also have been plotted by a scatterplot in which each point represents the setting of a record but it would be misleading to connect adjacent points with line segments which is what we inappropriately did in the graphs of the Olympic 400 meter run The following graph shows the world record times for the

Download Pdf Manuals

image

Related Search

Related Contents

Corps d`exil - Poexil - Université de Montréal    Boite a outils mise en oeuvre du referentiel de controle  Owner`s Manual Manual del Propietario  Philips Composite A/V to scart cable SWV3633W  Guide - Agence de l`eau Loire  10 ES100ZP Anleitung IT 02  MC 785-MP - VDH Products  Betriebsanleitung  LED 電源取扱説明書 - Shindengen  

Copyright © All rights reserved.
DMCA: DMCA_mwitty#outlook.com.