Home
        User Guide - Scientific Computing Associates
         Contents
1.                                                                                                                E   Ec    wave  ser_wave c    CC  ser wave c  o wav lm       Parallel Version                  Linda Concurrent Wave Equation   C Example       FILE  clinda_wave cl  ER FILES  draw_wave c  make clinda wave cl       O  H  d          E    ESCRIPTION    This program implements the concurrent wave equation described  in Chapter 5 of Fox et al   1988  Solving Problems on Concurrent  Processors  vol 1              A vibrating string is decomposed into points  Each processor is  responsible for updating the amplitude of a number of points over  time     At each iteration  each processor exchanges boundary points with  nearest neighbors  This version demonstrates domain decomposition    Note  use compile option  1m                Explanation of constants   values  0 1001  values at time t  oGldvaltO l001  values at time  t dt   newval 0 1001    values at time  t dt                       LI NP EE ND IS MOM II I I 1 E 11 121 1 0 1 2 01 0 M E ei    ee    include  lt stdio h gt   include  lt math h gt        define MAXPOINTS 1000  define MAXSTEPS 10000  define NWORKERS 4  define PI 3 14159265             yx Routine for creating the X graph of the wave        extern void draw wave       double values  MAXPOINTS 2  newval  MAXPOINTS 2  oldval  MAXPOINTS 2       Linda User Guide 145    PEE     real main    ag    int tpoints     total points on the line  int nsteps     number of tim
2.                                                  Linda Simple Array Program Makefile    FILE           make clinda array cl  RIPTION  see clinda array cl  make  f make clinda array cl    To compile using tuple scope us linda tuple scope compiler option   CFLAGS    linda tuple scope    FEFE HE TE FE FE TE FE FE HE FE FE HE TE FE HE TE FE EE EE HEH k HE HE EE EE HE EE EE EE EE HEH EH EE EE EE HE E E E E E E H      ele                                                                                        elinda arrav el  C  S CFLAGS  clinda array cl  o array    Linda User Guide 131    pi Calculation    Serial Version                                              A    Serial pi Calculation   C Version    FILE  ser pi calc c    OTHER FILES  make pi_calo e    DESCRIPTION  pi calculation example program  C Version     This program calculates pi using a  dartboard  algorithm  See  M Fox et al  1988  Solving Problems on Concurrent Processors  vol 1    page 207  All processes contribute to the calculation  with the    master averaging the values for pi          include  lt stdlib h gt    include  lt stdio h gt      void srandom  int seed    double dboard  int darts     define DARTS 5000    number of throws at dartboard      define ROUNDS 10    number of times  darts  is iterated     main     double pi     average of pi after  darts  is thrown     double avepi     average pi value for all iterations     int i  ni  srandom  5    avepi   0   for  i   05 i  lt  ROUNDS  i    4     Perform pi
3.                                     Note  To compile using tuple scope us linda tuple scope compiler flags     CCFLAGS    linda tuple scope   k HE FE FE EE ERE HE TE FE FE FE FE FE HE FE HE E TE FE HE EE EE HEH EH EH FE FE EE EE E TE FE E TE FE EE EE EE EE EE HE HE HE EE EH  i  E   ele  CFLAGS    mm  clinda_mm cl         CC  S CFLAGS  clinda mm cl  o mm    Linda User Guide 141    Concurrent Wave Eguation    Serial Version                                                                   fe    SERIAL Concurrent Wave Equation   C Version     FILE  ser_wave c     OTHER FILES  make wave c     DESCRIPTION    5 This program implements the concurrent wave equation described     in Chapter 5 of Fox et al   1988  Solving Problems on Concurrent   m Processors  vol 1         s A vibrating string is decomposed into points  In the parallel     version each processor is responsible for updating the amplitude     of a number of points over time        M At each iteration  each processor exchanges boundary points with     nearest neighbors  This version uses low level sends and receives    to exchange boundary points          include  lt stdlib h gt    include  lt math h gt    include  lt stdio h gt    include  lt time h gt    define MAXPOINTS 1000   define MAXSTEPS 10000   define MINPOINTS 20   define PI 3 14159265  void init param void    void init  line void    void update  void     int nsteps     number of time steps      tpoints     total points along string     rcode     generic return
4.                   This program calculates pi using a  dartboard  algorithm  See  Fox et al  1988  Solving Problems on Concurrent Processors  vol 1  page 207  All processes contribute to the calculation  with the  master averaging the values for pi              i        incl   incl    lude  lt stdlib h gt   lude  lt stdio h gt       void srandom  int seed         doub     le dboard  int darts       define MAXWORKERS 20   define DARTS 5000    number of throws at dartboard              define ROUNDS 10    number of times  darts  is iterated     real_main         local variables     double homepi     value of pi calculated by current task     pi     average of pi after  darts  is thrown     avepi     average pi value for all iterations     pi recv     pi received from worker     pisum     sum of workers pi values     int nworkers     number of workers       round nbr   wrker  i   int worker          get number of workers          nworkers   0   while   nworkers  lt  1      nworkers  gt  20      printf   Enter number of workers between 1 and 20 n       scanf   d    amp nworkers         start up the workers       for  i 1  i  lt   nworkers  i         eval   workers   worker i     printf   starting worker  d n   i         Start collecting the results for each round     avepi   0    srandom 0     for  round nbr   0  ro  nd nbr  lt  ROUNDS  round _nbr         134 Sample Programs    pisum   0     pi Calculation       master does pi calc too     homepi   dboard DARTS          get result
5.          float EX   float cy   int nts   parms    0 1  0 1  50         real_main        float u NXPROB   NYPROB    int nwrkers  ntile  ixmin  ixmax  idum  i  ix  iz  it  worker     void inidat    prtdat            zs     Initialize grid    ui   printf   Grid size  X   d Y   d Time steps   din  NXPROB NYPROB  parms nts    printf  Initializing grid and writing initial dat file   in      inidat  NXPROB  NYPROB  u            p  edat  NXPROB  NYPROB  u   initial dat              Compute tile extents  putting grid data into Tuplespace      Create workers for all but the right most tile           ui   out  struct par  s   parms     nwrkers   1    ntile   0    for  ix   1  ix  lt   NXPROB 2  ix    NXTILE     ixmin   ix   ixmax   ix   NXTILE   1   ntile       1f  ixmax  lt  NXPROB 2        start up the workers     printf   starting worker  d n   nwrkers    eval   worker   worker  ixmin  ixmax     nwrkers           put out the initial data into tuple space     out   initial data   ixmin   amp u ixmin   0   NXTILE NYPROB                   Linda User Guide 153       Put grid boundary data into Tuplespace    f   out  left    amp u 0  0  NYPROB      cut   rigbt      amp u NXPROB 1   0   lt NYPROB       P     Call a worker to compute values for right most tile          idum   worker  ixmin  ixmax             Gather results from T  plespace   ui   for   1  i  lt   ntile  i                i  n  result id    ixmin  Tixmax    n        result   rxmin   kulixminj 1013      printf  Writing final d
6.        When subroutine f is invoked using the Ca11 statement  the variable val will have the  value 1  since that value was assigned just prior to the call  However  when the subroutine  is invoked using the eval statement  the variable will have the value 5  since that was its  initial value at the inception of the program     eval Function Restrictions    evaled functions may have a maximum of 16 parameters  and both their parameters and  return values must be of one of the following types     C Linda  int  long  short  char  float  double    Fortran Linda  integer  real  double precision  logical       The first four C types can be optionally preceded by unsigned  the Fortran types may  include a length specifier  e g  real   4   Note that no arrays  structures  pointers  or  unions are allowed as function arguments  Of course  data of these types can always be  passed to a process through tuple space     The other fields in an eval are also limited to the data types in the preceding list plus  string constants     24 Using the Linda Operations    Linda Operations    Under Fortran Linda  subprograms appearing in an eval statement can be either  subroutines or functions  Subroutines are treated as if they were integer functions always  returning the value 0  Functions must return values of type integer  logical  or double  precision  Intrinsic functions may not be used in eval statements     Predicate Operation Forms  inp and rdp    inp and rdp are predicate forms of in and
7.       inda       User Guide    September  2005    SCIENTIFIC    COMPUTING  ASSOCIATES    One Century Tower  265 Church Street  New Haven  CT 06510 7010 USA  203 777 7442 fax 203 776 4074   lsupport LindaSpaces com             The software described in this manual is distributed under license from Scientific Computing Associ   ates  Inc   SCAI   Your license agreement specifies the permitted and prohibited uses of this software     Any unauthorized duplication or use of Linda    TCP Linda     ot the Linda CDS in whole or in part  in  print  or in any other storage or retrieval system is prohibited     The Linda software is provided    as is  without warranty of any kind  either expressed or implied  re   garding the software package  its merchantability  or its fitness for any particular purpose  While the in   formation in this manual has been carefully checked  SCIENTIFIC cannot be responsible for errors and  reserves the right to change the specifications of the software without notice     The Linda software is Copyright    1988 2005  SCIENTIFIC Computing Associates  Inc  This manual  is Copyright    1989 2005  SCIENTIFIC Computing Associates  Inc  All rights reserved     Linda is a registered trademark SCIENTIFIC Computing Associates  Inc    Tuplescope is a trademark of Yale University    UNIX is a registered trademark of AT amp T    The X Window System is a trademark of MIT    All other trademarks and registered trademarks are property of their respective holders     Manu
8.      The only changes here are the two loops which create and terminate the worker  processes  The first for loop consists of num workers eval operations  Each worker is  passed a unique integer as its argument  its worker ID number  so to speak  It will use this  value to retrieve its own task from tuple space     The final for loop creates one    wakeup    task per worker  with its second field set to  1   This is a poison pill  telling the worker to exit  The loop also retrieves the passive data  tuple the worker emits as it dies before generating the next poison pill  This ensures that  the master process will not terminate until all the workers have     If the speed of the shutdown process were important  then the in operations could be  placed in their own loop  so that all of the poison pills would be generated essentially at  once  Then  the workers die in parallel  with the master collecting the    worker    data    tuples only after generating all of the poison pills     In this program  there is no master routine per se  rather  the master functions are split  among three routines       real main  which creates and terminates the workers       nb setup energy  which places global  i e   non iteration specific  data into  tuple space and controls the parallel special charges calculation     Linda User Guide 89    90 Case Studies      nb energy  which controls the nonbonded interaction energy calculation for    each iteration     Hereis nb energy setup     void nb en
9.      will match it and assign the value 512 to i  assuming that i is an integer  Similarly  if j is  an integer variable equal to 8  then the statement     C Form Fortran Form    in  oube   j  Pa   in  coube  T  Ti     will match it  again if i is an integer   assign the value 512 to i  and remove the tuple  from tuple space     If more than one matching tuple for a template is present in tuple space  one of the  matching tuples will be used  Which one is non deterministic  it will not necessarily be the  oldest  the most recent  or a tuple specified by any other criteria  Programs must be  prepared to accept any matching tuple  and to receive equivalent tuples in any order   Similarly  repeated rd operations will often yield the same tuple each time if the tuples in  tuple space remain unchanged     If no matching tuple is present in tuple space  then both rd and in will wait until one  appears  this is called b ocking  The routine that called them will pause  waiting for them to  return     Note that the direction of data movement for the in and out operations is from the point  of view of the process calling them and vot from the point of view of tuple space  Thus    an out places data into tuple space  and an in retrieves data from it  This is similar to the  general use of input and output in conventional programming languages     Al data operations to and from tuple space occur in this way  Data is placed into tuple  space as tuples  and data is read or retrieved from t
10.     How ntsnet Finds Executables    ntsnet locates the local executable   the executable file that will run on the local node   which is the node where the ntsnet command is executed    in the following manner  If  a full pathname is specified on the command line  that path specifies the exact location of  the local executable  For example  the following command executes the network program  hello worldin  tmp       ntsnet  tmp hello world          If only an executable name is specified  then ntsnet uses the TSNET PATH environment  variable to locate the executable file  The environment variable   s value should be a colon  separated list of directories  which are searched in order for required executables   working just like the UNIX PATH vatiable   If TSNET PATH is unset  it defaults to         usr bin linda      This means that first the directory  usr bin linda is searched  followed by the  current directory     Linda User Guide 51    The location of the local executable can play a large part in determining the locations of  the executable files to be run on remote nodes using ntsne  s map translation feature   described below   These remote directory locations are also explicitly specifiable using  the rexecdir resource  application and node specific   Here is an example   Tsnet Appl Node rexecdir   usr local bin    Tsnet Appl moliere rexecdir   usr linda bin    Tsnet hello world Node rexecdir   usr bin                Tsnet hello world moliere rexecdir   usr linda bin t
11.     Linda User Guide 45    46 Using TCP Linda    the command line contained a full or partial directory specification  that location is  searched for this configuration file  If only an executable filename was given  then the  directories in the TSNET_PATH environment variable are searched in turn        The optional local configuration file and map translation file are named   tsnet config and  tsnet  map respectively  If used  these must be located in the  user s home directory  The global files   tsnet   config and tsnet  map   are located  in the common 1ib subdirectory of the Linda tree  for example  in  usr sca   linda7 1 common 1ib if the Linda tree begins at  usr sca linda7 1  Settings in  the local files always take precedence over those in the global files  and settings in the  application specific file take precedence over the local file  The global files may also be  ignored entirely if desired  Note that precedence operates on a setting by setting basis   and not by an entire file     The ntsnet configuration files contain entries which specify various execution parameters  and desired characteristics  Depending on the parameter to which they apply  entries can  vaty significantly in their ultimate scope  they can affect     The execution of any TCP Linda program  The execution of a specific program on every node it uses    The execution of any program on a specific node           e     The execution of a specific program only on a specific node    The configurat
12.    0   for i 0  i   nb n 1  i       force update i    amp  force update il    force update i  gt x   0 0   force update i  gt y   0 0   force update i  gt z   0 0        in  wakeup    wakeflag  workerid      Keep in mind that this code executes at the same time as the master routine for this part  of the calculation nb_energy_setup  which itself waits for the workers to place their  partial sums into tuple space     If we return our focus to the master program thread  once the initialization phase is  complete  real main enters its main loop  The routine verl step begins each  iteration  eventually  control passes to the routine nb energy  The sequential version of  this routine computes the nonbonded interactions  the parallel version of nb  energy  performs the master functions for this part of the calculation     nb energy begins by initializing some variables and then sending out the    wakeup     tuple for each worker along with the current coordinates of the atoms in the molecule        void nb energy str coor force evdw elec esup              elec    evdw    esup   count   0 0       wake up workers     for  workerid   0  workerid  lt  num workers  workerid     out   wakeup   workerid  workerid        send out current coordinates on each round     out   coor    coor   o str   n   1      The worker  who has been waiting to receive the    wakeup    tuple  retrieves it and checks  the value in its second field  If this value is not  1  then the next time step commences   T
13.    Arrays match other arrays whose elements are of the same type  Thus  an  array of integers will match only other arrays of integers and not arrays of  characters  Similarly  real arrays will not match integer arrays  even when they  contain the same length in bytes     e Scalar types don t match aggregate types  For example  if a is an array of  integers  then a  1 won t match an integer  but a  2  in C and a  2  in  Fortran will   Similarly  in C  if p is a pointer to an integer   p and p  1 do not  match  the  n array notation is discussed later in this chapter        The corresponding fields in the tuple contain the same values as the actuals in the  template        Scalars must have exactly the same values  Care must be taken when using  floating point values as actuals to avoid inequality due to round off or  truncation       Aggregate actuals such as arrays  which otherwise match  must agree in both  the number of elements and the values of all corresponding elements     The following sections contain many illustrations of these matching rules     Linda User Guide 27    Scalars    Arrays    The following C operations all place a tuple with an integer second field into the tuple  space     int i   p  ar20     i     p   amp i    out  integer   3     constant integer      out   integer   i     integer variable     out  integer    p     dereferenced ptr to int     out   integer   a 5      element of an int array     Out  integer   f 1      function returns an int     out
14.    Begin program execution    Pause program execution    Resume execution of a paused program    Create  edit and or compile a TDL program    Save the current contents of tuple space to a file  not available when  Dynamic Tuple Fetch mode is in effect   The file is named  program N dump  where program is the application name and N is a    integer incremented each successive save operation     End Tuplescope session     Controls whether single step mode is in effect or not  default is off      Display Aggregates    Controls whether the contents of aggregates are displayed in tuple  displays  off by default   If Display Aggregates is not in effect   aggregates in tuple displays appear as the word Block  The format for  an aggregate display is the one that was in effect when its tuple icon was    Linda User Guide 123    opened if Dynamic Tuple Fetch is in effect or when it entered tuple  space if Dynamic Tuple Fetch is not in effect     Dynamic Tuple Fetch  When in effect  tuple contents are copied to Tuplescope only when  requested  This mode may speed up execution somewhat  but it has the  side effect that not all tuples are always continuously available for  inspection as they are under the normal mode     Reverse Execution    Available in postmortem mode only  Causes execution to run in reverse  when in effect     Exit Modes Menu  Close the Modes menu     The Debug Menu    Edit program debug  Edit a TDL program named for the current application  The editor  specified by the 
15.    Here are five independent matrix multiplication operations  Certainly  it would be nice to  produce a parallel version of this program  but the question is  where to parallelize  We  could replace each matrix multiplication call with a parallel version  ot execute all five  calls simultaneously  In some circumstances  each of these solutions will be the right one     Here  however  we d also like to execute some of those exp calls in parallel as well  which  would tend to favor the second approach  creating a parallel version of gaus3 that might  very well use a sequential matrix multiplication routine     There is a third possibility as well  gaus3 is itself called a large number of times     Do 10 I 1 npts  call gaus3 x m   10 Continue    and again the calls are independent of one another  It might be possible to execute some  of these calls in parallel  leaving gaus3 essentially untouched  Whether this is a good idea  or not depends on the likely value of the loop s upper limit  npt s  If npt s is typically   say  8  then parallelizing at this point will limit the number of processor which the  program could take advantage of to 8  and so it is probably better to parallelize gaus3  itself  If  on the other hand  npt s is typically 500000  then this is a perfectly good place  to focus attention  and the job will be much simpler as well     Linda User Guide 81    Database Searching    82 Case Studies    This case study looks at a generalized database search program  What 
16.    oj woddns sa day        Command Options    Many of these options have the same meanings as they do with other UNIX compilers      C Suppress linking and only produces  10 object files for each Linda  source file      Dname   definition   Define a C preprocessor symbol  clc only       g Produce additional symbol table information for debuggers    help Display a help message    Ipathname Add the specified directory to the include list  clc only       linda option  arguments   Linda specific compiler directives  Values for option are     compile args s     Pass the string s on to the native compiler when it is  used to compile source files     c args s Pass the string s on to the C compiler  fle only      info Print out the pathname of the Linda directory and  the default size of tuple space     link args s Pass the string s on to the compiler when used to  link the executable     main file Use specified file in place of cmain o or fmain o    ts N Initialize tuple space to N 200 byte blocks  This  option is applicable in Code Development System  code    tuple_scope Prepare the object files and or executable to be run    with the Tuplescope debugger  This option can be  abbreviated as t_scope      Ix Link in the library libx a     Ldirectory Add directory to the object file library search list used by the linker    o outputfile Name the executable file as indicated     v Display subcommands for each step of the compilation     110 Linda Usage and Syntax Summary     w72    The 
17.   4 0    double  score   double  darts   return  pi      HEE HEH GRA EH FE HE HH HE FE HE FE FE FE TE FE FE EH FE E TE FE HE EH FE HE FE FE HE FE HE TE FE E TE FE HE EE AA  erial   pi calc Makefile  ILE  make ser pi calc c                                                                               ESCRIPTION  See ser pi calc c             USE  make  f make ser pi calc c  LAST REVISED  4 18 94 Blaise Barney    CC                               EAE AE E E AE AE AE FE FE FE AE AE AE RENI                                                   cc    pi cales ser pl calce       SqQC    ser p   calc c  o pi calec    136 Sample Programs    Matrix Multiply    Matrix Multiply    Serial Version     KKK KK oko oko k oko k k kok ok k kok ok kok kok RRA RR RRA kok k kok k kok k Koko kok I I He I I                         arrays are stored passed  C arrays are row major order but Fortran    arrays are column major order   Ck CK Ck Ck CK Ck CK Ck AA A X AA A X A A X AXA AX X A AX X AA X X A A X A A X ZA Z X A A XZ A A X  A Z Z A A X Z A Z Z A A Z   X KA HH KKK ko A KH          include  lt stdio h gt       Serial Matrix Multiply   C Version     FILES Ser mm c     OTHER FILES  make mm c     DESCRIPTION  Sequential version of a matrix multiply     To make this a parallel processing progam this program would be divided into  T two parts   the master and the worker section  The master task would     distributes a matrix multiply operation to numtasks 1 worker tasks      NOTE1  C and Fortran versions 
18.   Bi   return     rd row_index   row    row                         for  col index   0  col index  lt  N  col index    clump     rd col index   columns     columnst    tor       0  r  lt  clump    r  4  result ptr   result  VE col index    M    for  e   0  e  lt  clump   t          dot   0   rp   rows    r   M    ep   columns         M    for  i   G7 1  lt  M   42   p  ttep  dot       rp   icp         result potrete   dot           Put a block of the result matrix into tuple space     out   result   rows index  result M clump          end while          end worker          The worker begins by retrieving a task  and exiting if all tasks are already done   It reads  the corresponding tuple from tuple space  containing clump rows of A   It then reads  the columns of B  in chunks of clump columns at a time  as they have been partitioned  into tuples   For each chunk  all corresponding elements of the result matrix are  computed  when all of the chunks of the columns array  holding B  have been read and  processed  the worker sends the completed chunk of the result matrix to tuple space and  begins the next task     The variable clump can get its value in any number of ways  from a preprocessor  variable  from a command line argument  as some function of the sizes of the matrices   and so on  It allows you to adjust program execution in a number of ways  For example   if local memory for worker processes is limited  clump could be chosen so that the  portions of A and B a worker 
19.   Linda User Guide 5    To put it another way  a computer program doesn   t merely solve a particular problem   but rather it embodies a particular approach to solving its specific problem  Making a  parallel version potentially can involve changes to the program structure  or the  algorithms it implements  or both  The examples in this manual include instances of all  three possibilities     Once the work has been divided into pieces  yielding a parallel program  another factor  comes into play  the inherent cost associated with parallelism  specifically the additional  effort of constructing and coordinating the separate program parts  This overhead often  is dominated by the communication between discrete program processes  and it increases  naturally with the number of chunks the program is divided into  eventually reaching a  point of diminishing return in which the cost of creating and maintaining the separate  execution threads overshadows the performance gains realized from their parallel  execution  An efficient parallel program will maximize the ratio of work proceeding in  parallel to the overhead associated with its parallel execution     This ratio of computation to communication is referred to as granularity  Think of  dividing a rock into roughly equal sized parts  There are many ways to do it  in fact  there  is a continuum of possibilities ranging from fine grains of sand at one end to two rocks  half the size of the original at the other  A parallel program 
20.   The value in the second field is i s current value     Linda provides four operations for accessing tuple space  plus two variant forms  desctibed in Predicate Operation Forms  inp and rdp on page 25         Operation Action    out Places a data tuple in tuple space    eval Creates a live tuple  usually starting new process es      in Removes a tuple from tuple space    rd Reads the values in a tuple in tuple space  leaving the tuple there     For example  this out operation places a data tuple with one string and two integer fields  into tuple space     C Form Fortran Form  out   eube   4  64   out  cube   4  64     There are two kinds of tuples  data tuples  also called passive tuples   like those we ve looked  at so fat  which contain static data  and process tuples  also known as    ive tuples  which are  under active evaluation     An eval operation creates a process tuple consisting of the fields specified as its argument  and then returns  This process tuple implicitly creates a process to evaluate each  argument  Actual implementations create a process only for arguments consisting of a  simple function call  and fulfilling some other conditions which we ll note later   all other  fields in the eval are evaluated sequentially     While the processes run  the eval s tuple is referred to as a live tuple  as each process  completes  its return value is placed into the corresponding field  and once all fields are  filled   all processes have completed   the resulting da
21.   Using TCP Linda    This chapter describes the special considerations involved when executing Linda    programs on a network of computers   The network version of Linda is called TCP  Linda  and that terminology is used here as well  Release notes for particular platforms  often contain additional information relevant to the use of TCP Linda     Running parallel programs on networks is complicated by issues such as process  scheduling  executable location  remote execution facilities  and the like  ntsnet is a  powerful  flexible utility for executing Linda programs on a network  designed to help  you manage this complexity  This section discusses the simplest possible case  and is  designed to enable you to get started running programs right away  The remainder of the  chapter covers more complex scenarios and the ntsnet features provided to handle them     Normally  TCP Linda runs the real_main process on the local system  and evaled  worker processes run on remote hosts  This requires that it is possible to successfully  execute an rsh command from the local host to each remote host without being required  to enter a password  Consult the man page for rsh or your system administrator if this  condition does not hold true for your site     In the simplest case  the current working directory is a commonly mounted directory   accessible by the same pathname from every host that you want to use  it can be a    permanent mount point or be auto mounted      Given these assumpti
22.   gogol    flaubert     u home  net aurora home       A Sample Network    These concepts will become clear once we look at several examples of map translation   We ll use the sample network shown in the preceding illustration  which shows the  location of users    home directories for each node in the network  for our initial examples   On aurora  users    home directories are located in  home  the mount point for one of its  local disks  This same disk is normally statically mounted via NFS to three other systems   blake  chaucer  and flaubert  On blake  it is also mounted at  home  on the other two  systems  it is mounted at a different directory location  at  u on chaucer  and at  u home  on flaubert   Home directories on erasmus are found in the local directory  m    On the  node gogo   the home directories from aurora are automounted as necessary at  net    aurora home  Finally  users don   t generally have their own home directories on node  degas  when they run remote worker processes on this node  they use  tmp as their  working directory  which is what is listed in the diagram      How ntsnet Finds Executables    Consider the following command  run by user chavez from the work subdirectory of her  home directory on flaubert  Le    u home chavez  work        ntsnet test24g    For this command to work properly and successfully execute on all of the nodes in the  sample network  we need to construct rules that tell ntsnet how to translate this working  directory for eac
23.   integer      a     dereferenced ptr to int       Note that single elements from an array are scalars of the type of the array     The constructs  amp i and p are not included among these examples because each of them is  a pointer to an integer  they are treated as arrays not as scalars  see the next section    Thus  the following tuple and template will not match even though p points to i  an  integer        integer   p 1     integer    i     Here are some example Fortran Linda operations involving scalars   Real a 20   x    Integer i  Character  10 name       out   integer   i   out  real   x   out  real   a 6    out  character   name     Note that Fortran Character vatiables are scalars     Atray handling within tuples and templates is vety easy  Here are some examples of  tuples and templates involving arrays     char ap20    int len     out   arrayl      ans  in  arrayl   Pas      cut t array2    ini arrayz      28 Using the Linda Operations    asilo     a len      Fortran     Specifying Tuples and Basic Tuple Matching Rules    Dimension a  20   Integer len    out  arravyl   a   in   arrayl    a     Qut  array2   azl0   in  array2    a len     The general format for an array field in a tuple is name   length  where name is the array  name  and length is the length of the array  number of elements   As the out operations  in the examples indicate  the length is often optional  When it is omitted  the entire length  of the array is assumed  In C  you must still include t
24.   maxprocspernode  Maximum number of 1  maxprocspernode n  application specific     mp n    Scope       getload    Whether or not to use  current system load averages  when scheduling workers on  nodes     true     getload  getload    application specific       loadperiod    The period in minutes over  which to compute load  averages  when they are used  for scheduling       loadperiod mins     m mins    application specific       threshold    Maximum load allowed on a  node  if the normalized load  exceeds this value  then no  wotker will be started     20    node specific       speedfactor    A number indicating relative  CPU capacity compared to  other nodes  Larger values  indicate increased ability to  run multiple workers  Used in  computing the adjusted load  average     1 0    node specific       masterload    Load that the master process  puts on its node  the local  node   Used in computing the  adjusted load average      masterload n    application specific       workerload    Load that a worker process  puts on its node  Used in  computing the adjusted load  average      workerload n    application specific       fallbackload    Value to use if ntsnet is  unable to obtain the current  system load average  Setting  this resource to a large value  will ensure that nodes that are  down will be excluded     0 99     fallbackload 7    application specific       available          Whether a node is available or  not  useful for temporarily  disabling a node without  rem
25.   the following command prepares the program test 24 for use with  gprof       cc  o test24  pg test24 o    Then  you run the resulting executable in the normal manner  Doing so will create a file  named mon out  prof  or gmon out  gprof  in the directory from which the program  was executed  These files contain the profiling data obtained during the run  You then  run prof or gprof on the output files     There can be a lot of output from both of these commands  Among the most useful are  the breakdown of time spent  the number of times each routine was called  and the call  graph information  where each routine was called from   Here is an example of the first     Stime seconds cum   cum sec procedure  file   29 2 235 9100 2 2 235 91 gaus3  gaus3 f   24 6 198 5800 53 8 434 49 dgemm mm   dgemm mm s   13 0 105 1600 66 8 3539 65 f  nc3   fune3 t    9 1 73 2500 75 8 612 90 tria  tria tE    8 0 64 8500 83 9 6717 75 exp  exp s    Tad 98 5500 91 1 736 30 intar     intarc t     This display shows the total amount and percentage of CPU time used by each routine  in  decreasing order  In this program  90  of the total execution time is spent in just 6  routines  one of which is a matrix multiply library call  About 8  of the time is spent in  calls to the exponential function     The following display is an example of a call freguency table     polls  calls cum  bytes procedure  file   20547111 68 53 68 53 480 exp  exp s   768 0 00 68 54 17072 gaus3  gaus3 f     14 Overview of Parall
26.  0005 seconds  So a work task that  invovled 20000 bytes total of input and output data  should run at least 3milliseconds in  order to justify the communication time  while a task involving a few bytes of input and  output data should run at least 1 millisecond     As new network interconnect technology is developed  the granularity effects must be  re examined  The use of certain high speed switches  for example  may give networks  performance characteristics almost identical to distributed memory parallel computers     Forcing an eval to a Specific Node or System Type    68 Using TCP Linda    While there is no way within an eval operation itself to force its execution on any  particular node or type of system  adding an additional function layer in front of the  target routine can accomplish this  Here is an example        master     for  i 0  i  lt  NWORKERS  i     eval   worker   do worker        do_worker         get the hostname or architecture type via standard system call  if    strcmp host   moliere    worker 1     elseif    strcmp  host   goethe    worker 2     elseif    strcmp arch   p4    p4 worker     and so on       The eval operation calls a generic worker function  do  worker  which determines the  hostname  or architecture type   and then calls the appropriate real worker function     Debugging TCP Linda Programs    Debugging TCP Linda Programs    There are two ways of debugging a TCP Linda program       Use the ntsnet debug option     Manually start program
27.  1 e   when  kaon is true       kaon This option turns on the keep alive mechanism  This is the default    kaon This option turns off the keep alive mechanism      loadperiod minutes  This option specifies the number of minutes over which the machine  load is averaged  Typical values for loadperiod are 1  5  and 10  The  default is 5      masterload  oad  This option specifies the load that the master  real main  process is  considered to put on the node  The value specified can be any real  number  gt   0  The default is 1  Typically 1 or some smaller fraction is  used  If the master process uses much less CPU time than the workers   the master load should be set smaller than the worker load      maxprocspernode number  This option specifies the maximum number of Linda processes started  on any given node the application is running on  On the local node   maxprocspernode includes the master  The default value is 1      mp number Synonym for  maxprocspernode      n minworkers  maxworkers   This option specifies the acceptable range of the number of workers  that the application can run with  If maxworkers is omitted  it is set to the  same value as minworkers  ntsnet initially starts up the number of  workers equal to the maximum of the minworkers and maxworkers  resoutce values  The master then waits as specified in the minwait and  maxwait resources  for the workers to join the execution group  If at  least minworkers join before the maxwait interval has elapsed  execution 
28.  119  message passing 7   8  messages  increasing number 67  minwait resource 62  119  minworkers resource 62  119  molecular dynamics 86    multidimensional arrays 32  37    N  named common blocks 33  native compiler  passing options to 110  nice resource 51  119  node names  in ntsnet configuration files 48  node set 61   nodefile 50  nodefile resource 51  119  nodelist resource 50  61  119   nodefile value 50  ntsnet command 21   bcast options 111   cleanup options 59  111   d options 59  112   distribute options 59  112   getload options 65  112  th options 51  112  thigh options 51  112  tkaon options 112  redirect options 67  114  Tsuffix options 60  114   translate options 58  114  tuseglobalconfig options 67   114  tuseglobalmap options 67  114  tv options 67  115   verbose options 67  115    tveryverbose options 67  115       tvv options 67  115    appl option 48  111   bcastcache option 111  configuration file format 116  configuration files 45  configuration information  sources 45    debug option 69  112    delay option 67   disabling global configuration  files 67    fallbackload option 65  112   help options 112     kainterval option 112   loadpetiod option 65  113   m option 65  113   masterload option 65  113   maxprocspernode option 65   113    mp option 65    n option 62  113    nodefile option 51  113   nodelist option 51  113   opt option 50  114   options vs  resources 48    p option 52  61  114  process scheduling 61  return value 106   searching for exe
29.  36  image rendering 73  in 11 12  21  105  initialization   workers and 75  inp 25  105    interprocessor communication 6    K  kainterval resource 118    kaon resource 117   118    L   language definition 105   length of array 28   lexit 37  106   lhalt 37  106   libraries 110     linda  eval 26     linda in 26     linda inp 26     linda out 26     linda rd 26     linda rdp 26   Linda Code Development System 95   Linda directory tree 46   Linda model 10     linda prefix 105   LINDA  CC environment vatiable   122   LINDA  CC LINK environment   variable 122   LINDA  CLC environment vatiable   95  122   LINDA PATH environment   vatiable 122   LINDA PROCS environment  variable 107  123   linda  rcp shell scripts 118   linda rsh shell script 118    lindarcparg resource 118       lindarsharg resource 118  linking 110   passing switches to 110  linking with clc 20  lintoff 106  linton 106  live tuple 11  load balancing 10  loadperiod resource 63  65  118  loffexit 107  lonexit 107  Iprocs 107    M  map translation 52  121  configuration files 53  disabling 120  master  becoming worker 38   39  distributed 89  waiting until workers finish 19  master worker paradigm 8  73  masterload resource 64   65  118  matching rules 26  37  matrix multiplication  extended example 77  under distributed data structures  9  under message passing 7 8  maximum fields in tuple 26  maxnodes resource 118  maxprocspernode resource 65   66   118  maxwait resource 62  70  117  119  maxworkers resource 62 
30.  Assignment   C Version  ILE  clinda array cl             OTHER FILES  make clinda array cl    D          ESCRIPTION     In this simpl xample  the master task initiates a number of tasks  indicated by user input  It then distributes portion of the   array into tuple space to be worked on by the workers                       Each worker task retrieves data indicating what portion of the array  the worker should update  and performs a simple value assignment  to each of its elements  The value assigned to each element is    simply that element s index in the array l  Each worker task  then puts its array back into tuple space and will  continue retrieving work assignments until it is notified to stop          As the master receives back each portion of the array  selected  elements are displayed  When it has received all portions of the array   It will put out a  poison pill  to notify the workers to stop work           EOKCKCKCKCKCkCkCK Ck kCk ck kCk Ck kCk RR k kok RR RARA RRA CK Ck kCk Ck kCKCk KCk Ck KCk k kCk Ck Ck kc k Ck kc k k ck kok ck kok ck kok I e ke       include   stdio h     define ARRAYSIZE 1000   define MAXWORKERS 20          real main int argc  char  argv            Local Variables          128 Sample Programs    Array Assignment          int numworkers  index  i    nt data ARRAYSIZE   result  ARRAYSIZE    int extrachunk  id    TE data_index    int chunksize  nbr_chunks    ime worker         KKK RIA RARA RARA RR KK K initial lizatiohs KKKKKKKKKKKKKKKKKKKKKKKKK
31.  DESCRIPTION           Example   Heat Egaution Domain Decomposition   C Version      l    make clinda_heat2d cl    This example is based on a simplified two dimensional heat equation  domain decomposition     The initial temperature is computed to  le of the domain and zero at the boundaries  The  at zero throughout the simulation  During the    an array containing two domains is used  these domains       alternate between ol    The paral  advanced  worker al    in time by       LSO     kk hk A 0X 0    006 006 06 09    00 0X 0    152 Sample Programs    llel version decomposes the domain into tiles     note that the code assumes that NXTILE  quotient is the number of parallel processes     Id data and new data     and each tile is                a separate process  The master process performs as a  NXPROB  NYPROB  and NXTILE may be varied to study performance   E evenly divides  NXPROB 2   This  In the C version  the    X                2D Heat Equation    tiles are actually strips  This is because C Linda does not have the rich  Fortran 90 array section notation that is available in Fortran Linda        Two data files are produced  an initial data set and a final data set   An X graphic of these two states displays after all calculations have  completed        eK k ok k I I kCk Ck kCk Ck kCkCk KCkCk kCkCK Ck kCKCk kCkCk kCkCk kCkCk kok k kok k Ck kc k Ck kck ck kok I I ke ke       include  lt stdio h gt    define NXPROB 11   define NYPROB 11   define NXTILE 3  struct Farms   
32.  Index          SCIENTIFIC    COMPUTING ASSOCIATES    One Century Tower  265 Church Street  New Haven  CT 06510 7010 USA  203 777 7442 fax 203 776 4074   lsupport LindaSpaces com          
33.  Integer len    out  array2   b 20   in  array2    d len     In both cases  the first twenty elements of array d are set to the value of the  corresponding element in array b  and the variable len is set to 20  This requirement  makes sense since these constructs can be used with arrays of different sizes within the  same program     The following out operations yield identical tuples     C  int  p  al201    p  a    cut   array     al    out  array      20  7   out  array   pi20 37   Fortran  Integer a 20    out  array   a   out   array   gr     out  array   ai20     The length is required in the third C example  pointers must always specify an explicit  length     You could create a tuple containing the first ten elements of a with any of these  operations     C  ink  p  ap20   BL20       10    l  n   pu    out  ten elements   a 10         out  ten elements   p 10      30 Using the Linda Operations    Fortran     Fortran     Fortran     C     Fortran     Specifying Tuples and Basic Tuple Matching Rules    Integer a 20   b     c 10   len  out  ten elements   a 10     and retrieve it with any of these operations      Len elements    a len          Len elements    p len     ten elements    b          in    inti  in    in  ten elements    c       in  ten elements    a len        in  ten elements    a    All of the operations will retrieve ten integers  and len will be assigned the value 10 for  those operations including it  Note that omitting the length variable is allowed  and
34.  It is useful  for protecting time consuming system calls from being interrupted   Interrupts should not be disabled for long periods  linton and lintoff  calls can be nested  For many implementations  this is a no op     Restores the interrupts associated with tuple space handling  see lintoff  below   For many implementations  this is a no op     The clc and flc Commands     flloffexit hd    Deletes a specific handler from the list of handlers set up by lonexit   where hd is the handler descriptor returned by lonexit  Returns 0 on  success and  1 on failure  descriptor out of range or not referring to an  active termination handler      lonexit  p a  Names a routine to be called after a Linda process calls lexit  Ihalt  or  P  returns normally  The routine p is called as       p   status  a     where status is the argument with which return  lexit or Ihalt was called   and a is typically the address of an argument vector  although it also can  be an integer value  Multiple calls can be made to lonexit  specifying up  to 16 termination handlers  which are called in reverse chronological  order  i e   the last specified routine is called first   lonexit returns a  unique  non negative termination handler descriptor upon success or  1  if the termination handler could no be stored      f lprocs   Returns the total number of processes that have joined the computation   including the master process   In the Code Development System  this  function is not meaningful  and it return
35.  Linda is easy to use  Conceptually  Linda implements parallelism via a logically  global memory  virtual shared memory   called zup  space  and a small number of  simple but powerful operations on it  Tuple space and the operations that act on  it are easy to understand and quickly mastered  In addition  the C Linda and  Fortran Linda compilers support all of the usual program development features   including compile time error checking and runtime debugging and visualization     How to Use this Manual       If you are new to parallel programming  Chapter 1  Overview of Parallel Programming  and the Linda Model  provides a general introduction to the subject and to Linda  It also  introduces tuple space  the Linda operations  and other essential concepts        Linda User Guide 1    For a more in depth look at parallel programming and algorithm development   SCIENTIFIC recommends the book  How to Write Parallel Programs  A First Course  by  Nicholas Carriero and David Gelernter  which is available in PDF format at no cost on  the SCAI Website at www lindaspaces com  See the Bibliography for the complete  citation for this book and related works        If you are using Linda to create parallel programs  Chapter 2  Using the Linda  Operations  describes the Linda operations in detail  It explains the program compilation  and execution process  provides a couple of simple example programs  and includes an  extended discussion of tuple matching rules and restrictions        If y
36.  Strinps   idees eere RE yee ease eens ee KK ERG NU RETE ca 35  Anonynious Fotrmals   12452 dpa dae eae reed sodn   Qao Qa nde tacite 36  Fixed Aggregates IN C   esee sese cies kee epee el cn Re R ooo vd ea e Rus 36  Termination of Linda Programs   sss 0 0    eee eee eee 37  Example  Freewake   24  cues kode ce oe He a a HERR oe eie n 38    Linda User Guide i    3 USING  TCP Linda x   ose y Zea dates hail REDUCES 43    Quick Start d EM PM LIP TM 43  Mhacntsnet Do6esu cse coe iisa la gane NS 44  Using the ntsnet Command                  00000 k   45  Customizing Network Execution               eee eee ee 45  ntsnet    Confiouration Piles s   uro tes sieves e ete ds a Peeters 45  Resource PES vue ine rte teer snes BO Ba Ep SMS 49  Determining Which Nodes a Program Will Run On    50  Specifying Execution Priority rears a aa ey aa eee eee eee es 51  How ntsnet Finds Executables                 44 4 4444 51  About Map Translation    ees ee eh  eee l ese mr n   52  The Map Translation File    53  Map Translation Entry Wildcards    0    ee eee eee 58  Distributing Executables      0 0    4444040000000 k k   59  Architecture   Specific Suffixes    socrus e iai e een 59  Specifying the Working Directory for Each Node                lt     61  Permissions and Security Issues s oona nai p a cee etter ees 61  ntsnet Worker Process Scheduling                     4444444400000 61  Forming The Execution Group                  44444444 eee 61  Selecting Nodes for Workers                   ce
37.  The default is 0 99  The  value specified can be any real number  gt   0  If failure of the RPC call  indicates that the node is down  this option can be used to set  fallbackload to a very large value  effectively making the node  unavailable to ntsnet     This option indicates that ntsnet should use load average information    when scheduling workers on the nodes in the network  This is the  default     This option indicates that ntsnet should not use load average  information  This can be used to make worker scheduling consistent  between different runs of ntsnet  It also makes sense if the rstatd  daemon is not available on the network     Synonymous with  high  high   This option causes ntsnet to display the usage message     This option causes all workers to be run at normal priority and causes  Linda internodal communication to run at full speed  This is the default     This option causes all workers to run at a nice ed priority  unless  specifically overridden on a per node per application basis using the nice  resource   It also causes Linda internodal communication to be throttled  to avoid flooding the network      kainterval seconds    112 Linda Usage and Syntax Summary    Specifies how often  in seconds  each Linda process sends out a keep  alive message  The default is 100 seconds  The range of legal values is  100 to 31536000  one year   The range is silently enforced  This    The ntsnet Command    resource is only useful when the keep alive mechanism is used 
38.  This method is used to study the very large molecules of commercial and  biological interest  typically containing thousands of atoms  The atoms in these molecules  are constantly in motion  this movement results in changes in the overall molecular  structure  which can in turn affect the molecule   s properties  A molecular dynamics  simulation calculates the changing structure of a molecule over time in an effort to  understand and predict its properties  These calculations are carried out iteratively   solving for the total molecular energy and the forces on and positions of each atom in the  molecule for each time step  In general  an atom   s position depends on the positions of  every other atom in the molecule  making molecular dynamics calculations require  significant computational resources     This case study illustrates the following techniques        Per iteration worker wakeup     Cleaning up tuple space     Distributed master functions     Since the original program for this case study is very long  we ll only look at the central  portions as we examine how it was parallelized with C Linda  This program required  changes to several key routines  and illustrates using C Linda to parallelize the calculation  setup work as well as the computation core     The diagram below presents a schematic representation of the sequential version of the  computation     After performing some initial setup steps in which it reads in and stores the data for the  calculation and
39.  We ve now introduced all the pieces of the ntsnet configuration file  The following  sections will introduce many of the specific resources in the context of TCP Linda  execution scenarios     Determining Which Nodes a Program Will Run On    50 Using TCP Linda    Two resources control what nodes a given application will run on  First  the nodelist  resource  which takes a list of nodes as its value  specifies a node list for a given  application  Here are some examples     Tsnet Appl nodelist   chem gauss newton descartes                Tsnet hello world nodelist  gauss moliere dalton avogadro    The first line specifies the default set of execution nodes for TCP Linda programs  in  addition to the local node   The second line specifies a different set for the application  hello world  which overrides the default value set in the first line      Duplicates are automatically removed from node lists  Variant name forms for the same  node   the full name and the nickname  for example   are also discarded  the first one is  kept   In such cases  a warning message is printed     The nodelist resource can also take the special value  nodefile  This indicates that the  contents of the file specified in the nodefile resource contains the list of nodes to be used   one name per line   If nodefile has not been given a value  then the file tsnet  nodes  in the current directory is used  The value for nodelist defaults to  nodefile  so  ignoring both of these resources will result in th
40.  a far better programming practice to use a counter or semaphore tuple in situations  where inp or rdp seems called for  Consider this C Linda example     if  rdp  globals    x   y   z   0   do globalsix yvy z      Linda User Guide 25    If the    globals  tuple is not available in tuple space  then rdp returns 0 and the process  computes the globals itself  Simply doing a rd would result in blocking if the    globals     tuple weren   t available  and recomputing them can be faster than waiting for them   although slower than reading them   The same effect can be accomplished via a tuple  that can take on one of two values  for example       globals ready   0 or 1     The master process outs this tuple with value 0 at the beginning of the program  and the  process that computes the globals and sends that tuple to tuple space also ins this tuple  and outs it again  changing its second field to 1  Then  the preceding code can be replaced  by     rd  globals ready    1    if  1    pai  globals   x   y  T     else   do globalsix v z      C Linda Alternate Operation Names    The alternate names  linda in  linda rd  linda out  linda eval  linda inp  and    linda rdp are provided for cases where the shorter names conflict with other program  symbols  Each alternate operation name begins with two underscore characters     Specifying Tuples and Basic Tuple Matching Rules    This section discusses tuples and tuple matching rules in more detail and includes  examples that use a variety
41.  an optimization designed to  improve performance by detecting patterns in tuple usage and attempting to place tuples  on those nodes where they will eventually be used  If successful  this optimization  produces significant communications savings  This feature can be disabled by setting the  value of the redirect resource  application specific  to false  its default value is true   The  value of this resource may also be set with the  redirect  redirect command line options     Disabling Global Configuration Files    The useglobalconfig and useglobalmap resources  both application specific  specify  whether to use the entries in the global configuration file and global map file in addition  to the local files  In any case  local file entries take precedence over those in the global  files  The default value for both resources in true  Command line options are available for  both resources   useglobalconfig  useglobalconfig and  useglobalmap  useglobalmap     Generating Additional Status Messages    ntsnet can optionally display informational messages as program execution proceeds   Whether and how frequently messages are displayed are controlled by the verbose and  veryverbose resources  both application specific   Both are mainly useful for debugging  configuration and map files  and both default to false  The command line options   verbose  verbose  abbreviable to  v  v  and  veryverbose  veryverbose  or  vv  vv  can  also be used to specify these resources     Process Ini
42.  array 28  literal strings in 11  live 11  matching 12   13  matching rules 26  37  max    of fields 10  maximum fields 26  multidimensional arrays in 32  pronunciation 10  redirection 67  scalars in 28  strings in 35  structures in 33  varying length structures in 35  tuple classes 13  tuple space 10  cleaning up 94  discarding data 36  operations 11  results when full 22  size under TCP Linda 22  Tuplescope  Break button 99  buttons 123  clc compiler options 110  Continue button 99  control panel 96  Debug button 101  Debug menu 124  display 96  dynamic tuple fetch 99  exiting from 96  99  icons 98  invoking 95  menus 123  Modes menu 98  123  program preparation 95  Ouit button 96  99    reguirements 95    Run button 99   run modes 99   single step mode 99   TDL 101   using dbx with 100   varying execution speed 96   viewing ageregates 98   viewing tuples 98   X Toolkit options and 95  Tuplescope Debugging Language   101   syntax 101  124  types allowed in tuples 26  typographic conventions 3    U   useglobalconfig resource 67  120  useglobalmap resource 67  120  user resource 61  121   username    specifying an alternate 61    V   varying length structures 35  verbose resource 67  121  veryverbose resource 67  121    virtual shared memory 8    Ww   watermarking 84   worker  compated to task 9  initiated by eval 11  repeating setup code 75  wakeup technique 92   workerload resource 65  121    workerwait resource 70  117  121    X  X Windows 95    Linda User Guide v       vi
43.  be  easily modified for different environments  Changing the granularity level then becomes  as simple as changing a few parameter definitions  This technique complements the  preceding one  and both can be used in the same program     These are the major issues facing any parallel programmer  In the next section we ll look    at three different approaches to creating parallel programs and indicate how Linda is  situated with respect to each of them     6 Overview of Parallel Programming and the Linda Model    Approaches to Parallel Programming    Approaches to Parallel Programming  There are two main challenges facing any parallel programmer     How to divide the work among the available processors     Where to store the data and how to get it to processors that need it     Two radically different approaches to these problems have emerged as the dominant  parallel processing paradigms  They are message passing and distributed data structures  implemented in virtual shared memory     Message Passing    Message passing focuses on the separate processes used to complete the overall  computation  In this scheme  many concurrent processes are created  and all of the data  involved in the calculation is distributed among them in some way  There is no shared  data  When a process needs data held by another one  the second process must send it to  the first one     For example  let   s again consider a matrix multiplication calculation     A B 2C  A message passing version might cr
44.  calculates the sums of special charges and some other quantities for the  molecule  the program enters its main loop  For each time step   one loop iteration   the  program must calculate the bonded and nonbonded interactions among all of the atoms  in the molecule  The bonded interactions occur between atoms that are directly  connected together by a chemical bond  and the nonbonded interactions are the effects  of atoms that are not bonded upon one anothet s position  The latter take up the bulk of  the time in any molecular dynamics calculations because they are both more numerous  and more complex than the bonded interactions     Molecular Dynamics       Read data  from disk                  H  Setup H c  u     alloc memory N E all  gt  H     build matrix C Se       sum changes      Repeat  NSteps C C    Calculate  bonded  interactions    times       Calculate  nonbonded  interactions    For this molecule  the oxygen atom s  only bonded interaction is with the  carbon atom it is connected to  A  molecular dynamics calculation will  also compute the effects of its  nonbonded interactions with every    other atom in the molecule   Pm     Sequential Version of the Molec  Dyn  Program  amp  Bonded vs  Nonbonded Interactions       Calculate  new atomic  coordinates             Here is a simplified version of the original main routine     main  argc  argv      T   300 0   process_args  argc argv     Read data   Initialize parameters  amp  data structures     verl init  str coo
45.  calculation on serial processor     pi   dboard DARTS    avepi     avepi   i    pi   i   1    prints    After  3d throws  average value of pi    10 8f n     DARTS    i   1   avepi            KKK k kok k oko k kok ko k kok k oko k A A I A A kok k kok I I e ke k k      dboard    KOKCKCKCKCKCKCKCk kCk Ck KCkCk kok k Ck kCK Ck kCkCk kCkCk KCkCk kok k Ck kCK kok k k kCKCk kok kok k Ck k Ck k kc k Ck kck ck k kk k kk ckck kc kk ke e kx x      fdefine sqr x    x   x      long random void      double dboard int darts       double x coord     x coordinate  between  1 and 1     y_coord     y coordinate  between  1 and 1     pi  je pi    ES    random number between 0 and 1      int score     number of darts that hit circle       n     unsigned long cconst     132 Sample Programs       used to convert integer random number        between 0 and 2 31 to double random number       pi Calculation       between 0 and 1       cconst   2  lt  lt   31   1    score   0      KOR KKK KK KR KK KK AK I A A I I IK kk    for    Throw darts at board  Done by generating random numbers   between 0 and 1 and converting them to values for x and y  coordinates and then testing to see if they  land  in   the circle   If so  score is incremented  After throwing the  specified number of darts  pi is calculated  The computed value  of pi is returned as the value of this function  dboard    Note  the seed value for rand   is set in pi calc        KCKCKCKCKCKCKCKCk kCk Ck kCkCk kCkCK Ck kCK Ck kCk Ck kCkCk kok k 
46.  chavez  work   which will in turn be translated to the appropriate directory when ntsnet starts remote  Worker processes     If we want to run the ntsnet command on node gogo   however  we must create an  additional rule  Home directories on gogo  are automounted from aurora on demand   When referred to in the context of a process starting from a remote system  their location  can be written as in the first rule  thus  when a remote process is initiated on gogo  from  flanbert  the current working directory for the remote node is correctly translated to   net aurora home chavez work     Linda User Guide 55    56 Using TCP Linda    However  if the ntsnet command is run instead on gogo  from this same directory  the  literal directory location   what is returned by pwd or the get cwd system call   is what is  used for map translation  in this case  using the actual automounter mount point     tmp  mnt  net  aurora home chavez work  Thus  the translation from generic to  remote directories is handled correctly by the first rule  and what is needed is a rule for  translating this local directory to a generic directory  This is the function of mapto entries   The following entry maps the local directory on gogo  to the same generic directory we   ve  been using     mapto  home      gogol    tmp_mnt net aurora home     With this rule in place  running ntsnet from gogo  will be successful  and the remote  working directories we   ve considered so far will be set appropriately  Sinc
47.  code     double values  MAXPOINTS 2      values at time t     oldval  MAXPOINTS 2      values at time  t dt      newval  MAXPOINTS 2      values at time  t dt       fe     Obtains input values from user           void init param void     char tohar s           set number of points  number of iterations     tpoints   0     142 Sample Programs    Concurrent Wave Eguation    nsteps   0   while   tpoints  lt  MINPOINTS      tpoints  gt  MAXPOINTS      printf   Enter number of points along vibrating string n     Scant   ss   tehar   tpoints   atoi tchar    if   tp  ints  lt  MINPOINTS      tpoints  gt  MAXPOINTS     printf   enter value between   d and  d n   MINPOINTS  MAXPOINTS             while   nsteps  lt  1      nsteps  gt  MAXSTEPS      printf  Enter number of time steps n     Scant   Ss   taharis  nsteps   atoi tchar    if   nsteps  lt  1      nsteps  gt  MAXSTEPS     printf   enter value between 1 and  d n   MAXSTEPS                   printf   points    d  steps    d n   tpoints  nsteps         je    All processes initialize points on line        void init line void     int ip Te E  double x  fac                 calculate initial values based on sine curve          fac   2 0   PI        k   0   for  j   17 j  lt   tpoints  j    ktt  1  x    double k   double   tpoints   1    values j    sin  fac   x       for  i   1  i  lt   tpoints  i    oldval i    values i                M Calculate new values using wave eguation          void do math int i  1  double dtime  c  dx  ta
48.  consisting of the entire array b   will match     The last example is a bit tricky  in this case  the out operation creates a tuple with the first  element of a as its second field  and any integer can be used in a formal to retrieve it     32 Using the Linda Operations    Specifying Tuples and Basic Tuple Matching Rules    Fortran 90 Array Sections    Fortran Linda recognizes a subset of the array syntax used in the Fortran 90 standard  within its operations  This syntax provides an alternate way of referring to arrays and  their subsections and may not be combined with the name  ength notation we ve  considered so far     Array subscript references in Fortran Linda may take the following form    ifirst   ilast   istride     where jfirstis the smallest index of a slice of the array    last is the largest index in the slice   and   stride is the stride  if omitted  the stride defaults to 1   A full array section is specified  with this type of expression for each dimension in the array  The shorthand form of a    single colon alone refers to the entire range of values for that array dimension with stride  1     Here are some examples   r  sl a 100  100  15007   100  100     ont  whole array   sa 1 100   1 100     out   second row   b 2 2       ott   divide an Z  part 1   a t  2 250    outi i dryrde in 2  part 2   a t t  Sle         every other   b 1 100 2 1 100        out    The first out operation places the entire array a into the tuple space  The second places  only the s
49.  cud este hm eee iy E Dep epi qe e RHET on 98    Viewing Process Information sa tiyini ia eiaei nia e en 99  Tuplescope Run Modes    eee ee n rito ttes t tnt oa Enis ea hy 99  Using Tuplescope with a Native Debugger    100  The Tuplescope Debugging Language               4 4 4444 101   TDL Language Syntax  bss cic caw ee ry va   eee K y ee a E a 101   6 Linda Usage and Syntax Summary                        105  Lida  Operations eriep ikoe A V oe bie toe Woes Raa pob bosse ee ray RS 105   Formal GC Litda Syntax jake side sti ath eve nasal a dee p Sisk Wane 105  Timing Functions  e o er oo blew See eile a eee RR Sees 106  Support Functions  5e ved le AW aa a eR rre ee 106  The cl    and fle Commands creerse se ta Ree ond ose Be 107   Commarid Synta  usb bon b   osad une breed ke e aw edn EM Ter dees 107   Command Options Lesen e gece aaa ees NE hehe ee lado old ban ode 110  The ntsnet  Command ie eh RR zde wie e Ae rt Re ia 111   DYNA corem o e eO EET EUN Eee ure diode re tede uen 111   Parameters tentar dde aeu sd vede Aa Da ol tated dE Orbe ats nt 111   Options Syntax Convention      cilisis enn 111   Command Options aises ein e Re var pe d ei eee a ea m San prc a eta 111  ntsnet Configuration File Format               lt     ccc cece k 116   Resource Definition Syntax          lice ee 116   R  SOUtCeS 5543 scai o oec eu Dei d E oo oda 116  Map Translation File Format                  cece eee k 121  Environment Vatiables       eser err A  6 oba EE ES 122  Tuplescope Referenc
50.  example     ntsnet can draw its configuration information from a variety of sources  These sources  are  in order of precedence     command line options   ntsnet s application specific configuration file  if any   ntsnet s local configuration file   ntsnet s global  system wide  configuration file  ntsnet s built in default values             9       When they do not conflict  the settings from all of these soutces ate metged together to  create the ntsnet execution environment     We ll cover each of these items separately  in the context of actual execution tasks  See  the Linda Usage and Syntax Summary for a complete reference to all command line  options and configuration file resources and formats        ntsnet Configuration Files    ntsnet uses several configuration files  the global  local  and application specific  configuration filee   we ll use this term to refer to the specific files rather than in a generic  sense from this point on   which define program execution characteristics  and the local  and global map translation files  which define directory equivalences on the various potential  execution nodes in the network     ntsnet first looks for an application specific configuration file  named  tsnet config appliration name  where application name is the name of the application  being executed with ntsnet  application names will be discussed shortly   ntsnet looks for  an application specific configuration file in the following way  first  if the executable on
51.  execution on each node     This section looks at each method in turn  both discussions assume that you have  properly prepared executables for use with a debugger by compiling them with  g     ntsnet   s Debug Mode    The  debug option to the ntsnet command initiates TCP Linda program execution in  debug mode  Including this option on the command line starts an xterm process running  the debugger specified by the debugger resource on each participating node  The value  for the debugger resource defaults to dbx  the other currently supported debugger is gdb   the debugger from the Free Software Foundation   Setting the value for debugger to  none effectively disables debugging on a particular node  as in this example     Tsnet moliere debugger  none  Tsnet Node debugger  gdb       The second configuration file command sets the default value for the debugger resource  to gdb  while the first line prevents debugging on node 70 ere  The debugger resource is  node specific     For example  the following command will create three debugging windows executing the  program test24 on nodes selected from the ntsnet node list in the usual way       ntsnet  debug  n 2 test24  The node and application name will appear in the title bar of each window     Once all of the debugger processes have started up  you can set breakpoints  run in single  step mode  examine variables  and perform all other normal debugging functions for each  process  ntsnet facilitates program initiation by defining
52.  is chosen arbitrarily  When a match is found  the actuals  in the matching tuples are assigned to the formals in the corresponding  fields of s     Look for a tuple matching s in tuple space  If a matching tuple is found   actual to formal assignment occuts  If no matching tuple exists  the  process blocks until one becomes available     Predicate forms of rd and in respectively  They do not block if no  matching tuple exists  but return 0  FALSE  and exit  If a match is  found  they return 1  TRUE  and perform actual to formal assignment           Each field of s containing a simple function call results in the creation of  a new process to evaluate that field  All other fields are evaluated  synchronously prior to process creation  When all field values have  become available  the tuple s is placed into tuple space     Synchronously evaluates the fields of the tuple s and then places it into  tuple space     You can use the prefix __linda_ to construct an alternate name for any operation if the  shorter name conflicts with other symbols  The prefix begins with two underscore    characters     Formal C Linda Syntax  linda call    call type     call body                call type call body  in   linda in  inp   linda inp  ra __ linda rd  rdp   linda rdp  out     Vinda out  eval   linda eval     element   element        Linda User Guide 105    element     formal      actual    formal    lvalue  length    type name   actual  fvalue  length    length  expression   type name floa
53.  iz   1   0   NXTILE NYPROB                KOR k k kok k oko k oko ko k kok k RARA RARA k ok k kok RRA k kok RRA RRA RR K kok k kok ck kck ck kk ck kkk k      Ste  ic                        He     E TT  void step int it  int ixmin  int ixmax  float ul  NXTILE 2   NYPROB    float u2 NXTILE 2   NYPROB      void update                       Put boundary data into Tuplespace   kg  if  ixmin    1  1   out     west   it  ixmin  sul   1   0   NYPROB        if  ixmax    NXPROB 2       out    east   it  ixmax   amp ul  NXTILE   0   NYPROB          re     Get boundary data from Tuplespace   ui  if  ixmin f  1  1   int  Bast   up  uxmim l  Pearl   O   O  eh     if  ixmax    NXPROB 2          int  west   it  ixmaxtl    amp ul NXTILE 1  0               Update solution    S   update  NXTILE 2  NYPROB  ul  u2         Linda User Guide 155     KOR KKK KK kok k ok ko k kok k RARA RARA RR KR kkk kkk k kkk kkk kk kkk k kkk k kkk k kkk k k kk k      update  KOKCKCKCKCKCKCKCkCKCkCk kCkCkCkCkCK Ck kCk Ck kCk Ck KCkCk kCkCk Ck kc k Ck kCkCk kok K kc k k kc k k kc kckckck ck k kk ckok kc kok c ke e xe kx x      void update  int nx  int ny  float  ul  float  u2     int ix  iy                                     for  ix   1  ix  lt   nx 2  ix    1  for  iy   1  iy  lt   ny 2  iytt       uz41x nyt1y      ul ix  nytiy   parma cx     Kiult ixtl j ny  ftiy      ult ix l  nytiy   2 0       ul ix nytiy        permscv       oleistnytiytl    uitix  nytiy L   2 0   Klul ix ny  y  iy  js           KOR KKK KK kok
54.  kok ko k KK IK I A I A K kok kkk kkk kk kkk kkk k      inidat  eK A kCkCk kCkCK Ck kCk Ck kCk k KCkCk kCkCk Ck kCK Ck K KCkCk kCk Ck Ck kc k k kc k k kc k ck kck ck kok ck I I I KH      void inidat  int nx  int ny  float  ul       int ix  iy    for  ix   0  ix  lt   nx l  ix    1  for  iy   0  iy  lt   ny 1  iy       kiul ix nytkiy     El    t   ix    nx   xx   Ij   iy    n     iy     1  g            KOK KK KK KK kok ko KK ok kok ok kok RRA      pridat    KOKCKCKCKCKCKCKCk KCkCk kCk Ck kCkCK Ck kCK Ck kCk Ck kCKCk kok k kCkCK Ck kCKCk kCkCk KCkCk k kc k k kc k k kc k kckck ck kck ck k kk ckck kc kok sk ke ke e kx x      void prtdat int nx  int ny  float  ul  char  fnam       int ix  iyi  FILE  fp        fp   fopen fnam   w          for  iy   ny 1  iy  gt   0  iy     for  ix   0  ix  lt   nx 1  ix       Foriner  fo   SO 2f     lultix ny b ury  s  if  ix    nx 1   fprintt fp      ys  else    fnrintfifp  Xn         fclose fp      156 Sample Programs    2D Heat Equation       HEHEHE HEH HEH EEE EEE EEE EH EE HEE HEE HEE HEE HEE HEE HE HE HE HE HEE HR  FILE  make clinda_heat2D cl  DESCRIPTION  see clinda_heat2D cl  USE  make  f make clinda_heat2D cl                                                                                                             Note  To compile using tuple scope us linda tuple_scope compiler option   CFLAGS    linda tuple scope   FE HE TE FE FE k EH EH EH EH E FE FE FE HE FE FE HE TE FE HE EE EE EE HEH EH EH FE FE HH EE E TE FE E TE FE EE EE EE EE EE H
55.  o mm       Parallel Version   KKK kok k oko k oko k kok k oko k RAR RR KARA k oko k kkk kok ok k oko kok RRA RRA RRA kkk kkk kk kkk kkk kk ck k kk    LINDA Matrix Multiply   C Version            PILE  elind   mm cl    OTHER FILES  make clinda mm cl     DESCRIPTION      In this code  the master task distributes a matrix multiply                                            operation to n worker tasks   NOTE  C and Fortran versions of this code differ because of the way  arrays are stored passed  C arrays are row major order but Fortran    arrays are column major order   KOKCKCKCKCKCKCkCK Ck kCk Ck kCk k kok k Ck kCk Ck kc k Koko k kCk CK Ck kCK Ck kCKCk KCk k KCk Ck kok k kok Ck kk kck ck kck ck kck ck k kk kk             include  lt stdio h gt    define NRA 62   define NCA 15   define NCB 7    define MAXWORKERS 20       real main       int id   LME numworkers  nbr_rows  offset i j k   iT extra   int worker       worker function     int work count     number of work assignments made       double a NRA   NCA   b NCA  NCB   c NRA   NCB   results NRA   NCB         ACkCk ck ck ck ck ck ck ck ck k ck kk k k k kk k k k  i  itializations KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK      Get the number of tasks from the user   KCKCKCKCKCKCkCkCK Ck kCk ck k kc k k kc K Ck kCk k kCK Ck KCkCk KCk Ck kCkCK Ck kCK Ck kCkCk KCk k KCk k kok Ck k kc k K kok kok kok ck ckck ck kok ck ke ke e Y    numworkers 0              while   numworkers  gt  MAXWORKERS      numworkers  lt  1      printf   Enter number 
56.  of data types     Tuples can have a maximum of 16 fields  In C Linda  tuple fields can be of any of the  following types     int  long  short  and char  optionally preceded by unsigned    float and double   struct   union   Arrays of the above types of arbitrary dimension  including multidimensional      9 9   e     arrays     Pointers must always be dereferenced in tuples     In Fortran Linda  tuple fields can be of these types          Integer   1 through  8   Real  Double Precision  Logical   1 through   8   Character  Complex  Complex 16     Synonyms for these standards types  for example  Real 8       Arrays of these types of arbitrary dimension  including multidimensional arrays   and or portions thereof      Named common blocks    26 Using the Linda Operations    Specifying Tuples and Basic Tuple Matching Rules    Formal Tuple Matching Rules    A tuple and a template match when       They contain the same number of fields     All corresponding fields are of the same type     e The type of a field containing an expression is whatever type the expression  resolves to  The type of a field containing a formal 1s the type of the variable  used in the formal          In C  fora structure or union field  the type is extended to include the  structure or union tag name  The tag name and size of structures must match   Tagless unions and structures are not allowed       In Fortran  common blocks match based upon their name alone  Their  internal structure is 70  considered    
57.  on the ntsnet command line  it takes precedence over the  resoutce value in the configuration files     available Specifies whether a node is available for use as a worker  This resource is  node specific  The default is true     116 Linda Usage and Syntax Summary    bcast    bcastcache    cleanup    debug    debugger    delay    distribute    fallbackload    getload    high    ntsnet Configuration File Format    Specifies whether or not the tuple broadcast optimization is enabled   The default is false     Specifies the size of the broadcast cache  This size is a trade off between  memory consumption and hit rate  The default size is 1Mb  This  resource is only used when beast is true     Specifies whether or not remote executables shall be removed from  remote nodes after execution completes  Executables are removed only  if they were distributed by ntsnet in the current execution  The local  executable is protected from removal  The default is true        Specifies whether or not to run in debug mode  see Debugging TCP  Linda Programs on page 69 for more information   The default is  false  If true  it also changes the default value for kaon to false   workerwait to 1000000  and maxwait to 1000000  and overrides the  value of nice to be false     Specifies the debugger to use when running in debug mode  The default  is dbx     Specifies the delay period in seconds between invocations of rsh when  ntsnet initiates execution on remote nodes  The default value is 0     Speci
58.  rd respectively  that   s what the p stands for    They attempt to match the template specified as their argument to a tuple in tuple space  in the same way as in and rd  and they perform the same actual to formal assignment  when a matching tuple is available  As expected  inp removes the matching tuple from  tuple space  while rdp does not  When either of them successfully matches a tuple  it  returns a value of 1 in C and  TRUE  in Fortran        If no matching tuple is available  inp and rdp will not block  Rather  they will return a  value of 0  C  or  FALSE   Fortran  and exit  Using inp and rdp can complicate your  program  because they tend to introduce timing dependencies and non deterministic  behavior that may not have been intended        For example  consider the following C Linda code fragments        Master code     real main           for  i 0  i lt tasks    i  O  0bL  task   1    out  tasks outed          Worker code     worker           rd  tasks outed     while  inp  t  sk    1    do task i      Clearly  if the rd of the    tasks outed    tuple were omitted  the the worker code would be   non deterministic  It might get any number of tasks before the loop terminated  which is  not the intent  What is perhaps less clear is that the program is still non deterministic even  with the rd  This is due to the fact that out is asynchronous  There is no guarantee that all  of the task tuples will be in tuplespace before the    tasks outed    tuple arrives     It is
59.  shall proceed  otherwise execution shall terminate      nodefile filename  This option specifies the name of the file containing a list of nodes on  which this application can run  The default is tsnet   nodes  This  resource is for backward compatibility with the old tsnet utility  This file  is only used if the nodelist resource is set to  nodefile  which is the  default value  See the description of the nodelist resource for more  details      nodelist  node specifiers      This option specifies a space separated list of nodes on which an  application may run  This list must be inclosed in quotes if more than  one node specifier is used  A node specifier can be any one or a  combination of the types described below   The keyword  nodefile    Linda User Guide 113    A node name  A user defined resource    See the description of the nodelist resource for more details     Note  If the  nodelist option is not used and you have not specifically  set the nodelist resource in the ntsnet configuration file s   the  application will run on the nodes contained in the tsnet   nodes file in  your current working directory      opt  resource  value   This option specifies a value to override any resource in the  configuration file  It provides a mechanism for overriding resources for  which no specific command line option is provided      p path This option specifies both the directory on a remote node where the  Linda executable resides or will be distributed to  and the directory th
60.  some part of the output matrix C  At the beginning of the program  the  master process creates tasks for each chunk of C that is to be computed separately  Each  worker removes a task from the shared data space  It then reads the required rows of A  and columns of B  if necessary   and forms the dot products  When it is finished  it places  the resulting chunk of C into the shared data space  and at the conclusion of the program   the master gathers up all the chunks of C  The granularity of the calculation can be  adjusted by varying the amount of C that each task computes  This approach to the  problem is illustrated in the preceding diagram     Notice once again the distinction between tasks and workers  In this example  the    elements of C to be computed are divided into groups  and each task consists of  computing one of the groups  The elements of C are not divided among the worker    Linda User Guide 9    processes in any explicit way  Worker processes do not know what tasks they will  perform when they are created  Workers acquire tasks as they are ready  and perform  whatever task they get     This approach has many benefits  For one thing  it is generally easy to code  since the  worker processes don   t need to worry about explicit interprocess communication  that is  taken care of by the parallel programming environment  which manages the shared data  space  Processes read and write data via Linda   s operations  In addition  this method also  tends to be natura
61.  specific instance  nt snet     Application names are usually the name of the corresponding executable program  In  order to make the separations between components possible  however  periods in  application names must be translated to underscores  Thus  the application big  job  would appear in configuration file entries as big_job  Application names can also be  user defined if desired  and the application name to use for a given run can be specified  on the command line with ntsnet   s  appl option  For example       ntsnet  appl big job medium  job    This option can be used either to apply one application   s settings to a different  application program or to specify the use of a user defined application name  which need  not correspond to the name of any executable program   Note that periods in executable  names are translated to underscores only when used as application names in the  configuration files  such translation should not take place at any other time  such as when  they are invoked in the ntsnet command line     Node names can be full node names  such as moliere frachem com  ot node nicknames   moliere  In the node component of configuration file entries only  periods again have to be  translated to underscores  so that ntsnet can figure out where the component boundaries  are   Anywhere else in the configuration file   as part of resource values  for example     and on the command line  no such translation is used     Configuration file resources ate keywor
62.  subsection of a  multidimensional array  you can do so by specifying a starting element and or length  but  a more powerful mechanism for doing so is provided by the Fortran 90 array syntax   described in the next section     The remainder of this section will discuss multidimensional arrays in C    In C  the basic principle to keep in mind is that multidimensional arrays match only when  their types and shapes are the same  the same holds true for sections of multidimensional  arrays  We will use these arrays in the examples    int a 100  6  2   b 60   4   5   ar100  6   2      The following operations create and retrieve a tuple containing a multidimensional array     Sut multi   i   in  multa  di      The following in operation will not succeed because the shapes of arrays  a and b ate different  even though they have the same number of elements      cut   multi section   ax    in  multi section    b      WILL NOT WORK          Portions of multidimensional arrays may also be specified  Here are some examples   int a 3  5  2   b 51 2   cl2   i     out  section   alol  01 35  int section   26i        out    2d section   a 0 2    in  2d section   Tbi s       out   not an array   a 0  0   D    in  not an array    i      just a scalar sss       In the first pair of operations  the construct a  0   0    points to the start of an array of  length 2 which is why the formal involving array c matches it  In the second pair of  operations  two 5x2 array sections  with the second one
63.  such  statements still will retrieve whatever number of elements is present in the    ten elements     tuple   Assuming that the first ten elements of array b have the same values as the corresponding  elements of array a  the following in operations would consume one of the    ten element     tuples without assigning any values   in  ten elements   b l0     in  ten elements     b 10   Here array b 1s used as an actual  and a matching tuple is simply removed from the tuple  space  assuming one exists   Note that there are better ways to remove tuples from the  tuple space without copying the data in them  as we will discuss later   Artay references can begin with any desired element  as in these examples   in  ten elements      amp b 4  1en     in  ten elements    b 5  len   These operations would retrieve a    ten elements  tuple  place the ten array elements it  held into the fifth through fourteenth elements of array b  and set the value of 1en to 10   Although the examples in this section have used integer arrays  exactly the same    principles and syntax apply when accessing arrays with elements of other types     Note that retrieving a tuple containing an array longer than the target array used in the  template will result in writing past the end of the target array     Linda User Guide 31    Multidimensional Arrays    In Fortran  array shape is ignored  and so multidimensional arrays can be handled in a  similar way to one dimensional arrays  If you want to refer to a
64.  tets    tpoints  lt  10  Y tpoints i 10        Print something out for validation      printf   first Sd pointe  for validation   Wo   tpts    for  1   0  i  lt  tpts  i    printt    4 2Ff     values 1i     printi   anys      Display results with draw wave routine          draw wave   amp values      return  0      Linda User Guide 149                                                                                                                                                             HEHE HH EH EH EH FE HE TE FE FE FE FE FE HE FE FE HE TE FE EH EE EE HEH EH HH TE FE FE HE FE HE EE EE EE EE EE HEH EH EE EH HE E E E HER H    Linda Concurrent Wave Equation Program Makefile     FILE  make clinda wave cl     DESCRIPTION  see clinda wave cl     USE  make  f make clinda wave cl     Note  To compile using tuple scope us linda tuple scope compiler option      CFLAGS    lm  linda tuple scope    k k FE E ERE FE HE FE FE HE TE FE E TE FE HE TE FE FE HE FE FE E TE FE HE TE FE FE HE FE FE EE EE FE HE TE FE EE EE EE EE EE HE TE FE EH EE E E E E      CC   ele  CFLAGS    lm  wave  clinda_wave cl   S CC  S CFLAGS  clinda_wave cl  o wave          2D Heat Equation          in    Serial Version      e kk k kok k oko k ok ko RARA RRA kkk kkk k kkk RRA k kok k kok RRA RRA RRA k kok k kok k Koko k kk k k    Ser  FIL  OTH  DES  two  tem  zer  the    dom  FEF          clu                      ial HEAT2D Example   C Version   E  heat2D c   ER FILES    CRIPTIONS  This example is based on a simplif
65.  that remote TCP Linda processes ate initiated with  rsh  not rlogin   Hence  make sure that the PATH environment variable is set  properly even if the login initialization file is not executed  You can test this by  running rsh manually  and you can ensure this by placing the variable   s definition  in    cshrc rather than     login if you use the C shell       In some network configurations  it can be necessary to give remote hosts access  to the local X server  This is the purpose of the xhost command  You will need  to run xhost if you see error messages like this one     Xlib  Client is not authorized to connect to Server    If it is required  you can execute the xhost command manually  For example  the  form xhost   grants access to the local X server to all remote hosts  You can also  specify a list of nodes  check the man page for details      Alternatively  you can modify the linda rsh shell script  located in the bin  subdirectory of the Linda tree   adding the option  access to the xon command   which causes the latter script to run xhost automatically       ntsnet s debug mode also changes the default values of some other resources       The workerwait resource defaults to 1000000 seconds        The maxwait resource defaults to 1000000 seconds        The nice resource is overridden to be false        ntsnet ensures a consistent dbx environment across all nodes by copying the   dbxinit file from the local node to all participating remote nodes  The dbx  command ig
66.  the  master could easily generate tasks much faster than the workers could complete them   and fill up tuple space in the process  causing the program to run out of memory and  terminate     A technique known as watermarking can provide protection against this eventuality   Watermarking involves making sure that there are no more than a fixed number of task  tuples at any given time  the high water mark  so to speak   Once this limit is reached  the  master process must do something else   such as gathering results   until the number  reaches a lower bound  the low water mark   at which time it can go back to creating  tasks  When the number of tasks once again reaches the upper bound  the process  repeats     Here is a version of the master process with watermarking     20 Call Get_Next  DB  Rec   IF  Rec  EQ  EOF  Go TO 30  out  task   rec  OK   NTasks NTasks 1  IF  NTasks  LE  UPPER BOUND  Goto 20  DO While NTasks  GT  LOWER BOUND   in  result   res                          Call Process  res   NTasks NTasks 1  EndDo  Goto 20             30 Do 40 I 1 NTasks  in  result    res        Call Process  res   40 Continue    Database Searching    Creating a new task increments the ntasks counter  Once it reaches its maximum value   the master switches over to gathering and processing results  decrementing the ntasks  counter  which now holds the number of outstanding tasks  since every time the master  finds a result tuple  it can be sure a task has been consumed  When the number 
67.  the alias Irun within each  debugging session to be the appropriate command line to begin application execution   You should use this alias  rather than the debugger s normal program initiation command   e g   run in dbx      Once the program has finished executing  the controlling  master  process will exit and  the debugger prompt will appear in the corresponding window  However  the other   worker  processes will not return  To terminate all program processes  enter the quit  command to the debugger for the master process  and ntsnet will automatically terminate  all of the other processes     The debug resource  application specific  can be used instead of the command line    option  A value of true is equivalent to including the command line option  the default  value is false      Linda User Guide 69    Hints on running TCP Linda programs in Debug Mode    70 Using TCP Linda    Keep in mind that each process is handling a portion of tuple space in addition to  running the application program  Therefore  when a process is paused   for example  at a  breakpoint   then no tuple space requests can be handled by it  For this reason  it   s best  to break only a single process at a time  with all other processes either continuing or  stepping through a  blocked  in or rd operation       ntsnet relies on the command search path being set appropriately on all remote  nodes  Specifically  the locations of xterm  the debugger  sh  and rcp scp  need to  be in the search path  Note
68.  the rate limiting steps in the entire job  For  example if the comparison which took the longest was started last  the other workers  would all finish and sit idle waiting for it     Sometimes  such problems can be avoided by paying attention to the order in which  records are obtained  for example  by making get_next_rec more sensitive to task size  in our example  This can complicate get next rec a great deal  to the point where it  too would benefit from becoming a parallel operation  Of course  the same kinds of  considerations hold for the routine process as well     At other times  it is the comparison itself that needs to be parallelized  It may not be  sufficient to perform several comparisons at once when an individual comparison takes a  very long time     In either of these cases  it will not be possible to take the generic approach to database  searching that we have here  Rather  the specifics of the comparison or record retrieval or  results processing algorithms will have to be examined explicitly  and creating a parallel  version of one or more of them will be necessary to achieve good performance  In  Chapter 7 of their book  Carriero and Gelernter present an elegant combination database  searching program that parcels out small comparisons as a whole and divides large ones  into discrete pieces     Linda User Guide 85    Molecular Dynamics    86 Case Studies    Molecular dynamics calculations are performed to simulate the motion within molecules  over time 
69.  the slider is at the  far right end of the bar  which represents normal  full  speed  Moving the slider  to the left causes program execution to proceed at a slower rate  You can use the  slider at any time during program execution to change the execution speed   Slower execution rates represent a middle ground between single  step execution  mode  in which execution breaks at every Linda operation  and normal full speed  execution     96 Using Tuplescope         The Tuplescope Display    A tow of buttons for the tuple class window at the bottom left of the control  panel  Click on a button to open its cotresponding tuple class window  Buttons  for currently open tuple class windows are grayed out     When a process is initiated with an eval  an icon for it briefly appears above the tuple  class window icons  This icon is a white outlined black box containing a white number  within it  in the diagram  the sample icon contains the number 2  This number functions  as a sort of internal process ID and is incremented every time a new process is created  it  does not correspond to any user assigned numbering scheme  Once the process  performs an operation on tuple space  this icon disappears and the appropriate icon  appears in one of the tuple class windows     Tuple Class Windows    A tuple class window has the following parts          A sizing box  black with two white veins   located in the upper left corner of the  window controls the size of the window  Click the box to cho
70.  usually abort the process and cause the Linda program to fail     It is recommended that you quit from all native debugger processes before exiting from  Tuplescope  Pressing the Tuplescope Quit button while debugging windows are still  open causes their processes to be terminated    out from under them     Tuplescope will  make no attempt to stop the debugger processes  so you will have to do it manually   Some debuggers have difficulty shutting down in this state  so you may have to use the  UNIX kill command to stop those processes     The Tuplescope Debugging Language    Clicking on Tuplescope s Debug button brings up a menu that can be used to create   compile  and control debugging programs written in the Tuplescope Debugging  Language  TDL   TDL is the means Tuplescope provides for users to specify that certain  actions be taken on various program conditions  The various items on this menu have the  following meanings     ltem Effect   Edit program debug Edit the TDL program for this application   Compile program debug Compile the TDL program for this application   Clear Debugging Actions Cancel all debugging actions in effect    Exit Debug Menu Close the Debug menu     The Edit and Compile items edit and compile the file program  debug where program is the  name of the application running under Tuplescope  Edit will open an editor in a separate  X window  using the contents of the EDITOR environment variable to determine which   editor to run        The Compile item 
71.  will block until the counter tuple   s second field has its final value  the  number of worker processes     A third approach involves retrieving the final data tuples created after the eval   ed  processes exit  for example     C Version Fortran Version  in  worker   retval   in  worker    iretval     This allows the program to examine the return code from the function started by the eval  operation  While it isn t really necessary for a function as simple as hello it is a  technique that is quite useful in mote complex programs     Compiling and Running the Program    To run this program  you must first compile and link it  The C Linda compiler has similar  syntax to standard C compilers  Its name is clc  and its source files must have the  extension  cl  Here is a clc command that would compile the program in the file  hello world cl     Linda User Guide 19    C Linda Compilation      clc  o hello world hello world cl    The  o option has the same meaning as with other compilers  so this command would  complle and link the source file hello world cl  creating the executable program  hello world     The Fortran Linda version of the program is created using the fle command  as shown  below     Fortran Linda Compilation      flc  o hello world hello world fl do args f    Note that the extension on the Fortran Linda source file is    1  As illustrated  additional  source files  Linda and non Linda alike  may also be included on a Linda compilation  command line when approptia
72. 0 Do 40 I 1 NTasks  in  result    res        Call Process  res   40 Continue       DO 50 I 1 NWORKERS  out   task   dummy  DII       E    50 Continue  Return  End       This program first starts the workers  gets the target  and places it into tuple space  Then  it loops  retrieving one record from the database and creating a corresponding task tuple  until there are no mote records  Then it retrieves the results generated by the workers  from tuple space and hands them to process     Finally  in its final loop  the master process generates one additional task tuple for each  worker  These tasks serve as poison pills  special tasks telling the workers to die  The task  tuple   s third field holds either the value represented by OK  meaning    this is a real task      or the one represented by DIE  meaning terminate        Linda User Guide 83    84 Case Studies    Here is the corresponding worker   Subroutine worker  rd  target   target     DO While  TRUE    in  task   rec  flag   If  flag  EQ  DIE  Goto 100  Compare rec  target  result   out   result   result   EndDo  100 Continue    E             The worker loops continuously  reading tasks and compating records  placing the results  into tuple space for the master to gather later  until it encounters the poison pill  at which  point it exits     Straightforward as this version is  it has some potential pitfalls  The most serious occurs if  the database has large records  or large numbers of records  or both  In either case 
73. 9    database searching 82    Linda User Guide i       declarations in 2  dividing work among workers  74  Freewake 38  hello wotld 17  matrix multiplication 77  molecular dynamics 86  ntsnet configuration file entries  47  poison pill 83  ray tracing 73  Rayshade 73  simplification of 73  watermarking 84  executable locations 51  execution group 62  119  execution priority 51  extensions   cl 19  C Linda soutce files 19    extensions of C Linda source files 19    F  fallbackload resource 63  65  117  field types in tuples 26  file permissions 61  files  gmon out 14  mon out 14  tsnet config application name  45  fixed aggregates 36  flexit 106  flhalt 106  flintoff 106  flinton 106  floffexit 107  flonexit 107  flprocs 107  formal 12  anonymous 36  Fortran 38  106   107    array numbering 41    ii Index    array otder vs  C 41  calling from C 41  Fortran 90 array syntax 33  Fortran common blocks 33   Fortran support functions 106   107   functions  eval restrictions on 24  flexit 106  flhalt 106  flintoff 106  flinton 106  floffexit 107  flonexit 107  flprocs 107  lexit 106  Ihalt 106  lintoff 106  linton 106  loffexit 107  lonexit 107  Iprocs 107  print times 106  start timer 106  support 106  timet split 106  timing 106    G  getload resource 63  65  117  gprof command 14  granularity 5  adjustability 6  9  81  definition 6  networks and 68  granularity knob 77    H   hello_world program 17  high resource 51  117  119  HP UX 100   hpterm 100    I  ignoring fields in tuples
74. Configuration File Format    The maximum amount of time to wait for a valid execution group to be  formed  Note that maxwait specifies the total time to wait  including the  time specified in minwait  it does not represent an amount of time to  wait over and above the minwait interval  The default is 30 seconds   which is the same as the default for minwait  unless debug is true  when  the default value is 1000000 seconds  See the discussion of minwait  below for more details about this resource     Specifies the maximum number of workers started for a given  application  The default is the number of distinct nodes in nodelist  minus one for the local node running the master  ntsnet initially starts  up the number of workers equal to the maximum of the minworkers and  maxworkers resource values  The master then waits the time period  specified in the minwait and maxwait resources for the workers to join  the execution group  If at least minworkers join within that time   execution shall proceed  otherwise execution shall terminate  See the  discussion of minwait below for full details     Specifies the minimum amount of time to wait to allow an execution  group to form in seconds  the default is 30  Execution will proceed  according to the following criteria  First  if at any point before the  minwait interval has elapsed  maxworkers workers have joined the  execution group  execution will commence at once  When the minwait  interval expires  if at least minworkers workers ha
75. Determining working directories and executable file locations on remote nodes   using map translation and the associated configuration files  if necessary    Copying executable files to remote nodes  if necessary     Initiating remote processes    Waiting for normal or abnormal termination conditions during program  execution    Shutting down all remote processes at program termination    Removing executables from remote systems  if appropriate      The remainder of this chapter will look at these activities   and the ways that the user can  affect how ntsnet performs them   in considerable detail     Using the ntsnet Command    Using the ntsnet Command    The general syntax for the ntsnet command is   ntsnet  options  executable  arguments     where options are ntsnet s options  executable is the executable file to run on the local  system  and arguments are command line arguments for the specified network program   ntsnet uses the command line options  the location of the local executable  and the  settings in its configuration files to determine all of the remaining information it needs to  execute the network parallel program     Customizing Network Execution    TCP Linda provides the ntsnet command to execute parallel programs across networks  of machines  ntsnet is designed for maximum flexibility  Proper configuration makes  running a TCP Linda program as simple as prepending its executable   s pathname   and  any arguments   with the ntsnet command  as in the previous
76. E HE HE HE E E E E                                                          CC   elg   OBJ   heat2D   SRC   clinda heat2D cl  XLIBS   CFLAGS      B OBJ   S SRE   SICC  S CFLAGS    SRC  S INCLUDE  S LIBS  S XLIBS   o   0BJ        Linda User Guide 157    158 Sample Programs    Bibliography    Nicholas Carriero and David Gelernter  How to Write Parallel Programs  A First Course   The MIT Press  Cambridge  MA  1990    An introduction to developing parallel algorithms and programs  The book uses Linda as   the parallel programming environment     David Gelernter and David Kaminsky     Supercomputing out of Recycled Garbage   Preliminary Experience with Piranha     Proceedings of the ACM International  Conference on Supercomputing  July 19 23  1992    An overview of and preliminary performance results for Piranha     Brian W  Kernighan and Dennis M  Ritchie  The C Programming Language  Second  Edition  Prentice Hall  Englewood Cliffs  NJ  1988   The canonical book on C     Robert Bjornson  Craig Kolb and Andrew Sherman     Ray Tracing with Network Linda      SIAM News 24 1  January 1991    Discusses the Rayshade program used as an example in Chapter 3 of this manual     Harry Dolan  Robert Bjornson and Leigh Cagan     A Parallel CFD Study on UNIX Networks  and Multiprocessors     SCIENTIFIC Technical Report N1   Discusses the Freewake program used as an example in Chapter 2 of this manual     Mark A  Shifman  Andreas Windemuth  Klaus Schulten and Perry L  Miller     Molecular  Dyna
77. EDITOR environment variable is opened in a new  window        Compile program debug  Translate the TDL program to Tuplescope   s internal form and put its  statements into effect     Clear Debugging Actions  Cancel all debugging directives in effect via the current TDL program     Exit Debug Menu  Close the Debug menu     TDL Language Syntax  TDL statements have the following form   if  condition  then action    Conditions have one of the following formats  select one item from each column   This  format tests the value in a tuple field and performs the action for matching tuples       field N    constant value   I    A Fs    This format tests for the specified Linda operation and performs the action for matching  processes     124 Linda Usage and Syntax Summary    TDL Language Syntax      linda op   eval    l  out  in  ra   block_in   block rd       This format tests for the specified process number and performs the action for matching  processes       process    N         A Vo    Note that the brackets are part of the condition syntax and must be included  Multiple  conditions can be joined with and and or  The entire condition is enclosed in  parentheses when it is placed into the TDL statement     Actions must be one of the following     break Pause program execution   hide Hide the triggering processes tuples   color c Change the triggering process tuple to the color c  one of  red     orange  yellow  green  blue  indigo  and violet   save Save the current contents of tu
78. In this chapter  we will look at them in more detail and in the context of complete  if simple  parallel  programs  Note that this chapter contains examples of using both C based and  Fortran based Linda  and it refers to them as  C Linda  and  Fortran Linda   respectively     Quick Start  Hello  world    Fortran     Fortran     In this section we ll construct a parallel version of the canonical first example program   hello world  Here is the sequential version     main       printrf  Hello  worldXn       Program Hello World  Print     Hello  world   End       It would be absurd to try to perform the  computation  done by this program in parallel   but we can create a parallel version where each worker process executes this program     and says  Hello  world    at the same time  Here is a program that does so     real main int argc char  argv        int nworker  j  hello     nworker atoi argv 1                   for  j 0  j lt nworker  j    val  worker   hello j     for  j 0  j lt nworker  j    in  done     printf   hello world is finished n        return  0      Subroutine real_main  integer I  NProe    Linda User Guide 17    Obtain number of workers  amp  store in NProc  Do 10 I 1 NProc  eval  worker   hello I    10 Continue  Do 11 I 1 NProc  in  done    11 Continue  Print    hello world is finished   Return  End          The first thing to notice about this program is its name and top level structure    real main for the C Linda  which requires that the top level routine 
79. KK KK KK KK oko k KK KK worker task OKCKCKCKCkCkCk ck k I I ko ke X RARAS    int worker      int offset rows length   int next offset   int i  jg  kj  double a NRA   NCA   b NCA   NCB   c NRA   NCB         First read in the a and b arrays       Ed     array x   Pals  rd   array b    bj        This is the head of an infinite loop over tasks          while  1          See if there is any work out there to do     in   offset    offset      if  offset     1      out  offset   ljs   return  0       There is still data to process  Receive matrix data from master task     in   array data   offset   rows         Update offset for the next guy       if   offset   rows   lt  NRA   next_offset   offset   rows        140 Sample Programs       Matrix Multiply    else  next offset    1     out   offset   next offset       Do matrix multiply       for  i offset i lt rows offset i      for  3 0  j  NCB  j         st  t 20   for  k 0 k lt NCA k     el21 141 e  11191   ela  ik    biel ta              Send results back to master task     out   results   offset  rows  Eli            end while loop        It s all over       return  0           EE AE aE aE aT aE AE k aE aE FE aE FE FE AE AE FE aE HE aE AE AE AE EH aE EH AE aa aH aa Ha aa aE aH aaa AE AE HEE aaa                                              Linda Matrix Multiply Makefile  FILE  make clinda_mm cl  DESCRIPTION  see clinda_mm cl  USE  make  f make clinda mm cl                                                                            
80. KKKKKKK      Get the number of tasks from the user  Then     define the number of worker tasks and the array partition size as chunksize   KKEKK KKK KKK KKK KKK KKK A A X KKK KKK KKK KKK KKK Z A AX A KKK KK KKK KKK KKK KKK KKK KKK KKK k  Edid        numworkers 0                 while   numworkers  gt  MAXWORKERS      numworkers  lt  1      printf   Enter number of workers between 1 and  d n   MAXWORKERS     scanf    d    amp numworkers          chunksize   100      KKK KK KR RIA RARA k kok k RAR kkk k master task OKCKCKCKCk kCk k k   OR ck ko ke ke ke e x x f    printf                Starting LINDA ARRAY Example        X   X IN             Initialize the array          i 0    while  i  lt  ARRAYSIZE     data i    0   i          put it out there     out   init arrav   data         Start the worker tasks       i 1    while  i  lt   numworkers     printf   Starting worker task  d n   i    eval   worker   worker i     i4       nbr chunks   ARRAYSIZE chunksize     i   1    index   0    while  i  lt   nbr chunks     out   my section   index  chunksize    index   index   chunksize   dues        end while       Linda User Guide 129       print a few sample values        i 1   while  in   result data       i  lt   nbr chunks     7id   index   result       put it into the data array      data index   index    while  data index  lt  index chunksize     data data_index  result  data index    data index                     print  t     n     printf   MASTER  Sample results from worker tas
81. ail  about how that happens     22 Using the Linda Operations    Linda Operations    Logically  the fields of an eval are evaluated concurrently by separate processes  evaling a  five field tuple implicitly creates five new processes  When every field has been evaluated   then the resulting data tuple is placed into tuple space     In current Linda implementations  however  only expressions consisting of a single  function call are evaluated within the live tuple and actually result in a new process  These  functions can use only simple data types as their arguments and return values  see below    All other fields are evaluated sequentially before new processes ate created     Here is a typical eval statement     C Version Fortran Version  evali  coord    x  t x    eval  ocoord   X  Elx      This eval will ultimately result in the same data tuple as the out operation we looked at  previously  Howevet  in this case  the eval operation will return immediately  and a new  process will evaluate     x   By contrast  the out operation will not complete until the  evaluation of     x  is complete and it has placed the data tuple into tuple space     Compare these two C Linda loops     Loop with out Loop with eval  for  30  i lt  100  144  for  10  i  lt  100  i     out  f values     i     1    eval  t values   Ap fi      The loop on the left will sequentially evaluate f  i  for the first 100 non negative  integers  placing a tuple into tuple space as each one completes  The loop on 
82. al version  7 0  Corresponds to TCP Linda version 7 0  September  2005    Printed in the U S A        Acknowledgments    Many Linda users and developers have contributed to the direction and quality of both the  Linda products and this documentation  In particular  we d like to give special thanks to  Harry Dolan  Kevin Dowd  and Joe Casper of United Technologies Research Center  who  produced the Linda version of the Freewake program described in Chapter 2  and T  Alan  Egolf  the author of Freewake  Craig Kolb  the developer of the Rayshade program described  in Chapter 3  Mark A  Shifman  Andreas Windemuth  Klaus Schulten  and Perry L  Miller  the  developers of the Molecular Dynamics program described in Chapter 3  Paul Bercovitz  who  created the original version of the Tuplescope debugger described in Chapter 5  and Donald  Berndt and Steven Ericsson Zenith  authors of previous Linda manuals for SCIENTIFIC     Contents    Introduction    esses j   bd d     oc AE 1  How to Use this Manual 0 005 006 eere e m9 rk nr eee  eh  ha Eh ire en 1  About the Example Programs    sies e hr a er eR e Rees 2  Typographic Conventions                   ccc ce ee eee ee eens 3   1 Overview of Parallel Programming and the Linda Model        5  Approaches to Parallel Programming                 4444444444440 7   Message Passitnous coe eee er eA He peque EP Pa e PEE a  Distributed Data Structite    s se shee re Gorin o Gewese REESE E EET 8  The  Linda Model citrico ERU E Tr peor OG nee ak E
83. an  Version    20    10       ell  segmnt  iseg         calcdisp   amp x iseg  ifil  iblade     amp y iseg  ifil  iblade    amp z iseg  ifil  iblade     amp dy iseg    amp dz iseg    amp y   amp z       place results in tuple space                       amp dx iseg    amp x        out   delta   index  dx  dy  dz    Subroutine worker  Double Precision x     y     z       rd  wake    x   y   z   nblades   nfilmnt   nsegmnt     Do 10 I 1 VERY BIG NUM    in  index          index   out   index   index 1    GE  nblades nfilmnt        if  index Return    iblade  index   nfilmnt    1  ifil modulo  index nfilmnt   1       Do 20 iseg 1 nsegmnt       Gali       caledisp x iblade ifil iseg    z iiblade ifil iseg         y iblade ifil iseg                           dx iblade ifil iseg   dy iblade ifil iseg    dz  iblade  ifil iseg    Continue  out   delta   index  dx  dy  dz   Continue  Return  End       Each worker process first reads the position arrays and their index limits from tuple  space  The worker then loops continuously until all points are done  At the beginning of  each loop  it removes the    index    tuple from tuple space  increments the counter in its  second field  and then returns it to tuple space for possible use by other processes     40 Using the Linda Operations    Example  Freewake    This counter  stored in the variable index  serves as a composite index  combining the  outer two loops of the original Fortran code  Each task consists of executing the inner  Fortr
84. an loop for a fixed pair of I and J values  i e   specific blade and filament indices   A  single counter tuple is easily retrieved  incremented  and returned to tuple space        There are NFILMNT times NBLADES distinct pairs of I and J values  so the worker first  tests whether the counter   s value is greater than or equal to this product  equality is  included in the test since index begins at 0 and runs to  nfilmnt nblades  1   we   ve given the equivalent variables in the Linda programs lowercase names   The  variable iblade is defined as index divided by nfilmnt and ifil is defined as  index modulo nfilmnt  Because these are integer operations and all fractions are  truncated  iblade will remain 0 until index reaches nfilmnt  then become and  remain 1 until index reaches nfilmnt 2  and so on  At the same time  ifil counts  from 0 to nfilmnt 1 for each value of iblade over the same period  The following  table indicates the encoding of iblade and ifil within index for nfilmnt 3 and  nblades 2     blade filament  index iblade iseg       0 0 0  1 0 1  2 0 2  3 1 0  4 1 1  5 1 2  6 terminate    Once iblade and ifil are computed  a for loop calls a slightly modified version of the  otiginal calcdisp routine once for each segment value  using the computed iblade  and ifil values each time  In the C Linda version  this loop differs from the Fortran  inner loop it replaces in several ways  First  the arguments to calcdisp are explicitly the  addresses of the relevant array el
85. an terminate normally when the last process  usually real main   finishes  returns from its outermost call   A program may terminate by having the  final process call the C Linda support function lexit  flexit is the  Fortran Linda form   but this is not required  Note that if you want to call an exit  function  use lexit  never use the standard system call exit       The program can force termination by calling 1halt  C  or flhalt  Fortran    When such a call occurs in any process of the Linda program  the entire    computation will be terminated immediately and automatically          Hany process terminates abnormally  then the entire program will again be  terminated automatically at once     Linda User Guide 37    An individual process within the parallel program may terminate execution of user code  by calling lexit or flexit  This function terminates local user processing without  affecting the other running processes  if any         See the Linda Usage and Syntax Summary for full details on available termination  routines and their use        Example  Freewake    We re now ready to look at another Linda program which illustrates all four operations  and several tuple matching principles in action  This program also illustrates the following  techniques     The master process becoming a worker after initialization is complete     The use of a composite index to combine two existing indices     The program we ll examine is named Freewake  a computational fluid dynamics  
86. ansform the serial program into a parallel program in the most straightforward way  possible  measure its performance  and then search for ways to improve it     If we focus on step 2  a natural question arises  where does the parallelism come from   The language of parallel programming can be quite ambiguous  On the one hand  there is  talk of  parallelizing programs   a phrase which focuses on the programmers who  convert sequential programs into parallel ones  On the other hand  much is said about   exploiting parallelism   implying that parallelism is already inherent in the program itself   Which is correct  Is it something you find or something you create     The answer is  of course   it depends   Sometimes a program can be trivially adapted to  run in parallel because the work it does naturally divides into discrete chunks  Sometimes  a serial program must be restructured significantly in order to transform it into a parallel  program  reorganizing the computations the program does into units which can be run in  parallel  And sometimes an existing program will yield little in the way of independent  computations  In this case  it is necessary to rethink the approach to the problem that the  program addresses  that is create new algorithms to formulate a solution which can be    implemented as a parallel program t       t of course  even in these cases  it may still be possible to take advantage of a parallel computer  by running multiple concurrent sequential jobs   
87. application  It was developed to model the wake structure created by helicopter rotor  blades and its influence on the blades themselves to aid in the design of new helicopters   It was originally parallelized with C Linda  but we will provide both C Linda and  Fortran Linda versions here     At its core  the calculation is essentially a very complex N body problem  the wake  surface is divided into a large number of discrete elements  The structure of this surface  depends on the properties of the individual elements  including their velocities  For each  time step  the change in velocity of each element is a function of its interactions with all  of the other elements  and all of these changes in velocity determine the new structure of  the wake surface  Once it is obtained  the program calculates the interaction of the wake  and the blades themselves     The vast majority of the computation is spent calculating the displacement of each point  of the wake for each time step  This is computed by these three nested loops  See How  and Where to Parallelize on page 14 for information about determining the most  computationally intensive portions of a program   Here is the original Fortran code                                            DO I   1  NBLADES  Typically   2  DO J   1  NFILMNT  Typically   16  DO K   1  NSEGMNT  Typically   512    CALL CALCBISP1 Silja  Rie Vib Rly  AIJ K   DX I J K  DY I J K  DZ I J K    END DO  END DO  END DO                The subroutine CALCDISP ca
88. arrays in tuples simply by name   i e   without the colon notation   For example  if a is an array of declared length 20  then  it can be referred to in a tuple by its name alone  as in this example   int a 20   b 20      out   arrav   a     Similarly  you can retrieve such an array by name     36 Using the Linda Operations    Termination of Linda Programs    rd    aeray   25   in  array   Ta   You can also treat multidimensional arrays  and sections thereof  this way     int a 100 1 61 121  5 25   241  2      out   milti section     a 90  0     in  multi section    b 0  01 5       In these statements  both arrays are essentially being used as pointers to the start of a  fixed aggregate of length 2  Two array sections treated as fixed aggregates in this way  must have the same length and shape to match     Fixed aggregates never match arrays specified in the usual way  Fixed aggregates match  only other fixed aggregates  Literal character strings specified without a terminal colon  are treated as fixed aggregates  Thus  neither of the following in operations will find a  match     char a 20   8161   int len     out    wrong   all   cut  also wrong    hello       in  wrong    a     no match     in  also wrong    stlen     no match       Termination of Linda Programs    Since a Linda program can consists of many concurrent processes  program termination  requires a bit more care than for sequential programs  Linda programs can terminate in  one of three ways       The program c
89. at  shall be eded to prior to executing the remote Linda process  Thus the   p option simultaneously overrides both the rexecdir and rworkdir  resources     Since  p specifies a local directory  the value of  p is subject to map  translation  The translation occurs before the  p value overrides the  rexecdir and rworkdir resources  This option is intended to provided a  mechanism very similar to the  p option on the previous tsnet utility      redirect This option turns on the tuple redirection optimization  This is the  default     redirect This option turns off the tuple redirection optimization     suffix This option causes a node specific suffix  indicated by the suffixstring    resource  to be appended to the executable name  This is the default   Note that the default value of suffixstring is an empty string      suffix This option indicates that TCP Linda is not to use node specific  suffixes    translate This option indicates that map translation shall be performed  This is    the default  Note that it is not an error to have translation on and to  have no map file  In that case  no translations will be performed      translate This option indicates that map translation shall not be performed      useglobalconfig  This option causes ntsnet to use the resource definitions in the global  configuration file  The resource definitions in the global configuration  file are used in addition to the command line options and the user s local  configuration if one exists  This i
90. at file and generating graph    n     prtdat NXPROB  NYPROB  u   final dat        KOR KK kok k oko ok ok ko k kok RRA kok k oko k oko kok RR RR RRA kkk RRA k kkk k kkk kkk k kkk kk ke kk      Worker routine  KOKCKCKCKCKCkCK Ck kCk Ck kCk Ck kok k kCkCKCk kCk Koko k Ck kCk Ck kok kok Ck kc kk k Ck kck ck kck ck k kk kk kk kk kk      int worker int ixmin  int ixmax          float utile 2   NXTILE 2   NYPROB    int ix  iy  iz  ES  void step             ss     Get parameters and initial data from Tuplespace   af   rd  struct parms   paris     imi initral data   asin  Penatile  0   1   0  eps    Ze     Set edges of utile to grid boundary values if appropriate   ui  if  ixmin    1  1  int   Ieft    T  utile O0  O   0   lt     for  iy   0  iy  lt   NYPROB 1  iy       utile 1  0   iy    utile 0   0   iy         if  ixmax    NXPROB   2     ind  right   2  utile  0   NXTILE 1   0  2    for  iy   0  iy  lt   NYPROB 1  iy       utile 1   NXTILE 1   iy    utile 0   NXTILE 1   iy      El                   154 Sample Programs    2D Heat Equation                   for  ix   0  ix  lt   NXTILE41  ix    4  utile 1   ix   0    utile 0   ix   0    utile 1   ix   NYPROB 1    utile 0   ix   NYPROB 1         FE       Iterate over all timesteps    ui   iz   0    for  it   1  it  lt   parms nts  ittt     step it 1  ixmin  ixmax   amp utile iz  0  0   sutile 1 iz   0  0     az   1   Tus               Put results into Tuplespace    k    out   result id   ixmin  ixmax    out   result   ixmin   amp utile
91. be given this name  rather than the usual main  Similarly  the top level Fortran Linda routine is named  real main  note also that it is defined as a Subroutine  notas a Program        The C Linda program requires one command line atgument  the number of wotker  processes to cteate  and to save space we ve eliminated the code that checks whether or  not it got a valid argument   There are a number of ways that the Fortran Linda version  could obtain the number of worker processes  we won t dwell upon those possibilities at  this point     Next  the program s first loop initiates nworker worker processes using the eval  operation  each executing the function hello  Here is hello     C  hello int 1      printf   Hello  world from number  d n  i    out   done     return 0      Fortran  Subroutine Hello  ID   Print    Hello  world from number   ID    out   done    Return       End    hello is only a minor variation of our original sequential program  It prints the  Hello   world  message along with the number it was passed from real main  this integer  serves as a sort of internal process number   Each message will look something like this     Hello  world from number 3    The routine hello places a tuple containing the string  done  into tuple space just  before exiting  These tuples are then gathered up by the master process   real main     in its second loop  This technique has the effect of forcing real_main to wait until all  the worker processes have terminated before exiti
92. causes the file to be translated into an internal form used by  Tuplescope  Successful compilation results in the message Compilation done   Otherwise  an error message is printed  Once compiled  the statements in the debugging  program go into effect until cancelled by the Clear Debugging Actions item     TDL Language Syntax    TDL programs consist of one or more lines of the following form    if  condition  then action   where condition is a test condition  and action is some action to be taken when the test is  true  1   Note that the parentheses are part of the condition    Conditions are formed from the following components     item operator test_value     Note that the brackets are a required part of the syntax     There are three distinct kinds of conditions     Linda User Guide 101    102 Using Tuplescope    Tuple field comparison tests  where   e7 is field N  where N is an integer    operator is one of the C operators          gt  and  lt   and  est_value is a constant  This  sort of test selects tuples on the basis of one or more of their fields  For example     This test chooses tuples whose second field contains the value 2     Character strings used as constants must be enclosed in double guotation marks   Single characters must be enclosed in single quotation marks     Note that tuples from distinct tuple classes can be selected by the same tuple field  comparison  Fields containing aggregates may not be used in such conditions     Tuple space operation tests  w
93. ce eee eee 63  Special  Purpose  Resources  y siria seins tre e adi Wee av n   vie ets 67  Tuple Redirection Optimization              lt     e 67  Disabling Global Configuration Files                  eee eee eee 67  Generating Additional Status Messages    0 6    cee eee eee eee 67  Process Initiation Delays    1    nee ence ee 67  Appropriate Granularity for Network Applications               lt     68  Forcing an eval to a Specific Node or System Type               lt  lt     68  Debugging TCP Linda Progtams   5  eere rte eek Rn ee 69  ntsnet s Deb  g Mode  s   esses eer ex Cy  Er e e a 69  Hints on running TCP Linda programs in Debug Mode                       70  Running TCP Linda Programs Without ntsnet                444444444 71  4 Case  Studies      ers er ere Osea AO keer see NR AR 73  Ray Tracing e zo boa ieee x ns GAY BU va Veta Gad Wed e es Au VR Bink We e 73  Matrix Multiplication icc   eder EIE ry eg d NR Helo de nae eB 77  Database Searching desees Bes  fy eR ORE REA each Re eee rede udis 82  Molecular Dynamics 2406 o mestre ene Ro UR e scene t UR D bts 86  5 Using Tuplescope x Mia A dd 95  Program Preparation 4   cess rk e Rm ere eR Aura darn d UR E Ds 95  Invoking TFuplescope   eee e y e  Re hme  nm ehh heit kt alo da m eon 95  Th   Tuplescope Display  soi dio ere e eren eed ed eet dde a 96  The Control Paneler esee ee rm eer Cr Ee p CE PUN Caepio ds 96  Tuple Class WII dO WS  senil oh vp wat C epa ee i e V P RES RS 97    ii Contents    Niewino ADOTEDALES v
94. ched  or come closest to matching  so far  When all relevant records  have been tested  output will print the final results     This version could search any kind of database  given appropriate versions of       get_next_record  compare  and process  get_next_record could be  implemented to return every record in the database in sequential order  or according to  some sorting criteria  or it could return only selected records   those most likely to match  the target  for example  think of searching a database containing fingerprints      Database Searching    compare might return a yes or no answer  depending on whether target matched the  current record or not  or it might return some value indicating how close a match the two  were  In the first case  process would only need to keep track of positive results     matches   while in the second it would probably want to report on some number of best  matches at the conclusion of the program     Transforming this program into a parallel version is fairly straightforward  Each task will  be one comparison  This time  we ll use one tuple for each task  holding the actual  database record  rather than a single counter tuple  Here is the parallel master routine     Subroutine real_main    Do 10 I 1 NWORKERS  eval   worker   worker            10 Continue  Call Get_Target  target   out   target   target     NTasks 0   20 Call Get_Next  DB  Rec   IF  Rec  EQ  EOF  Go TO 30  out   tesk   rec  OR   NTasks NTasks 1  Goto 20             3
95. cons for processes that have accessed tuples in this class  The form of the icon  varies depending on the operation that the process performed and its status   These are the possible icons  all icons display the process number in their center      Icon Appearance Meaning   Solid black arrow pointing up Successful in operation  Solid black atrow pointing down Successful out operation  White arrow in black box pointing up Successful rd operation  Solid black diamond Blocked in operation  White diamond Blocked rd operation    There are examples of each type of icon in the Tuplescope display diagram     Viewing Aggregates    98 Using Tuplescope    The Display Aggregates item on the Modes menu controls the display of aggregates   Click the Modes button to display its menu  and click the Display Aggregates item to  togele its current state  If the item is selected  a check mark appears to the left of its  name  You must select the Exit Modes Menu item to close the modes menu     Tuplescope Run Modes    When Display Aggregates is on  the Aggregates menu button is active  and you can use its  menu to select the data format for subsequent aggregates displays  It contains the choices  Long  Short  Float  Double  Character  and Hexadecimal  Only one format choice is  active at a given time  a check mark appears to the left of its name  All other formats are  grayed out and unavailable  The default format is Long     To select a different format  first deselect the current format by choo
96. consumption and hit rate  The default size  is 1 MB  This resource is only used when beast is true     This option indicates that remote executables should be removed when  execution completes  This is the default  Note that the local executable    is protected from removal     This option indicates that remote executables should not be removed  when execution completes     Linda User Guide 111     d  d     debug     distribute     distribute    Synonymous with  distribute  distribute     Run application in debug mode  see Debugging TCP Linda Programs  on page 69 for more information   This option also changes or  overtides the values of several ntsnet resources  see the discussion of the  debug resoutce later in this chapter for details     This option causes executables to be copied to remote nodes prior to  execution  Executables shall only be copied to nodes which are actually  going to take part in the execution  After execution completes  ntsnet  automatically removes the remote executables that it just distributed   The local executable is protected from removal  See the cleanup  command line option or resource for information on preventing the  automatic removal of remote executables     This option indicates that executables are not copied  This is the default      fallbackload load     getload     getload     h  h     help     high     high    This option specifies the load average the scheduler shall use for a node  if the RPC call to get system load average fails 
97. cont command in dbx      You may now use the native debugger to examine the process  The figure below  illustrates a sample combination debugging session     ex        Medes   frenos   Run   Break   Continue   Debus   Save   it  EBM      globals  task  result  worker                             Tuplescope windows exec 2  worker   ng   dbx version 3 1 for AIX   ane mn     Joe Type  help  for help   sl   alobals  INT  INT  reading symbolic information      m    dbx   9                Native debugger window    By default  the debugger started is dbx  except under HP UX where it runs xdb in an  hpterm window  A different debugger can be specified by setting the DEBUGGER  environment variable to it  If the executable is not in the current path  give the entire  pathname as its value  otherwise  its name alone is sufficient  Currently supported  debuggers are dbx  gdb  and xdb  under HP UX                  It takes a little practice to understand all the nuances of Tuplescope native debugger  interactions  The trickiest part is usually figuring out where the next continuation  command needs to be executed  If the Tuplescope Continue button does not resume  execution  try issuing a next command to the native debugger process es      100 Using Tuplescope    The Tuplescope Debugging Language    To exit from a native debugging session without affecting Tuplescope  detach from it  before quitting  in dbx  give the detach command  followed by quit   Quitting without  detaching first will
98. cutables 51  specifying resources without  option equivalents 114  syntax 45  111    wait option 62  115   workerload option 65  115     workerwait option 115    o  on command 118  operations 11  105  alternate names for 26  blocked 13  predicate forms 25  out 11  22  105  field evaluation order 22    ovethead 6    P  parallel programming  issues 7  steps 5  parallel programs  interprocess communication  options 8  parallel resource value 52    parallelism 5    Linda User Guide iii       and hardware environment 6  fine vs  coarse grained 6  parallelization  level to focus on 81  passive tuple 11  permissions 61  poison pill 8  portability 1  POSTCPP_CC environment  variable 123  predicate operation forms  inp 25  rdp 25  print_times 106  process scheduling 61  processor  definition 7  prof command 14  profiling 14  program examples  See examples    program termination 37    Q    quick start 17  Network Linda 43    R  ray tracing 73  rd 11   12  21  105  compared to in 21  rdp 25  105  real_main 18  37  creating from existing main 74  redirect resource 67  120  resource value 50  resources 46  ntsnet command line options  and 48  50  parallel value 52  58  120  process scheduling 61  specificity types 46  49  specifying those without    iv Index    command line options 114  syntax 116   rexecdir  parallel value 58   rexecdir resource 52  59  61  120  parallel value 52   rsh command 118   running C Linda executables 20   running C Linda programs  on a network 21   rworkdir res
99. dd significantly to the time needed to process the tuple  Many  non shared memory Linda implementations provide a way for the user to request that the  array not be copied  The user does this by negating the length  In this case  a supporting  Linda implementation simply records the address of the data  This address is then used to  locate the data that must be retrieved To satisfy a rd or an in  Any changes made to the  data after the out but prior to retrieval will be reflected in the data sent  Since the user  generally cannot know when the system internally retrieves the data  the tuple s content in  the presence of such modifications has to be viewed as nondeterministic  While it 1s  possible to design a parallel algorithm that exploits this non determinism  using a negated  length field is tantamount  in most cases  to a promise  to oneself   that the array will not  be modified until  by some means  it is known that all rds and ins against the tuple have  completed  Refer to the release notes to see if a particular implementation supports this  optimization  For more on arrays  Fixed Aggregates in C on page 36        Linda User Guide 29    Pointers and Assumed Size Arrays    C pointers to arrays and Fortran assumed size arrays must a mays specify an explicit length  when they are used as an actual within a tuple  as in these out operations     C  char b 2Z0    py Al30    int len     p   b   out  array2    pr20    in  arrayZ    d len      Fortran  Dimension b     d 30  
100. de  resource  value  where the various components have the following meanings     Program is either the class name Tsnet  or a specific instance of this class  1 e    ntsnet   In the future  there may be alternate versions of Tsnet type programs  such as  xtsnet  but currently there is only nt snet     Appl is either the class name Appl  or a specific application name  such as ping  The  application instance names cannot contain a period  you must convert periods to  underscores     Node is the class Node  or a specific node name  such as mysys  The node instance  names can be either the node   s official name or a nickname  The node instance names are  node names found in either the  etc hosts file  the NIS hosts database  or the  Internet domain name database  An example of an official name is   fugi mycompany com  A typical nickname for this node is fugi  If a node name  contains a period  you must convert the period to an underscore  The other option would    ec      be to use a nickname not containing the         character    Resource is a vatiable name recognized by ntsnet that can be assigned values    Value is the value assigned to the resource    If both the appl and node components are required for a given resource definition  the    appl component must precede node  If an incorrect format is used  the resource  definition will be ignored by ntsnet     Note  All resources are application specific unless otherwise specified  Also  if the  corresponding option is used
101. de specific  The default  is 1     suffix Specifies whether or not to append a node specific suffix  indicated by  the suffixstring resource  to the executable name  The default is true     suffixstring Specifies a suffix to be appended to a particular executable name when  run on a particular node  the default is the null string  meaning no  suffix   This resource is node and application specific     threshold Specifies the maximum load allowed on a specific node  The ntsnet  scheduler is prevented from starting another worker on this specific  node when this threshold is reached  This resource is node specific  The  default is 20     translate Specifies whether map file translation is used  The default is true   useglobalconfig  Specifies whether the global configuration file is used  The default is    true     useglobalmap Specifies whether the global map translation file is used  The default is  true     120 Linda Usage and Syntax Summary    Map Translation File Format    user Specifies a username to use on remote nodes  instead of your local  username  If this resource is unspecified  the remote username is the  same as your local username  This resource is node specific     verbose Specifies whether or not ntsnet works verbosely  The default is false  If  the verbose resource is true  ntsnet displays remote commands being  issued  and information about each node specified in the nodelist  resource     veryverbose Specifies whether the maximal amount of status messages s
102. ds whose values specify some kind of execution  behavior option or application or node characteristic  Consider these examples     Tsnet Appl Node rworkdir   tmp       Tsnet Appl maxprocspernode  4  Tsnet Node speedfactor  1    These three entries all refer to application and node classes only  thereby serving as  default values for instances not specifically defined in other entries  The first line sets the  default working directory for remote nodes to  tmp on each node  The second line sets  the maximum number of processes per node to 4  and the final line sets the default node  speed factor to 1  These entries are completely general  applying to every application  program and or node  Contrast them to the earlier example  which applied only to the  explicitly named applications and nodes     Resource Types    One can also freely intermix classes and specific instances  as in these examples     Tsnet hello world Node rworkdir   tmp hello  Tsnet Appl moliere rworkdir   xtmp       The first example sets the default working directory to  tmp hello for any remote  node when the program hello world is run  The second example sets the default  working directory on node moliere to  xt mp whenever a network application without its  own entry for this resource runs remotely on it     Many resources have corresponding ntsnet command line options  These options are  designed to be used to override configuration file settings for a specific program run   although they can be used inst
103. du Eie eas 10  How and Where to Parallelize  c  iced ee e asi em Ree e cete es 14   2 Using the Linda Operations     2    22cs ea hr Rn 17   Quick Start  Hello  world    0    eect nen 17   Compiling and Running the Program                  eee 19  C Linda Compilation escoria whe a dada hie des 20  Fortran Linda Compilation     0    0    ce eee eee ee 20  Linda  Operation edge ale eee ep eg nee rho E aa 21  Dr toi A gue E o ade cdi alan salad oan R bad bb Sis 21  O A REO ee RET VS Als GAM Gad Reg bs 21  o PC   CP  22  IU etd esce eM eie up eu P PU QR M UE eee paie edd meas ease 22  eval   s Inherited Environment               4444444444400 23  eval Function Restrictions ass ceder ene nee bnew E nem e  e oV VA A E SUE 24  Predicate Operation Forms  inp and fdp              eee 25  C Linda Alternate Operation Names              4    26  Specifying Tuples and Basic Tuple Matching Rules          ooooooomommmmmo  o   26  Formal Tuple Matching Rules                   4 eee eee ees 27  No 0  tt u Pa A A tek ak HASTE nas R ede na   28  ATTAYS 45 bees BN eS Gee GaSe ed ban  Eder   eho eRe RAO Ta Meee 28  Avoiding  Array COPIES isade reame erre rhe ds 29  Pointers and Assumed Size Arrays      0    eect etnies 30  Multidimensional Arrays             4444444000000 ence eens 32  Fortran 90 Array Sections 0 60 5  c ees cece ee ee ee eee e e ese 33  Fortran Named Common Blocks        0 0 0    cece eee eee 33  fcu PD 34  Varying Length Structures iiis sss ies re ee et e a eee 35   C Character
104. e    workerdone  tuples and merges their results into the  arrays used by the remainder of the  unaltered sequential  program  The calculation then  continues as in the sequential version  with the final calculated energies corrected for  temperature at the end of each iteration     Once all time steps have been competed  real main kills the workers and cleans up  tuple space  finishing the master functions it started when the program began  As we  have seen  master responsibilities are distributed among three separate routines in this  program  each controlling a different phase of the calculation     5    Using Tuplescope    Tuplescope is an X based visualization and debugging tool for Linda parallel programs   In addition to the usual debugger features such as single step mode and breakpoints   Tuplescope can display tuple classes  data in specific tuples  and visual indications of  process interaction throughout the course of a program run     Linda programs are usually debugged using the Code Development System and  Tuplescope  This combination offers the most convenient and least intrusive method of  debugging  Tuplescope is part of the Linda Code Development System  which simulates  parallel program execution in a uniprocessor environment  It requires a workstation  running X Windows  This chapter assumes knowledge of standard debuggers and  debugging activities  Refer to the manual pages for your system or to the relevant books  in the Bibliography for information on 
105. e Linda    product family   TCP Linda  the Linda Code  Development System  CDS   and Shared Memory Linda  SM Linda   The Linda product  family are parallel programming languages based on the C and Fortran languages that  enable users to create parallel programs that perform well in a wide range of computing  environments     This manual includes examples of using both the C based and Fortran based Linda  products and refers to them as  C Linda  and  Fortran Linda   C Linda combines the  coordination language Linda with the programming language C  similarly  Fortran Linda  combines the coordination language Linda with the Fortran programming language   Parallel programs are created with Linda by combining a number of independent  computations  processes  into a single parallel program  The separate computations are  written in C or Fortran  and Linda provides the glue that binds them together     Linda has several characteristics which set it apart from other parallel programming  environments       Linda augments the serial programming language  C or Fortran   It does not  replace it  nor make it obsolete  In this way  Linda builds on investments in    existing programs       Linda parallel programs are portable  and they run on a large variety of parallel  computer systems  including shared memory computers  distributed memory  computers  clusters  and networks of computers  With few exceptions  Linda  programs written for one environment run without change on another       
106. e rows to each successive workers until the  numbet of elements it has exceeds the target number  or until all the rows are gone  This    Linda User Guide 91    92 Case Studies    simple heuristic works quite well so long as the number of workers is much  much  smaller than the number of atoms  a condition invariably satisfied by real world molecular  dynamics problems     Once the starting position  in the variable st art  and the length  out size  have been  computed  the    nbwconfig    tuple is sent to tuple space  its second field is workerid   the worker   s ID number assigned at startup  The tuple   s final three arguments are the  total number of atom pairs and two constants     The final for loop in the routine gathers the results of the special charges calculations  performed in parallel by the workers  The partial results  from the    qsum    tuple  are  summed up by nb_energy_setup and used in the remaining  sequential  initialization  activities  after which the routine returns     Before continuing with the main thread of the computation  let   s look briefly at the  beginning of the worker function  nb  energy worker        in the configuration for each worker     in  nbwconfig   workerid   start   outsize   fub n  T  xzpfac   cont  c    if  workerid    num workers 1     PU  A   PAL   B   TB   pack atc   Farlen    out   done reading       else    for  rdct   1  rdct  lt  num workers  rdct     in  done reading     in  A    At   B    B    pack atc    arlen      T
107. e same behavior as under previous  releases of Network Linda  which used the tsnet command   providing backward  compatibility if you do nothing     How ntsnet Finds Executables    The  nodelist and  nodefile command line options correspond to the nodelist and  nodefile resources respectively  The value for  nodelist must be enclosed in double  guotation marks if multiple nodes are listed       ntsnet  nodelist  moliere leopardi sappho  hello world 4    Specifying Execution Priority    Two resources control the execution priority of processes started by ntsnet  The high  resource  application specific  indicates whether processes are nice   d or not  it defaults to  true   If high is set to false  then processes are run at lowered priority  The command line  options  high  high  abbreviable to  h and  h  correspond to this resource     The nice resource  node and application specific  allows you to specify whether processes  should be nice d or not on a node by node basis for each application  For example  the  following lines state that processes running hello world on z0 ere should be nice d  but those on chaucer shouldn t be  although generally processes that run on chaucer are  nice   d      Tsnet hello_world moliere nice  tru                Tsnet hello_world chaucer nice  false             Tsnet  Appl chaucer nice  tru    The default value for the nice resource is true  If the value of the high resoutce is set to  true  then it overrides the setting of the nice resource 
108. e steps     int npts  first  i k    int nmin  nleft    int worker      tpoints   0    nsteps   0    while   tpoints  lt  NWORKERS      tpoints  gt        printf   Enter number    IAXPOINTS      of points along vibrating string n          scant   sd    amp tpoints     if   tpoints  lt  NWORKERS              1    end while        Cpoints  gt   printf   Enter value between  d and  d n            while   nsteps  lt  1      printf    scanf     nsteps  gt  MAXS          Sd    amp nsteps      EPS       Enter number of times steps n             if   nsteps  lt  1      nsteps  gt  MAXS1          1    end while                      EPS     printf   Enter a value between 1 and  din            Put out total points  and time steps      out   globals   tpoints  nsteps  NWORKERS       Assign chunks of work     nmin   tpoints NWORKERS   nleft tpoints   NWORKERS   k   0   for  i   1  i  lt   NWORKERS  i       npts    i  lt   nleft    nmin  1 nmin   first  k   1   eval   worker   worker i first npts     k k npts         Get results          get_results  tpoints NWORKERS             int worker  int id  int first  int npoints  4  int left  right   int tpoints  nsteps  nwrkers   double  ret val     146 Sample Programs    AAXPOINTS        NWORKE       RS  MAXPOINTS                 zj    MAXSTE          get global values       printrf   d  first    d  npolnts   d n   id  first   rd   globals    tpoints   nsteps   nwrkers         Initialize line      init line  tpoints  npoints  first        Determi
109. e template does not require a string of a  specific length  This requirement holds for such templates even when the length of the  literal string in the tuple and the declared length of the array used in the template are the  same   even if hi had been hello  the colon would still be needed in the out operation     Anonymous Formals    A formal need not include a variable name  but instead can use a data type name in C  and  in Fortran can use a typeof construct with a data type name as its argument  Here are  some examples     C  struct STYPE s        in  data   Pint    in  struct    struct STYP   in  int array   Pant  i          33       Fortran  Common  sample array num  junk    in  data    typeof  integer     in  array    typeof  real 8     in  common    typeof   sample       Such formals are called anonymous formals  Anonymous formals within in operations  still remove a tuple from the tuple space  but the data in that field is not copied to any  local variable  It is effectively thrown away  This construct can be useful when you want  to remove a tuple containing  for example  a large array from a tuple space  Anonymous  formals allow it to be removed without the time consuming copying that would result  from an in operation     A rd operation with a template containing only actuals and anonymous formals has no  effect if there is a matching tuple  but still blocks when none is available     Fixed Aggregates in C  In C  you can reference fixed length aggregates such as 
110. e this is a mapto  entry  it will never be used to translate a generic directory to a working directory on gogo   an appropriate restriction given that automounted directories should not be explicitly  referred to by their temporary mount points     The last item we need to consider is translation rules for node degas  This node is a special  case in that users do not have permanent home directories there  and are accordingly  placed in the root directory when they log in  However  the system administrator feels  that it is undesirable for remote processes to run from the root directory and wants them  to run from  tmp instead  So  we need to equivalence the working directory we re using  to  tmp  Unfortunately  given that there are no subdirectories under   tmp  corresponding to the various users who might run remote processes on the system  we  cannot write one simple rule to cover everyone  Instead  user chavez would need to  include a rule like this one for het working directory within her local map translation  configuration     mapfrom  home chavez work 1    degas    tmp     Map Translation    The local directory  u   home  chavez work is  translated to the  generic directory   home chavez work   which in turn is trans   lated to the specified  remote directory for  each listed remote  node  a null translation  occurs for nodes  auroraandblake   as well as for any other  node not mentioned in  the map file   leaving  the remote directory as    home   chavez   work     H
111. ead of the configuration files if desired   Here is an  example command which runs hello world  overriding its usual maxprocspernode  resource setting       ntsnet  maxprocspernode 3 hello world    Command line options override any configuration file settings  Remember also that local  configuration file settings take precedence over those in the global  system wide   configuration file     Resource Types    In addition to their scope   node specific  application specific  or node and application  specific   resources can also be classified by the kind of value they expect     Some resources  like those we ve looked at so far  requite a value  an integer or a  directory  for example  Many others are Booleans  and expect a true or false setting for  their value  For such resources  the following values are all interpreted as true  true  yes   on  1  These values are all interpreted as false  false  no  off  0  For example  the following  entry indicates that the node marlowe is not currently available     Tsnet marlowe available  no    ntsnet command line options corresponding to resources taking Boolean values use the  following syntax convention  If the option name is preceded by a minus sign  then the  resource is set to true  and if it is preceded by a plus sign  the resource is set to false  For  example  the command line option  useglobalconfig sets the resource useglobalconfig to  true  stating that the global ntsnet configuration file should be consulted  The option   
112. eate many processes  each responsible for computing  one row of the output matrix C  If the matrices are large enough  it might not be possible  for each process to hold all of B at once  In this case  each process might then hold the  row of A  cotresponding to its output row  and  at any given time  one column of B  The  process computes the dot product of its tow and the column it currently holds   producing one element of its output row  When it finishes with one column  it sends it on  to another process  and receives a new column from some process  Once all processes  have received all the columns of B and have finished their final dot product  the matrix  multiplication is complete  although the completed rows would still need to be explicitly  moved from the component processots to the desired output location      The message passing approach to the problem is illustrated in the diagram below  It has a  number of implications for the programmer  First  the program needs to keep track of  which process has what data at all times  Second  explicit send data and receive data  operations must be executed whenever data needs to move from one process to another   Unless they are coded extremely carefully  such bookkeeping and communication  activities can cause bottlenecks in program execution          The term processor is used in a generic sense here to designate a distinct computational resource  whether it is one CPU in a multiprocessor computer or a separate computer o
113. eceives  the join message is the worker added to the execution group and counted toward the  minimum and maximum number of workers     The following table summarizes the preceding information  At a given time 7  having  received p join requests  the master process will act as follows     States  amp  Outcomes p lt  minworkers     minworkers  lt  p  lt  maxworkers p  gt  maxworkers       lt  minwait wait wait SUCCESS  minwait  lt     lt  maxwait wait SUCCESS SUCCESS     gt  maxwait failure SUCCESS SUCCESS    Selecting Nodes for Workers    Table 1 lists the resources used to determine whether a node is used for a worker process   When it wants to start a new worker process  ntsnet determines which node to start it on  in the following manner  First  it calculates a new adjusted load for each node assuming    that the process were scheduled to it  using the following formula      initial load    Nmaster   masterload     Nworker   workerload     speedfactor    This quantity represents a load average value corrected for a variety of factors  The  various components have the following meanings     initial_load If the getload resource is true  its default value   this is the load  average obtained from the node  over the period specified in the  loadperiod resource   If the remote procedure call to get the load  average fails  the value in the fallbackload resource is substituted  If  getload is false  then zwitial_load is set to 0     One potential use of the fallbackload resource 
114. econd row of array b  The third and fourth operations divide array a in half and  place the two halves into the tuple space  defaulting the beginning and ending elements in  the third dimension to the first and last array element  respectively  The final operation  illustrates the use of the stride value     Fortran Named Common Blocks    Entire named common blocks can be transferred to a tuple space as a single unit  Named  common blocks ate referenced by their common block name  enclosed in slashes  For  example  the following operations place and retrieve the designated common block     Common  example a b c d n    out   common    example         in  common     example      The out operation places the entire common block into the tuple space  and the  following in operation matches and retrieves this same tuple  For matching purposes   common blocks with identical names match  and the internal structure of the common  block is ignored  Blank common may not be placed into tuples using this method  The  best solution in such cases is usually to convert it to a named common block     Linda User Guide 33    C Structures    From the point of view of syntax  structures work very much like scalars  For example   these two out operations create identical tuples        struct STYPE S  t  pr    p      s   out  structure   s    out  structure    p      Hither of these in operations will retrieve one of the tuples created by the previous out  operations     in  structure    t    in  
115. ed for the various remote systems  e g   executables with different compilation  options   all required executable files must be in the same directory as the local  executable  and they will be copied to the target remote execution directories  as  determined by the specific values set in the rexecdir resource for each node or by map  translation  The distribute resource is false by default     The cleanup resource  application specific  determines whether remote executables are  removed after the execution completes  Its default value is true  Note that the local  executable is never removed  regardless of the setting of cleanup  The cleanup resource  setting is relevant only when distribute is true     The distribute resource may also be set with the  distribute  distribute command line  options  both abbreviable to d   The cleanup resource can be set with the   cleanup  cleanup command line options     Architecture Specific Suffixes    By default  the names of executable files on all nodes ate the same as the executable name  given on the ntsnet command line  When executables are distributed prior to execution   this executable is the one that is copied by default     Howevet  it s possible that different flavors of the executable should be run on different  nodes   ntsnet provides a mechanism for specifying which executable to use based on an  architecture specific suffix  For example  ntsnet can be told to use executables having the    extension  1686 on some nodes and t
116. either constants or expressions which  resolve to constants   and some of which hold placeholders for the data in the  corresponding field of the matched tuple in tuple space  These placeholders begin with a  question mark and are known as formals  When a matching tuple is found  variables used  as formals in the template will be assigned the values in corresponding fields of the  matched tuple     Here is an example     C Form Fortran Form      simple    i    simple   TEJ    In this template  the first field is an actual  and the second field is a formal  If this  template is used as the argument to a rd operation  and a matching tuple is found  then  the variable i will be assigned the value in the second field of the matched tuple     A template matches a tuple when       They both have the same number of fields      The types  values  and length of all actuals  literal values  in the template are the  same as the those of the corresponding fields in the tuple      The types and lengths of all formals in the template match the types and lengths  of the corresponding fields in the tuple     We ll consider these conditions in more detail in Chapter 2  for now  let s look at some  examples  If the tuple        H Also known as an anti tuple     12 Overview of Parallel Programming and the Linda Model    The Linda Model    C Form Fortran Form    cube   8  512    cube   8  512     is in tuple space  then the statement     C Form Fortran Form  rtd  cube   8   1   rdi  cube   8  21
117. either normal or single step mode  depending on the setting of the Single Step item on  the Modes menu  When single step mode is in effect  Tuplescope executes until the next  Linda operation takes place  Tuplescope assumes that required source files are in the  current directory     It is not possible to execute a program more than once within a single Tuplescope  session  To rerun a program  exit Tuplescope and restart it     Clicking on the Break button will cause program execution to pause at the next Linda  operation  Execution will resume when you click on the Continue button  which may also    be used to resume execution in single step mode     To exit Tuplescope  click on the Quit button     Linda User Guide 99    Using Tuplescope with a Native Debugger    The following method is recommended for combining Tuplescope with a native    debugger like dbx     Compile the program for Tuplescope and the native debugger by including any  necessary compiler options on the clc command  i e    g for dbx   linda tuple scope for Tuplescope      Execute in single step mode in Tuplescope until the desired process is created        Click on the new process icon with the middle mouse button  This will create a  new window running the native debugger attached to that process       Set desired breakpoints in the native debugger  Then turn off single step mode in  Tuplescope     Give the continue execution command to the native debugger to resume  execution of that process  e g  use the 
118. el Programming and the Linda Model    How and Where to Parallelize    This sort of display summarizes the number of times a given routine was called  Often  it  is helpful to also know where a routine was called from  A call graph table will indicate  this information  Here is an example           called procedure fcalls  calls from line calling proc file   exp T557120  36 78 48 gauss   gaus3 t   3022848 14 71 63 gaus3  gaus3 f   3022848 14 71 79 gaus3_  gaus3      3022848 14 71 95 qgaus3   gaus3 f   503808 2 45 143 Gauss   gaus3 t   503808 2 45 127 gaus3  gaus3 f   503808 2 45 111 gaus3  gaus3 f   503808 2 45 159 gaus3_  gaus3 f   503808 2 45 L73 Gauss   gassi   503808 2 45 LIL gaus3   gaus3 t   Bart 1007616 15 03 111 tunes  funez  tt   1007616 15 03 110 furc3   funca  E   1007616 15 03 108 tunes     fune3 tf   1007616 15 03 109 func  Lunes       503808 Td  44 tunes   Eines  E   503808 3 51 147 func3  tfunc3 f   503808 3  148 func3   func3 f   503808 Tol 149 fune3   funes  E                    Here we can easily see that the exponential function is called literally millions of times  all  from within one routine  We would want to try to do some of those calls in parallel if  they are independent     Linda User Guide 15    16 Overview of Parallel Programming and the Linda Model    2    Using the Linda Operations    In the last chapter we discussed the process of creating parallel programs from a general  point of view  We also introduced tuples and the main Linda operations  
119. ements  since Fortran subroutine arguments are always  passed by reference  Second  the first and third array indices are reversed  due to the  differing Fortran and C multidimensional array ordering conventions  The Fortran arrays  have not changed in any way  so the location denoted by the Fortran X  i1  i2  i3   for  example  must be accessed as x  i 3   12   11  from C  Third  C arrays begin at 0 while  the Fortran arrays begin at 1  However  this is easily accounted for by making iseg run  from 0 to nsegmnt 1 rather than from 1 to nsegmnt  as it did in the Fortran loop    Fourth  we   ve also added the addresses of the arrays x  y  and z to the subroutine   s  arguments  in Fortran  CALCDISP accesses them via a COMMON block not shown  here   Finally  we ve translated the subroutine name to lowercase and appended an  underscore  a common requirement when calling a Fortran subroutine from C     After the inner loop completes  the worker sends the displacement values it has created  to tuple space  and the outer loop begins again  When the counter in the    index    tuple  exceeds its maximum value  each worker process will terminate the next time it examines  it     Linda User Guide 41    As we ve seen  it was very easy to transform Freewake into a parallel program with Linda  because all of its work is isolated so well within a single routine  In the next chapter we ll  look at several more complicated case histories     42 Using the Linda Operations    Quick Start    3  
120. en local directoty  trees and directory trees on remote nodes  If your network presents a consistent view of a  common file system  via NFS or AFS  for example   then you will not need to worry  about map translation  On the other hand  if your networked file systems ate not  completely consistent   if a file system is mounted as  home on one system and as   net  home on another system  for example   then map files can be a great help in  automating TCP Linda execution     Basically  map files provide a way of saying     When I run a program from directory X on  the local node  always use directory Y on remote node R     Map translation occurs for  both the execution directories  locations of executable files  and working directories on  each node     How ntsnet Finds Executables    Map translation means that ntsnet translates local directories to their properly defined  eguivalents on a remote node before specifying an executable on  or copy an executable  to  that remote node  executable file distribution is discussed later in this chapter   If map  translation is enabled   as it is by default   but no translation is explicitly specified for a  given directory and or node  whether because its rules haven   t been specified or no map  translation file exists   the rule is simple  look for the same directory on the remote node     If enabled  map translation occurs whether the local directories ate specified explicitly  as  when the full executable file pathname is given on t
121. ergy setup str coor count          Perform sequential setup work       global data all workers get once      Oub   A   ESTELARES  B    sStr   B 2    pack ate   packatomte istr   nti         taskav   total work  workers    target pairs per worker     taskav    nb num     nb num 1    2   num workers        create worker tasks        start   beginning of worker   s domain    outsize   size of worker s section     for  index 1 w id 0  w id  lt  num workers  w id       start   index     increments of index follow           compute start  amp  length for this worker  tasksum holds the    number of pairs given to this worker so far  increment it    with each successive value of index    size of current    matrix row  until task  gt   target or we re out of pairs              for  tasksum 0   tasksum  lt   taskav  amp  amp  index  lt  nb num l   index     tasksum    index     if  w id    num workers   1     last worker     outsize   nb num   1   start    else  outsize   index   start        start workers    initialization phase     out   nbwconfig   wid  start  outsize   nb num  expfac  confac    Fill worker specific data arrays and send to tuple space        end worker init loop          get partial g sums and nb list size calculated by workers         qasum   gbsum   0 0    for  i 0  gt  i  lt  num workers  itt     in  qsum    uasum   ubsum   ucount    qasum    uasum        qbsum    ubsum    count    ucount     Molecular Dynamics    Finish initialization   return    After performin
122. es data IRR M eene el e Saltem uus Va 123   Menu Buttons    oou cen oie ure A RA ra ca ee a etae 123   The Modes  Menu    oic re A a a a REPRE 123   The Debug MER 1 0    s e Rr vie xD ERE TOES od Y Ee 124  TID Loap Syntax pi sem eed yer EPOR Ree LET DESC ES Ee eges 124   7 Sample Programs             eel n 127  Array  ASSI E SA AAA EPIIT eo O AS E Gye 127   Serial Version oi aje e pem ees EIE IRAE ARS re ET OA P ees 127   Parallel  Version  z aeneo AA noone erin Sipe one Sao TUS 128  pi  Calculation zoe ate esten pu V Bay as dae eel ae SESS etam a A 132   Serial Version s imos in ed ke  ees eh e ere ene ER Sees 132   Parallel Verlo   cce eee ete hcm eme P gie tel e t eere els 134  Matri Multiply    usce dere RR ERA SERA HRS waned RC ER o t ee As 137   RN uei sO skens re aces voee tS ep Se ee ee Sere ese RE ERGY CERIS BBA 137   Parallel Versiones ose seven ee ERE REI exequi e pre cuente I 138  Concurrent Wave Equation               cette eens 142   Serial Version  aue tg wie ees A lacks eee oeste an sha daa no  ena o Sealers eel utet 142   Parallel Verso eee e HR ERR Rep eere Ee p reg 145    Linda User Guide iii    2D Heat  EQU ts sage as a v   pl hey ea eS en   Oe EE SP See ae 150    DELAL VERSION cte li ee RD ed A CA aM ST de odd a Dah nde uod 150  Parallel VersIOD us cece metet teet E a wo ara dele a Sak Base ee A los 152  Bibliograplij    cobos hess wee oboe ot ad e Vet ek i  lndex  eb a aa ee a a VE Va fg NU ees i    iv Contents    Introduction    This manual describes th
123. est       The first line sets the default location for remote executables to  usr local bin on  the remote system  meaning that ntsnet should use this directory on each node as the  default location for the executable file to startup  Subsequent lines set different default  values for the application hello world and for the node o ere  The final line sets a  specific value for the application hello world when running remotely on molere  the  executable for hello world on ofer resides in the directory  usr linda bin   test     The rexecdir resource can also be given the special value parallel  which is its default  value   This indicates that map translation is to be performed on the directory where the  local executable resides  for evety node which has no specific remote execution directory  set  Map translation involves taking the name of the local directory containing the  executable to be run and determining what the equivalent directory is for each remote  node participating in the job  These equivalent directories are determined according to  the rules set up by the user in the local and or global map translation files  The format of  these files is discussed in the next section     The  p command line option specifies the values for both rexecdir and rworkdir   overriding all other methods of specifying them   p is discussed later in this chapter      About Map Translation    52 Using TCP Linda    As we ve stated  map translation is a way of defining equivalences betwe
124. f tuple space in addition to  running the application program  Therefore  when a process is paused   for  example  at a breakpoint   then no tuple space requests can be handled by it  For  this reason  it   s best to break only a single process at a time  with all other  processes either continuing or stepping over an in operation          To do so effectively  you need to have compiled the application with  g     Linda User Guide 71    72 Using TCP Linda    Ray Tracing    4    Case Studies    This chapter provides detailed discussions of several real applications parallelized with  Linda  In some cases  the descriptions will follow the programming process  recommended in Chapter 1 by first transforming a sequential program into a parallel  program in a straightforward manner and then  if necessary  optimizing it to produce a  more efficient parallel program  in others  we ll focus on topics and issues of special  importance  Space limitations will not allow for a full treatment of any of these programs   but these case studies illustrate many important techniques useful to any programmer  working with Linda     Image rendering is a very computationally intensive application  The Rayshade program   written in C  renders color images using ray tracing  It is capable of including texturing  effects to simulate different materials in the image  multiple light sources and types   point  spotlight  diffuse   and atmospheric effects like fog or mist  Rayshade computes  the effects 
125. fies whether or not the executable s  shall be distributed to the  remote nodes  Executables are distributed only to those remote nodes  that are actually going to take part in the execution  After the execution  completes  ntsnet automatically removes the remote executables that it  just distributed  The local executable is protected from removal  The  default is false  See the cleanup resource for information on preventing  the automatic removal of remote executables     Specifies the load average the scheduler shall use for a node if the RPC  call to get system load average fails  The default is 0   99  The value  specified can be any real number  gt   0  If failure of the RPC call  indicates that the node is down  this option can be used to set  fallbackload to a very large value  effectively making the node  unavailable to ntsnet     Specifies whether ot not to use load averages when scheduling workers  on the nodes in the network  The default is true  This can be used to  make worker scheduling consistent between different runs of ntsnet  It  also makes sense if the rstatd daemon is not available on the network     Specifies whether all workers shall run at normal ptiority and Linda  internodal communication should run at full speed  The default is true   If the high resource is false  the workers run nice d  unless specifically  overtidden on a per node pet application basis using the nice resource   note that high being true overrides the setting for nice   Also when 
126. g the same initial setup steps as the sequential version  the parallel version  of nb  energy  setup places some global data in tuple space     The majority of the code shown in this routine is concerned with dividing the work up  among the worker processes  The tasks created here define the range of the data each  worker is responsible for  This represents a somewhat different technique from the usual  mastet worker scenario  In the latter  the work is divided into tasks which are  independent of any particular worker  and each worker grabs a task when it needs one   Here  the total work is divided among the workers at the beginning of the calculation  the  wotk each worker does is essentially part of its structure   in this case  a function of its  wotker ID number  This technique of dividing the problem space into discrete chunks  and assigning each to one worker is known as domain decomposition   The parameters  calculated here and communicated to each worker via tuple space will be used in both the  summing of special charges done in the setup phase  with results collected at the end of  this routine   and in the actual nonbonded interactions computation later in the program     The scheme for dividing the work here is somewhat complex  but it is designed to ensure  good load balancing among the worker processes   this is always a concern when a  domain decomposition approach is taken     For a molecular dynamics calculation  the total number of nonbonded interactions fo
127. h of the nodes  Most of the work can be done by this map entry  which  uses  home as its generic directory     map  home    chaucer    u   erasmus    mf   flaubert    u home     gogol    net aurora home     This entry translates the listed local directories to the generic directory  home  The  translation applies to the entire tree starting at the specified locations  Thus  in our  example  ntsnet will translate the  local  current working directory on flaubert    u  home   chavez  work  to the generic directory  home chavez work  we don t have to  explicitly construct a rule for the current directory   When it needs to detetmine the  wotking directory for a remote process participating in the execution of the application  test24g it will use this genetic directory  So  when it starts a remote worker process on  chaucer  it will use the directory  u  chavez  work as the current directory  translating    home to  u  as specified in the rule for node chaucer  When ntsnet starts a remote worker  process on blake  it still attempts to translate the generic directory  but no rule exists for  blake  In this case  a null translation occuts  and the directory remains  home chavez   work  which is what is appropriate for this system     The rule we   ve written will allow us to run our ntsnet from the work subdirectory of  cbavex s home directory on nodes aurora and blake  and on any of the listed nodes except  gogol  in each case  the current working directory will translate to  home
128. h shell script  called by  ntsnet to start up a worker process on a remote node  This resource  provides a hook enabling users to control the behavior of the shell script   which can itself be modified by the user   In the default implementation  of linda rsh  located in the Linda bin subdirectory   only the string   on  is meaningful as a value to this resource  If  on  is passed to  linda rsh  then the on command will be used instead of rsh to initiate  the remote process  This is a node specific resource     Specifies the number of minutes over which the machine load is  averaged  This is the load average then used by the worker  Typical  values for loadperiod ate 1  5  and 10  The default is 5     Specifies the load that the master  real_main  process is considered to  put on the node  The value specified can be any real number  gt   0  The  default is 1  Typically 1 or some smaller fraction is used  If the master  process uses much less CPU time than the workers  then masterload  should be set smaller than workerload     Specifies the maximum number of nodes on which to execute  The  default value is the number of nodes in the node list     maxprocspernode    118 Linda Usage and Syntax Summary    Specifies the maximum number of Linda processes started on any given  node the application is running on  On the local node   maxprocspernode includes the master  The default value is 1     maxwait    maxworkers    minwait    minworkers    nice    nodefile    nodelist    ntsnet 
129. he colon when omitting the length   while in Fortran  the colon may also be omitted  although including it is correct too   For  example  the first out operation in each language places the entire array a into the tuple  space     If you only want to place part of an array into the tuple space  then you can include an  explicit length in the out operation  In this way  the second out operation in each  language places only the first ten elements of array a into the    array2    tuple     The array format is basically the same for arrays used as formals in templates  The one  difference is that an integer variable is used for the length parameter  and its value is set  to the length of the array in the matching tuple  Thus  the final pair of in operations in the  preceding example will both result in the value of len being set to 10     The semantics of Linda ensure that a user may alter the contents of an array immediately  upon return from the out   the tuple generated by the out will still reflect the contents of  the array just prior to the out  To enforce these semantics  Linda makes a copy of the  array before returning from the out  Shared memory implementations of Linda allocate  space for this copy in the shared memory region  Non shared memory implementations  of Linda allocate memory in the address space of the process executing the out     Avoiding Array Copies    This copying can be problematic for very large arrays  It can neatly double memory  requirements and a
130. he command line  or determined  implicitly  using the TSNET_PATH for example   Thus  map translation will occur for  both of the following commands          ntsnet  tmp test24    ntsnet test24    In the first case  ntsnet will translate the directory you ve specified   tmp  for each  node where the application test 24 will run  if no explicit translation has been defined   then ntsnet will perform a null translation and look for the program test 24 in  tmp on  each remote node as well     For the second command  ntsnet will first determine the location of the test 24  executable on the local node  using TSNET PATH  or its default value   and then  translate that location for each remote node involved in the job        If the rworkdir resource is set to parallel  the default value   then the current working  directory is also subject to map translation     The Map Translation File    ntsnet uses the local and global map translation files     tsnet map and common   lib tsnet map  relative to the Linda tree   respectively  to determine the  correspondences between directories on different nodes  The first matching entry is used  to perform each translation  The map translation mechanism is extremely powerful and  can be used to define equivalences among systems  whether or not their file systems are  linked with NFS     Map translation is a two step process  Rather than having to specify the exact translation  for every pair of hosts within a network  map translation allows you 
131. he worker next obtains the current coordinates of the atoms           in  wakeup    wakeflag  workerid    if  wakeflag     1  return 0    eat poison and die     if workerid    num workers 1       rdi coor   Teilen    out   read _ coords     7    else      Linda User Guide 93    94 Case Studies    for  rdct   1  rdet  lt  num workers  rdct     in  read coords     in  eoor    c len      Most workers rd this tuple  however  the last worker waits until all the other workers have  rd it  using the same technique of gathering semaphore tuples we saw earlier  before  removing it from tuple space with the in operation  In addition to freeing the memory   this is necessary so that the new coordinates can be unambiguously transmitted to tuple  space on the next iteration     The worker then calculates the nonbonded interactions for its pairs of atoms  It uses  code differing only in its loop limits from that in the sequential version of nb energy     for  i190  gi start   i  lt  outsize  i    git   f  do many computations       The initial value of the vatiable gi and the limit against which the variable i is tested  were both obtained from the    nbwconfig  tuple  Once this calculation is complete  the  worker sends its results to tuple space        out  workerdone   fu start outsize  elec  evdw  esup  count      It then waits for the next wakeup tuple commencing the next iteration of its while loop   eventually  it retrieves a poison pill and exits     nb energy ultimately gathers th
132. held at any given time  2 c1ump M total elements  would  fit within the available amount  Or if the program were running on a network containing  processors of greatly differing speeds  then it might sometimes be preferable to make the    Matrix Multiplication    task size smaller so that there are enough total tasks to keep every processor busy for  essentially the entire execution time  with faster processors completing more tasks in that  time   Building such adjustability into a program is one way to ensure easy adaptability to  different parallel computing environments     Even this version makes some optimistic assumptions  however  For example  it assumes  that the master process has sufficient memory to read in both matrices before sending  them to tuple space  If this assumption is false  then the disk reads and out operations  would need to be interleaved to minimize the amount of data the master has to hold in  local memory     We ll close this consideration of parallel matrix multiplication by looking at a case where  one might not want to use it  Here is a portion of a subroutine from a computational    chemistry program  written in Fortran      Subroutine gaus3 x n        Loops containing many independent exponentials  call matrix multiply  chi coo   psi        call matrix_multiply  chix  coo  gx    chiy  coo  gy                 call matrix multipl  ehiz coo gz                    y  call matrix multiply  y       call matrix multipl GhidZ coo d2   Return  End    
133. here  fe is linda op  operation is either    or      and  est_value is one of the following  eval  out  in  rd  block_in  and  block_rd  This kind of test detects the occurrence of a particular kind of Linda  operation  Here is an example      linda_op    eval   This condition detects the occurrence of an eval operation   Process comparison tests  where   ez  is process  operation is one of          lt   and    and  est_valueis an integer  representing a Tuplescope process number  This  sort of test detects when any tuple space operation is performed by any process  whose process number fulfills the condition  For example  this condition detects    when any process with process number less than 5 accesses tuple space      process  lt  5     Multiple conditions can be joined with and or or  as in this example      process  lt  5  and  linda_op    out     The entire composite condition is enclosed in parentheses in the complete TDL  statement  as we ll see below     Action can be one of the following     Action  break    Effect    Pause program execution  resume execution with the Continue button  Ifa  tuple is associated with the tuple space operation that triggers a break   Tuplescope turns it solid black  The process which performed that tuple  space operation is marked by a notch in its icon   Ordinary display of all  tuples and process icons is restored when you click the Continue button      The Tuplescope Debugging Language    hide Suppress display of matching tuples or 
134. hese statements retrieve the    nbwconfig  task tuple and the global values tuple from  tuple space  After it starts up  the worker will block until the  nbwconfig  tuple is  available  The appearance of this tuple is a trigger that initiates active computation  It is in  this sense that we refer to it as a wakeup for the wotker  causing it to resume active  execution after a significant pause     sleep         The if statement checks whether this is the worker with the highest worker ID  If not  it  rds the globals tuple  and then creates a    done reading  tuple  If it is the last worker  then  it ins all the    done reading  tuples from the other workers and then ins the globals tuple   removing it from tuple space  This technique is useful when you want to remove large   unneeded sets of data from tuple space or when some data must be removed because  new or updated versions will replace it  We ll see an example of the latter later in the  program     The wotker next computes its portion of the special charges sum  and dispatches the  results with an out operation  It then enters its main infinite while loop  performing a  few data initialization operations fot the coming nonbonded interaction calculation  and  then waiting for its next wakeup tuple  again tied to its worker ID and appropriately  labelled    wakeup        Molecular Dynamics    qasum   qbsum   0   Calculate new values   out   qsum   qasum  qbsum  count      while  1     vdw   elec   esup   0 0        count
135. hould be  displayed  The default is false  The veryverbose and verbose resources  are independent     workerload Specifies the load that a worker will put on a node  The value specified  can be any real number  gt   0  The default is 1  Typically 1 or some  smaller fraction is used  A larger value could be used to increase the  chances of having one Linda process running on each node     workerwait Specifies the time  in seconds  that a worker waits for a response to its  join message  from the master  The default is 90  unless debug is true  in  which case it is 1000000  If a worker does not get a response within the  specified time  telling the worker that it has joined  the worker will exit   and therefore not participate in the execution of the application     Map Translation File Format    This section documents the format of the map translation files used by TCP Linda  These  files ate  t snet  map in the user s home directory   the local map file   and lib   tsnet  map  the global map file  located relative to the Linda tree      Map file entries have one of the following formats     map generic directory    nodel   specific directory    node2   specific directory         mapto generic directory    nodel   specific directory     node2   specific directory         mapfrom generic directory      nodel   specific directory     node2   specific directory       Linda User Guide 121    Note that generic directory need not be a real directory location at all  but can be any  
136. ied    dimensional heat equation domain decomposition  The initial  perature is computed to be high in the middle of the domain and  o at the boundaries  The boundaries are held at zero throughout       simulation  During the time stepping  an array containing two    ains is used  these domains alternate between old data and new data   OKCKCKCKCKCkCkCk Ck kCk ck kCkc k kCkCKCkCk CK Ck kk Ck kCk Ck kCk CK Ck kCk Ck kk Ck kCkCk KCk Ck kCk Ck k kc k Ck kck ck kok ck k kk k kk kok ck ke         de  lt stdio h gt      define NXPROB 11   define NYPROB 11  str    t Farms       E    f    Joa       Loa    E CE   t cy     int nts     parms   10 1  0 1  BUT     mai    n      f  int       at u 2   NXPROB   NYPROB         voi               kkk    ini    iX  iy  iz  it   d inidat    prtdat    update       CKkCkckCckck AKA AKA AKA A KA ck ZA Z X A X KA X X ck X ck ck X AZ A X AZ ck ck ck ck ck ck ck ck ck ck ck ck ck ck ck ck ck ck ck ck ck k kk k k kk kk k     Initialize grid   OKCKCKCKCKCkCkCk Ck kCk ck kCk k kCk CK kCk CK Ck kCkCk kCkCk RR RR k k kCkCk IA k kc k kckck ck kok ck kok ck k kk kok e ke      dat  NXPROB  NYPROB  u      150 Sample Programs    2D Heat Equation    prtdat  NXPROB  NYPROB  u   initial dat     for  ix   0  ix  lt   NXPROB 1  ix       wpr pzs   107   nl  llizilV   u 1   ix   NYPROB 1    u 0   ix   NYPROB 1         for  iy   0  iy  lt   NYPROB 1  iy       urr ro  liv    34101 107 liy    u 1   NXPROB 1   iy    u 0   NXPROB 1   iy             KKK KK KK oko k oko k k k 
137. ing the same  routine used by the sequential program  However  instead of an immediate call to   out line  the computation is followed by an out operation  placing the results into tuple  space  Sometime later  do_t race will retrieve the line and then make the call to   out line to write it to the output file  Meanwhile  the worker has started its next task  It  will continue to perform tasks until all of them are done        worker bears a striking resemblance to the original version of do trace  it adds a call  to rayshade_main unnecessary in the sequential version  and it lacks only the call to  outline from its main loop  which remains behind in the master process version of  do_trace   This separation results from the need to pass data between master and  workers via tuple space   It is also much easier to code than sending explicit messages  between all of the various processes      Matrix Multiplication    Matrix multiplication is a common operation central to many computationally intensive  engineering and scientific applications  We   ve already looked at matrix multiplication in a  general way in Chapter 1  Here we will look at a couple of approaches to parallel matrix  multiplication  and conclude by considering when one should   and should not   use  them  This example again uses C Linda     This case study illustrates the following techniques       Adjustable granularity and granularity knobs     Deciding where in a program to parallelize     The basic matrix mu
138. interactions NSteps    out worker  wakeup    out current  coordinates    in wakeup      Exit     rd or in   coordinates  v    Calculate  nonbonded  interactions    out results            in results   amp  fill matrix       Calculate  new atomic Exit  coordinates                For each loop iteration  the master calculates the bonded interactions and other energy  terms  and then divides the work of the nonbonded interactions among the workers   placing the data the worker will need into tuple space  Eventually  the master gathers the  workers    results  using them along with the quantities it computed itself to calculate the  new position and velocity for each atom in the molecule     Here is real_main  which is a renamed version of the original main     real_main  argc  argv             T   300 0   process args  argc  argv       Molecular Dynamics       startup workers     for  i   0  i  lt  num workers  i     eval   worker   nb energy worker i   num workers       Read data   Initialize parameters  amp  data structures     verl init  str coor vel s         Te   temperature  str kinetic_energy  str  vel     verl scale coor  sqrt T Te     for  i 0  i  lt  NSteps  i        verl_step str  coor  vel  s            Te   temperature str  kinetic_energy str  vel     verl scale coor  sqrt T Te                      kill workers  amp  clean tuple space     for  i   0  i  lt  num workers  i      out   wak  up    l  2     in  worker   Pant         best  Ah    use C Linda exit function  
139. ion file syntax is modeled after the Xlib resources of the X Window  System  However  ntsnet and TCP Linda are neither part of X nor do they require it   Some users may find a general introduction to X resources helpful  see the Bibliography  for the appropriate references     A resource is basically just an execution parameter  characteristic   The configuration file  defines values for the various available resources and specifies the contexts   application  programs and or execution nodes   for which the value applies     The ntsnet configuration files consist of lines of the following form   program     application     node   resource  value    where program is the system program name to which the resource being defined is applied   application is the relevant user application program  node is the relevant node name  resource  is the resource characteristic name  and value is the value to be set for this resource in this  context  We ll look at each of these parts in turn     To begin with  the system program for the ntsnet configuration file entries will always  refer to TCP Linda  We ll look at the exact syntax of this component in a moment  At  this point in time  this component is simply a carryover from the X syntax     A resource can be either application specific  node specific  ot apply to both applications and  nodes  For example  the resource rworkdir  which specifies the working directory on a  remote node  is application and node specific  meaning that a diffe
140. is designed to prevent  attempts to start new workers on nodes that have gone down  If the  RPC to get the load average for a node fails and fallbackload is set to  a large value  then it will be quite unlikely that the node will be chosen  by the scheduler  Setting fallbackload to a value greater than threshold    speedfactor will ensure that it is never chosen     N    n 1 if the master process is running on the node  0 otherwise        T ntsnet makes use of rstatd to gather this information  If this service is not provided on a node   default values are assumed     Linda User Guide 63    masterload The second term in the numerator of the formula for the adjusted  load means that the value of the masterload resource is added to the  raw load average if the master process is running on the node  This  enables an estimate of how the CPU resources the master will  eventually use to be included in the adjusted load average  even  though it is not currently consuming them   Set masterload to a  smaller value than its default of 1 if it does not consume significant  resources during execution  for example  if it does not itself become a  worket      N optant The number of worker processes already started on the node     64 Using TCP Linda    Table 1  Process Scheduling Resources    ntsnet Worker Process Scheduling       Resource    Meaning    processes that can be run on  any node  Includes the master  process on the local node     Default  Value    Equivalent    ntsnet Option s 
141. k    trace jit line y  out buf     Out  result   v  out but Xres         return     The worker begins by reading the command line arguments from tuple space into a local  array with the rd operation  It then calls rayshade_main to perform the necessary  initialization and setup activities  It is often just as fast to have workers each repeat the  sequential program s initialization process rather than doing it only once and then  attempting to transmit the results of that process to the workers through tuple space     However  when you choose to have the worker repeat the program initialization  it is  important not to repeat steps that must occur only once  Here  the workers call  rayshade main with a third argument of 1  ensuring that picture initialization is not  repeated     The wotker function next retrieves the global parameters from tuple space  and sets the  sampling parameters  using the same code as originally appeared in do trace     Matrix Multiplication    Each iteration of the while loop computes one scan line of the final image and places it  into tuple space for later collection by the master process  executing do trace   It  removes the    scaninfo    counter tuple from tuple space  decrements it  and puts it back so  that other processes can use it  If the counter has fallen below zero  all scan lines are  finished  the loop ends  and the worker returns  terminating its process      If the counter is non negative  the worker generates that scan line by call
142. k  d n   id    prank     data     d      d  Wn   index  datalindex      printf    data   d      din   index 10  data index 10     print     0 data   d      d n   index 20  datalindex 20     Printe  M Wy    bb       now kill off the workers       to do  1 e   chunksize   0     i 1   while  i  lt   numworkers     out  my section   0 0    iri      printf    MASTER  All Done  n        by telling them there is no more work     KOR KKK KKK KK KK kok k koe ke ke e e k worker task TK A kck ck kck ck RK      int worker int id        local variables     int index   int data ARRAYSIZI  int chunksize  1   int next index        Di       1        Get the initialized array from tuple space  rd   init array     data       loop until no more work        1       while          Index into my portion of the array that was passed in     in   my section    index    chunksize      130 Sample Programs    KE    Get the chunksize                  py    ret    THE          DESC  USE     Note                K    a   CFLAGS  array     C    Array Assignment       check to see if there is more work      if  chunksize    y  d     there is no more work so let other tasks know     return 0            Do a simple value assignment to each of my array elements       i index    while  i  lt  index chunksize     data i    i   1   Itt                 Send my results back to the master     ut   result  data   id  index  data      end of loop     utn  0    FEFE HE TE FE FE FE FE FE HE FE EEE EE HH HER HER ER HE      
143. kCkCK Ck kCK Ck kCKCk KCkCk k kc k k kc k k kc k k kc k Ck kck ck k I kk                           A           x                    include  lt stdio h gt      defineARRAYSIZE600000       main   1  int i     loop variable     floatdata ARRAYSIZE      the intial array           KOR KKK KKK KK KK kk k koe ke ke e e k initializations OKCKCKCKCkCKCk ck k   kk kk k kk kk X kk e e kx x      printf  Tyn tkrk Starting Serial Example     EXARHEREFRA QU    fElush  stdout            Initialize the array     for i 0  i lt ARRAYSIZE  i     data i    0 0      Do a simple value assignment to each of the array elements          Linda User Guide 127    for i l  i  lt  ARRAYSIZE  i       data i    i   1   printf  Sample results n     pesmttrit data 1   f n    data 1     printf  data 100   f n   data 100     print    data 1000   f n n   data 1000     rtriuwshisbdout      printf  All Done  Xn                         k k k AE FE ERE      Serial   Array Makefile                                                                          FILE  make ser array c    DESCRIPTION  See ser array c  4 USE  make  f make ser array c    k k k FE FE FE FE AE ERE    CC                                                                            array  Ser array c    s4CG  sef array c  o array    Parallel Version     KR k ko kok k kok k oko ko kok k oko k kok ARK RR RRA k ok k RA k kok RRA RRA kkk kkk kk kck ck kok ck kk ck k kk                                                               B    LINDA Example   Array
144. ker L  M  N        rd  columns    columns  len      while  1      im  task   Pay    if  L  gt   Ly    out   tssk   i    break        else  out    task   itl      vales   y BBE     for j 0  j  lt  N  j     result j  dot product  row  columns j M   M      out   result   i  result N           return     The worker next reads in the row of A that it needs  Its final loop computes each element  of the corresponding row of C by forming the dot product  When all of its entries have  been computed  the worker sends the completed row of C to tuple space and begins the  next task     Matrix Multiplication    Unfortunately  this version has a disadvantage that not only inhibits it from always  performing well  but sometimes even prevents it from running at all  It assumes that each  worker has sufficient local memory to store the entire B matrix  For large matrices  this  can be quite problematic  For such cases  we could modify the master to place each  column of B into a separate tuple  and have the workers read them in for each row of A   a column at a time  but this would add significant communications overhead and  probably yield poor performance  Instead  a more flexible program is needed where the  amount of work done in each task can be varied to fit memory limitations and other  restrictions imposed by the parallel computing environment     Here is a version which fulfills this goal     master L  M  N       Allocate memory for matrices   Read in matrices        put matrices int
145. kok I IK k kkk k kkk k kkk k kkk k kkk ck kokck k kkk I k kk k       Iterate over all timesteps   KKK I RRA RR k I kCkCk RARA RR RR RARA RA k k kCk Ck kc k Ck kck ck kok ck ckck ck I RRA ARAS    iz   0    for  it   1  it  lt   parms nts  it       update  NXPROB  NYPROB   amp u iz  0  0    amp u 1 iz  0  01    iz   1   iz     prtdat NXPROB  NYPROB   amp u iz  0  O    final dat        KR KK kok k RARA kok k RR KAR KR k kok k oko k A k oko kok k kok k kkk K KC k kok k kok kok ck I e k k      subroutine update  XKOKCKCKCKCKCkCKCk kCk Ck kCkCk kCkCKCkCkCK Ck kCk Ck kCKCk I I kCK I K KCkCk A I kk I e e kx      void update  int nx  int ny  float  ul  float  u2     int Xx  1y                                   for  ix   1  ix  lt   nx 2  ix       for  iy   1  iy  lt   ny 2  1y      Ue ti        ole nyt y   parms cx      ul  ix l j nytiy   fui  1x 1  nytiy     2 0      Ul ixtnytiy       parms cy    Kiul  ix n  tiy rl      ul ix nytiy   L   2 0    lyltix nriy  js           KOR KKK KKK kok kok k oko k A A A A k kok k kok k A k kok ck kok ck I HK kk      subroutine inidat    kok kok Ck KCk Ck kCkCk kCkCk Ck kCK Ck kCk Ck kCKCk KCkCk kok k Ko k Ck kCKCk kCkCk kCkCk A kCk Ck kc kckckck ck kok ck k kk ckck X kk kk      void inidat  int nx  int ny  float  ul  4  int qx  iy        for  ix   0  ix  lt   nx 1  ix       for  iy   0  iy  lt   ny 1  iy        iplris aytiy o   Elost   ix    nx    ix   1    iy    ny   iy   I e          Linda User Guide 151     KR k kok ok k oko k ok ko ok k k kok ok 
146. kok k Ck kCKCk KCkCk kCk Ck k kc k Ck kc k k kc k ck kck ck k kk ckck kc kk ke e kx kx          n   1  n  lt   darts  n    1      generate random numbers for x and y coordinates     r    double  random   cconst    x coord    240   B     XD    r    double  random   cconst    y coord    2 0   r    1 0        if dart lands in circle  increment score     if     sqrix coord    sgr  y coord    lt   140   score             calculate pi       pi    4 0    double  score   double  darts     return  pi      EEEH    f Serial   pi calc Makefile             FE AE TE AE AE AE FE FE FE AE AE AE FE FE FE AE AE AE FE AAA AE E FE AE E AE AE E FE AE AE AE AE AE FE AE E AE AE AE E FE AE AE AEE AE FE                                                               FILE           make ser_pi_calc c            DESCRIPTION  See ser_pi_calc c            USE   EEEH    QE    make  f make ser_pi_calc c                         FE AE TE AE AE AE FE FE FE AE AE AE FE FE FE AE AE AE FE FE FE AE AE AE E FE FE AE AE AE AE FE AE AE AE AE AE FE AE AE E AE AE E FE AE E AE AE AE FE AE FE AE AE AE FE AE E AE AE AE AE FE FE AE AEE FE FE                                                         cc    pi calc  ser pi calc c  BICOI Ser pi calc c  o pi calc       Linda User Guide 133    Parallel Version                  A ACA F F F F       LINDA pi Calculation Example   C Version       FILE  clind   pi _cale el  OTHER FILES  dboard c  make clinda pi calc c  DESCRIPTION  Linda pi calculation example program  C Version        
147. lculates the displacement of a single point in three  dimensional space  Each call to CALCDISP is independent  It takes the x  y  and  Z coordinates of a point in space  elements of the arrays X  Y  and Z  respectively   and  produces the displacement in each direction of the specified point in space as its only  output  elements of the arrays DX  DY  and DZ   Thus  performing some of those calls at  the same time would have a large effect on overall program performance     38 Using the Linda Operations    C Version    Fortran  Version    Example  Freewake    Here is the key part of the code which serves as the master process and coordinates the  calls to CALCDISP        put data into tuple space     out   wake   x  y  z  nblades  nfilmnt  nsegmnt         out   indesx   0     create task counter     for  i   0  i  lt  NWORKERS  i       start workers      eval   worker  worker       worker        then become a worker     for  index   0  index  lt   nblades   nfilmnt  index            gather data from tuple space     in  delta   index   TEmp x   tmp y  Ptmp x    Put data into the displacement arrays DX  DY  and DZ            e Put data into tuple space  out  wake   x  y  z  nblades  nfilmnt  nsegmnt     C Create task counter  start workers  then become  Ge a worker yourself   out   index   0   Do 10 I 1 NWORKERS  eval  worker   worker          10 Continue  Call worker    Do 20 index 0 nblades nfilmnt 1  in  delta   index   tmp x   tmp y   tmp z        Put data into the displace
148. lel Programming       Shared Data Space       est    Tasks                                                                             B                                                                                                                Worker reads  required data       Worker sends results    Worker Processes    Distributed Data Structures Matrix Multiplication    In a distributed data structures parallel program  workers retrieve tasks from the shared data space  complete them  and then  repeat the process until all tasks are done  Here each task is to compute a portion of the result matrix C  Each worker reads the  data it needs for its current task   here the relevant portions of A and B   from the global data space  and places its results there  when finished as well  The master process  which generated the tasks and placed them into the shared data space  also eventually  gathers the results placed there by the worker processes     Note that the tasks and workers are also independent of one another  The total work is  not split among the workers  rather  the total work is split into a number of chunks  and  each worker performs task after task until the entire job is done  In this case  it is the task  size  and not merely the number of workers  that primarily determines the granularity of  the program  and this granularity can be adjusted by varying the task size     If we look again at our matrix multiplication example  each task might consist of  computing
149. ling for this rendering operation   do  trace s main loop executes  For each scan line in the final image  it calls   trace jit line and outline  The former handles the ray tracing computations   through many subordinate routines  and outline writes the completed line to a file     This sequential program already divides its work into natural  independent units  each  scan line of the final image  The parallel version will compute many separate scan lines in  parallel  We ll look at how Rayshade was restructured to run in parallel in some detail     To begin with  a new real main routine was created  This is not always necessary   sometimes it is best just to rename the existing main to real main In this case  main  was renamed rayshade  main and real  main calls it  This was done because the  original main routine needs to be called by each worker process  as we ll see     Here is real main     real main argc  argv       for  i   Or 1   arge  cb    strepy  args  i   afravlil     cut   comm args   args iargc      return rayshade_main argc  argv  0       real_main   s tasks are vety simple  place a copy of the command line arguments into  tuple space   accomplished by copying them to a local array  which is then used by the  out operation   and then call rayshade_main with its original arguments and one  additional new one     Here is rayshade_main  the additions made for the C Linda version are in boldface      rayshade_main  argc  argv  worker       Setup    parse options  a
150. llel programming techniques  while others  discuss the example programs in greater detail or from different perspectives     About the Example Programs    2    Introduction    The chapters in this guide that describe using Linda include many code samples and  fragments as examples  All of this code is derived from real programs  but in most cases  has been shortened and simplified  usually to make it fit into the allowed space  Typically   declarations and preprocessor directives are omitted except when they are vital to  understanding the program  Also  sections of code that are not relevant to the point  being made are replaced often by a one line summary of their function  set in italics    Blank lines  without initial comment indicator  have been inserted into Fortran programs  for readability  Therefore  although the code examples are derived from real programs   they do not generally constitute  working  code  Many examples are provided in both C    Typographic Conventions    and Fortran versions  differences between the two languages are highlighted in the text  where appropriate  For examples of  working code   see the chapter  Sample Programs  on page 127     Typographic Conventions    Fixed width type is used for all code examples  whether set off in their own  paragraphs or included within the main text  For example  variable names and filenames  refetred to within the text are set in fixed width type     Boldface fixed width type is used in examples to indicate te
151. llo  world from number running on cervantes   Hello  world from number running on sappho   Hello  world from number running on blake   Hello  world from number running on virgil     Hello  world from number running on leopardi              oor     JJ    BS B    Hello  world from number running on goethe           hello world is finished        The ntsnet command initiates a Linda program on a network  ntsnet and its  configuration and options are described in detail in Using TCP Linda     Linda Operations    rd    This section describes the Linda operations we looked at in the previous chapter in more  detail  including some simple examples  Additional examples are given in the next  section     Tuples and Tuple Matching      The in operation attempts to remove a tuple from tuple space by searching for a data  tuple which matches the template specified as its argument  If no matching tuple is  found  then the operation blocks  and the process executing the in suspends until one  becomes available  If there are one or more matching tuples  then one of them is chosen  arbitrarily  The matching tuple is removed from tuple space  and each formal in the  template is set to the value of its corresponding field in the tuple     For example  this in operation removes a tuple having the string    coord    as its first field  and two other fields of the same type as the variables x and y from tuple space  it also  assigns the values in the tuple s second and third fields to x and y res
152. lly load balancing  Workers continually execute tasks as long as any of  them remain  If one worker runs on a faster processor than some others  it will finish  each task more quickly and do proportionately more of them   Of course  there are times  when reality isn t quite this simple  so we ll look at some techniques to ensure good load  balancing in Chapter 3      The Linda Model    Linda   or more precisely  the Linda model   is a general model of parallel computing  based on distributed data structures  although as we   ve noted before  it can be used to  implement message passing as well   Linda calls the shared data space tuple space  C Linda  is an implementation of the Linda model using the C programming language  and  Fortran Linda is an implementation of the Linda model using the Fortran programming  language  Processes access tuple space via a small number of operations that Linda  provides  For example  parallel programs using C Linda are written in C and incorporate  these operations as necessary to access tuple space  In this way  C Linda functions as a  coordination language  providing the tools and environment necessary to combine  distinct processes into a complete parallel program  The parallel operations in C Linda  are orthogonal to C  providing complementary capabilities necessary to parallel programs   C Linda programs make full use of standard C for computation and other non parallel  tasks  C Linda enables these sequential operations to be divided amo
153. ltiplication procedure is well known  multiplying an L by M matrix A  by an M by N matrix B yields an L by N matrix C where C  is the dot product of the 7   row of A and the    column of B  We won t bother translating this procedure into a  sequential C program     Instead  let s look at a simple  straightforward parallel matrix multiplication  Here is the  master routine     real main         read mats  rows  columns  L  M  N      for  i 0  i lt  NWORKERS  i     eval  worker   worker L  M  N             tor  i0  i  lt  Le i          Linda User Guide 77    78 Case Studies    out   row   i  rows  il M    out  columns   columns  M N       out   task      z    for  i0  i  lt  bz i     in  result   i   result i  len         The function begins by reading in the matrices  we re ignoring the details of how this  happens   Next  it starts the worker processes  places each row of A and the entire B  matrix into tuple space  and then creates the task tuple  Finally it collects the rows of C as  they are completed by the workers     The worker process begins by rding the B matrix from tuple space  After doing so  it  entets an infinite loop from which it will exit only when all tasks are completed     Each task consists of computing an entire row of C  as specified by the task tuple  The  worker process retrieves this tuple  increments its value  and places it back into tuple  space  checking to make sure it 1s not already greater than the maximum number of rows  to be computed     wor
154. me  and user defined    Linda User Guide 119    resources  The default is  nodef ile  plus the local node name  The  key word  nodefile refers to the nodefile resource value  which is a  file containing a list of node names  User defined resources provides a  way to specify a list of node names symbolically  The user defined  resource must be preceded with the indirection symbol  The maximum  number of indirections is 16     redirect Specifies whether or not tuple redirection optimization is used  The  default is true     rexecdir Specifies the directory on a remote node where the TCP Linda  executable resides  Or if distributing  it also specifies the directory on  the remote node where the Linda executable shall be distributed to prior  to execution  This resource is node and application specific  The default  is the key word Parallel  The Parallel keyword indicates that  ntsnet should use the map file to translate the name of the local  executable directory for that remote node     rworkdir Specifies the remote node s working directory  This resource is node and  application specific  The default is the key word Parallel  The  Parallel keyword indicates that ntsnet should use the map file to  translate the name of the local working directory for that remote node     speedfactor Specifies the relative aggregate CPU capability of a particular node  The  larger the relative speedfactor  the more capable that particular node is  of running multiple workers  This resource is no
155. ment arrays DX  DY  and DZ   20 Continue    The first out operation places the position arrays x  y  and z into tuple space in the   wake  tuple  later  the workers will each rd it  Then  the master process creates the     index    tuple  from which the workers will generate tasks  In the first for loop  the master  process next starts NWORKERS worker processes        At this point  the master process has completed all necessary setup work  so it becomes a  worker itself by calling the same worker function used in the eval operations  This is a  common technique when few startup activities are required  and worker processes run  for a long time  If the master did not become a worker  then its process would remain  idle until the workers finished     After the workers finish  the master process executes the final for loop  which gathers the  results produced by all the workers  removing them from tuple space and placing them in    the locations expected by the original Fortran program     Here is a simplified version of the worker routine     Linda User Guide 39    C Version       worker       rd  wake    x   y   z   nblades   nfilmnt   nsegmnt    while  1       loop until work is done     in   index    index      get current task index     out   index   index 1       increment and put back     if  index  gt   nblades   nfilmnt      test if done     break   iblade   index   nfilmnt     blade     ifil   index   nfilmnt     filament     for  iseg 0  iseg lt n segment       Fortr
156. mics Simulation on a Network of Workstations Using a Machine Independent Parallel  Programming Language     SCIENTIFIC Technical Report INZ   Discusses the Molecular Dynamics code used as an example in Chapter 3 of this manual     Mike Loukides  UNIX for FORTRAN Programmers  Sebastopol  CA  O Reilly  amp   Associates  1990   Chapter 5 discusses the use of UNIX debugging utilities  including dbx  Chapter 8  provides a good overview of standard UNIX profiling utilities     Tim O Reilly et  al  X Window System User s Manual  Volume 3 of The X Window System  series  Sebastopol  CA  O   Reilly  amp  Associates  1988 1992   Chapter 10 of the standard edition and Chapter 9 of the Motif edition discuss the theory   philosophy of X resources  which has been used in the design of the Network Linda  configuration files     Linda User Guide  B i    B ii Bibliography    Symbols    notation 29      notation 12    A  actual 12  adjustability 81  aggregates  viewing via Tuplescope 98  alternate operation names 26    linda eval 26    linda in 26    linda inp 26    linda out 26    linda rd 26    linda rdp 26  anonymous formal 36  anti tuple 12  application  specifying to ntsnet 48  application specific configuration  file 45  array  copying 29  fixed length 36  length 28  arrays 28  36  colon notation 29  35  Fortran 90 syntax 33  Fortran vs  C 41  multidimensional 32  of structures  C  34    available resource 65  116    B   bcast resource 117  bcastcache resource 117  blank common 33  blocki
157. mum and maximum times to wait for  nodes to join the execution group  If maxwait is omitted  it is set to the  same value as minwait  Both default to 30 seconds  Execution will  commence once the execution group is set  based on the values of the  minwait  maxwait  minworkers  and maxworkers resources  see the  discussion of these resources in the next section for details      This option specifies the load that a worker will put on a node  The  value specified can be any real number  gt   0  The default is 1  Typically  1 or some smaller fraction is used  A larger value could be used to  increase the chances of having one Linda process running on each node      workerwait seconds    This option specifies the time  in seconds  that a worker waits for a  response to its join message  from the master  The default is 90  If a  worker does not get a response within the specified time  telling the    Linda User Guide 115    worker that it   s joined  the worker will exit  and therefore not participate  in the application execution     ntsnet Configuration File Format    This section serves as a reference for the format of both the user  local  ntsnet  configuration file      tsnet   config  and the global ntsnet configuration file  1ib   tsnet config relative to the Linda tree      When setting Boolean resources in the configuration files  values can be specified as true  ot false  yes or no  on or off  or as 1 or 0     Resource Definition Syntax    Resources    program  appl   no
158. n a network     Linda User Guide 7       Process i 1    send                               Bj 1                   Message Passing Matrix Multiplication    receive    Process i                                              receive    Bj    send    Process i 1                                              send    B j 1    receive    In message passing parallel programs  there is no global data  All data is always held by some process and must be  explicitly sent to other processes that need it  Here each process holds one row of A and computes one element of C while  it has the corresponding column of B  The columns of B are passed from process to process to complete the computation     Distributed Data Structures    Distributed data structure programs are a second approach to parallel programming  This  method decouples the data required for the calculation and the distinct  simultaneously executing processes which each perform a part of it  making them  autonomous  Distributed data structure programs use a shared data space  with the  individual processes reading data from it and placing results into it  The data structure is  distributed in the sense that different parts of it can reside in different processors  but it  looks like one single global memory space to the component processes  In this sense  a  distributed data structure might be termed virtual shared memory  VSM   Although  Linda can be used to implement both types of parallel programs  the most natural one for  u
159. ndom void      double dboard int darts       double x coord     x coordinate  between  1 and 1     y coord     y coordinate  between  1 and 1     pi     pi     pr    random number between 0 and 1       int score        number of darts that hit circle       Linda User Guide 135    n   unsigned long cconst     used to convert integer random number        between 0 and 2 31 to double random number        between 0 and 1       cconst   2  lt  lt   31   1    score   0      KOR KK kok k oko k oko ko k kok k kok kok ok RARA k oko k kok RR KARA RRA RAR kok K kok I I I kkk k            tt  5  F    DI    Throw darts at board  Done by generating random numbers   between 0 and 1 and converting them to values for x and y  coordinates and then testing to see if they  land  in   the circle   If so  score is incremented  After throwing the  specified number of darts  pi is calculated  The computed value  of pi is returned as the value of this function  dboard     Note  the seed value for rand   is set in pi_calc   kok kok k A kCkCk kCkCK Ck kCk Ck kCKCk kCkCk kCkCk Ck kCK Ck kCkCk KCkCk kCkCk kCk Ck Ck kc k Ck kc k ck kck ck kck I I I ke e kx f          for  n   1  n  lt   darts  n          generate random numbers for x and y coordinates     r    double  random   cconst   x coord    2 0 om    1 03  r    double  random   cconst   y coord    2 0   Y    1 05       if dart lands in circle  increment score     if   sqr   x coord      ar y coord    lt   1 0   score             calculate pi     pi 
160. ne left and right neighbors     if  id    nwrkers    right ds    else  right   id   1        if  id    1    left   nwrkers   else   left   id   1           Update the line     update  left  right tpoints nsteps  npoints  first id         Output the results       Concurrent Wave Equation    npoints         the first value   0   was only used for passing data to  neigbors so reset array to send back only values we are interested in          ret val   values l           out   results   id first  ret val npoints       fe    Initialize points on line  des       int n  t Line  int tpoints  int  points  int first        local variables     int X  Je Ki  double x  fac              Calculate initial values based on sine curve     fac   2 0   PI    k   first 1    for  j l  J  lt   npoints  j    ktt  4   x    double k   double   tpoints   1     values j    sin  fac   x         for  i l  i lt   npoints  i       oldval  i    values i      Linda User Guide 147         Update all values along line a specified number of times          d       int update int left  int right  int tpoints  int nsteps  int npoints   ine first  int Ta  d     local variables     double dtime  c  dx  tau  sqtau   int iteration  1  j                 Update values along the wave for nstep time steps     dtime   0 3     o   1 0   dx   1 0   tau   le   dtime Y dx      Sqtau   tau   tau        Update values for each point along string                   for  i l  i lt   nsteps  i       iteration   i      Exchange data wi
161. ned the execution group        When the minwait interval has elapsed  if at least minworkers have joined the  execution group  execution will start       Otherwise  ntsnet will continue trying to create workers until the maxwait interval  has elapsed  which includes the time already spent in minwait   As soon as  minworkers workers have started  execution will immediately commence       Once maxwait seconds have passed and the execution group is still smaller than  minworkers  the startup process will fail  and execution will not proceed     The default value for both maxwait and minwait is 30 seconds  The values for maxwait  and minwait may also be specified using the  wait command line option  Its syntax is      wait minwait   maxwait   For example   w 30 60 would set minwait to 30 seconds and maxwait to a total of 60  seconds  If  wait is given only a single value as its parameter  then that value is used for    both resources     Similarly  the values for maxworkers and minworkers may also be specified using the  n  command line option  Its syntax is      n minworkers     maxworkers      ntsnet Worker Process Scheduling    For example   n 2 4 would set minworkers to 2 and maxworkers to 4  If  n is given  only a single value as its parameter  then that value is used for both maxworkers and  minworkers     Once ntsnet has attempted to start a process on a node  it waits for a message from the  worker indicating that it has joined the execution group  Only after the master r
162. ng 13  25    C  cds 95  character strings  C 35     cl file extension 19  cle command 19   c option 110   D option 110   g option 110   help option 110   I option 110   L option 110   loption 110   linda compile args option 110   linda info option 110   linda link args option 110   linda t scope option 110   linda tuple scope option 95   110   o option 20  110  passing switches to native  compiler 110  specifying executable 110  syntax 107   v option 110   w option 110  cleanup resource 59  117  C Linda operations 11  Code Development System 95  colon notation 29  35  common blocks 33  compiling 19  110  composite index 41  computational fluid dynamics 38  configuration files  disabling global 67  formats 116  121  map translation 53    ntsnet 45    Index    D  data tuple 11  database searching 82  dbx  Tuplescope and 100  debug resource 69  112  117   119   121  debugger  use with Tuplescope 100  DEBUGGER environment variable  122  debugger resource 69  112  117  delay resource 67  117  directory  Linda 46  discarding tuples 36  distribute resource 59   60  117  distributed data structures 8  benefits 10  load balancing in 10  distributed master 89    E  environment variables 51  95  122  eval 11  22  105  compared to out 22  completion 12  creating processes 23  creating workers 11  data types in 24  field evaluation 23  forcing to specific node 68  function restrictions 24  retrieving resulting data tuples  19  eval server 62  106  examples  comments in 73  counter tuple 1
163. ng itself  a recommended Linda    18 Using the Linda Operations    Quick Start  Hello  world    programming practice  Each in operation removes one  done  tuple from tuple space   and it will block if none is currently present and wait until some worker finishes and  sends its tuple there     This same effect could have been achieved by means of a counter tuple which each  worker process increments as it finishes  In this case  real main would create and  initialize the counter     C Version Fortran Version  out  counter   0   out      counter     0     and each worker would update it as its last action     C Version Fortran Version  in  oco  unter   215 in  counter    J   out  counter   j l   out counter   J l     These statements remove the counter from tuple space  assign the current value of its  second field   the number of processes that have finished so far   to the variable j  and  then increment it and place the tuple back into tuple space  Note that only one process  can access the counter tuple at a time  and so some processes may have to wait for others  to finish before they can terminate  In this case  the waiting time is minuscule  so for this  program  the concern is a non issue  However  in general it is best to avoid building  unnecessary dependencies among processes into a program     With a counter tuple  the final loop in real_main would be replaced by the statement     C Version Fortran Version    in  counter   nworker   in  counter   NProc     real_main
164. ng the available  processors  Since C Linda is implemented as a precompiler  C Linda programs are  essentially independent of the particular  native  C compiler used for final compilation  and linking  Fortran Linda operates in an analogous manner     Linda programmers don   t need to worry about how tuple space is set up  where it is  physically located  or how data moves between it and running processes  all of this is  managed by the Linda software system  Because of this  Linda is logically independent of  system architecture  and Linda programs are portable across different architectures   whether they   re shared memory computers  distributed memory computers  or networks  of workstations     Data moves to and from tuple space as tuples  Tuples are the data structures of tuple  space  A tuple is a sequence of up to 16 typed fields  it is represented by a  comma separated list of items enclosed in parentheses  Here is a simple example     C Form Fortran Form    Simple   1    simple   1        Tt Pronounced    two pull    with the emphasis on the first syllable     10 Overview of Parallel Programming and the Linda Model    The Linda Model  This tuple has two fields  the first is a character string  and the second is an integer  and in  this case  both of them contain literal values  Variables may also be used in tuples     C Form Fortran Form    easy   i    easy   i     This tuple also has a string as its first field  and a second field of whatever type the  vatiable i is
165. nore IO is also often useful when debugging TCP Linda programs   This is not necessary for gdb     Debugging TCP Linda Programs    Running TCP Linda Programs Without ntsnet    Master  session    Worker  sessions    TCP Linda programs can also be executed manually by invoking ntsnet with the   launchByHand option  These are the steps for doing so     hd    Establish a session on each desired remote node  via rlogin for example   It will  be most convenient to start each discrete session in a separate window  In general   start as many processes as you would when executing the program with ntsnet  If    you plan to use a debugger  start it on the desired remote nodes    Run ntsnet  launchByHand     in a separate session     In the session where you will start the master process  make sure that the  LINDA_PATH environment variable is set correctly  It should point to the  top level Linda installation directory  and the directory specification must include  a final slash     Begin program execution using the commands output by ntsnet  cut and paste  will reduce errors   one per remote session  These have the general form     application  appl arguments   LARGS  linda args     node name application  LARGS   linda args     where application is the program command  app  arguments are the application   s  arguments  if needed   linda args are any TCP Linda run time kernel options  and    node name is the name of a remote node     Keep in mind that each process is handling a portion o
166. ntsnet Command    The ntsnet Command    Suppress warning messages     Suppress warning messages about text beyond column 72  text is still  ignored  flc only      ntsnet  options  executable  arguments     Syntax   Parameters  options  executable  arguments    One or more command line options  listed below   Command line  options override configuration file settings     Executable file to execute     Arguments to the executable program     Options Syntax Convention    When setting boolean resources on the command line  ntsnet uses the convention that an  option name preceded by a minus sign sets the corresponding resource to true  and one  preceded by a plus sign sets the corresponding resource to false     Command Options     appl name     bcast     bcast     bcastcache s     cleanup     cleanup    This option causes ntsnet to use name as the application name for the  purposes of querying the configuration file database  Normally  ntsnet  uses the executable name  as typed on the ntsnet command line  as the  application name in the configuration file database  This can be useful if  several different executables use the same configuration parameters   Note that  appl has no corresponding resource parameter in the  configuration file     This option enables the tuple broadcast optimization     This option disables the tuple broadcast optimization  This is the  default     This option specifies the size of the broadcast cache in bytes  This size is  a trade off between memory 
167. o tuple space     for  i   0  i  lt  L  i    clump           rows    M clump  columns    M clump     out  1   rows   rows  M clump     cut  i   oolumns   columns  M clump          start workers and make first task     for  i 0  i  lt  workers    i   eval   worker   worker L  M  N  clump       out  task   U     for  i   0  i  lt  L  i    clump  result    M clump   in  result matrix   i    result           The master begins by allocating memory for and reading in the matrices  Next  it places    the matrices into tuple space in chunks  each of which is clump rows or columns  of its  matrix  clump functions as a granularity knob in this program  a parameter whose value  determines the task size  at least in part  Changing clump   s value directly affects how  large each chunk is     The master then starts the workers and creates the task tuple as before  Its final loop  retrieves the result matrix  C  from tuple space  again in chunks of clump rows  Code to  reassemble C from these chunks is omitted          Both versions of this program assume that A is stored by row in the rows array  and B is stored  by column in the columns array  Such a strategy makes accessing the proper elements of each one  much easier     Linda User Guide 79    80 Case Studies    Here is the corresponding worker function     worker L  M  N  clump      Allocate memory for matrix chunks   while  1     in  task     rows index    if  rows_index  lt  L   out   task   rows index   clump    else    cGut  task 
168. o use ones with the extension  i586 on others           Using this mechanism  it   s even possible to support accidental homogeneity   in which different  machine architectures support the same data layout  but obviously  great care must be taken here     Linda User Guide 59    60 Using TCP Linda    These suffixes are used when the suffix resource  application specific  is true  the default  value   Which suffix to use for a given node is specified by the suffixstring resource   application and node specific   The default suffixstring value is the null string  so even  though suffixes are used by default  this fact has no effect until some specific suffixes are  defined using suffixstring     Here is a section of a ntsnet configuration file illustrating a common use of these  features     Tsnet Appl Node rexecdir  parallel       Tsnet test24 nodelist  chaucer moliere sappho    Tsnet test24 suffix  true    Tsnet  Appl chaucer suffixstring   athalon    Tsnet  Appl moliere suffixstring   pIII       Tsnet Appl sappho suffixstring   xeon    Tsnet test24 sappho suffixstring   xOptBlasLib       Tsnet Appl aurora suffixstring   duror    These entries would result in the following executables being used for these nodes when  running the application test24  located in whatever directory resulted from map  translation         chaucer test24 athalon  moliere test24 pIII   sappho test24 xOptBlasLib  aurora  local  test24 duron    If the distribute resource for the application test 24 is t
169. of  outstanding tasks reaches its minimum allowed value  the do loop ends  the master  begins to make new tasks  and the process begins again     After all needed tasks have been created  the master still needs to gather any remaining  results from tuple space  which is the purpose of the second while loop     UPPER_BOUND and LOWER_BOUND allow this program to adapt to its environment   Their values can be adjusted based on the size of tuple space  on the sizes of individual  database records of the database as a whole  on the relative speeds of the  get_next_record  compare  and process functions  and so on           It can be as important to make sure that there are enough tasks in tuple space as to make  sure there aren   t too many  When there aren   t enough tasks to keep all the workers busy   then work starvation sets in  and performance diminishes  Thus  if LOWER_BOUND were too  low  there might be periods where workers would actually have to wait for their next task   a condition which is virtually never desirable        Load balancing is another consideration that often comes into play  This program will  perform fine if all of the comparisons take about the same amount of time  as would be  the case when comparing fingerprints  However  there are many cases where different  comparison operations take vastly different amounts of time   comparing DNA  sequences  for example  In such cases  the program must ensure that the more time  consuming comparisons do not become
170. of all of these factors on the image  also taking into account reflections and  shadows     This case study illustrates the following techniques       Dividing the main task among the workers      Adapting the sequential program structure to the master worker paradigm     Rayshade   s main routine is shown below   As mentioned in the Introduction  we   ve  removed some sections of code and replaced them with descriptive comments  and we   ve  ignored others  like declarations  altogether      main argc  argv       Setup    parse_options argc  argv     read_input_file      Initialization    startpic       start new picture      More setup    raytrace        After some initial setup  Rayshade processes the command line options  validating them  for correctness and determining which options have been selected  It then reads in the  image and then performs some additional initialization steps     Linda User Guide 73    74 Case Studies    Next  Rayshade calls the function startpic  which logically creates a new picture   folowed by some additional setup activities  Finally it calls rayt race which initiates the  actual rendering     The bulk of the work is handled by the routine do trace  called by raytrace     do trace     called by raytrace        Set sampling parameters      Trace each scanline  writing results to output file      for  y   StartLine  y  gt   0  y       trace jit line y  out buf    outline out buf         After testing and setting parameters controlling the samp
171. of this code differ because of the way             define NRA 62    number of rows in matrix A      define NCA 15   number of columns in matrix A        define NCB 7    number of columns in matrix B     main   1  int iz jp kit mise  7    double a NRA  NCA      matrix A to be multiplied     b NCA   NCB      matrix B to be multiplied     C NRA   NCB     result matrix C          Initialize A  B  and C matrices     for  i150  i lt NRA  i     for  j 0  j  NCA  j     eLxpp3le  itj     for  i0  i  NCA  i     for  j 0  j lt NCB  j     bli  lil  3 3     for  i 0 i lt NRA  i     for  j 0  j lt NCB  j     celta   0 0          Perform matrix multiply     for  i 0 i lt NRA  i     for  j 0 j lt NCB  j      for  k 0 k lt NCA k     cltlldits alil k    SIRI TI             Okay  it s a trivial program     printf   Here is the result matrix n     for  i 0  i lt NRA  i     printe  ya     for  j 0  j lt NCB  j     printf   56 25 Te od jd qs       Linda User Guide 137       printi   ass          HEE HEHE EEE EEE EEE EEE EEE EEE EEE EE HEHEHE EE HEE HEE HEE HE HE HE EE E E HEE HE E E E HEE E E E HH HE E    Serial   Matrix Multiply Makefile     FILE  make ser_mm c     DESCRIPTION  See ser mm c     USE  make  f make ser mm c    k k k k k AE aE aE A aE AE HE aE aE aE aE Ha a AE AE FE aE aaa aE a a aE E aE HE aaa AE aa aaa Ea aE E                                                                                                                                        E   ec    mm  ser mm c    CC  ser mm c 
172. of this wildcard character is  supported  Here is an example entry using the ampersand wildcatd    mapto  net  amp       sappho         blake t 73    These entries map the root directory to  net   hostname for the nodes sappho and blake   Thus  the local directory  tests new on sappho would be mapped to the generic  directory  net  sappho tests new by this entry    The ampersand character can also be used within the actual translation specifications     map working           home test work 8     How ntsnet Finds Executables    This example equivalences the directories  home test  work  hostname for all of the  nodes within the network  Here is a more complicated example     map  net  amp       O fp    This entry maps the local root directory on all nodes to the generic directory  net    hostname  where hostname is replaced by the name of the local node   It also translates  directories of that form back to their local equivalents when performing translation on  remote directories  preventing unnecessaty automounting     When wildcarded entries produce multiple matches for a given translation  the longest  matching string is used  When there are two matching strings of equal length  then the  more specific match  1 e   containing fewer wildcards  is chosen     Distributing Executables    ntsnet can distribute executable files prior to program execution  This is controlled by  setting the distribute resource to true  application specific   When different executables  are requir
173. of workers between 1 and  d n    MAXWORKERS    scanf    d    amp numworkers      138 Sample Programs    Matrix Multiply     BRR KKK kok k kok k kok k Koko ke ke ke ke KARA master Ltask     KKK KK KR KK KK ke ke ke e RARAS    ye Initialize A and B       for  i 0  i  lt  NRA  i       for  3 0  j  lt  NCA  j       alili  si   ji            for  i 0  i  lt  NCA  i       for  3 0  j  lt  NCB  j     Bl ji si   jz       put out the arrays for the tasks to read       out  arrav b   bi  out  array a     al        Put out work for them to do   ten columns at a time       work count 0    nbr rows   10    offset   0    extra   NRA   nbr rows    if  extra  gt  0  1  out   offset   offset    out   array data   offset  extra    offset   offset   extra   work count  s     else    out   offset   offset         while  offset  lt  NRA     out   array_data   offset  nbr_rows     offset   offset   nbr_rows   work count             Start up worker tasks        id   1    while  id  lt   numworkers     printf   Starting task td Xn   id    eval   worker   worker      id t            Receive results from worker tasks       Linda User Guide 139    for  k 1 k lt  work_count k       in   results    offset   nbr_rows   c       put results into results array     for  i offset  i lt nbr_rowstoffset  i       for  j 0  j  NCB  j    1  results i  j    clill j         Print results       for  i 0 i lt NRA  i        for  j 0  3 lt NCB  j    1   printf     2f    results r  3l3      printr   An            KR KKK K
174. ok kok ok kok k ok k oko k oko oko k oko k kok k kok Koko k kok k kok ck kokck kck ck Koko kkk      subroutine prtdat    XKOKCKCKCKCKCkCK Ck kCkCk kCkCk kCkCK Ck kCK Ck kc k k kCKCk kCkCk kCkCk kok k Ck kCKCk kok k kc k k kc k k kc k Ck kc k ck kok ck kok ck ckck ck ckok sk ke kk                                                                     void prtdat  int nx  int ny  float  ul  char  fnam     int ix  iy  FILE  fp   fp   fopen fnam   w     for  iy   ny 1  iy  gt   0  iy    4  for  ix     r ix  lt   nx l  ixit  4  fprintf  fp    8 3t     ultix nytiy     if  ix    nx 1   fprinbrt tg  T  ya  else  fprintfi fg   Aevi        fclose fp       HEHEHE EEE EEE EEE EEE EEE EEE EEE EEE TEEGATE HEE HEE HEE HE HE HE EE ERE E ERE E HE    Serial   heat2D Makefile      FI    DESCRIP     USI mak    T   ES             oF       TION           p              make ser heat2D c  See ser heat2D c       f make s    P     r heat2D c          EAE AE AE AE AE TEE aE AE AE AE FE EE ER                                                         HAA AAT AE k aT aE FE AE AE AE AE aE aE HEE aE AE E aE aE E AE AE aE a aE AE AE aE aE a aE aE a aE Ea aE               u ec    heat2D  ser_heat2D c  S CC  ser_heat2D c    Parallel Version     o heat2D    P  e ok k kok k oko k oko k kok kk kkk k kkk k k k k k k k k k k kk k k k k kk k k kCkCk Ck kc k Ck kCk Ck kCk ck k k kkk k kkk k k k k k                      be high in the midd   boundaries are held  time stepping     LINDA  FILE  clinda heat2d   OTHER FILES  
175. ompiler to use for compiling      files  defaults to f77 in most cases   and to xIf under AIX      LINDA FORTRAN LINK    LINDA PATH    122 Linda Usage and Syntax Summary    Used by the linda fortran link shell script  specifies the  command to use for linking the executable  same defaults as for  LINDA FORTRAN      Specifies the path to the Linda installation directory  The directory  specification must contain a terminal slash     LINDA PROCS    POSTOPP QC    Tuplescope Reference    Used by the Linda Code Development System as the return value  for the Iprocs and flprocs support functions  under CDS  Iprocs is  not truly meaningful and is provided only for compatibility with  TCP Linda      Used by the postcpp  cc shell script  specifies the C compiler to  use for compiling  c1 files  defaults to cc      POSTFL FORTRAN Used by the postfl fortran shell script  specifies the Fortran       TSNET PATH    Tuplescope Reference    Menu Buttons  Modes    Aggregates    Run  Break  Continue  Debug    Save    Quit    The Modes Menu  Single Step    compiler to use for compiling   f files generated by Fortran Linda  from     1 source files  same defaults as for LINDA FORTRAN      Used by ntsnet  specifies its search path for local executables  Its  value is a colon separated list of directories     Set debugging modes on or off    Specify format for aggregates displays  active when Display Aggregates  is on   Available formats are  Long  Short  Float  Double  Character  and  Hexadecimal 
176. ons  the following steps are necessary to create and run a TCP  Linda program       Make sure that the bin subdirectory of the Linda distribution tree is in the search  path       Define the set of hosts to be used for program execution by creating the file     tsnet config in your home directory  containing a single line like this one     Tsnet  Appl nodelist  moliere sappho blake shelley    Replace the sample node names with the appropriate ones for your network        T TCP Linda is also available for certain distributed memory parallel computers  However  the  discussion here centers on networks  although identical considerations apply in both cases  This  discussion applies only to TCP Linda version 2 4 7 or higher     Linda User Guide 43    What ntsnet Does    44 Using TCP Linda    Compile the program  using a command like the following     Cle  6 hello world hello world sel    See the    Quick Start  section of Chapter 2 for a more detailed discussion of this  and the following step     Finally  run the program  preceding the normal invocation command and  arguments with ntsnet       ntsnet hello world 4    ntsnet will automatically run the program on the defined set of nodes     ntsnet is responsible for the following tasks        9                Parsing the command line options and configuration file entries    Querying remote systems for load averages    Locating local executable files    Determining what set of nodes to run on  based on its scheduling algorithm   
177. ose minimum     medium  or maximum window size     Mouse Button Effect   Left Make the window minimum size  Middle Make the window medium size  Right Make the window maximum size    A textual representation of the tuple class  positioned to the right of the sizing  box  This shows the tuple class s structure  Here is an example       gl  bals   INT  INT     Clicking with the left mouse button on the tuple representation string will close  the tuple class window     Linda User Guide 97      Spherical icons for each tuple that exists in this class  Left clicking on a tuple  opens a small window that displays its contents  For example       globals  1000 225   If a tuple contains a large amount of data  scroll bars will appear  and you can    scroll through it  viewing one part of it at a time  You can also scroll with the  keyboard arrow keys     Key Scrolling Effect   Up arrow Scroll to previous line   Down atrow Scroll to next line   Ctrl up arrow Scroll to previous page   Ctrl down arrow Scroll to next page   Left arrow Move to the beginning of the tuple  Right arrow Move to the end of the tuple    If the tuple contains an aggregate  by default the aggregates s contents is not  shown  instead  the word BLOCK appears  The Modes and Aggregates menus  control the display of aggregates  see Viewing Aggregates below         Clicking on an individual tuple window with the right mouse button will refresh  its contents  clicking on it with the left mouse button will close it       I
178. other provisions have been made   using the current directory as  the default directory in each case  The  p option is no longer necessary in such cases     Permissions and Security Issues    ntsnet assumes that all remote systems and directories are accessible  By default  it uses  the local username for running remote processes  The user resource  node specific  can  be used to specify an alternate username on a remote system  For example  the following  configuration file entry tells ntsnet to use the username guest when running process on    node sappbo     Tsnet sappho user  guest    There is no provision for specifying passwords for use on remote systems  The standard  TCP IP mechanisms   the  etc hosts equiv file and individual user   rhosts  files   should be used to assure access     ntsnet Worker Process Scheduling    The TCP Linda System provides considerable control over how processes are scheduled  on temote nodes  ntsnet uses the node list resources we   ve already looked at to determine  the list of nodes where the application may potentially run  Whether a node is actually  used depends on a number of other factors  which we ll examine in turn     Forming The Execution Group    The list of potential nodes on which an application may run is determined by the nodelist  resource  the set of nodes on which an application actually runs is called the execution  group  ntsnet begins with the node set and successively starts remote processes using the  scheduling rule
179. ou are using TCP Linda  Chapter 3  Using TCP Linda  describes the special features  and considerations of the Linda implementation for networks of UNIX workstations   The first section provides an introduction and    quick start    for new users of TCP Linda   The remainder of the chapter describes the general features of TCP Linda     To see examples of using Linda with the C and Fortran programming languages  Chapter  4  Case Studies  presents several extended program examples illustrating the process of  transforming a sequential program into a parallel program with Linda     Chapter 5  Using Tuplescope  describes the Tuplescope visualization and debugging tool  and the Linda Code Development System  CDS  and how to use them for high level  debugging of Linda programs running on a single node           Chapter 6  Linda Usage and Syntax Summary  provides a programmer s reference to the  features of the Linda programming environment  including both C Linda and  Fortran Linda language constructs and the elements of the Linda Toolkit     Chapter 7  Sample Programs  contains five sets of example programs that demonstrate  how to use Linda to parallelize serial programs  Each example includes the serial version  of a program written in C and a parallel version created with Linda  The examples are  presented in order of increasing complexity        The Bibliography lists books and articles that may be of interest to Linda users  Some  items provide more advanced treatment of para
180. ource 52  120  parallel value 53  58    S  sample programs 127   152  2D heat eguation 150  array assignment 127  concurrent wave equation 142  matrix multiply 137  pi calculation 132  scalars 28  secutity 61  shape of arrays 32  shell scripts  location 118  speedfactor resource 65  120  start timer 106  status messages 67  stride 33  strings 35  structures  arrays of 34  C 33  varying length 35  suffix resource 60  120  suffixstring resource 60  120  support functions 106  Syntax  formal C Linda 105    syntax requirements 18    T  task  adjusting size of 80  compared to worker 9   TCP Linda 43   68  appropriate granularity for 68  file permission requirements 61  process scheduling 61  running programs 21  searching for executables 51  secutity 61  shell scripts 118  specifying the working directory  61  tuple space size 22   TDL 101  template 12  matching rules 26  37  termination 37  threshold resource 65   66  120  throwing away data 36  timer split 106  timing functions 106  translate resource 57  120  tsnet command 50  61   tsnet config file 46  tsnet config application_name file  45  tsnet map file 46  53  TSNET_PATH environment  variable 46  51  123  tuple  allowed field types 26  arrays in 28  36  arrays of structures in 34  character strings in 35  classes 96  colon notation 29  common blocks in 33  counter 19  data 11  data types in evals 24  definition 10    discarding 36  formal to actual assignment 12  Fortran 90 array syntax 33  ignoring fields in 36  length of
181. out   varying struct    s5s          bytes   sizeof  STYPE     sizeof int     buf len l        current structure length     out   varying struct    sibytes      The first out operation creates a tuple with a varying structure as its second field  The  second out operation creates a tuple whose second field is also a varying structure  for  this instance  the current length is set to the size in bytes of STYPE  including the first  element of array buf  plus the size in bytes of the remainder of array buf  the product of  the size of one element and its number of elements minus 1         C Character Strings    In keeping with C usage  character strings are simply arrays of characters  and the normal  array considerations and matching rules apply  The only exception occurs when  comparing string constants with character arrays     The length of a string constant is the number of characters in it plus one  for the null  terminator  Thus  the string    hello    has a length of 6  and a five element character array    will not match it  it requites a six element array     char s 6    int len     out   string    hello      length   6     in  string   38      Linda User Guide 35    The array colon notation may also be used with strings for fields where the length of the  string is variable     out  sterimg    Litas  in  string   ra  len  7    Note that the literal string in the out operation needed to include the colon in order for  the template in the in operation to match since th
182. oving it from existing  node sets        true       node specific             workerload    The third term of the numerator adjusts the load average for the    number of workers the node already has running  even though they    are not yet consuming substantial CPU resources  You can alter this    value from its default of 1 to reflect your estimate of how much load a    single worker process adds     Linda User Guide 65    66 Using TCP Linda    speedfactor The speedfactor resource s value is used to normalize the corrected  load averages for differing CPU speeds   and hence different total  capacities   on different nodes  Generally  a value of 1 is given to the  slowest system type within the network  Faster systems will then be  assigned correspondingly higher speedfactor values  Multiprocessor  systems ate also given higher speedfactor values  generally set to the  value appropriate for a single CPU multiplied by the number of  processors     Once all the adjusted loads are calculated  ntsnet finds the node having the lowest value   which presumably will be the most lightly loaded when execution begins  It then checks  its adjusted load against the value of its threshold resource  If the adjusted load doesn   t  exceed it  ntsnet next checks the number of processes already started  If this number is  less than the value of the maxprocspernode resource  another process is started   otherwise  ntsnet continues on to the next least heavily loaded node  This scheduling  proce
183. ow ntsnet Finds Executables       local generic remote     home chavez work    mabig a mapfrom  flaubert   u home chavez work              aurora   home chavez work         gt  blake   home chavez work  m gt  chaucer   u chavez work  m  degas   tmp             erasmus   mf chavez work    I  flaubert   u home chavez work        gt  gogol   net aurora home chavez work             This particular rule facilitates generic to remote directory translation  enabling remote  processes to be started on degas from any other node  using  tmp as the working  directory  Note that the generic directory we ve specified is the one to which her home  directory will be translated from any other node  This rule will work only for user chavez   other users would need to construct their own versions     This rule will not allow chavez to run ntsnet on degas  however  since it uses mapfrom  If it  were necessaty to be able to use degas as the local node  then the rule should be written as  a map  The preceding diagram illustrates the map translation process using all three rules  for our sample ntsnet command  executed on flauber       We ve noted that generic directories need not exist as real directory locations  Here is a  map file entry which takes advantage of this fact     map special      chaucer  u guest linda    blake  home guest1 app1 linda   sappho  usr guests linda bin   aurora  public bin    zola fuses local bin     special is not a real directory on any node  this entry defines the s
184. pecified directories  on the nodes listed as equivalent     Map translation can be disabled by setting the translate resource  application specific  to  false  it is true by default   or by using the corresponding command line options      translate  translate   Remember also that map translation is used only for determining  executable file locations on nodes for which the setting of rexecdir is parallel     Linda User Guide 57    Similarly  the remote working directory is determined by map translation of the local   current  working directory only for nodes where the rworkdir resource is set to  parallel  also its default value   In particular  map translation does nothing to  filenames internal to the application     Map Translation Entry Wildcards    58 Using TCP Linda    There are two wildcard characters allowed in map translation file entries       An asterisk     can be used as a wildcard in local and remote node names         An ampersand   amp   can be used to substitute the current node name within a  translation     For example  the following mapto entry handles the most common version of  automounted directories     mapto  net         gt   tmp mnt net     It specifies the same remote directory for every remote node for the generic directory    net  The asterisk wildcard character can be used as above to represent the full node  name  It can also be used as the initial component of a node name to specify wildcarded  subdomains      medusa   com  No other placement 
185. pectively     C Version Fortran Version    in  coord   2x  y  inm  eoord   2x  Py     If no matching tuple exists in tuple space  then the operation will block  The following  tuple searches for the same sort of tuple  but specifies that the value in its second field  must match the current value of x     C Version Fortran Version    ini ocoor     xe Ty   in  coord   x  Ty     The rd operation functions identically to in except that it does not remove the matching  tuple from tuple space  For example  the following rd operation will attempt to match the  same kind of tuple as in the examples with in  except that this time the value in the  second field must be 3     Linda User Guide 21    C Version Fortran Version  rd  ooord   3  Tyl  rdi eoord   3  Ry     When the rd operation successfully matches a tuple  the value in its third field will be  assigned to the variable y  The tuple will remain in tuple space  available for use by other  processes     out    The out operation adds a tuple to tuple space  Prior to adding it  out evaluates all of its  fields  resolving them to actual values  out returns after the tuple has been added to tuple  space     For example  the following out operation places a    coord    tuple into tuple space     C Version Fortran Version  out   eoord      3  10  ceuta  ecard  3  105    If any of the fields in an out operation contain expressions  they are evaluated before the  tuple is placed into tuple space  For example  this out operation will com
186. ple space to a file  named    program N dump  where program is the application name and N is a  integer incremented each successive save operation     Linda User Guide 125    126 Linda Usage and Syntax Summary    7    Sample Programs    This chapter contains five sets of example programs that demonstrate how to use Linda  to parallelize serial programs  Each example includes the serial version of a program  written in C and a parallel version created with C Linda  The examples are presented in  order of increasing complexity     Array Assignment    Serial Version     KR k k kok k oko k oko k oko k oko k kok I kok k k ok k oko k Koko RRA RARA k kok RRA RRA kkk kkk kk kok ck kkk ck ck ck k kk                      Serial Example   Array Assignment   C Version   PILhE array c   OTHER FILES  make array c   DESCRIPTION   In this simple example  an array is initialized and values assigned   In the parallel version the master task initiates numtasks 1 number of  worker tasks  It then distributes an equal portion of an array to each  worker task  Each worker task receives its portion of the array  and       performs a simple value assignment to each of its elements  The value   assigned to each element is simply that element s index in the array l   Each worker task then sends its portion of the array back to the master  task  As the master receives back each portion of the array  selected    elements are displayed   XKOKCKCKCKCKCKCK Ck kCk Ck kCk Ck kCkCK Ck kCKCk kCk Ck kCKCk kCkCk 
187. processes  This can be used to filter  out unwanted tuples or processes     color color Change matching items to the indicated color  Co or must be on one of   red  orange  yellow  green  blue  indigo and violet  On  monochrome displays  the colors are mapped to distinct black and white  pie shaped icons     save Dumps the contents of tuple space to a disk file  This is equivalent to  clicking on the Save button during program execution  Save operations    are not legal when the Dynamic Tuple Fetch option is in effect     Here are some examples of complete TDL statements           if   linda_op    out   then color red  if   process  lt  5  or  process  gt  7   then hide  if   linda op    out  and  field 2    0   then break    The first statement turns the process icons for processes performing out operations the   color red  The second statement hides all process icons except those for process numbers  5 and 6  The final statement causes execution to pause whenever a tuple is placed in tuple  space whose second field is nonzero  Note the syntax of the TDL statements  including   both the parentheses and square brackets which must surround conditions     Linda User Guide 103    104 Using Tuplescope    6    Linda Usage and Syntax Summary    Linda Operations    in s     rd s     rdp s   amp  inp s     eval s     out s     Withdraw a tuple matching s from tuple space  If no matching tuple is  available  execution suspends until one is  If more than one matching  tuple exists  one
188. pute the value of  the function    with the argument x  and then place a tuple with that value in its third field  into tuple space  the second field will of course contain the current value of x      C Version Fortran Version  out  Tepora  x      se   4 Cle  OOS e  RL     The evaluation order of the fields is not defined and cannot be relied upon  For example   in the following C Linda out operation  x may or may not be incremented before it is  used as the argument to the function f     C Version Fortran Version  out  coerd   FE  El   outst    coord     oz EXI     Similarly  in the Fortran Linda version  there is no way to determine which routine will be  called first should both    and g modify x     These types of constructions should be avoided     No specified action is taken 1f tuple space is full when an out operation attempts to place  a tuple there  Current Linda implementations will abort execution and print a diagnostic  message  Typically  such events are treated by programmers along the lines of a stack or  heap overflow in conventional programming languages  the system is rebuilt with a larger  tuple space  Future Linda versions may raise exception flags  Under TCP Linda  the size  of tuple space is limited only by the total available virtual memory of all participating  nodes     eval    As we saw in the previous chapters  an eval operation creates a process tuple consisting  of the fields specified as its argument and then returns  Here we ll go into more det
189. r an  N atom molecule is approximately N  N 1  2  and so we want each worker to do about     N   N 1  2    num workers    of them  If there are N atoms  there are N  possible interactions  but we throw out pairs  where an atom is interacting with itself  The factor of 1 2 comes from the fact that the  interactions are symmetric  A s effect on B is equivalent to B   s effect on A  and thus only  need to be computed once  We should also remove bonded pairs  but if this number is  small compared to the total number of interactions  then our formula is a reasonable  approximation to the ideal number of interactions per worker     In some cases  the total work can just be divided as evenly as possible among the  wotkers  Here  we might like to just give each worker the number of interactions we  computed above  cotrecting the last worker s amount for any remainder and other integer  rounding we had to do  However  the interaction pairs are actually stored as rows in a  lower triangular matrix  a matrix with all of its non zero elements on or below the  diagonal   and for maximal efficiency  we want to give only complete rows of the matrix  to each worker  Given this constraint  it is necessary to figure out how many rows to give  each wotker so that the number of elements each one gets comes out about the same   and close to the target number      To do so  the program uses the fact that there are I non zero elements in row I of a lower    triangular matrix  and assigns consecutiv
190. r vel s         Te   temperature  str kinetic energy str vel                verl_scale  coor  sqrt  T Te          for  i20  i  lt  NSteps  i       verl step str coor vel s            Te   temperature  str kinetic_energy  str  vel             verl_scale  coor  sqrt  T Te          exit  0       The key routines here are verl init  which performs the setup work  and  verl step  which oversees the calculation at each time step  The actual computations  are performed by routines called from these functions  As we ll see  it is in the latter that  the major changes for the C Linda version appear     Linda User Guide 87    88 Case Studies    The following diagram illustrates the structure of the parallel version of the molecular  dynamics program  The master still does much of the initial setup work  but one step   summing the special charges  is divided among the workers  Once the master has  gathered and processed the workers    results from the setup phase  the main calculation  loop begins  Some parts of it remain with the master  in fact  the nonbonded interactions  so dominate the execution time that it is not worth parallelizing any of the other steps     Structure of the Parallel Molecular Dynamics Program       eval  workers    Read data worker  from disk setup  in task  amp   globals   Y  compute  charges   Y   out charges      Setup      seq  setup    out globals    outtasks            in 8 sum  changes         i Tuple  Repeat Calculate Space  NSteps bonded Repeat  times 
191. rent value can be set  for it for every application program and node combination  For example  you can specify    Customizing Network Execution    a different working directory when running program big job on node zoZere and on  node chaucer  and you can specify different working directories for programs big job and  medjob on node molere     In contrast  the resource speedfactor is node specific  meaning that you can specify a  different value for each potential execution node  but not for node application program  combinations  the value for a node applies to all TCP Linda applications that run on it   Such resources usually specify intrinsic characteristics of a node which don   t depend on  the application program being run  For example  speedfactor specifies how fast a node is  relative to other nodes in the network  something which is relatively constant across  different applications  at least in theory      Finally  the resource maxprocspernode is an application specific resource  meaning that  its value can be specified separately for different application programs  The value set for  an individual application is used for whatever nodes it may execute on  This resource  specifies the maximum number of processes per node that can be run for a given  application program  the default is 1     Here are some example entries       some sample  tsnet config file entries    ntsnet hello_world moliere rworkdir   tmp             ntsnet hello_world maxprocspernode  1          n
192. rgc  argv      Ray Tracing    read input file      Initialization    if  worker   return     startpici      Start new picture     More setup   raytrace       The third argument is a flag indicating whether the caller is a worker process or not  a  value of 1 means it is a worker   This flag is used to ensure that the worker exits at the  proper time  and the remaining initialization steps are executed only at the beginning of    the job t    A few lines were also added to parse_options to handle additional options specifying  the number of workers to use for the parallel version     rayshade main still ends by calling rayt race  No changes were made in that  routine  but a lot was done to the support routine do_trace that raytrace calls     do_trace becomes the master process in the parallel version of Rayshade  It   s  important to note that the master does not always need to be the top level routine  real_main or even the original main function  any routine can become the master  process     Here is do_trace     do trace        out   JitSamples   JitSamples     out   TopRay   TopRay    for  i   0  i lt  Workers  i          eval   worker   worker       out   scaninfo   StartLine     for  y   StartLine  y  gt   0  y       ini  result   w  Y out bufi    outline out buf      The new version begins by placing two parameters needed by lower level routines into  tuple space  The worker processes will read them and pass them along to the routines  that need them  The function then 
193. rue  then files of these names  will be copied to the remote nodes prior to execution  Otherwise  they must already exist  there  in the proper directory      The command line options  suffix  suffix may also be used to specify the value of the  suffix resource for the current run     ntsnet Worker Process Scheduling    Specifying the Working Directory for Each Node    The working directory used for program execution on a remote node is also mapped  from the current directory on the local node if the rworkdir resource  application and  node specific  is set to parallel for that node  Alternatively  an explicit working  directory can be set for an application  node  or application node combination by giving  an explicit directory as rworkdir s value     The  p option can be used to specify a remote directory for both rexecdir and rworkdir in  a single option  It overrides any other method of setting either resource  including other  command line options  Note that the old tsnet command requirement of including  p  with  cwd as its option is no longer necessary  If the current directory is NFS mounted  on all desired nodes  using the same path everywhere  then a command like this       ntsnet hello world    will work properly  even assuming no settings have been made via the ntsnet  configuration file   It will run the executable hello world  located in the current  directory on all relevant nodes  by default  the nodes in the file tsnet   nodes in the  current directory if no 
194. s closed tuple window   open windows grayed out   click to open     X tuple space partition Dog X le space partition Que X tuple space partition Que    sl   globals  INT  INT  ZI  task  INT  gj  result  INT     window sizing box                       Ex blocked process after  m pocese locker A    g completing an out Li  o 9   9999909    process after    procen blocked A 22 s    9 successful in  process after    successful rd  tuples  click to view        Tuple class windows    The Tuplescope display consists of two parts  the control panel window and separate  windows for each tuple class  tuple classes consist of the compiler generated distinct  partitionings of tuple space  based upon tuple field types and any unique constant  strings   Note that the display above is deliberately artificial and was constructed to  present all of Tuplescope s features in a compact form rather than to represent any real   or even possible  situation     The Control Panel    The control panel is a separate Tuplescope window that contains these items       The name of the executable  appearing on both the window title bar and in a box  in the upper left of the display       A row of menu buttons along the top of the window  labeled Modes through Quit   Click a menu button to display its associated menu  if any  or perform its  designated action        Aslider control bar in the upper right corner of the window that has a sliding bar  that moves along a tick marked area  The default position for
195. s described below until a sufficient number of them are running     Linda User Guide 61    62 Using TCP Linda    Technically  the processes started by ntsnet are known as eva  servers  An eval servet is a  process started by ntsnet that waits to execute a Linda process  The process on the local  node is known as the master in this context  and it too eventually becomes an eval server   The eval servers on remote nodes  and any additional eval setvers on the local node  are  known as workers  Note that this terminology is independent of the master worker  concepts within the application program  When used in this chapter  the terms master  and worker refer exclusively to the initiating process on the local node and all other eval  servers started by ntsnet  respectively     How many worker processes are desired is controlled by the maxworkers and  minworkers resources  application specific   The default values are the number of  distinct nodes in the nodelist resource  not counting the local node  for maxworkers and  1 for minworkers  The master attempts to start maxworkers workers when execution  begins  if at least minworkers eventually join the execution group  the job proceeds   Otherwise execution terminates     More specifically  execution will begin according to the following rules       ntsnet will attempt to start up to maxworkers worker processes  Until the time  specified by the minwait resource  execution will begin immediately whenever  maxworkers workers have joi
196. s from each worker          for  wrker   0  wrker  lt  nworkers  wrker       in   pr results   round nbr   pi recv       keep running total of pi       pisum      pisum   pi recv          end for ea worker loop             Calculate the average value of pi for this iteration       pi    pisum      homepi    nworkers 1       1 includes master s calc             Master calculates the average value of pi over all iterations     avepi     avepi   round nbr    pi   round nbr   1    prince    After  3d throws  average value of pi    10 8f n     DARTS    round_nbr   1   avepi         end for each round loop       return  0             KOR k k kok k oko k ok ko k kok k kok kok ok RARA RR RR A I A A k kok k I OI I eK kk      Start of Worker      Task    Fe AK RARA RR RR A k oko k kok k Koko Koko k kok k kok kok k K kk kk kk k kk ke ke ke ke e e ke x      int worker int id                local variabl    es          double my_pi   int round      Set seed for    srandom  id      random number generator equal to task ID             Calculate pi  for  round   0     using dartboard algorithm     round  lt  ROUNDS  round         my_pi   dboard DARTS     out   pi_results   round  my pi      return 0       KKK KK RARA RAR RAR ok ok oko kok ok k oko RR RRA I A k kkk kok ck kok ck kck ck kok ck Koko k kk      dboard    KOKCKCKCKCkCkCKCk kCkCk kCkCk A kCk Ck kCkCk KCkCk kCk Ck kCK Ck k Ck kCkCk kCk Ck kCk k Ck kCk Ck kc k Ck kck ck kok ck kck ck ckck I KK      fdefine sqr x    x   x      long ra
197. s the default     114 Linda Usage and Syntax Summary    The ntsnet Command     useglobalconfig     useglobalmap     useglobalmap     v  v     vv  vv     verbose     verbose     veryverbose     veryverbose    This option causes ntsnet not to use the resource definitions in the  global configuration file  ntsnet will only use the command line options  and the user s local configuration file if one exists  This is useful if a user  does not wish to use the configuration file installed by the system  administrator     This option causes ntsnet to use the global map translation file  The  translations in the global map translation file are used in addition to the  uset   s local map translation file if one exists  This is the default     This option causes ntsnet not to use the global map translation file   ntsnet will only use the user   s local map translation file if one exists  This  is useful if a user does not wish to use the map file installed by the  system administrator    Synonymous with  verbose  verbose     Synonymous with  veryverbose  veryverbose     This mode displays remote commands being issued and other  information useful for debugging configuration and map files     This option turns off verbose mode  This is the default     Turns on very verbose mode  which produces the maximum amount of  informational status messages     This option turns off very verbose mode  It is the default      wait minwait  maxwait      workerload  oad    This option specifies the mini
198. s the value of the  LINDA PROCS environment variable  or the value 6 if it is not defined     The clc and flc Commands    Command Syntax    cle  options  source files  fle  options  source files    clc expects source files to be of one of the following types   c   cl   o    10  flc expects  source files to be of one of the following types           1  0    10  By default  these  commands produce the executable a out  override this name with  o   The following  figure illustrates the compilation and linking process     Linda User Guide 107    Command line  clc  o sundae choc cl straw lo vanilla c    shell script   calls cpp     clc  parser executable    Compilation       executable   aka  the Linda engine      ac    _Itxxx o vanilla c  l i       E   Itcp es support lo    Itcp es support d0 Linda Runtime    Support  Itcp linda a        i etc     a 7 shell script 4  eo  linda_cc_link  calls cc  amp  Id                i    C Linda Compiling and Linking Process          Linking    108 Linda Usage and Syntax Summary    The clc and flc Commands    Gumurj     0Je  e sqnjs  e epui  do   uoddns  eumuny epur        UeJ110J epul             yxxxu     ueJ140J epuil    Fejiuea    uonejiduo3    J eTTTUPBA OT  Me13s TI 00UD eepuns o        OTF     eui  PUBWIWIOD    ss9901g Buryur  pue Buijiduo3 epurT1 ue140 4    yu ue0j epul       z  jeue    i    9p02 UeILIO4  ejiduioo       o  ui   vu      o xxxe   48    op woddns se day  o  uoddns se doy       ueJjuoJ jpsod    a    109    Linda User Guide 
199. se with Linda is the distributed data structures using virtual shared memory     All interprocess communication is accomplished via this global data space  Processes  never explicitly send messages to one another  but rather place data into the shared data  space  When another process needs that data  it obtains it from the shared data space   Linda handles all data transfer operations  so the program need not worry about the exact  mechanism by which it occurs  Linda operations require no additional overhead over any  message passing scheme  and in fact sometimes are more efficient     Programs using the distributed data structure method often use a master worker  computation strategy  Under this approach  the total work to be done by the program is  broken into a number of discrete tasks which are stored in the global data space  One  process  known as the master  is responsible for generating the tasks and gathering and  processing the results  Actual program execution involves a number of component  processes known as workers  Each worker removes a task  completes it  and then grabs  another  continuing until some condition is met  For example  it may encounter a special  type of task   known as a poison pi     telling it to die  or it may be terminated by some  other mechanism  Depending on the situation  the master process may also perform task  computations in addition to its other duties     8 Overview of Parallel Programming and the Linda Model    Approaches to Paral
200. sing its menu item   The check mark will then disappear  and the other formats will become active  You may  then select the desired format  Exit from this menu by choosing the Exit Aggregates  Menu item     What format a tuple containing an aggregate uses depends on the Dynamic Tuple Fetch  setting on the Modes menu  If Dynamic Tuple Fetch is active  then an aggregate is  displayed in the current Aggregates menu display format when you click on its tuple icon   If Dynamic Tuple Fetch is not in effect  then the format that was in effect when the tuple  entered tuple space will be used regardless of the current setting     Viewing Process Information    Clicking on a process icon displays a scrollable file window containing the text of the  source file corresponding to it   Tuplescope requires source files to be in the current  working directory   The line containing the tuple space operation corresponding to that  icon is indicated by a caret        Scroll bars or the keyboard scrolling keys can be used to  examine the source file  To close the window  click the left mouse button anywhere  within it     Tuplescope Run Modes    Tuplescope has a number of different modes for program execution  First  execution  speed can be controlled with the slider control on the Tuplescope control panel discussed  previously  This is independent of the other run controls we ll look at in this section     Clicking on the Run button will commence program execution  The program will execute  in 
201. ss continues until maxworkers processes are started or until no qualifying node can  be found  The goal of the scheduling process is to minimize the load on the maximally  loaded node     Here are some sample ntsnet configuration file entries showing the uses of these  resources     Tsnet Appl getload  true       Tsnet Appl loadperiod  10      use the default speedfactor of 1 for these systems    Tsnet Node slowmach  priestley pinter      fastguy is 5 25X slowmach    Tsnet Appl fastguys  sand stahl      goethe has 2 heads and gogol has 4  each is 1 5X slowmach    Tsnet Node multiprocs  goethe gogol    Tsnet Appl nodelist   slowmach  fastguys  multiprocs      scheduler parameters    Tsnet Appl masterload     5          Tsnet Appl workerload  1      maxprocspernode is set high so gogol can get a lot    Tsnet  Appl maxprocspernode  8    Special Purpose Resources    Tsnet stahl speedfactor  5 25    Tsnet sand speedfactor  5 25       Tsnet goethe speedfactor  3       Tsnet gogol speedfactor  6    This file attempts to set speedfactor values for the various nodes reflecting their relative  CPU capacities  The maxprocspernode resource is set to a high value so that the  multiprocessor system with 4 CPU   s can run up to two workers per processor     Special Purpose Resources    ntsnet provides several resources for use with various optional features which will be  described in this section     Tuple Redirection Optimization    By default  the TCP Linda system uses  up e redirection 
202. starts Workers worker processes  each running the  routine worker        t While this program structure  in which the workers each call the main routine  is unusual  the  technique of having workers repeat the setup code is not  because it is often faster     Linda User Guide 75    76 Case Studies    Once the workers have all started  do trace creates the task tuple    scaninfo   As in the  Freewake example  Rayshade uses a counter tuple to indicate which task   in this case   which scan line   should be done next  Here the counter is initially set to the maximum  scan line number to be computed  each worker will decrement the counter as it grabs a  task  and when the counter falls below 0  all tasks will be complete     The second for loop gathers completed scan lines from tuple space  again counting  down from the maximum scan line number  and sends each one to the output file  The  lines need to be gathered in order so that they are placed into the file in the proper  location  they may not artive in tuple space in that order  however  so do_trace may  spend some time blocked     The last piece needed for the parallel version of Rayshade is the worker function     worker        rd  comm args    7 args  arge    for     i   0  i  lt  afgo tTf i   argv i     char     argsti    rayshade main argc  argv  1    rd  lopRay   7 TopRay         rd  JitSamples     JitSamples    Set sampling parameters   while  1      in  scaninte   7 Y     out   scaninfo   yal     if  y  lt  0    brea
203. string  In this case  the entry has the effect of setting up equivalences among the listed set  of remote directories     Wildcards are allowed in map translation file entries       The asterisk character     can be used for any node name or as the first    component of a node name  e g      com        The ampersand character   amp   substitutes the current node name   at the time and    in the context in which the translation is taking place   within a directory    pathname  It can be used in either generic or specific directory specifications     See the discussion in Chapter 4 for full details on map translation file entries     Environment Variables    The following environment variables are used within the Linda system     DEBUGGER             LINDA CC    LINDA CC LINK    LINDA CLC    LINDA FLC    LINDA FORTRAN    Specifies the debugger to use when combining a native debugger  with Tuplescope  The default is dbx  xdb under HP UX      Used by the 1inda cc shell script  specifies the C compiler to use  for compiling  c files  defaults to cc      Used by the linda cc link shell script  specifies the command  to use for linking the executable  defaults to cc      Used by the C Linda compiler  specifies which type of executable to  build  linda tcp or cds  Code Development System      Used by the Fortran Linda compiler  specifies which type of  executable to build  linda tcp or cds  Code Development  System      Used by the 1inda fortran shell script  specifies the Fortran  c
204. structure     p      Structure fields are one way to create tuples containing one record of data  For example   the following loop retrieves NREC records from a database and places them into the    tuple space  Each record is identified by the integer record number in the tuple s second  field        ine dos       struct REC Ss           for  150  i  lt  NBEC  i    4  get next rec  amp s    Guti  record   iy 8         Structures may also be used as actuals in templates   in  structure   t    In this case  the structures in the tuple and template must have the same structure tag  the  same size  and they must be identical on a byte by byte basis  including values in any  padding bytes  When using such constructs  be careful to take structure padding into  account     An atray of structures is treated just like any other array     int len   struct STYPI             se 20       20     dub  struct stray   si      matches  sets l  n   10     int struct array    t len         34 Using the Linda Operations    Specifying Tuples and Basic Tuple Matching Rules    Varying Length Structures    You can use the colon notation with structures to specify them as varying  In this case   the length is taken to be the size of the structure in bytes  This construct was designed for  structures having varying arrays as their last elements     Here are two examples        struct STYPE      double guy Dy es  int buf_len   int bwt  1    V   int bytes    STYPE  s           declared struct length     
205. t   double   struct   union     unsigned    int   long   short   char      Timing Functions        The C Linda function names ate listed below     start timer      Initializes and starts the stopwatch in the current process  A separate call  to start timer is required in each process where timing is desired     timer split string     print times      Support Functions    Takes a stopwatch reading when called and labels the time split with the  specified string  length    32 bytes   The maximum number of timer  splits is 32     Prints a table listing all time splits executed so far for this process  Each  row includes the time split and its associated string     The C Linda function names are listed below  The Fortran Linda versions have an f  prepended to the C Linda function name      fllexit status      f lhalt status      f lintoff       f linton      106 Linda Usage and Syntax Summary    Replacement for the C exit function  The lexit routine allows an eval  process to abort the execution of the routine invoked through the eval  operation but still continue as an eval server  The status value  int   passed to lexit is placed into the corresponding field of the live tuple   subject to typecasting restrictions      Terminates execution of the entire Linda application  not just the local  process   after calling any termination handlers specified by lonexit  see  below   Provides the exit value returned by ntsnet     Blocks the interrupts associated with tuple space handling 
206. ta tuple is placed into tuple space   While it is not literally true that an eval creates the processes that evaluate its arguments   it can be helpful to think of it this way     For example  this eval statement will result in a process being created to evaluate its third  argument  f  i      C Form Fortran Form  eval  test   i  f i    eval   test   i  f 1      evals ate often used to initiate worker processes  as in the following loop     Linda User Guide 11    C Form Fortran Form  for  i20 i  lt  NWORKERS  i    Do 5 I 1 NWORKERS  eval  worker   worker     eval  worker   worker             5 Continue    This loop starts NWORKERS worker processes  In this case  the primary function of the  eval is simply to start the process  rather than to perform a computation and place the  result into tuple space  Formally  howevet  when each worker finishes  a tuple of the    form   C Form Fortran Form    worker   0    worker   0     is placed into tuple space  assuming that the workers terminate normally and adhere to  the usual UNIX retutn value convention of zero for success      The other two operations allow a process to access the data in tuple space  A rd operation  reads a tuple from tuple space  and an in operation removes a tuple from tuple space     Both rd and in take a template as their argument  A template specifies what sort of tuple  to retrieve  Like tuples  templates consist of a sequence of typed fields  some of which  hold values  such fields are known as actuals    
207. te     Here is a sample run     Program   hello world 8    Execution o  world from number       lo  world from number       lo  world from number       lo  world from number       lo  world from number       lo  world from number       lo  world from number        gt  Aw a  amp  O EG                lo  world from number       I  0000000 0           g  oO    lo_world is finished        It is to be expected that the messages from the various processes will display in  non numerical order since we   ve done nothing to force them to display sequentially   Linda programs are essentially asynchronous  and there is no guarantee that a particular  process will execute before any other  Indeed  we would not want to do so  since we   re  trying to achieve a simultaneously executing parallel version of hello world     To run on a network  the program is compiled and linked in essentially the same way  but  running it requires using a slightly different command  For this version  we might want to    add the node name to the output line from each worker     C  gethostname  name  20    printf  Hello  world from number bd running on  s Mn  i name      Shown below are the commands to create and run the modified program  C Linda  version is shown      20 Using the Linda Operations    Program  Execution with  TCP Linda    Linda Operations      clc  o hello world hello world cl    ntsnet hello world 8   Hello  world from number running on moliere   Hello  world from number running on ibsen   He
208. th  left hand  neighbor     if  first    1          r    out   RtoL   left  iteration  values 1     in  LtoR  id  iteration   values 0         end if             Exchange data with  right hand  neighbor     if  firsttnpoints 1 l  tpoints  1  out  LtoR  right iteration  values npoints     in  RtoL  id iteration   values  npoints 1          end if                Update points along line     for  j 1  j  lt   npoints  j                        Global endpoints     if   first j 1    1      first j 1    tpoints      newval j    0 0      else       Use wave equation to update points     newval j     2 0   values j     oldval j      sqteu    values j 1       2 0   values 31     values 3 1             end if          end npoints for       for  j 1  j  lt   npoints  j         oldval j    values j    values j    newval jl     148 Sample Programs    Concurrent Wave Equation            end nsteps for loop     return  0                    Receive results from workers and print       mf   int get_results  int tpoints  int nworkers        local variables     int  ly jy K  tpts  first  Length   double results 1000                  Store worker s results in results array     for  i 1  i  lt   nworkers  i    1     Receive number of points and first point     in   results      1   first   resultsilength       put it into the results array     first   first     1     adjust first to array starting at U          j 0   for  k first  k  lt  length   first  k    j       values k    results jl        
209. that divides the work into  many tiny tasks is said to be fine grained  while one that divides the work into a small  number of relatively large ones is called coarse grained     There is no absolute correct level of granularity  Neither coarse grained nor fine grained  parallelism is inherently better or worse than the other  However  when overhead  overwhelms computation  a program is too fine grained for its environment  whatever its  absolute granularity level may be  The optimum level depends on both the algorithms a  program implements and the hardware environment it runs in  A level of granularity that  runs efficiently in one environment  for example  a parallel computer with very fast  interprocessor communication channels  may not perform as well in another  such as a  network of workstations with much slower communication between processors      There are two ways to address this issue  and we ll look at examples of both of them in  the course of this manual   First  many problems offer a choice of granularity level  For  example  if a program must execute eight independent matrix multiply operations  a  parallel program could perform all eight of them at the same time  or execute eight  parallel matrix multiplication operations one after another  Which approach is correct  depends on the structure of the overall problem  and each is better than the other in  some circumstances     The other solution is to build adjustable granularity into programs so that they can
210. the  high resource is false  Linda internodal communication is throttled so    Linda User Guide 117    kainterval    kaon    lindarcparg    lindarsharg    loadperiod    masterload    maxnodes    that it does not flood the network and thereby degrade the performance  of the network Linda application and other network users  For small  networks  2 4 nodes  specifying high as true will probably not make a  difference  On large networks  specifying high as true  and thus asking  the Linda kernel not to throttle internodal communication  may cause  the network to flood     Specifies how often  in seconds  each Linda process sends out a keep  alive message  The default is 100 seconds  The range of legal values is  100 to 31536000  one year   The range is silently enforced  This  resource is only useful when the keep alive mechanism is used  that is   when kaon is true     Specifies whether of not the keep alive mechanism is used  The default  is true unless debug is true  in which case it is false     Specifies a string to be passed to the linda rcp shell script called by  ntsnet to distribute executables to remote nodes  This resource provides  a hook enabling the user to change the behavior of the shell script   which can itself be modified by the user   The default implementation  of linda_rep  located in the Linda bin subdirectory  takes no arguments  and so ignores the value of this resource  This is a node specific  resource     Specifies a string to be passed to the linda rs
211. the particular data  records are is less important than the general issues all such searches raise  many of which  are applicable to other types of problems as well  Readers wanting a more rigorous and  specific treatment of this topic should consult Chapters 6 and 7 of How to Write Parallel  Programs by Carriero and Gelernter     This case study discusses the following techniques and issues     Distinct task tuples   Using watermarking to avoid overwhelming tuple space   Task ordering to aid in load balancing            e     Dealing with unequal task sizes     Here is a simple sequential program  in Fortran  to search a database   Program DB S  arch    Call Get target  target    10 Call Get next DB  rec   If  rec  EQ  EOF  go to 100  Call Compare  target rec result                                Call Process  result   Goto 10       100 Continue    This program compates a target record  or key  against records in a database  The  following discussion assumes that the operation of comparing two data records takes a  substantial amount of time  Many such database applications exist  including ones  designed for DNA sequence searching  credit application retrieval  and many other sorts  of complex string matching     The program hides all the details of the operation in its component subroutines   get_next_record retrieves another record from the database  compare compares  the record to the target  and process takes compare   s result and keeps track of which  records have mat
212. the right will  create 100 concurrent processes to evaluate the function f for each i value  As each  process finishes  the resulting data tuple will go into tuple space     eval   s Inherited Environment    In C Linda  the environment available to evaled functions consists solely of the bindings  for explicitly named parameters to the function  Static and global initialized variables in  the function are not currently reinitialized for the second and following evals and thus  will have unpredictable values  The created process does not inherit the entire  environment of its parent process  Thus  in the preceding eval example  the environment  passed to the function f will include only the variable x     Under Fortran Linda  created processes inherit only the environment present when the    Linda program was initiated   with no modifications due to execution of user code  In  many implementations  this is achieved by saving a clean copy of the program image  from which to clone new processes           Some distributed memory Linda systems satisfy these semantics exactly only when the number  of evals does not exceed the number of processors     Linda User Guide 23    Consider the following program structure     Block Data   Integer val   Common  params val  Data val  5    End       Subroutine real main  Integer val  Common  params  val    val   1  Call     3   eval   worker      3               Return  End       Subroutine F I   Integer val  Common  params  val    Return  End
213. these topics     Program Preparation    Linda programs must be compiled with the  linda tuple_scope option to use Tuplescope   The environment variable LINDA_CLC or LINDA_FLC should also be set to cds  for  Code Development System   For example  the following commands will prepare the    C Linda program test24 for use with Tuplescope      setenv LINDA CLC cds  clc  o test24  linda tuple scope test24 cl    o  oe    Invoking Tuplescope    Once an executable has been prepared for use with Tuplescope  simply invoking it will  start the debugger  For example  the following command would initiate a Tuplescope  session with the test 24 application we prepared above     9      test24 arguments    Of course  this command would need to be run from a shell window within an X  Windows session  If desired  you may also include X Toolkit options following the  program arguments to customize the resulting Tuplescope windows        T    f you use the Bourne shell  the equivalent commands are  LINDA CLC cds  export  LINDA CLC     Linda User Guide 95    The Tuplescope Display    Below is a canonical diagram of the Tuplescope display  note that exact window  properties and look depend on the window manager in use      Tuplescope control panel    executable name          Jak   Hodes   arme   Run   Break   Contre   Debug   Seve   Gus  EBM  L J                    T  process starting up menu buttons            run speed slider         globals l task   result  worker                tuple window icon
214. tiation Delays    ntsnet initiates worker processes by running the rsh command in the background on the  local node  creating each successive process as soon as the previous command returns   Under some unusual network circumstances  such a procedure can overwhelm a network  server process and result in errors  The delay resource is provided to handle such  situations  It specifies the amount of time to pause between successive rsh command    Linda User Guide 67    initiations  in milliseconds  the default value is 0   If you experience such problems  try  setting its value to 1000  1 second   The  delay command line option may also be used to  specify the value for this resource     Appropriate Granularity for Network Applications    A network environment often has a significantly different granularity profile from other  parallel computing environments  such as shared or distributed memory multiprocessors   because its communications speed is so much slower  This is the result of the differing  communications bandwidth achievable via Ethernet and typical hardware interconnects in  parallel computers  In general  networks impose relatively high communications  overhead  and parallel programs need to be relatively coarse grained to achieve good  performance     On a typical 100mbs Ethernet network  an out in combination takes approximately 500  microseconds  and maximum throughput is about 10MB per second  Thus to out and  then in a tuple of size S takes about  S 10000000    
215. to specify two rules  for each host  to and from a generic value  known as a generic directory  A generic directory  is a string   often a directory pathname   used essentially as a key into the various entries  in the map translation file  The generic directory serves as the target for specific local and  remote directories as ntsnet attempts to translate them for use on various nodes     Map translation file entries use the following commands   mapto Map a specific local directory to a generic directory     mapfrom Map a generic directory to a specific directory  usually on a  remote node      Linda User Guide 53    54 Using TCP Linda    map Eguivalent to a mapto and a mapfrom  mapping specific  directories to and from a generic directory     Here is the general syntax for a map translation file entry     mapverb generic dir    node name   specific dir   node name   specific dir     The generic directory is always translated to a specific directory before being used as a  location for executables or as the working directory on any node  Thus  there is no  requirement that it even exist as a real directory path  In fact  it is possible to use any  arbitrary symbolic name as a generic directory     aurora     home          NFS mounted directory                    ae    as      t s vi  a    a e  chaucer 2   blake  lu a    home  a e  4       e       e    a  a  a  a    E  d    degas ES at erasmus  Amp z ox  mf  a A ot  a  a  a          gt                  oa       a    a s      
216. tsnet moliere speedfactor  2    Lines beginning with an exclamation point     are comments  The second line sets the  working directory to  tmp on the node moliere when the application hello_world is  run there  The third line sets the maximum number of processes that can run on any one  node when the application hello world is executing to 1  and the final line sets the  speed factor for the node molere to 2   where the default value is 1   indicating that it s  about twice as fast as the norm     In configuration file entries  the program  application and node components  if used  can  be either the class name or a specific instance of a class  Class names   recognizable by  their initial capital letter   act as wildcards  stating that the entry applies to all instances of  the specified class  In this way  they can setve as default values for specific instances     specific applications and or nodes   for which no explicit entries have been created  The  following table lists the possible values for each of these three components of a  configuration file entry     Linda User Guide 47    48 Using TCP Linda    Item Associated Class Name Example Instance    program Tsnet ntsnet  application Appl user application program name  node Node node name  user defined node resource    Currently  nt snet is the only valid specific instance of the class Tsnet  so the two are  effectively equivalent  Thus  for every entry  the program component will be either the  class Tsnet ot its only
217. u  sqtau        dtime   0 3       1 0   dx   1 0    tau    e   dtime Y dx     Sqtau   tau   tau    newval i     2 0   values i     oldval i       isqtau    values ri 1     2 0   values il    v  lues 1 1        Linda User Guide 143       EA    Update the points a specified number of times          void update      int i  jy  tpts        update values for each point along string        update points along line     for  j   1  j  lt   tpoints  j          global endpoints     Xr  Xi    1     1j    tpolnts    newval j    0 0   else  do math j            for  j   1  j  lt   tpoints  j       oldval j    values j    jl newval jl     values j    J  print it out for validation       tpts    Epoints     10    Epoints  10    print  i tfirst 3d points  for validation    n    tots     for  i   07 1     tpEs  i    prantr  s4 2f     values i     prints  ya s               Main program          main      int left  right        get program parameters and initialize wave values     init param       init line          update values along the line for nstep time steps     update        return 0     144 Sample Programs    k    sy    Concurrent Wave Equation          HEE k k EEE EEE EEE EEE EEE EEE E E E FE HE EE HEE HEE HEE HE HE HE EE E ERE E E E E HE    SERIAL Wave Equation Makefile     FILE  make ser_wave c     DESCRIPTION  See ser_wave c     USE  mak f make ser_wave c   HEE EEE EEE EEE EEE EEE HEHEHE EEE EE HEE HEE HEE HEE HEE HEE HE HE HE EE ERE E E E E HE                                     
218. uple space by matching a tuple to a  template  not by  say  specifying a memory address or a position in a list  This  characteristic defines tuple space as an associative memory model     The examples so far have all used a string as the first element of the tuple  This is not  required but is a good practice to adopt in most cases because it makes Linda programs  more readable and easier to debug  It also helps the Linda compiler to easily separate  tuples into discrete classes enabling more efficient matching     The next chapter will look at more complicated tuples and tuple matching scenarios and  will present some simple examples of parallel programs using Linda     Linda User Guide 13    How and Where to Parallelize    This section contains a short discussion of how to find the computational kernels in a  program  It discusses the UNIX profiling utilities prof and gprof  The steps described  here are independent of Linda and are usually done before parallelizing the program   They are designed to help you determine where to focus your efforts within the original  application  This section by nature is introductory and brief  consult the relevant manual  pages and related works in the Bibliography for more detailed discussions of these  commands and profiling in general     To use the UNIX profiling utilities  the  p  for prof  or  pg  for gprof  options must be  included on the link command  Note that they ate not needed for compilations  only for  linking  For example
219. useglobalconfig sets the resource to false  and the global configuration file will be  ignored  The polarities of the plus and minus signs may seem counterintuitive at times   just remember that minus means on  which is the usual convention used by most X  applications     Note that all options which require parameters   for example  the command line option   maxprocspernode option we looked at earlier   are preceded by a hyphen  a prepended  plus sign has no meaning for them and will generate an error   Their values follow them   separated by a space     Linda User Guide 49    Not all resources have named command line options  The  opt option is provided so that  any resource   s value can be specified from the command line  It takes a valid  configuration file entry as its parameter  enclosed in double quotation marks       ntsnet  opt  Tsnet moliere available  no  hello world       A third type of resource enables users to create named node lists  Here are some  examples     Tsnet Appl fran  moliere gaughin voltaire pascal       Tsnet Appl eng  chaucer marlowe blake joyc       Tsnet Appl chem   sparcs  rs6k priestley dalton    Each of these lines defines a name for a list of nodes  The first line defines a list of nodes  to be associated with the name sparcs  for example  When a list name is used as a  component in another list  its name is preceded by an at sign  to indicate resource  indirection   as in the third line above  Up to 16 levels of indirection are allowed    
220. ve joined the execution  group  then execution will begin  Otherwise  execution will begin as  soon as minworkers workers do join or the maxwait interval has expired   the latter includes the time in minwait   If there are still not maxworkers  workers when the maxwait interval ends  execution will terminate     Specifies the minimum number of workers started for a given  application  The default is 1  Thus  the default minimum shall be a  master process and one worker  see maxworkers      Specifies whether or not workers on a specific node run nice   d  This  resource is node and application specific  The default is true  When the  high resource is set to true  this resource is ignored  When debug is true   its value is overridden to be false     Specifies the pathname of the file containing a list of nodes on which  this application can run  The default is t snet   nodes  If nodefile and  nodelist are both undefined  ntsnet shall look for the list of nodes on  which to run in the tsnet   nodes file in the current working directory   This is the default behavior and is backwards compatible with the old  tsnet utility  The nodefile resource is only used if the nodelist resource is  set to  nodefile  which is the default value  See the description of the  nodelist resource for more details     Specifies a space separated list of nodes on which an application may    run  The nodelist value can be set to any one or a combination of these  items  the key word  nodefile  a node na
221. xt   usually commands  to the operating system   typed by the user     Italic type is used for replaceable arguments in operation and command definitions  for  summaty lines and other descriptions within example code  and occasionally for  emphasis     Boldface sans serif type is used for Linda operation names used in a generic way within  normal text  for non varying text within syntax definitions  and for menu buttons and    other controls in The Tuplescope Display     Italic sans serif type is used for replaceable arguments within formal syntax definitions and  when they are referred to within the text     Linda User Guide 3    4 Introduction    1    Overview of Parallel Programming and  the Linda Model    People always are trying to make programs run faster  One way to do so is to divide the  work the program must do into pieces that can be worked on at the same time  on  different processors   More formally  creating a parallel program depends on finding  independent computations that can be executed simultaneously     Producing a parallel program generally involves three steps       Developing and debugging a sequential program  For existing programs  this step  is already done      Transforming the sequential program into a parallel program      Optimizing the parallel program     Of course  if the second step is done perfectly then the third one will be unnecessary  but  in practice that rarely happens  It   s usually much easier   and quicker in the long run   to  tr
    
Download Pdf Manuals
 
 
    
Related Search
    
Related Contents
IKEA gas Range User's Manual  会社紹介パンフレット (2014年2月版)  Dough It Yourself Cookbook  Rexel Budget A4 Pockets  CHAS - Scooter Clube do Brasil  GSE 675 Counting Scale User Manual - Avery Weigh  importanti indicazioni di sicurezza  カタログ  400 Clip Fast Access System User Manual, Accom  Franke Oven CR 910 M WH 48D    Copyright © All rights reserved. 
   Failed to retrieve file